Instance Sizes ● m1.xlarge is by far the most common size ● m1.large is ok for many use cases ● m2.4xlarge in some cases ● keep the entire dataset in memory ● c1.xlarge / cc1.4xlarge ● Smallish but very hot set of data – regardless of how much data is on disk ● Extremely high request rate ● Encrypted node-node communications and high traffic ● Usually better off with many m1.xlarge instances because of the extra memory, but not always
Configuration ● Stripe All Ephemeral Drives ● data directory and commit log on same volume ● Only applies to EC2 and SSDs, not physical HW ● Why? ● 6-8 GB heap on m1.xlarge ● 3-4 GB heap on m1.large ● Phi Convict Threshold? Maybe ...
EBS versus Ephemeral ● Ephemeral drives are: ● Generally faster for C* ● More stable (no pauses/freezes; outages?) ● Cheaper ● Easier to initially configure ● Striped EBS? ● yeah, about that … ● TL;DL don't use EBS for C* on EC2
Multi-Zone ● Alternate zones in your token topology ● No really, this is important, alternate zones – We should probably fix this ... ● “complicated, but possible” to add new zones after initial deployment ● Never move a *token* to a different region or zone ● If you think that is what you want to do, real y you want to bootstrap new one at token-1 in the new region/zone and then decom the old one
Multi-Region C* on EC2 ● Connectivity is the complicated part ● Ec2MultiRegionSnitch is not the entire answer – https://issues.apache.org/jira/browse/CASSANDRA-2452 ● Don't try to make a “fail over” DC, just go with active-active ● If you insist, then do the fail over in your application and configure C* the same as you would active-active ● Generally requires a lot more storage ● Doesn't matter though because you're using ephemeral drives (right?) and don't want a TB of data on each node anyway
Multi-Region Connectivity Options ● VPN ● Encrypted node-node communication ● CPU utilization is often a downside ● VPNCubed / VPCPlus ● I've never deployed it, heard good things about it though ● Amazon VPC ● anyone know if a single VPC can span regions yet? ● SSH Tunnels ● EC2 security groups ● IPTables ● Encrypted node-node + public IP binding + AWS security groups + IPTables (EIPs may simplify this, never actual y tried it)
Recovery From Failures ● Don't “fix” EC2 nodes, replace them ● boostrap at token-1, remove old token – bootstrap can be slow, but wil get better ● Other than that it's the same in EC2 as not ...
Node Maintenance ● “Maintenance” On EC2? ● Usually not required (just replace the node) ● If it is, just stop C*, CL+HH/repair/RR will fix it ● Same as physical HW ● https://issues.apache.org/jira/browse/CASSANDRA-2034 ● Stop Trying To Decom Nodes Just To Replace a Disk !!!
Backups ● C* snapshots and push to S3 ● Directory Watcher that pushes new files to S3 ● SimpleGeo: https://github.com/simplegeo/tablesnap ● Netflix: http://slidesha.re/NFOnCassBkup ● Keep a log of al incoming writes ● Not specific to S3 ● Can be coupled with snapshots / S3 ● Useful for other reasons as well ● Compression in transit to S3 (or where ever) can be done on a separate EC2 instance to avoid burning CPU ● Usually not worth the extra complexity / cost
Changing Node Sizes ● Start a new instance ● rsync data from from original node to new node ● Shutdown C* on original node ● rsync data from from original node to new node ● Start C* on new node ● Shutdown original instance ● NB: Assumes same token, region, zone, etc
Elastic Load Balancers They're awesome, use them ● Could be more awesome (e.g. better integration with Route 53) ● What I real y want is TCP anycast for ELB across regions (AWS could ● make it work) Balance across regions with GeoIP / GeoDNS ● Zerigo, TZOHA, Neustar, “homegrown”, etc ● Route 53? You wish (though Route 53 itself is run over anycast) ● – “in the future we plan for Route 53 to also give you greater control over … the route your users take to reach an endpoint” --Werner Vogels Put them in front of your app servers, not your C* instances ● Keep your app servers stateless or at least “weakly” stateless (e.g. no sticky ● sessions required)
AMIs versus Scripted Setup ● DataStax publishes C* AMIs ● Chef Recipes as well ● Or roll your own … ● Whatever you do, just make sure it's automated and repeatable ● *personally* I prefer scripting the setup remotely, but this is … “less than ideal” ● PSSH is, in general, awesome
WTF?! ● Your zone X is not the same as my zone X ● Consistent within an EC2 account ● Problematic across accounts ● Does not apply to regions (i.e. your region X is my region X) ● EIPs resolve to private IPs from within AWS ● EBS volumes sometimes just “freeze” ● AWS: “yeah, that happens sometimes under load” ● steal% sometimes 20% or more (1%-3% is “normal”) ● This is AWS literally stealing your money ● Thankfully not all that common, but watch out for it
Missing AWS Features ● ELB over anycast ● Probably doable by AWS, but not others ... ● GeoDNS from Route53 ● No real y, WTF Doesn't Route53 Do GeoDNS ?!?! ● Multi-Region VPC ● Local SSDs
We're Hiring ! ● Developers ● QA ● Community Manager ● Sales / SE ● Interns – Dev – Support – QA ● Smart People Interested In Cassandra
Cassandra On EC2 Q? (yes, I'll post the slides on slideshare)