. . just say no to EBS EBS classic EBS with PIOPS:
SSD (hi1.4xlarge) • 8 cores • 60 gigs RAM • 2 1-TB SSD drives • 120k random reads/sec • 85k random writes/sec • expensive! $2300/mo on demand
PIOPS • Up to 2000 IOPS/volume • Up to 1024 GB/volume • Variability of < 0.1% • Costs double regular EBS • Supports snapshots • RAID together multiple volumes for more storage/performance
estimating PIOPS • estimate how many IOPS to provision with the “tps” column of sar -d 1 • multiply that by 2-3x depending on your spikiness
Ephemeral Storage • Cheap • Fast • No network latency • No snapshot capability • Data is lost forever if you stop or resize the instance
filesystem • Use ext4 • Raise file descriptor limits • Raise connection limits • Mount with noatime and nodiratime • Consider putting the journal on a separate volume
blockdev • Your default blockdev is probably wrong • Too large? you wil underuse memory • Too smal ? you wil hit the disk too much • Experiment.
infrastructure is code • Chef • Puppet • CloudFormation • Scripts (e.g. MongoLab’s mongoctl)
highlights of mongo chef cookbook • Configures EBS raid for you • Supports PIOPS • Handles multiple clusters, sharding, arbiters • Built-in snapshot support • Provisions new nodes automagical y from latest completed RAID snapshot set for cluster
provisioning from snapshot • Fast and easy • Takes < 5 minutes using knife-ec2 • Wil not reset padding factors
provisioning with initial sync • Compacts and repairs your col ections and databases • Hard on your primary, does a full table scan of al data • On > 2.2.0 you can sync from a secondary by button- mashing rs.syncFrom() on startup • Or use iptables to block secondary from viewing primary (al versions) • Resets al padding factors to 1
fragmentation is terrible
fragmentation • Your RAM gets fragmented too! • Leads to underuse of memory • Deletes are not the only source of fragmentation • Repair, compact, or resync regularly • Or consider using powerof2 padding factor
3 ways to fix fragmentation: • Re-sync a secondary from scratch • resets your padding factors • hard on your primary; rs.syncFrom() a secondary • Repair a secondary • resets your padding factors • may take longer than your oplog age • Run continuous compaction on your snapshot node • won’t reset padding factors • but it also won’t reclaim disk space
Finding bad queries • db.currentOp() • mongodb.log • profiling col ection
db.currentOp() • Check the queue size • Any indexes building? • Sort by num_seconds • Sort by num_yields, locktype • Consider adding comments to your queries • Run explain() on queries that are long-running
mongodb.log • Configure output with --slowms • Look for high execution time, nscanned, nreturned • See which queries are holding long locks • Match connection ids to IPs
system.profile col ection • Enable profiling with db.setProfiling() • Does not persist through restarts • Like mongodb.log, but queryable • Writes to this col ection incur some cost • Use db.system.profile.find() to get slow queries for a certain col ection, time range, execution time, etc
. . when queries pile up . . • Know what your tipping point looks like • Don’t elect a new primary or restart • Do kil queries before the tipping point • Write your kil script before you need it • Don’t kil internal mongo operations, only queries.
can’t elect a primary? • Never run with an even number of votes (max 7) • You need > 50% of votes to elect a primary • Set your priority levels explicitly if you need warmup • Consider delegating voting to arbiters • Set snapshot nodes to be nonvoting if possible. • Check your mongo log. Is something vetoing? Do they have an inconsistent view of the cluster state?
secondaries crashing? • Some rare mongo bugs wil cause al secondaries to crash unrecoverably • Never kil oplog tailers or other internal database operations, this can also trash secondaries • Arbiters are more stable than secondaries, consider using them to form a quorum with your primary
replication stops? • Other rare bugs wil stop replication or cause secondaries to exit without a corrupt op • The correct way to fix this is to re-snapshot off the primary and rebuild your secondaries. • However, you can sometimes *dangerously* repair a secondary: 1. stop mongo 2. bring it back up in standalone mode 3. repair the offending col ection 4. restart mongo again as part of the replica set