An Introduction to NoSQL Brad Anderson - DevNexus March 21, 2011
Agenda NoSQL is BULLSHIT You Don’t Need It You Can’t Query It
The Name Play on MySQL (Eric Evans, Rackspace) Not Only SQL (Emil Eifrem) Broad Umbrella Shitty Marketing Term and we’re stuck with it
Why do you need NoSQL? YOU DON’T!
Seriously, you don’t... Vastly different performance characteristics Immature APIs and tools / ecosystems Bugs, most are actively being developed Your situation doesn’t warrant it
Why do they exist? Every one of these new data storage systems came from a particular pain someone was having. Each system was created to specifically solve the pain point the authors were experiencing. This pain usually involves a metric shit-tonne of data and distributed processing is required. Schema-free
Examples Google - index Internet (mapreduce/bigtable) Yahoo - keep up with Google (Hadoop) Amazon - shopping cart (Dynamo) Facebook - inbox search (Cassandra) Lotus - Notes legacy restrictions (CouchDB) Cloudant - physics research (BigCouch) Basho - CRM product (Riak) Neo - graph traversal (Neo4J)
Pain of Scaling Scale Reads with master-slave replication Scale Writes with master-master replication Partitioning Vertically (by functional groups) Partitioning Horizontally (by key, i.e. ‘date’) Caching works, kinda
What to do? Distribute both data and processing horizontal scaling Organize data differently Use appropriate on-disk storage
Sorting Hat Says... Distribution Model Data Model Disk Data Structure
Distribution Model Embedded (no distribution) Replication / Sharding Chord - peer to peer Dynamo consistent hashing, vnodes, vector clocks
No Distribution BDB Neo4J
Replication / Sharding Distribution MongoDB CouchDB Redis
Dynamo Distribution BigCouch Riak Voldemort Cassandra no vnodes no vector clocks Hibari ?
N A B C D Z P o U de B C D EA T 1C D E FB D Node 2 E Node 3 F Node 4 G Node 26 C htt Dynamo - p://boorad.cloudant.com/db how does it name/blah? w=2 work? N=3 Load W=2 Balancer R=2 hash(blah) = E