Why Modeling Matters • NoSQL => no joins • What replaces joins? • Hierarchy • Duplication of data • Different models for querying, indexing • Your optimal data model is (probably) very different than with relational • Simpler • More like you develop
Stop Thinking Like This! endless layers of abstraction (and misery)
Hierarchy before NoSQL • Simple User Model
Hierarchy before NoSQL • Tuned Queries • Write some brittle SQL: • “select user.id, … inner join settings on … • Pick out the fields and construct object hierarchy (this gets nasty, fast) • (outer joins for optional values?) • Object fetching • Queries follow object graph, PK/FK • 5 queries to fetch object in this example
Hierarchy before NoSQL
Hierarchy with NoSQL • JSON structure mapped to objects • Fetch json from MongoDB** • Unmarshall into objects/tuples • Use it Using JSON4S
Hierarchy with NoSQL Focus on your Software, not DB layer!
Hierarchy with NoSQL • Write operations • Atomic upsert (create, update or fail) • Saves all levels of object atomically • Reduces need for transactions
Hierarchy with NoSQL • Write operations • Atomic upsert (create, update or fail) • Saves all levels of object atomically • Reduces need for transactions Convenien ce not All or magic nothing
Unique Identifiers in your Data • Relational design => PK/FK • Often not “meaningful” identifiers for data • User Data Model
Unique Identifiers in your Data • Relational design => PK/FK • Often not “meaningful” identifiers for data Unique by • User Data Model username
Unique Identifiers in your Data • Words Ensured to be constant
Data Duplication • Without Joins, what about SQL lookup tables? • Duplication of data in NoSQL is required • Trade storage for speed
Data Duplication • …Can Without Joins, what about SQL lookup move logic tables? to app • Duplication of data in NoSQL is required • Trade storage for speed
Data Duplication • Many fields don’t change, ever • But… many do • New decisions for the developer! • Often background updates
Data DuplicationHow often • Many fields don’t change, everdoes this • But… many do change? • New decisions for the developer! • Often background updates
Reaching into Objects • Incredible feature of MongoDB • Dot syntax safely** traverses the object graph
Inner Indexes • Convenience at a cost • No index => table scan • No value? => table scan • No child value? => table scan • Table scan with big collection? • Can’t index everything! 96GB of Indexes?
Inner Indexes • This will should drive your Data Model • Sparse Data test Even with only 2000 non-empty values!
Adding & Modifying • Append in mongo is blazing fast • “tail” of data is always in memory • Pre-allocated data files • Main expense is “index maintenance” • Some marshalling/unmarshalling cost** • Modifying? Object growth • Pre-allocation of space built in collection design
Adding & Modifying • Each object has allocated space • Exceed that space, need to relocate object • Leaves “hole” in collection • Large increases to documents hurts your overall performance • Your data model should strive for equally- sized objects as much as possible
Retrieval • Many same rules apply as relational • Indexes • complex/inner or not • Indexes in RAM? Yes • Cardinality matters • New(ish) considerations • Complex hierarchy not free • Marshalling unmarshalling
Marshalling & Unmarshalling Object complexit y c rds/se co e R
Marshalling & Unmarshalling • All you can eat from your Data Model? • Techniques have tremendous impact • Development ease until it matters • 50% speed bump with manual mapping Only demand what you can consume!
Making the most of _id • Indexes matter • Tailor your _id to be meaningful by access pattern • It’s your first defense when auto-sharding • Date-driven data? • Monotonically _id value • Ensures recent data is “hot”
Making the most of _id • Other time-based data techniques • Flexibility in querying
Making the most of _id • Other time-based data techniques • Flexibility in querying Case- sensitive REGEX is your pal
Making the most of _id • Hot indexes are happy indexes • Access should strive for right bias • Random access with large indexes hit disk 1 7 1 2 5 7
Your Data Model • NoSQL gets you started faster • Many relational pain points are gone • New considerations (easier?) • Migration should be real effort • Designed by access patterns over object structure • Don’t prematurely optimize, but know where the knobs are