API Design should be understandable, should push users in the right direction
Previous API Many series with metadata in the name like in Graphite region.us.data_center.1.host.serverA.network_in! region.us.data_center.1.host.serverA.network_out! 5m.mean.region.us.data_center.1.host.serverA.network_in! 5m.mean.region.us.data_center.1.host.serverA.network_out!
Understandable API Design: Retention Policies • Previously called shard spaces • Users tell the server which shard space to read/ write data into based on a regex to match against the series name
Too hard to use, couldn’t tell where data was going
Users could accidentally hide a bunch of data that was previously available
Solution: have the user be explicit at write or query time
Pushing users in the right direction: Tags SELECT mean(value) FROM cpu! Users wanted this: WHERE host = 'serverA'!
Columns • Wasted space • Add Indexes? • User still has to create them
Everyone was expecting tags… Indexed metadata and series
Structural Design Problems Tell users to have many series: LIST SERIES /.*region\.uswest.*/! SELECT mean(value)! FROM merge(/.*region\.uswest.*/)! WHERE time > now() - 2h! GROUP BY time(5m)!
No way to make that perform well with > 100k series
Solution: measurements and tags break up the namespace
Structural Design Problems: Query Engine • Pipes raw data over the network • Need to redesign to get data locality • MapReduce framework