Design Decisions ● Use familiar tools: Erlang, Riak, HTTP ● Not a critical service but ... ● ... Avoid SPOF ● Write performance >> read performance ● Centralized reference clock ● Integer only ● Avoid 2i if possible ● When in doubt, leave it to Riak
In Theory... Client Client Client Metyr Metyr Metyr Riak cluster
Storing metrics in Riak No SQL, no schemas, no indices (?), no aggregate operations
Attempt 1 The naïve way just never works...
Make each sample an object A bucket per metric; index by Epoch time
The Good™ Atomicity, write-once, fast range queries
The Bad Slow, large overhead, requires 2i
Attempt 2 Combine samples into chunks by time
Key Points ● One bucket per metric as before ● Split into hour-sized chunks (configurable) ● Chunk key: Epoch time ● Chunk value: List of samples ● To read: Fetch chunks within interval ● To write: Fetch chunk, add sample, write back
Chunk Anatomy One sample Time Value Tags ... ... Time Value Tags ... 0 0 0 N N N 64 bits 64 bits
Writing just got harder Slower since we must fetch a chunk first; potential race conditions, ...
(Arbitrary) Goal: Write 1K samples/sec Tests showed that the solution described so far was inadequate
Buffer them writes Keep per-metric write buffers, flushed every 10 seconds or so
Some Remaining Issues ● Race condition on write ● Storage requirements ● Downsampling of old data