One Basic Pattern •Acquire data •Transform and/or Transport data •Output data
One Multi-Tool? What would it be like to build a tool to tackle this in the general case? Wins: •Fewer processes to manage •Increased client / configuration consistency •Processing shared across domains
One Multi-Tool? Requirements: •Lightweight •Flexible and configurable •Easily extended
I know, I know...
BUT! Replacing even two services on each box is a net ops win. SCIENCE!
How Heka Is Put Together
Inputs •Listen or fetch •Just about the low level transport
Splitters •Slice Inputs' raw data streams into discrete events •Text or binary protocols •Decouple protocols from their transports
Decoders •Parse event data to populate a metadata envelope for all event types •Extract structure from unstructured data... •... or just wrap a blob •Sandbox-able (Lua)
Router Simple, efficient grammar for matching messages: Type == "counter" && Payload == "1" Type == "applog" && Logger == "marketplace" Type == "alert" && (Severity==7 || Payload=="emergency") Type == "myapp.metric" && Fields[name] =~ /.*\.stat/
Filters •Watch flowing data •Generate output messages •Sandbox-able (Lua)
Outputs •Deliver to external service... •… and/or to upstream Heka... •… and/or directly to Heka Dashboard UI •Configurable reconnect
Sandboxes Are Fun! • Dynamically added to running Heka w/ no config changes, no restart ● CPU cycles and RAM usage monitored ● Misbehaving plugins are shut off
Sandboxes Are Fun! • LPeg (parsing expression grammar) & JSON libraries for data parsing • Circular buffer library for time series data
Sandboxes Are Fun! Circular buffers auto-generate dashboard graphs