How inspect logs Retrospection (reactive search) Store data, and search Prospection (proactive search) Define what should be processed, and store data
What logs inspected Schema-full data: strict schema: pre defined fields w/ types (or reject) schema on read: try to read known fields (or ignore) Schema-less data: any fields (or ignore), any types (implicit/explicit conversion) fit for services in-development (all internet services!)
How/what How\What Schema-full Schema-less RDBMS, MongoDB, Retrospect Hive, BigQuery, Hive(SerDe), TD, Cassandra, HBase, ... Plain text file, ... Esper, Prospect many of stream CEPs, Norikra, ... ...
Data size: schema & index Logs: size is always important (xTB - xPB) Schema: size optimization access optimization on memory/disk Index: access optimization on memory/disk more memory/disk required hard to distribute
Query response improvements of retrospection Schema-full + indexed (RDBMS) Query plan optimization Schema on read I/O and Task size optimization & scale out Schema-less + indexed (Mongo) mmap-ed index & data (!)
Query response improvements of prospection Time window + incremental calculation Stream processing engines
Stream processing and data size No disks: reduction of failure points Less memory: size of just processing and I/O buffers aggregation results Easy to distribute: stream duplication stream splitting by aggregation key
Stream processing and schema Stream processing: query -> data Prospective schema by queries: Queries know required fields and its types Unused fields can be ignored Implicit type conversion available Schema-less data + schema-full queries
My goal: Schema-less data stream + schema-full queries It’s Norikra!