Apache Apex - Stream Processing ● YARN - Native - Uses Hadoop YARN framework for resource negotiation ● Highly Scalable - Scales statically as well as dynamically ● Highly Performant - Can reach single digit millisecond end-to-end latency ● Fault Tolerant - Automatically recovers from failures - without manual intervention ● Stateful - Guarantees that no state will be lost ● Easily Operable - Exposes an easy API for developing Operators (part of an application) and Applications
Project History ● Project development started in 2012 at DataTorrent ● Open-sourced in July 2015 ● Apache Apex started incubation in August 2015 ● 50+ committers from Apple, GE, Capital One, DirecTV, Silver Spring Networks, Barclays, Ampool and DataTorrent ● Mentors from Class Software, MapR and Hortonworks ● Soon to be a top level Apache project
Apex Platform Overview
An Apex Application is a DAG (Directed Acyclic Graph) ● A DAG is composed of vertices (Operators) and edges (Streams). ● A Stream is a sequence of data tuples which connects operators at end-points called Ports ● An Operator takes one or more input streams, performs computations & emits one or more output streams ● Each operator is USER’s business logic, or built-in operator from our open source library ● Operator may have multiple instances that run in parallel
Hadoop 1.0 vs 2.0 - YARN
Apex as a YARN Application ● YARN (Hadoop 2.0) replaces MapReduce with a more generic Resource Management Framework. ● Apex uses YARN for resource management and HDFS for storing any persistent storage
Support for Windowing ● Apex splits incoming tuples into finite time slices - Streaming Windows ○ Transparent to the user ○ Apex Default = 500 ms ● Checkpointing and book-keeping done at Streaming window boundary ● Applications may need to perform computations in windows - Application Windows ○ Specified as a multiple of Streaming Window size ○ Call backs to user operator logic ■ beginWindow(long windowId) ■ endWindow() ○ Example - An application which identifies some aggregates and emits them every minute. Here application window size = 60 secs = 30 Streaming Windows ● Sliding and Tumbling Application windows are supported natively
Buffer Server ● Staging area for outgoing tuples ● Downstream operators connect to upstream Buffer Server to subscribe for tuples ● Plays a role in recovery by replaying data to the downstream operator from a particular checkpoint ● Spooling to disk is also supported
Fault Tolerance - Checkpointing ● During checkpointing all operator state is written to HDFS asynchronously ● This is decentralized and happens independently for each operator ● If all operators in the DAG have checkpointed a particular window, then that window is said to be committed and all previous checkpoints are purged O1 O2 O3 O4 Checkpoint Window = 60 Streaming Windows Checkpoint Window # ---> 180 180 180 120 Committed Window # = 120 Checkpoint # ---> 3 3 3 2 Committed Checkpoint # = 2
Recovery ● Apex Application Master detects the failure of an operator based on the missing heart beats from the operators or if windows are not progressing ● All downstream operators from the failed operator are restarted from the last committed checkpoint to recover from their states. ● Data is replayed from the same checkpoint by the Buffer Server ● Recovery is automatic and does not require manual intervention.
Scalability - Partitioning ● Operators can be “replicated” (partitioned) into multiple instances to cope up with high speed input streams. ● Can be specified at Application launch time ● User can control the distribution of tuples to downstream partitions. ● Automatic Unifier to unify the tuples
Scalability - Dynamic scaling ● Auto scaling is also supported. Number of partitions may automatically increase or decrease based on the incoming load. Can be customized by the user ● User has to define the trigger for auto scaling: ○ Example - Increase partitions if latency goes above 100 ms.
Apex Processing Semantics ● AT_LEAST_ONCE (default): Windows are processed at least once ● AT_MOST_ONCE: Windows are processed at most once ○ During recovery, all downstream operators are fast-forwarded to the window of latest checkpoint ● EXACTLY_ONCE: Windows are processed exactly once ○ Checkpoint every window ○ Checkpointing becomes blocking
Apex Guarantees ● Apex guarantees No loss of data and computational state - Checkpointed periodically ● Automatic recovery ensures that processing resumes from where it left off ● Order of incoming data is guaranteed to be maintained ○ Not applicable in case of partitioning of operators ● Events in a window are always replayed in the same window in case of failures
Decision Making in < 2ms 1. Performance requirements a. A system which can provide a very very low latency for decision making (40 ms) b. Ability to handle large volumes of data and ever changing rules (1,000 events per 20 ms burst) c. 99.5% uptime. Which is about 1.5 days downtime in an year ➔ Apex achieved: ◆ 2 ms latency against the requirement of 40ms ◆ Was able to handle 2,000 events burst against requirement of 1,000 events burst at a net rate of 70,000 events/s. ◆ 99.9995% uptime against requirement of and 99.5% uptime and 2. Relevant Roadmap 3. Enterprise grade 4. Have a healthy and diverse community and committers, i.e. not controlled by one vendor Talk Slides: http://www.slideshare.net/ilganeli/nextgen-decision-making-in-under-2ms DataTorrent Blog: https://www.datatorrent.com/blog/next-gen-decision-making-in-under- 2-milliseconds/
Decision making in < 2ms contd.. ● Comparison finally boiled down to ○ Apache Storm ○ Apache Flink ○ Apache Apex ● Some problems in Storm and Flink among others ○ Nimbus is a single point of failure ○ Bolts / Spouts / Operators share a JVM. Hard to debug ○ No dynamic topologies ○ Restarting entire topologies in case of failures