Netflix is a data-driven organization that places emphasis on data quality, availability and agil...
Netflix is a data-driven organization that places emphasis on data quality, availability and agility to capture and process that data. Some of our recommendation algorithms are computed as events happen in real time. Such streaming applications are long running tasks that need to be resilient. This is especially true in a cloud deployment due to the ephemeral nature of resources. In this talk, we will cover the What, the Why and the How of our resiliency exercise with Spark Streaming in an AWS cloud deployment. A Netflix ChaosMonkey based approach, which randomly terminated instances or processes, was employed to simulate failures. We hope that this exercise will help build confidence in the resiliency on Spark Streaming for similar contexts.