SSSSLIDE

Data Platform at Twitter: Enabling Real-time & Batch Analytics at Scale

bySriram Krishnan

2015/02/12にアップロード

The Data Platform at Twitter supports engineers and data scientists running batch jobs on Hadoop clusters that are several 1000s of nodes, and real-time jobs on top of systems such as Storm. In this presentation, I discuss the overall Data Platform stack at Twitter. In particular, I talk about enabling real-time and batch analytics at scale with the help of Scalding, which is a Scala DSL for batch jobs using MapReduce, Summingbird, which is a framework for combined real-time and batch processing, and Tsar, which is a framework for real-time time-series aggregations.

参照元: http://www.slideshare.net/krishflix/data-platform-at-twitter-enabling-realtime-batch-analytics-at-scale