Disclaimer The following content results from research, use case analysis, industry observations, plus personal perspectives and opinions – presented by a speaker who is an independent author/consultant. The following content does not in any way represent the opinions or official messaging for any clients of Liber 118, Apache Foundation, United Nations, Area 51, S.P.E.C.T.R.E., etc. Except, perhaps, for the smarter ones who nurture an ample sense of humor, which unfortunately may disqualify much of Silicon Valley…
From Business Use Cases To Bare Metal Paradigm shifts can be observed at three levels of the tech stack for cluster computing. Each implies orders of magnitude in cost savings over prior best results, based on substantive changes in software engineering practices… Functional Programming Data Workflow Abstractions Datacenter Computing
From Business Use Cases To Bare Metal In other words, now that we have Mesos, Docker, and Spark, why do we need Hadoop legacy software? Functional Programming Data Workflow Abstractions Datacenter Computing
From Business Use Cases To Bare Metal Countdown: Augury and Omens Aside, Part 3… hard problems? • latency Functional Programming • aggregation • paral elism • data rates Data Workflow Abstractions Datacenter Computing
From Business Use Cases To Bare Metal Countdown: Augury and Omens Aside, Part 3… hard problems => solutions • applicative systems Functional Programming • leveraging semigroup structure • lazy evaluation aka combinator graph reduction • probabilistic data structures Data Workflow Abstractions Datacenter Computing
From Business Use Cases To Bare Metal Countdown: Augury and Omens Aside, Part 2… hard problems? • process, data, and metadata in silos Functional Programming • BI + data modeling legacy culture • CAP theorem vs. ACID • accidental complexity • propagating schema and lineage Data Workflow Abstractions • learning curve inertia • managing risk vs. innovation Datacenter Computing
From Business Use Cases To Bare Metal Countdown: Augury and Omens Aside, Part 2… hard problems => solutions • interdisciplinary teams Functional Programming • generalize across batch + real-time + etc. • separation of concerns • pattern language • compiler => query planner Data Workflow Abstractions Datacenter Computing
From Business Use Cases To Bare Metal Countdown: Augury and Omens Aside, Part 1… hard problems? • commodity hardware failure rates Functional Programming • sched. batch is simple; sched. services is expensive • no getting around it: building a distrib system • static partitioning => cost of cluster computing • monolithic control ers vs. shared state Data Workflow Abstractions • low util rates => upsidedown in power availability Datacenter Computing
From Business Use Cases To Bare Metal Countdown: Augury and Omens Aside, Part 1… hard problems => solutions • isolation Functional Programming • containerization • mixed workloads • data locality • service+framework architecture Data Workflow Abstractions • predictive scheduling Datacenter Computing
IoT Data Rates: Tools and techniques that served well for ad-tech wil not necessarily apply for “Industrial Internet” data rates … we must retool; power requirements alone would boil the oceans technologyreview.com/...
Some History, Part 3
Narrative Arc: Lambda Somethingorother Theory, Eight Decades Ago: Haskell Curry, known for seminal work on combinatory logic (1927) Alonzo Church, known for lambda calculus (1936) and much more! Alonso Church wikipedia.org ! Both sought formal answers to the question, “What can be computed?” Haskell Curry haskell.org
Narrative Arc: Lambda Somethingorother Praxis, Four Decades Ago: Leveraging lambda calculus, combinators, etc., to increase parallelism of apps as applicative systems “Can Programming Be Liberated from the von Neumann Style? A Functional Style and Its Algebra of Programs” John Backus ACM Turing Award (1977) acm.org stanford.edu/class/cs242/readings/backus.pdf “A new implementation technique for applicative languages” Turner, D. A. (1979) Softw: Pract. Exper., 9: 31–49. doi: 10.1002/spe.4380090105 David Tuner wikipedia.org
Narrative Arc: Lambda Somethingorother Today: Add ALL the Things: Abstract Algebra Meets Analytics infoq.com/presentations/abstract- algebra-analytics Avi Bryant, Strange Loop (2013) Avi Bryant @avibryant • grouping doesn’t matter (associativity) • ordering doesn’t matter (commutativity) • zeros get ignored In other words, while partitioning data at scale is quite difficult, you can let the math al ow your code to be flexible at scale
Narrative Arc: Lambda Somethingorother Algebra for Analytics speakerdeck.com/johnynek/ algebra-for-analytics Oscar Boykin, Strata SC (2014) • “Associativity allows parallelism in reducing” by letting you put the () where you want Oscar Boykin @posco • “Lack of associativity increases latency exponentially” A + B + C + D + E + F + G + H + I + J + K + L + M + N + O + P (A + B) + C + + D + E + F + G + + + H + I ??? + J + K + + + + + L + M + N + O (A + B) (C + D) (E + F) (G + H) (I + J) (K + L) + P (M + N) (O + P)
Narrative Arc: Lambda Somethingorother That, plus oh so much more math fun in store! u v x The Posterior The Evidence The Prior w (current decision) (the data) (past decisions) n r r n VH r Σ M m = U input hidden output A I x b = z - cT 0 x' 0
Narrative Arc: Data Workflow Abstractions Q3 1997 inflection point: four independent teams working toward horizontal scale-out of workflows based on commodity hardware This effort prepared the way for huge Internet successes during the 1997 holiday season… AMZN, EBAY, Inktomi (YHOO Search), then GOOG MapReduce on clusters of commodity hardware and the Apache Hadoop open source stack emerged from this context
Narrative Arc: Data Workflow Abstractions Pregel Giraph Dremel Dril Tez MapReduce Impala GraphLab Storm S4 General Batch Processing Specialized Systems: iterative, interactive, streaming, graph, etc.
Narrative Arc: Data Workflow Abstractions How about a generalized engine for distributed, applicative systems – apps sharing code across multiple use cases: batch, iterative, streaming, etc. The State of Spark, and 2004 2010 MapReduce paper Spark paper Where We're Going Next Matei Zaharia 2002 2004 2006 2008 2010 2012 2014 Spark Summit (2013) youtu.be/nU6vO2EJAb4 2002 2008 2014 MapReduce @ Google Hadoop Summit Apache Spark top-level 2006 Hadoop @ Yahoo! RDD RDD RDD transformations RDD action value
Some History, Part 1
Lessons from Google
Datacenter Computing Google has been doing datacenter computing for years, to address the complexities of large-scale data workflows: • leveraging the modern kernel: isolation in lieu of VMs • “most (>80%) jobs are batch jobs, but the majority of resources (55–80%) are al ocated to service jobs” • mixed workloads, multi-tenancy • relatively high utilization rates • JVM FTW? not so much… • reality: scheduling batch is simple; scheduling services is hard/expensive
Beyond Hadoop Hadoop – an open source solution for fault-tolerant paral el processing of batch jobs at scale, based on commodity hardware… however, other priorities have emerged for the analytics lifecycle: • apps require integration beyond Hadoop • multiple topologies, mixed workloads, multi-tenancy • significant disruptions in h/w cost/performance curves • higher utilization • lower latency • highly-available, long running services • more than “Just JVM” – e.g., Py adoption, etc.
Just No Getting Around It “There's Just No Getting Around It: You're Building a Distributed System” Mark Cavage ACM Queue (2013-05-03) queue.acm.org/detail.cfm?id=2482856 key takeaways on architecture: • decompose the business application into discrete services on the boundaries of fault domains, scaling, and data workload • make as many things as possible stateless • when dealing with state, deeply understand CAP, latency, throughput, and durability requirements “Without practical experience working on successful—and failed—systems, most engineers take a "hopefully it works" approach and attempt to string together off-the-shelf software, whether open source or commercial, and often are unsuccessful at building a resilient, performant system. In reality, building a distributed system requires a methodical approach to requirements along the boundaries of failure domains, latency, throughput, durability, consistency, and desired SLAs for the business application at all aspects of the application.”
Mesos – open source datacenter computing a common substrate for cluster computing mesos.apache.org heterogenous assets in your datacenter or cloud made available as a homogenous set of resources • top-level Apache project • scalability to 10,000s of nodes • obviates the need for virtual machines • isolation (pluggable) for CPU, RAM, I/O, FS, etc. • fault-tolerant leader election based on Zookeeper • APIs in C++, Java/Scala, Python, Go, Erlang, Haskell • web UI for inspecting cluster state • available for Linux, OpenSolaris, Mac OSX
What are the costs of Single Tenancy? RAILS CPU MEMCACHED HADOOP CPU LOAD CPU LOAD LOAD 100% 100% 100% 75% 75% 75% 50% 50% 50% 25% 25% 25% 0% 0% 0% t t COMBINED CPU LOAD (RAILS, MEMCACHED, HADOOP) 100% 75% Hadoop 50% Memcached Rails 25% 0%
Arguments for Datacenter Computing rather than running several specialized clusters, each at relatively low utilization rates, instead run many mixed workloads obvious benefits are realized in terms of: • scalability, elasticity, fault tolerance, performance, utilization • reduced equipment capex, Ops overhead, etc. • reduced licensing, eliminating need for VMs or potential vendor lock-in subtle benefits – arguably, more important for Enterprise IT: • reduced time for engineers to ramp up new services at scale • reduced latency between batch and services, enabling new high ROI use cases • enables Dev/Test apps to run safely on a Production cluster
Analogies and Architecture
Prior Practice: Dedicated Servers DATACENTER • low utilization rates • longer time to ramp up new services
Prior Practice: Virtualization DATACENTER PROVISIONED VMS • even more machines to manage • substantial performance decrease due to virtualization • VM licensing costs
Prior Practice: Static Partitioning DATACENTER STATIC PARTITIONING • even more machines to manage • substantial performance decrease due to virtualization • VM licensing costs • failures make static partitioning more complex to manage
Mesos: One Large Pool of Resources DATACENTER MESOS “We wanted people to be able to program for the datacenter just like they program for their laptop." ! Ben Hindman
! Fault-tolerant distributed systems… …written in 100-300 lines of C++, Java/Scala, Python, Go, etc. …building blocks, if you will ! Q: required lines of network code? A: probably none
Mesos – architecture apps: HA services, web apps, batch jobs, scripts, etc. frameworks: Spark, Storm, task schedulers: Chronos, etc. MPI, Jenkins, etc. meta-frameworks: Aurora, Marathon APIs: C++, JVM, Py, Go Mesos, distrib kernel HDFS, distrib file system Linux: libcgroup, libprocess, libev, etc.
Quasar+Mesos @ Stanford, Twitter, etc.… Improving Resource Efficiency with Apache Mesos Christina Delimitrou youtu.be/YpmElyi94AA
Quasar+Mesos @ Stanford, Twitter, etc.… Consider that for datacenter computing at scale, a surge in workloads implies: • large cap-ex investment, long lead-time to build • utilities cannot supply the power requirements Even for large players that achieve 2x beyond typical industry DC util rates, those factors become show-stoppers. Even so, high rates of over-provisioning are typical, so there’s much room to improve. Experiences with Quasar+Mesos showed: • 88% apps get >95% performance • ~10% overprovisioning instead of 500% • up to 70% cluster util at steady state • 23% shorter scenario completion
Because… Use Cases
Production Deployments (public)
Opposite Ends of the Spectrum, One Common Substrate Solaris Zones Built-in / Hypervisors bare metal Linux CGroups
Opposite Ends of the Spectrum, One Common Substrate Request / Response Batch
• key services run in production: analytics, typeahead, ads • Twitter engineers rely on Mesos to build al new services • instead of thinking about static machines, engineers think about resources like CPU, memory and disk • al ows services to scale and leverage a shared pool of servers across datacenters efficiently • reduces the time between prototyping and launching
Case Study: Airbnb (fungible cloud infrastructure) “We think we might be pushing data science in the field of travel more so than anyone has ever done before… a smaller number of engineers can have higher impact through automation on Mesos." Mike Curtis, VP Engineering gigaom.com/2013/07/29/airbnb-is-engineering-itself-into-a-data... • improves resource management and efficiency • helps advance engineering strategy of building smal teams that can move fast • key to letting engineers make the most of AWS-based infrastructure beyond just Hadoop • al owed company to migrate off Elastic MapReduce • enables use of Hadoop along with Chronos, Spark, Storm, etc.
Case Study: eBay (continuous integration) eBay PaaS Team ebaytechblog.com/2014/04/04/delivering-ebays-ci- solution-with-apache-mesos-part-i/ • cluster management (PaaS core framework services) for CI • integration of: OpenStack, Jenkins, Zookeeper, Mesos, Marathon, Ansible In eBay’s existing CI model, each developer gets a personal CI/Jenkins Master instance. This Jenkins instance runs within a dedicated VM, and over time the result has been VM sprawl and poor resource utilization. We started looking at solutions to maximize our resource utilization and reduce the VM footprint while still preserving the individual CI instance model. After much deliberation, we chose Apache Mesos for a POC. This post shares the journey of how we approached this challenge and accomplished our goal.
Summary Question: Given the points about Part 3, Part 2, Part 1… Given the history from Church and Curry to BDAS and Twitter OSS… Given the needs, e.g., IoT preferably not boiling the oceans… Why do we stil see proto-legacy systems like Tez? Or, for that matter, why do we find notable experts stating that “Hadoop is an OS” ? It’s time to set the legacy of YHOO circa 2009 aside, to step up to contemporary challenges with better understanding of the underlying math and CS theory => solving business use cases at scale To paraphrase author William Gibson, the future is already here – it’s just not very evenly distributed, nor is it google-able