@tw1tt3rart TW1TT3Rart ┈┈┈┈┈◢◤┈┈┈┈┈┈┈┈ ┈┈◢▇▇▇▇▇◣┈┈┈┈┈┈ ┈┈▇▇▇▇▇◤┈┈THANK┈┈ ┈┈▇▇▇▇▇┈┈┈┈YOU┈┈ ┈┈◥▇▇▇▇◣┈┈┈STEVE┈ ┈┈┈◥▇◤◥▇◤┈┈┈┈┈┈ #ThankYouSteve #TwitterArt 6 Oct via web Favorite Retweet Reply “Creativity comes from constraint” “Brevity is the soul of the wit”
What is the scale of Twitter?
500,000,000 Tweets / Day 3,500,000,000 Tweets / Week
3.5B Tweets / Week
≈ 6000+ Tweets / Second (steady state) However, there are peaks!
Open Source Craft (operating principles) Use Open Assume Open Define Secret Sauce Measure Everything Default to GitHub Default to Permissive Acquire and Open Pay it Forward
Use Open Use and benchmark open source software by default. When starting a new initiative, always evaluate open source options before going to reinvent the wheel. (e.g., if redis doesn’t work for you, you better have solid evidence)
Twitter Runs on Open Source
Define Secret Sauce Don’t open source anything that represents a core business value. Define your secret sauce so there’s a shared understanding that can guide decisions. Embed this secret sauce within your culture and company via training.
Secret Sauce, what is it? What’s yours?
If you know your secret sauce...
Assume Open Assume that what you are developing wil be opened in the future. Pretend the whole world will be watching. Use reasonable third party dependencies to prevent pain down the road. (we mostly use Apache’s Third Party Guidelines as a starting point)
Default to GitHub The GitHub community is the largest open source community, with over three mil ion users. You would be stupid to ignore that fact. Embrace social coding tools to lower the barrier to contribution and participation.
Foundations are Good* We just prefer not to default to them. We view them as a place for stable projects that grow into maturity, not to incubate new projects. Our goal is to gain traction first as fast as possible. If not, fail fast and carry on.
Default to Permissive
Be Permissive For outbound open source software, we default to OSI permissive licenses (the ALv2 in the majority of cases). We do this so we can maximize adoption and participation, which we favor instead of control.
Notes from Antirez (BSD) “First of all, open source for me is not a way to contribute to the free software movement, but to contribute to humanity. This means a lot of things, for instance I don't care about what people do with my code, nor if they'll release back their modifications. I simply want people to use my code in one way or the other. Especially I want people to have fun, learn new stuff, and make money with my code. For me other people making money out of something I wrote is not something that I lost, it is something that I gained.” See http://antirez.com/news/48
Acquire and Open* Include open sourcing software in M&A discussions, especial y if you’re mainly acquiring talent or shelving the product. There’s no need for software to go to waste.
Measure Everything If you can’t measure what you’re doing, you have no idea what you’re doing. We measure everything inside of Twitter (affectionately called birdbrain) and make it accessible to everyone.
Pay it Forward Support open source organizations and projects important to your business, it’s the right and smart thing to do. This can be financial y or simply staffing projects that are strategic to you.
Open Source Craft* Use Open Assume Open Define Secret Sauce Measure Everything Default to GitHub Default to Permissive Acquire and Open Pay it Forward Note: This fits in a tweet
Twistory Evolving the Twitter Stack
2006: A simple idea...
Routing Presentation Logic Storage MySQL Monorail (Ruby on Rails)
2008: Growing Pains
Routing Presentation Logic Storage MySQL Monorail (Ruby on Rails) Tweet Store Flock Cache Memcache Redis
What was wrong? Fragile monolithic Rails code base: managing raw database and memcache connections to rendering the site and presenting the public APIs Throwing machines at the problem: instead of engineering solutions Trapped in an optimization corner: trade of readability and flexibility for performance
Whale Hunting Expeditions We organized archeology digs and whale hunting expeditions to understand large scale failures
Re-envision the system? We wanted big infra wins: in performance, reliability and efficiency (reduce machines to run Twitter by 10x) Failure is inevitable in distributed systems: we wanted to isolate failures across our infrastructure Cleaner boundaries with related logic in one place: desire for a loosely coupled services oriented model at the systems level
Ruby VM Reflection Started to evaluate our front end server tier: CPU, RAM and network Rails machines were being pushed to the limit: CPU and RAM maxed but not network (200-300 requests/host) Twitter’s usage was growing: it was going to take a lot of machines to keep up with the growth curve
The JVM Solution Level of trust with the JVM with previous experience JVM is a mature and world class platform Huge mature ecosystem of libraries Polyglot possibilities (Java, Scala, Clojure, etc)
Decomposing the Monolith Created services based on our core nouns: Tweet service User service Timeline service DM service Social Graph service ....
Routing Presentation Logic Storage HTTP THRIFT THRIFT* MySQL Monorail Tweet Store API Tweet Service Flock Web User Service TFE User Store (reverse proxy) Timeline Search Service Cache SocialGraph Feature X Service Memcached Feature Y DM Service Redis
Twitter Stack A peak at some of our technology Finagle, Zipkin, Scalding and Mesos
Services: Concurrency is Hard Decomposing the monolith: each team took slightly different approaches to concurrency Different failure semantics across teams: no consistent back pressure mechanism Failure domains informed us of the importance of having a unified client/server library: deal with failure strategies and load balancing
Tracing with Zipkin Zipkin hooks into the transmission logic of Finagle and times each service operation; gives you a visual representation where most of the time to fulfil a request went. https://github.com/twitter/zipkin
Hadoop with Scalding Services receive a ton of traffic and generate a ton of use log and debugging entries. @Scalding is a open source Scala library that makes it easy to specify MapReduce jobs with the benefits of functional programming! https://github.com/twitter/scalding
Mesos, Linux and cgroups Apache Mesos: kernel of the data center obviates the need for virtual machines* isolation via Linux cgroups (CPU, RAM, network, FS) reshape clusters dynamical y based on resources multiple frameworks; scalability to 10,000s of nodes
Data Center Computing Reduce CapEx/OpEx via efficient utilization of HW http://mesos.apache.org 33% reduces CapEx and OpEx! 100% 0% 33% 75% 50% 0% 25% 33% 0% reduces latency! 0%
How did it all turn out? Not bad... not bad at all... Where did the fail whale go?
Site Success Rate Today :) 100% Off the monorail not a lot of traffic World Cup 99._% 2006 2010 2014
Performance Today :)
Growth Continues Today... 2500+ Employees Worldwide 50% Employees are Engineers 255M+ Active Users 500M+ Tweets per Day 35+ Languages Supported 76% Active Users are on Mobile 100+ Open Source Projects
Concluding Thoughts Lessons Learned
Lesson #1 Embrace open source best of breed solutions are open these days learn from your peers code and university research don’t only consume, give back to enrich ecosystem: http://opensource.twitter.com
Lesson #2 Incremental change always wins increase chance of success by making small changes small changes add up with minimized risk loosely coupled micro services work