Enterprise Architecture with Ruby (and Rails) Building amazing products, companies and technology using Ruby on Rails and friends. An opinionated overview for MagmaRails.MX by Konstantin Gredeskoul, CTO, Wanelo, Inctwitter: @kig, github.com/kigster
My Background CTO @ Wanelo.com — “Pinterest for shopping” Principal @ ModCloth.com — is one of the largest independent e-commerce Rails sites Principal @ Blurb.com — print-on-demand bookstore, and a large e-commerce web site Professionally building enterprise software since 1995 Converted from Java/Perl/C to ruby in 2006
What is Enterprise? It’s an organization with many people, services, technologies Enterprise architecture is an ongoing business function that helps an 'enterprise' figure out how to best execute the strategies that drive its development [ref: wikipedia]
From Start-Up To Enterprise Many modern enterprises started small, as tiny start-ups Many start-ups choose RoR for productivity As the start-up grows, so does the technology, applications, and the stack.
Teams using RoR can be very productive Productivity is super important for unproven young companies trying things out “Build quickly, iterate, avoid building features users don’t need” — Lean Start-Up Movement Do not optimize “prematurely”, but think about tomorrow’s scalability when building today.
Productivity vs Scale: The Dilemma! To move fast - we use Ruby (dynamic languages), a framework (Rails), cloud, a familiar database, and keep the team small To truly scale an application - need multiple languages (Java, C/C++, Scala), custom or no frameworks, datacenter, large team
But does everyone need mega scale? Majority of Rails projects are OK without mega-scale(only a tiny fraction is like Twitter or Facebook) Ruby/Rails can happily grow into a large applications without major rewrites Best assurance that an application wil grow well with it’s use, is to follow best practices.
So what is this talk about? How to start smallBut move fast How to evolve a Rails appBut keep it scalable How to split things upWhen the app gets large, and keep everyone sane
Part 1: How to start small, but move fast
Get a great team together Keep team size small, 4-6 developers is ideal Have at least 2-3 ruby/rails/front-end experts on the team Do automated testing (and TDD) from the beginning. Hard to add later.
Process matters Paired Programming is amazing. Level the field, transfer knowledge, build trust within the team, move faster Morning stand-ups, weekly sprint planners, technical discussions as needed, retrospectives Dedicated graphic designer/UXR, and a Product Manager
Everyday tools matter RubyMine IDE is very powerful, but $69Other tools also work, VIM, TextMate When pairing, using consistent toolset is very important. Pick it and stick to it. If everyone has their own laptop, create a common OS account and use it to pair
Communication is key Continuous Integration server runs all automated tests (Jenkins is great!) Everyone knows when tests break! Pivotal CI Monitor open source app pulls from Jenkins
Communication is key Use Chat (eg, Campfire) to notify team about check-ins, deploys or failed builds Review other’s commits (ie, on GitHub) to learn as much code as possible Take care of your team mates, and do worry about the project. Success depends on it.
A few more awesome tools* iTerm2 - free mega awesome Terminal replacement (Cmd-D/Cmd-Shift-D) SizeUp - align windows on the screen right/left/up/down/middle. iStat Menus - view CPU, Network IO, Disk in Mac OS-X Toolbar CCMenu - view results of CI in your toolbar
Choice of libraries matters MiniTest, Jasmine, Capybara (RackTest + Selenium) for testing Devise for authentication, user mgmt Twitter Bootstrap for early UI is amazingalthough we prefer SCSS instead of LESS (HAML for views, RABL for APIs
Data matters the most Relational Databases: PostgreSQL, MySQL High consistency, reliability, decades of research, great performance, gets tricky at mega-scale BigTable based: MongoDB, HBase Eventual consistency, recent, have indexes, almost table-like. Also tricky at mega scale. Amazon Dynamo like: RIAK, Voldemort Distributed hash-table, tricky from the very beginning.
What to choose? Without a strong reason otherwise, choose a relational database. I prefer PostgreSQL. Instagram scaled on PostgreSQL very well If under pressure and in doubt, it’s OK to choose whatever you are familiar with.
Part 2: How to evolve a Rails App
New Rails Project: Day 1 rails new my-awesome-app cd my-awesome-app rake db:migrate ruby 1.9.3-p125 rails 3.2.3 macbook air 1.8Ghz
1. Starting Up One app server, one db,10 unicorns per app server nginx for static assets PostgreSQL for data Always put your DB on a separate server Cloud
1. Starting Up Simple, but no app server redundancy, limited throughput 10 unicorns = 10 concurrent requests at any one time
2. Growing Up Split into multiple App Servers HAProxy to distribute load nginx for static files found on local file system, proxy requests otherwise
2. Growing Up Site usage grows. Responses get slow. Started at 150ms, then 400ms, then 700ms....
3. Scaling Up Add MemCached (1Gb+) Use Redis (or cookies) for sessions(reduce db load) Add action caching(even short TTL helps, i.e. 1min) Use AJAX to personalize pagesto make them cacheable*
Personalization with AJAX - A brief de-tour Logged in (or not) user requests a page... Page is served from the cache without any personalization (no “Hi John!”, “Logout”, etc) on document.ready: AJAX hits the server, gets tiny JSON data of the current user (or “not logged in”) JS modifies the DOM to show user’s logged in state, any other personalization, or “Log In”.
Personalization with AJAX - How?
Personalization with AJAX - Where?
Personalization with AJAX - Why? Because entire page can be served from the cache (often 50Kb+ per request) No ActiveRecord and no rendering makes it really fast! Recent rough test using Rails 3.2.3, ruby 1.9.3-p194, memcached: 4ms latency!!!
Why not page caching? Because unlike action caching, page caching is file-system based. Because it’s more difficult to expire Because it’s more difficult to share across many servers
4. Scaling Images We are serving lots of images. Nginx is getting slammed. Should we add more balancers? Write our own? HELLZ NO!
4. Scaling Images Don’t wait to use a CDN to SERVE images, especial y user-uploaded images. S3 is a popular choice to STORE images. But do keep a local backup =) But it’s smart to keep a local backup copy... But do keep a local backup =)
5. Deployments and Downtime Our site is popular! And our users hate downtime. They really really do.
5. Deployments and Downtime We want to be able to deploy the code while the site is running. So users are happy. There are several ways to do that. This solution uses DNS round robin with two balancers, and two public IP addresses.
Two Cluster Solution = Almost Zero Downtime
Temporary Redirect Rule
Temporary Redirect Rule
Two Clusters are cool! Cluster 1 runs old code and is live Cluster 2 gets new code Old and new run in parallel, but only one is serving live traffic
Migrations with Zero Downtime? Almost possible on a live system, if: We are not removing or renaming columns or tables in active use Migrations do not lock tables (for too long) Column/Table renames/deletes can be done in two deployments instead of one without downtime
So the app is now faster, and we can deploy without a downtime What about email and other long-running tasks? Don’t forget SPF records.
Background Jobs with Resque But monitor it’s queues Must restart on reboot resque-cleaner is awesome!
Different queues for different types of jobs Relatively easy to implement priorities for Jobs(order queues by priority) Group Jobs by execution times to avoid delays
DB Usage and complexity grows. We are doing big joins with many tables, and they are taking their sweet time.
Solr to the Resque Use Solr instead of doing complex joins Solr reads are < 10ms Sunspot Gem by default writes to Solr from each ruby VM (i.e. unicorn)! Serialize writes with Resque! One master for writes Read replicas on each app server
Putting it together
At this size... Automate everythingChef or Puppet is awesome Monitor everythingTolerate reboots, restarts, partial failures Use OS services layer to start/stop everythingEnsures recovery after reboot Capistrano tends to gets “complex”Can also deploy with Chef
Choose Vendors Wisely You can pick your own, but here is my list: Clouds - JOYENT, EngineYardfastest I/O cloud, but on Solaris derivative Automation - Chef + OPSCODE Caching/CDN - FASTLY.COMvarnish based CDN, very fast, full power of VCL configuration Metrics and Performance - NewRelicTurnkey solution, getting better every day
In Development Use foreman to start dependent services (Solr, Redis, Resque) from a Procfile Do “Just enough” testing with Solr - it’s slow! Deploy to single-box demo servers often using the same Capistrano scripts used for production
Mature App: Day 700 cd my-awesome-app rake rspec:models rspec:controllers rspec:libs rake cucumber:webrat cucumber:selenium jasmine
How Big Exactly? 200+ models 200K+ lines of RUBY source code without gems 100K+ lines of ERB, HTML and HAML templates 100+ gem dependencies this is a real world application that’s in production today.
Is that too big? This cat’s name is Lenin
Here is why I think it is. 1.5+ hours for the full the test suite to complete10 mins of db seeds, 30 minutes for unit tests only, etc merges often result in integration tests going RED 20 seconds boot-up time for Rails env (r console, etc)! 500Mb of RSS RAM for one single-threaded web process it’s a difficult undertaking to upgrade dependencies and rails
It was much nicer when it was a bit smaller..
Let’s zoom in... Is PERFORMANCE of the app an issue? NO! 150ms per request avg Is SCALABILITY of the app an issue? NO! 8000+ concurrent users Is RELIABILITY of the app an issue? NO! barely any downtime in over one year
Then WTF is the Problem? Is PRODUCTIVITY of developing the app an issue? YES! Lots of waiting all the time Is MERGING source code between parallel projects difficult?YES! 30+ people sharing large codebase Is KEEPING TEST SUIT GREEN challenging? YES! tests are brittle and long running
But wait, there’s more! What about DEPLOYMENT of a large app? Takes a long time, and small tweaks require full deploys What about HOSTING COSTS?Necessary to provide enough RAM for the app to be scalable.
RAM? Latency matters... 1 Request = 200ms on average latency 5 reqs/second on a single-threaded ruby VM process 30,000 RPM = 500 r/sec = 100 processes 50Gb of RAM @ 200ms latency If average latency is 600ms, need 150Gb of RAM !!!
Smaller is actually better.
So how do we solve this?
Part 3: How to split things up
Couple of main themes Break up into smaller applications (vertical) oExtract services and create APIs Extract libraries (gems)
Smaller Applications Contain web GUI, logic, and data May combine with other apps May rely on common libraries May rely on services Typically run in their own Ruby VM
Consider a Typical E- Commerce Store Users must be able to register, login, logout (profiles) Users must be able to browse and search products, view, and add to cart Users must be able to checkout Probably many other stories, such as admin, but we’l ignore for now.
One idea... Application 1:Marketing, Product Catalog Browser, Search + Product Detail Page Application 2:Checkout, Payment, Order History, ReturnsFulfillment Very clear user flow transfer and data separation.
Some things can be shared Service: Single Sign-on, User profiles, Login/Registration[devise?, rest-ful authentication?] Service:Product Catalog data, Inventory Data Service:Comments, Votes, Ratings, Reviews
Services Technologies Rack/Sinatra/Rails are popular, and are often an entirely sufficient choice Goliath is awesome if performance is important, and if the service is mostly I/O bound node.js is also a popular choice Implementation may change in the future, as long as the API stays consistent
Extract look and feel (CSS/UI) into a gem to share across apps Create client API wrapper gems for consumers Create a single shared “base” client gem library
Rails App with < 30 Models Can run tests pretty quickly, hopefully under 5 minutes ugh to describe typical “clusters of functionality”, i.e. - m I i s oni f app ten lasrge enough to describe typical “clusters of Ru fu by VM nctionalmight st ity”, i.e. a - y u mi nd ni er 10 apps 0Mb of RSS RAM Is more Ru compre by VM mi he ght st nsib ay u le ndand ca er 10 n b 0Mb e ef of fRectively SS RAM Ruby VM might actually stay under 100Mb of RSS mai I ntained s more by a compre sma he ll nsibdev le a team. nd can be effectively maintai RAM ned by a small dev team. Is more comprehensible and can be effectively mai I ntained s more by a compre sma he ll nsibdev le a team. nd can be effectively maintained by a small dev team.
More Apps? Application 3:Administrative Interface, Catalog Management,Customer Support Application 4:System Status, Monitoring, Deployment[client of product service] Application 5:Analytics, Predictions, Models, Intelligence, DW[client of product service]
3rd Party Integrations
Ecosystem of Applications Is inevitable in large companies Scale better from team perspective Offer decoupling and implementation hiding Can be individually optimized and scaled
But then...Must every app know about every other app?
API Proxy / Router
Example: Order Placed Warehouse Management System needs to be updated Analytics Engine needs to be notified Financials needs to be updated Question: which component is responsible for updating each application?
Part 3: Event Driven Architectures
1995 Was Great GoF Design Patterns: Observer“...One-to-Many dependency between objects so that when one object changes state, all its dependents are notified and updated automatically...”
Distributed Version Publish/Subscribe and Point-to-Point Asynchronous Middleware
RabbitMQ is Great
Some Options for Pub/Sub RabbitMQ ruby-amqp gem to interface EventMachine::Channel
Other Distributed Options DRb - distributed Ruby (also Rinda, Starfish, beanstalkd, etc) DCell - actor based based on 0MQhttp://www.unlimitednovelty.com/201 2/04/introducing-dcell-actor-based.html All of them are a bit too low level for sharing and consuming business events
I would love a library for publishing business events built on top
Future Library Hides complexities of queues and exchanges Consumers declare interest in events they care about, define persistence and retry policy Publishers fire! events and forget about it
Future Library, ctd. Once registered, consumers get messages even after being offline When publisher can submit an event to the queue, it’s job is done. Library of business events becomes a compliment to the set of business APIs
I don’t think this library exists yet, but I would like to write one soon =)