Agenda Docker Journey with SalesforceIQ Lessons Learned PaaS/CaaS
Docker Journey with SalesforceIQ Two years in production...
What is production? Production != test dev Isolation, Security, Performance, Monitoring, Logging… Scale, templates, automation… What is successful? >99% uptime or low # of outages? Fast code deployment? 0 Security Incidents?
100% of our web infrastructure running with Docker Boom
SalesforceIQ journey into production 2013Q4 2014Q1 2014Q2 2014Q4 2015 2015+ Now Mesos Dev/ Integrations Ops Continuous Full Stack Batch Jobs CLI Deployment Container in Teamcity Azkaban Kafka DockerMe P Dev Web a Craft CMS Environment Zero Downtime a Main Website Deployments S Beanstalk Devenv 2.0
What we’ve put in containers Dev or Ops Web Server Environment Api Server Dependencies Database Batch Jobs CI/CD Server Integrations Rate of Change
What we’ve put in containers Dev or Ops Web Server Stateless Environment Api Server Database Batch Jobs CI/CD Server Integrations Stateful Long-Life Short-Life
Zoom in a little Create Deploy Run Operate Web Middleware / Integrations / Dev Internal Tools / Scripts / Jobs Environment Monitoring CI / CD Logging Batch & Stream processing Security Ops Environment Persistent Storage Fully Somewhat No Dockerized
Lessons Learned Alot...
Lots of tidbits ● Docker is prod ready but many surrounding ● Learn from others, Tons of people in production solutions are not (alpha and beta) now o Read the whole internet o Caution with the new toys is required ● You can secure running containers ● Don’t go straight towards a PaaS if you're just o Twistlock, Conjur, Banyanops starting out ● Get creative o Kubernetes, Mesos, CoreOS, Swarm, ECS o Docker is golden and mobile ● Keep it simple o Know what works and what doesn’t ● Old tools stil work great, and I’l show you how o Know how to scale what you're doing ● You're going to have to roll your own at some point (orchestration) o As of version 1.5.11, HAProxy does not support zero downtime restarts or reloads of configuration.
You can docker with Chef, Ansible, SaltStack... • You can use the tools you have today if you're not dockerized already • What… • But those are the tools i’m already using... •Yes they stil work and work great
Our current prod web server ● Worked with all our existing Hipache/Redis Container tools! ○ Chef, Monitoring, Logging ● Security didn’t change Web Container Web Container ○ Security keys v1 v2 ○ Firewall ● Super easy to scale ○ Could pack with Packer to Cron job to run shell script to orchestrate containers create AMI ○ Shell script was super easy ● Zero downtime Amazon AMI setup with Chef ● Rollbacks
Demo It’s time
#1 thing we found!!!!
You WILL have disk/file system issues...
File system... What we used overtime Volumes not unmounting 1. Started with AUFS - hit 42 layer limit Long deletion times on device mapper 2. Then moved to device mapper –storage-opt dm.blkdiscard=false a. Device/Volume not found Kernel version matters! b. NNOOOOOOOOOO Great visual deep dive 3. Back using AUFS again after bug fixes http://merrigrove.blogspot.com/2015/10/visuali and layer 42 limit removal zing-docker-containers-and-images.html?m=1 a. Continue to fight layer issues, mount issues 4. Back to device mapper with Docker 1.7 dynamic binaries! What we’ve landed on Ubuntu = AUFS Amazon Linux = Device mapper
Get a good registry Great options 1. We started private registry • Hub.docker.com a. went insane with buggy • Quay.io releases, failed pulls/pushes • Trusted registry 2. Went to quay.io • Google a. happy but slow, and costs $ • Azure $ • AWS 3. Back to private registry 0.9 • S3.. no registry… release… now stable save/load 4. Scaled it and working great 5. Now working on upgrading to Docker Registry 2.1
Scaling our registry • 100% AWS • Beanstalk ELB Beanstalk Elasticache -Autoscale -Redis Auto scaling Group ELB Docker web service • Redis Cache Docker Elasticache Registry Cache Had issues when a node failed • S3 Backend Storage Had huge issues on layer corruption -Unlimited -Cheap S3
Isolation is your friend Low container to host ratio • Compute Hipache/Redis Container Spikey Processing… no problem • Storage Out of disk… no problem • Networking Web Container Shared bandwidth… no problem Web Container v2 v1 • Ram Swapping issue… no problem • Security Groups Least privilege… no problem Cron job to run shell script to orchestrate containers Amazon AMI setup with Chef
Beanstalk architecture Beanstalk -Cloud formation • Run Over 50+ services on beanstalk today • Automagical y built web container DNS service discovery Load balancer per branch of code SSL Termination ELB • Corp site/Help site EC2 Server • 100% automated!! Autoscaling Container Isolation • Great for Web services SOA Security Groups Environment Variables • You wil have disk issues Storage Easy to spin up RDS
One year ago • CoreOS... so cool • Mesos… cool with scale • Beanstalk… with docker support • Swarm… beta • Deis… oooo saas • ECS… ok now we're getting somewhere • Kubernetes… where did that come from… looks cool too Now….. • Kubernetes on top of DCOS, on top of Mesos, on top of CoreOS… facepalm
PaaS/CaaS Overview Co DC Kube ECS re OS rnete OS s Orchestr ation Schedul er Resourc e Allocatio n Service Discover y More than Contain ers Health Check Storage clusterin g... Live Migratio n... Affinity rules...
Being successful with a PaaS/CaaS DNS DCOS Our DCOS Architecture SSL Termination ELB Built a edge router Mesos Master Mesos Public Slave Built a Brain router Auto Scaling Edge Edge Marathon Service Discovery Router Router Infra CLI Public <> Private DNS Health Check Can be Internal as wel This will run all of our API Change Event Mesos Private Slave stateless services Bus Auto Scaling Service Service Health Checks InfraIQ Intel igence Storage DB1 DB2 DB3
Summary • Starting out? Just use the same tools you have • You’l need to rol up your sleeves • Security is not hard but you need to think about it • Many vendors are entering container space • Build towards a PaaS • Many solutions to PaaS • Know what you're trying to solve • Have fun!
Thank you! John Fiedler @johnfiedler email@example.com