Accelerating an Ad-Tech Service with OpenStack Cloud @OpenStack Summit 2015 Vancouver CyberAgent, Inc Makoto Hasegawa Ryo Tagami
Vancouver Tokyo We came from Tokyo !! Hello Vancouver !!
About CyberAgent What is Ad-Tech in CyberAgent What is required for infrastructure of Ad-Tech Agenda Why we choose OpenStack OpenStack in CyberAgent Big picture of our private cloud Deployment / Operation / Monitoring Future of our private cloud
Who are you ?
About us Makoto Hasegawa I am a cloud architect and leading the system admin team of CyberAgent Ad- Tech Business Division. I have been managing the OpenStack for about one year. Other than OpenStack, I also use cloud platforms such as aws and gcp. Recently I’m most interested in how I can automate the cloud management by using various different tools. @makocchi and … https://www.facebook.com/makocchi0923
About us Ryo Tagami I’m an Infrastructure Engineer for Ad-Tech Business Division at CyberAgent Inc., where I design and deploy OpenStack based private cloud which hosts multiple advertisement related services CyberAgent Inc. offers. Keeping it neat.
“To create the 21st century's leading company” CyberAgent is expanding its business in the field of Internet, a leading industry of the 21st Century.
About CyberAgent We provide a variety of services such as on-line games, on-line communities and Ad-Tech.
And we have been running our Ad-Tech services on our OpenStack Cloud. ! and more !
The infrastructure of Ad-Tech has to be … Flexible, Agile and Stable more Flexible ! There are many platforms for services exists, and the infrastructure under them have to work with all of them. more Agile ! The infrastructure has to be prepared quickly and deleted when they are not necessary any more. more Stable ! One small trouble and system down has huge impact. So we need flexible, agile and stable Cloud platform.
Why we choose OpenStack
There are 3 reasons why There was a strong momentum that OpenStack should be the 1 next mainstream of Cloud Management System when we are evaluating several options. OpenStack is opensource. CyberAgent has strong culture of 2 leveraging opensource technologies. Using OSS benefits us in terms of catching up new technologies and also cutting costs. I thought the technical skills and the motivation of our 3 engineers will improve by learning, deploying and operating OpenStack, which contains a lot of different technologies
OpenStack in CyberAgent
We will start testing so very soon 2015.04 Kilo 2015.03 We provided some Ad-Tech services on OpenStack Juno (100+ compute nodes / 3 engineers) 2014.10 Juno 2014.06 We provided some Ad-Tech services on OpenStack Icehouse 2014.04 Icehouse (40+ compute nodes / 3 engineers) 2013.10 Havana 2013.10 Ad-Tech Division started in CyberAgent We provided over 10 services on OpenStack Grizzly 2013.04 Grizzly (10+ compute nodes / 10 engineers) 2012.09 Folsom We used OpenStack only for PoC
Big picture of our private cloud
Big picture of our private cloud Upgrade or new deployment? Multiple OpenStack deployments. Why? Difficult to upgrade (safely). Deploy a new cluster, abandon the old one.
Big picture of our private cloud Codename: GAIA Codename: minerva Codename: nevera Codename: diana Icehouse Icehouse Juno Personal Development Production Production Codename: eiskeller Codename: galadeira Icehouse Icehouse Personal Development (Sandbox) Production (Sandbox) Codename: venus Codename: vesta Juno Kilo Personal Development (Future) Production (Future)
Big picture of our private cloud Number of Computes 200+ computes Number of CPU cores 5,000 cores / 10,000 threads Number of VM Instances 1,000+ instances Network Dual 10G from server to ToR Dual 40G from ToR to EoR Specs of diana
Big picture of our private cloud OpenStack components: • Keystone • Glance Ceph is backing: • Nova • Glance • Neutron • Swift Proxy • Swift Proxy • Cinder • Cinder • Heat • Ceilometer Specs of diana
Big picture of our private cloud • Redundancy • Hardware • RAID5 SSDs • Redundant power supply • Redundant network connection • Software • Load-balanced API endpoints • MariaDB Galera Cluster • RabbitMQ HA Cluster
Deployment Operation Monitoring
Deployment / Operation / Monitoring Tools? Ansible In-house playbook and roles May release them on github.com
Deployment / Operation / Monitoring Sending all logs to central log server With rsyslog For troubleshooting using grep
Deployment / Operation / Monitoring Basic Service Monitoring • Process Existence • TCP Port State API Monitoring • Response Time • High Level Function Testing Standard Monitoring • Hardware Health • OS Resources Templates and scripts may be released
Compromises In-house tool had to be developed Load balancer management Reason: Not using Neutron LBaaS at the moment Network design differs from reference implementation LBaaS driver still in development for Juno Solution: Manage load balancer outside Neutron (with some level of multi tenancy) Instance tagging & filtering Reason: High demand from users migrating from AWS Not implemented in Juno Solution: Use instance ‘metadata’ as tag field
Future of our private cloud
Bucket List Software Defined Network Get rid of conventional VLAN based networking, introduce IP-based overlay networking Experimenting “MidoNet” by Midokura.
Bucket List Software Defined Storage Decouple storage from compute End-to-End 40G Networking SDS, in memory KVS Ironic (Baremetal) Virtual machines does not cover every use case Kilo and Liberty New is always better.