[at scale] OpenStack Benchmarking Boris Pavlovic Mirantis, 2013
Agenda ● Benchmarking OpenStack at scale ○ What? Why? How? ● Rally ○ What is Rally? ○ Vision ○ Examples and results
Benchmarking OpenStack ● How to ensure that OpenStack works at scale? ● How to detect performance issues quickly and improve OpenStack scalability?
A straightforward way to benchmark OpenStack ● Generate load from concurrent users ● Capture key metrics--avg/max time, failure rate ○ VM provisioning ○ Floating IP allocation ○ Snapshot creation ● Verify that the cloud works fine ... ● PROFIT!!!
A straightforward way to benchmark OpenStack ● Generate load from concurrent users ● Capture key metrics--avg/max time, failure rate ○ VM provisioning ○ Floating IP allocation ○ Snapshot creation ● Verify that the cloud works fine ... ● PROFIT!!! … but what if it breaks apart?
Incorrect deployment setup?
Bug in the code?
RTFM Did you take enough time to educate yourself? ;)
Read the docs… (after an hour)
There should be an
Improve OS cloud performance and scalability ● 3 common approaches: ○ Use better hardware ○ Deploy better ○ Make the code better
Improve OS cloud performance and scalability ● 3 common approaches: ○ Use better hardware ○ Deploy better ○ Make the code better ● But we need to know data points ○ Which part of the code is a bottleneck? ○ What hardware limits are hit, if any? ○ How deployment topology influences performance?
Shine a light in the darkness RALLY
What is Rally? ● Rally is a community-based project that allows OpenStack developers and operators to get relevant and repeatable benchmarking data of how their cloud operates at scale. ● Wiki https://wiki.openstack.org/wiki/Rally
Relevant to both devs and operators ● Different types of user-defined workloads ○ For developers: synthetic tests, stress tests ○ For operators: real-life cloud usage patterns ● Flexible reporting ○ For developers: low-level profiling data, bottlenecks ○ For operators: high-level data about cloud performance, highlights of bottlenecks within their use case
How Rally works RALLY Deploy Run OpenStack specified Get results cloud scenarios Deploy engines Server Providers Parameters Get results ● Number of ● Execution DevStack Virsh users time ● Number of breakdown OpenStack tenants ● Failure rates Fuel ● Concurrency ● Graphics ● Type of ● Profiling data LXC workload ● Duration Dummy Amazon … …
Benchmarking scenarios Data for Developers - Low-level profiling Synthetic workloads - Tomograph results - Graphs Workload 1 OpenStack cloud Results Workload 2 Workload 3 Data for Stakeholders - Historical data Real-life workloads - SLAs - Bottlenecks
Synthetic tests for developers ● Put stress test on various OpenStack components ○ Large number of provisioned VMs per second ○ Large number of provisioned volumes per second ○ Large number of uploaded images per second ○ Large amount of active resources (VMs/images/volumes) ● Expose bottlenecks and uncover design issues in OpenStack ● Create a golden standard for everyone in the community to validate against
How did we deploy OpenStack? ● Using Fuel ● On real hardware ● 3 physical controllers ● 500+ physical compute nodes ● In HA deployment mode with Galera, HAProxy, Corosync, Pacemaker
Large number of active VMs Large numbers of active VMs shouldn’t affect provision of new VMs
Large number of concurrent users Average time of booting and deleting VMs with different numbers of concurrent users
Profiling with Tomograph and Zipkin Highlights: ● Launch 3 VMs ○ 336 DB queries ○ 74 RPC calls ● Delete 3 VMs under high load ○ 1 minute global DB lock on quotas table
Why real workloads in addition to synthetic? ● Rationale ○ In the real world, scenarios are more complicated, than “boot-destroy” immediately ○ Workloads rarely change--OpenStack and its topology/configuration change often ○ Profiles are specific for businesses ● Expected outcome ○ Let companies specify their existing workload and benchmark cloud according to this workload ○ Let companies share
What to benchmark Provision VMs Use VMs Destroy VMs How long (on average)? How long (on average)? 1. How long (maximum)? How long (maximum)? 2. Success rate? Success rate? 3.
Another workload representation What it shows ● Areas of biggest concern ● A baseline for all future changes (OpenStack version, deployment topology, Neutron plugin)
What we ultimately want to achieve ● Provide a mechanism to easily define workloads ● Let users benchmark their cloud within specified workload ● Provide historical data on all applied optimizations to see if they are heading to better performance
Roadmap ● Greatly improve profiling capabilities to quickly pinpoint problem location ● Extend workload definitions to support richer and more realistic tests, combine workloads ● Support historical data and provide means of comparison/analytics ● Better correlation between business KPIs and reporting