Code profiling gives a rich, detailed view of runtime performance. However, it's difficult to ach...
Code profiling gives a rich, detailed view of runtime performance. However, it's difficult to achieve in production: for even a small fraction of web requests, huge challenges in scalability, access, and ease of use appear. Despite this, Yelp profiles a nontrivial fraction of its traffic by combining Amazon EC2, Amazon EMR, and Amazon S3. Developers can search, sort, filter, and combine interesting profiles; during a site slowdown or page failure, this allows a fast diagnosis and speedy recovery. Some of our analyses run nightly, while others run in real-time via Storm topologies. This session includes our use cases for code profiling, its benefits, and the implementation of its handlers and analysis flows. We include both performance results and implementation challenges of our MapReduce and Storm jobs, including code overviews. We also touch on issues such as concurrent logging, cross-data center replication, job scheduling, and API definitions.