Today's topic • What is a job queue? • Introduction to Barbeque: Cookpad's new job queue system • Why did we need a new job queue system? • How is Barbeque designed?
What is a job queue? What is a job queue? Introduction to Barbeque Why did we need a new job queue system? How is Barbeque designed?
A job queue is... • In the context of a web application, it's a system used to perform tasks asynchronously • i.e. perform slow tasks later instead of during request to app server
A job queue is... • Without a job queue, users have to wait until app server completes a slow task. user app request slow task response
A job queue is... • With a job queue, users don't have to wait for the slow task because it is performed asynchronously. user app queue worker request enqueue a task response dequeue a task slow task
How to use a job queue system in Ruby • Deﬁne a job using rubygem for a job queue system • resque, sidekiq, ... • Call method to enqueue a job • Monitor tasks and workers with a web console
Deﬁne a job (example of resque.gem) class TestJob @queue = :default def self.perform(user_id) # do something end end Resque.enqueue(TestJob, 100)
Methods to enqueue a job • Use a method in the gem for a job queue system • Resque.enqueue • Sidekiq::Worker.perform_async • Delayed::Job.enqueue • Shoryuken::Worker.perform_async • In Rails, you can use an abstraction layer • ActiveJob::Enqueuing.perform_later
Components in barbeque.gem • Worker • Built with Serverengine • Execute a job using Docker • Slack notiﬁcation • Queueing interface (Web API) • Web console • Manage registered applications and notiﬁcations • You can see a log stored in S3
Queueing API • POST /v2/job_executions • Enqueue a message to SQS • Parameters • application: string • job: string • queue: string • message: any (object, array, string, ...)
Why didn't we use a job queue so frequently? • We have "kuroko2", which is an useful scheduled batch system • Feature-rich web console • Support job execution with Docker • Many developers love kuroko2 and use it to perform a task asynchronously • But users have to wait the scheduled batch execution... • We should use a job queue system for users' convenience
kuroko2 (It'll be open-sourced a little later)
What did we want for Cookpad's job queue system? • Centralized management of workers • We have so many applications and don't want to manage infrastructure for each application • Easy job deployment • kuroko2-like web console
How to deploy a job? • We're already using Docker to deploy most of our applications • 80+ deployment conﬁgurations • We use ECS to run Docker containers in a cluster • ECS: Amazon EC2 Container Service • Scale-in/out can be done by adjusting desired capacity of AutoScaling Group
How is Barbeque designed? What is a job queue? Introduction to Barbeque Why did we need a new job queue system? How is Barbeque designed?
Data store for a queue: Amazon SQS • Managed message queue • Pros • Fast, scalable and reliable • Easy to integrate with other AWS components like Amazon SNS • Cons • May deliver duplicate messages (QoS is at-least-once) • Message delay duration is limited upto 900s
QoS: at-least-once, at-most-once or exactly-once? • Quality of Service is decided by a data store for a queue and implementation of a worker • 3 QoS: • at-most-once: Unreliable but no duplication • at-least-once: Reliable but duplicate delivery occurs • exactly-once: Most reliable but hard to make a system scalable
Why did we choose Amazon SQS? • It should be scalable and reliable because it's used by all applications • Duplicate delivery is allowed in our most use cases
Worker to perform a job • Barbeque worker is implemented with Ruby • Since the job performed on a job queue system tends to be a slow task, we can ignore the overhead in a worker layer • To perform a job, Barbeque executes a command with some special environment variables using Docker • They include a serialized message and a job name
Job execution on ECS • Barbeque worker uses "hako" to perform a job, which is a deploy tool for Docker • hako: https://github.com/eagletmt/hako • We're already using this to deploy many applications to ECS • While hako is open-sourced, hako adapter for Barbeque is still closed-source 😇
Barbeque worker workﬂow request app
Barbeque worker workﬂow request enqueue app barbeque API
Barbeque worker workﬂow request RDS (MySQL) enqueue app barbeque API store status enqueue barbeque worker dequeue Amazon SQS
Barbeque worker workﬂow request RDS (MySQL) enqueue app barbeque API store status enqueue barbeque worker app dequeue execute a command Amazon SQS ECS batch cluster
Barbeque worker workﬂow request RDS (MySQL) Amazon S3 enqueue app barbeque API store stdout, stderr store status enqueue barbeque worker app dequeue execute a command Amazon SQS ECS batch cluster
Auto scaling • Using AutoScaling group to scale an ECS cluster • Scale-in/out are run depending on resource usage of ECS cluster • Since a job is one task on ECS, we can estimate a precise resource to execute jobs • Resource to run an ECS task is deﬁned beforehand • We can reduce cost when jobs are not performed frequently!
Conclusion • We designed a new job queue system to: • Centralize management of job queue systems with Docker • Automate scaling by ECS and AutoScaling Group • Queueing API, web console and core of worker is open-sourced • https://github.com/cookpad/barbeque