Cache strategies for web apps Glen Campbell @glenc
Yes, I picked the dullest title ever
“A web cache is a mechanism for the temporary storage (caching) of web documents, such as HTML pages and images, to reduce bandwidth usage, server load, and perceived lag. A web cache stores copies of documents passing through it; subsequent requests may be satisfied from the cache if certain conditions are met.” –Wikipedia
What is the most common type of web cache?
REST • Client-server • Stateless • Cacheable • Layered system • Code on demand (optional) • Uniform interface
Example: local HTTP/1.1 200 OK Date: Wed, 29 Oct 2014 15:04:20 GMT Server: Apache/2.2.15 (CentOS) Last-Modified: Wed, 29 Oct 2014 14:54:23 GMT Accept-Ranges: bytes Content-Length: 212 Cache-Control: max-age=31536000 Expires: Thu, 29 Oct 2015 15:04:20 GMT Vary: Accept-Encoding Connection: close Content-Type: text/html; charset=UTF-8
Example: hotel HTTP/1.0 200 OK Date: Wed, 29 Oct 2014 15:05:32 GMT Server: Apache/2.2.15 (CentOS) Last-Modified: Wed, 29 Oct 2014 14:54:23 GMT Accept-Ranges: bytes Content-Length: 212 Cache-Control: max-age=31536000 Expires: Thu, 29 Oct 2015 15:05:32 GMT Vary: Accept-Encoding Content-Type: text/html; charset=UTF-8 X-Cache: MISS from localhost X-Cache-Lookup: MISS from localhost:3128 Via: 1.1 localhost:3128 (squid/2.7.STABLE3) Connection: close
What changed? $ diff local hotel 1,2c1,2 < HTTP/1.1 200 OK < Date: Wed, 29 Oct 2014 15:04:20 GMT --- > HTTP/1.0 200 OK > Date: Wed, 29 Oct 2014 15:05:32 GMT 8c8 < Expires: Thu, 29 Oct 2015 15:04:20 GMT --- > Expires: Thu, 29 Oct 2015 15:05:32 GMT 10d9 < Connection: close 11a11,14 > X-Cache: MISS from localhost > X-Cache-Lookup: MISS from localhost:3128 > Via: 1.1 localhost:3128 (squid/2.7.STABLE3) > Connection: close
Adding headers in PHP void header ( string $string [, bool $replace = true [, int $http_response_code ]] )
Cache-Control: in PHP header(‘Cache-Control: no-cache’); header(‘Cache-Control: max-age=600’);
Expires: • Indicates when the resource is stale. • Specifies a date/time rather than delta seconds (Cache-Control: max-age=S) • Mostly used for compatibility with HTTP 1.0; Cache-Control: is more semantically rich.
Is data cacheable? • Highly cacheable data: news stories, blog posts, aggregated data such as ratings or reviews (“likes”). • Uncacheable: secure, private, personal data such as user login information, credit card info, etc. Data that must change rapidly—stock quotes, for example, or health monitoring systems.
Example 1. No cache Web Server Service
Example 2. Shared Cache Web Server Web Server Cache (Proxy) Service
Example 3. Distributed Cache Web Server Web Server Web Server Web Server ICP Cache (Proxy) Cache (Proxy) Service
Example 4. Local+Remote Cache Web Server Web Server Web Server Web Server (local cache) (local cache) (local cache) (local cache) ICP Cache (Proxy) Cache (Proxy) Service
Squid • Old, venerable; the reference implementation for the HTTP standard • Single-threaded • Can be tricky to configure (a multitude of options) but very high-performance • Implements ICP (Internet Cache Protocol) for distributed and hierarchical caches
Varnish • More modern implementation than Squid; relies on virtual memory and multi-threaded access • Easier to set up and configure than squid • Does not support ICP or cache hierarchies
nginx • reverse proxy and webserver - does not need a separate web server process • great for static content, according to users • uses asynchronous sockets; one process per core architecture
DIY caching • Tools let you build your own cache system. • Not transparent, but can build transparency. • Most are simple key/value stores • Requires writing code
DIY cache example • Object retrieval interface fetches data from service. • Internal methods query the data store (memcached, Redis) first and use stored data if possible. • If data is not in the cache, fetch it from the backend service and store it in the cache.
Upsides for DIY caching • Provides a very clean programmatic interface (transparent at the application level) • Can be tailored to specific solutions where you understand the data. • Often very high performance
Downsides to DIY caching • Requires code to be written, tested, etc. • Requires code maintenance if the underlying data model is changed. • Not standardized like HTTP for specifying age, freshness of data (i.e., not a generic solution, but a custom one)
How does a CDN work? • Primary site (www.example.com) serves the HTML page. • <script> <style> <img> etc. tags reference static content on the CDN • User’s browsers loads (and often stores) the static content locally, because it’s served with a Cache-Control: max-age=32767 header.