Cache-control is an important way by which developers can dictate how resources will be cached when a user browses the internet. Without cache-control, the browser caching and the resulting experience for the user will be sub-optimal.
What is Cache-Control?
When a user browses the internet, the communication follows what is called the Hyper Text Transfer Protocol (HTTP) format. This is a protocol that dictates the standard of communication on the internet. Since the release of HTTP/1.1 in 1997, there were few changes to the protool until HTTP/2 was released in 2015.
Both HTTP/1.1 and HTTP/2 include a number of elements intended to make caching work as well as possible. The user is the client, who sends a request to a web server (say in the form of a URL) and the web server responds (with a web page). HTTP headers are elements or parameters in this format that include additional information to make the HTTP transaction go smooth.
Cache-Control is a HTTP cache header that contains a set of parameters to define the browser’s caching policies in the client requests and server responses. When a client makes a request to the server, the browser can cache, or store copies of resources for faster access and lower latency. This means that when the browser has to receive these files again, it doesn’t need to make a request to the web server again. Cache-Control specifies when and how a response should be cached and for how long.
What is Browser Caching?
Browser caching is the process by which a web browser saves website resources in order to load them quickly during the next client request. You can see it in action when you load a web page with a background image for example. The first time you load the page, the image gets saved in your browser cache. The next time you visit the page, you will notice that the page loads faster and latency is reduced. This is because the browser is not requesting the image again from the web server. Instead it is loading the image from your local files.
The browser cache does not store the files for an indefinite period of time though. There is a set time frame, known as Time to Live (TTL) beyond which the cached resource will expire from the local files. If you load the page after the TTL has expired, the browser will have to place another request to the web server and receive a fresh copy of the resource. The TTL for each browser and server is specified in the HTTP headers.
HTTP headers are a set of parameters that contain additional information about the communication between a client and a server. The World Wide Web operates based on the Hypertext Transfer Protocol which outlines the syntax for all communications between clients and servers.
There are a number of headers for specifying various types of information in the client-server communications.
For requests, the header usually contains information on the resource being requested, the client’s browser and data formats that the client will accept. For responses, the information is usually about whether or not the request was successfully fulfilled and the language and format of any resources in the body.
Broadly speaking, HTTP caching headers can be categorized into:
These are HTTP caching headers which can be used for both request and response messages but doesn’t apply to the content of the message. Cache-Control is one such header. Others include Date, which specifies the date and time of the message, and Connection, which specifies if the network connection stays open after the transaction.
These are headers that are used in the HTTP request. They contain more information about the resource being fetched, or about the client making the request.
Examples include Accept, which advertises which content or media types to fetch and Cookie, which contains the stored HTTP cookies previously sent by the server.
Response headers include additional information about the HTTP response. Examples include Age, which specifies the time that the object has been in proxy cache, and Location, which indicates the URL to redirect a page to.
Unlike the others, entity headers contain information about the content and body of the message. They can be used in HTTP requests or HTTP response messages. Examples include the Content-Length which specifies the size of the entity-body in bytes, and Content-Language, which describes the language intended for the audience.
Cache-Control Headers Explained
Cache-control headers include information on everything to do with caching – how to cache, when to cache, when not to and more. They are essentially directives consisting of key-value pairs separate by a colon. The ‘key’ is what appears to the left of the colon and in this case is always “cache-control”. The value of the header appears on the right of the colon. For example, “cache-control: max-age” is one such directive.
Cache-control directives are considered request directives if they are used by the client in an HTTP request and response directives if they are used by the server in an HTTP response.
Here are some of the most common cache-control directives:
This directive tells caches that a resource is not available for reuse for subsequent requests to the same URL without checking if the origin server for the resource has changed. In other words, it is an instruction to the browser that it must revalidate with the server every time before using a cached version of the URL. This is useful to ensure that authentication is respected among other benefits. The no-cache directive uses the ETag header field for validation of the cached response by making a roundtrip to and from the server to ensure that the response has not changed. If there has been no change, no download is required.
no-store is similar to no-cache but simpler. With this directive, the HTTP response cannot be cached and re-used. Instead, the resource has to be requested and a full response is downloaded from the original server each time. This is especially relevant when dealing with private/personal information or banking data.
s-maxage is similar to the max-age directive but the “s” stands for shared as in shared cache. This is relevant to Content Delivery Networks (CDN) and other intermediary caches. It overrides the max-age directive and the expires header field when present.
When resources are stored in the cache server, intermediate proxies can sometimes make modifications to these assets. For example, they could change the format of images and files in order to save space and improve performance. This can cause problems if the asset is to remain identical to the original entity-body. The no-transform directive tells the intermediate caches or proxies not to make any such modifications. For example, they cannot edit the response body, Content-Encoding, Content-Range, or Content-Type.
Benefits of Using a CDN for Cache-Control
Caching can be thought of as moving resources closer to a local drive from a server for faster access and reduced latency. This same idea applies for Content Delivery Networks (CDN) which moves your website content to proxies for accelerated content distribution and bandwidth optimization. Proxy servers are intermediate servers which cache resources instead of storing them all on the end user or a website visitor’s local drives.
CDNs provide numerous benefits for Cache-Control.
- They simplify cache policy management
It can be overwhelming for web developers to manually tag file types, tweak and manage all the different cache headers. CDNs help them simplify cache policy management using user-friendly dashboards. Administrators can override cache header directives as and when needed and at a granular level to control specific files and file types.
- They augment browser caching with proxies
Browser caching by itself does the job of downloading a website’s resources to your local drive after your first visit. CDNs can accelerate the delivery of these locally stored resources using proxies.
This helps bring content closer to the site visitors and makes sure that a single cached copy is served to multiple visitors. It also allows for quick delivery of resources even to first-time visitors whose browsers may not have cached the site content yet.
- They can help automate caching using machine learning
Some of the more advanced CDNs are capable of automating cache control using machine learning (ML). ML algorithms can track content usage patterns and cache dynamically generate content and resources.
For example, a HTML file that has not changed much over time can be labelled static and classified as cacheable. It can be served directly from the CDN servers for faster page load and responsiveness. The algorithm can continue to track the status of the page and classify it as dynamic as soon as there is a change. This optimizes your storage and caching policies and improves content delivery speed.