Documentation

Caching in Helix

Last modified

Helix intends to maximize site speed by using caching efficiently. Given that the serverless actions that constitute Helix’s backend have considerable higher latency than a traditional web server (but can scale much more easily), this is essential to creating a great experience.

Helix uses a number of techniques that optimize cache efficiency, but there are also scenarios where no cache will be available, and there are best practices for developers to increase the cache efficiency. Helix’s caching can also help to increase availability of your site when the underlying service (Adobe I/O Runtime) is unavailable. All of this will be explained in this guide.

Note: In practice, there are two caches that need consideration: the private browser cache and the shared CDN cache. This guide only looks at the shared CDN cache.

What gets cached by Helix

Helix caches:

  1. HTML and other dynamic content generated by helix-pipeline or other OpenWhisk actions
  2. Images and other media assets served from the content repository
  3. JavaScript, CSS, and other static content served from the code repository

How dynamic content gets cached

Dynamic content, e.g. everything that is rendered by a customer-defined OpenWhisk action is cached for one year or as long as the action specifies (see below how to change that).

The one year timeout does not expect that content will not be changed within a year, it simply assumes that the cache will get flushed proactively by Helix when a new version of the content is created.

This means, for most of the time, the CDN cache treats the key content served by Helix as static, except when there have been updates to either content or code that mandate an on-demand re-rendering.

How media assets get cached

Media assets get cached in the same way as dynamic content, albeit there is no way for a developer to enforce lower cache timeouts.

How static content gets cached

Static content, e.g. the images, CSS, fonts, and JavaScript files that create the client-side web application get cached in two ways, depending on the code reference in the helix-config.yaml.

When the code reference points to a branch name like master or develop, a cache timeout of 5 minutes will be applied. This means that your changes to the code base will be reflected across all visitors in a short time frame, but that there will be more invocations of the helix--static action that served static assets than there will be for the dynamic content actions.

When the code reference points to a commit SHA like 8c3c2ed845d3c85ebaf51d4e95f03859a9291d90, however, all static content will be treated as immutable, and a one year cache timeout will be applied.

In either way, all static content will be served through the same OpenWhisk action, so there will be a very low likelihood of a cold start and acceptable throughput even in cases where the content needs to get retrieved from origin.

When does the cache get emptied?

There are three main reasons why the cache gets emptied:

  1. You’ve asked for it
  2. You’ve asked for it, specifically
  3. The CDN had better things to do

You’ve asked for it

Whenever you publish a new version of your site, which means that the code that renders the site has changed, the static assets that make up the web application have changed, and potentially the content has changed as well, Helix will tell Fastly (on your behalf) to purge the entire cache.

This means, right after a site code update, the cache will be re-built as visitors come to your site and request resources that are no longer in the freshly purged cache. This will result in a lot of concurrent activations, but after a few minutes (or a few hours if you have a low-traffic site), everything worth caching will be back in cache.

You’ve asked for it, specifically

Note: this is not implemented yet, so for the time being, use the method described above.

When you do a content update to the site, the helix-bot that observes your content repository will tell Fastly to purge the cache for all pages that depend on the very files that just have been modified.

This means that all other pages on the site will stay cached.

The CDN had better things to do

Because the Fastly CDN is a shared cache, resources that are accessed infrequently will get evicted from the cache after a certain period of inactivity.

In our testing, we’ve found this “period of inactivity” that triggers the automatic cache clearing to be between one day and one week.

There are two things to note about this:

  1. This cache clearing is highly selective, so it will only affect the pages and resources of your site that see almost no traffic
  2. We use a two-level caching strategy in Fastly, which means that for every request there are two caches: one close to the edge (or the visitor) and one close to the origin (the Adobe I/O Runtime data centers). Because each edge node is seeing less traffic than the origin cache, things can get purged from the edge cache that still remain in the origin cache, which means they can get retrieved very quickly.

What happens when Adobe I/O Runtime is unavailable

Because more than 99% of all requests get handled by the CDN cache, availability of the underlying Adobe I/O Runtime is only a concern for the remaining 1% of requests.

So when Adobe I/O Runtime is unavailable, in the vast majority of cases, visitors will not notice anything, as the site will get served from the cache.

In cases where selective content updates have been made and select objects are cleared from the cache, Fastly will apply a health check to Adobe I/O Runtime and can deliver “stale” content, i.e. pages that have been marked outdated in the cache instead of serving an error page.

Note: just like selective purging in general, this has not yet been implemented.

What developers can do

Increase the cache timeout

As mentioned above, developers have control over the cache timeout, which means they can raise the default timeout in their pre.js like this:


module.exports.pre = function(context, action) {
    // increase cache timeout to two years
    Context.response.headers['Cache-Control'] = 's-max-age=63113904'
}

You can also lower the cache timeout, but as this negatively affects performance, it should only be done in cases where it has been advised by a Project Helix developer.

Use ESI to keep pages limited to one resource

When you have to combine multiple resources in a single rendered page, it is a good idea to split the page into two: one master page that depends on one resource, e.g. a Markdown document and a second fragment page that depends on another resource, e.g. an RSS feed.

Then the master page should use ESI to include the fragment page, so that each element can be cached independently from each other and the lifetime of each page can be bound to the lifetime of the underlying resource, e.g. the master page gets flushed when the Markdown content changes, but the fragment page gets flushed when the RSS feed changes.

Tell Helix what resources your page is using

By telling Helix what resources your page is using, you can enable more selective purging. To do so, use the context.content.sources[] array in your pre.js

module.exports.pre = function(context, action) {
    context.content.sources.push('https://example.com/rss.xml');
}

Note: you can already tell Helix about the sources, but selective purging is not yet implemented.

Note: the default pipeline is already setting the content.sources property to include the resource URL of the markdown file being rendered, so there is no need to add it.