For those of you who landed here directly and missed my implausible intro, this isn’t a game. It’s something even better: an article about caching!
I’ll describe the techniques used on my sites jasonthorsness.com and hn.unlurker.com. The latter is open-source so you can peruse the implementation.
If you are looking to defend a meager budget with web-exposed side projects from an onslaught of web traffic you might find these approaches helpful.
And if you are part of the problem, a user among countless others in your horde, driven en masse in wave after wave against my defenses — read on to scout my setup and find out whether or not it’s up to the challenge.
The article progresses through three difficulty levels:
Difficulty Level | Description |
---|---|
Easy | Mostly-static sites |
Medium | Data-driven dynamic sites |
Hard | Authenticated per-user sites |
Static means the content is the same for every user and doesn’t change as a function of time. For example, due to my sluggish pace of writing, jasonthorsness.com remains unchanged for weeks in between updates. Even then, my older articles like this one don’t change unless I adjust a common layout. For this kind of site, applying content-hashed resources, using a CDN, and keeping dynamic bits client-side is the dominant industry combination.
To improve the handling of supporting resources (CSS, JS, images, etc.) a universal practice in
modern web frameworks is to add content hashes to the resource names. By deriving the name of the
file from its contents, you can consider a file with a given name unchanging. For example, on this
site, one of the fonts is at the time of writing served as
/_next/static/media/bd734242e06bd6ad-s.p.woff2
with the cache-control header of
public,max-age=31536000,immutable
. The bd734242e06bd6ad
part of the name is a hash of the file
contents. The CDN and the user’s browser will cache this file for as long as they want without
worrying about it becoming stale. If I ever change the font, the file will have a different name so
all caches will miss and fetch the new font from the origin.
If you open the network tab in your browser’s dev tools and refresh this page, you’ll see most resources are served in this manner and arrive in 0-1 ms from the memory or disk cache. This is simultaneously the least costly and lowest-latency sort of caching — nothing does any work except the user’s browser.
🏰 In Tower Defense lingo: content-hashed static files in the browser cache let you defeat waves of requests right at the spawn point.
Resources referenced from outside the site itself, like the path you see in the URL bar, can’t have
hashes appended because the links would be broken whenever the content changes. The server instead
delivers the response with an “Etag” header that identifies the current version of the content. When
the browser requests the resource again, it includes the last Etag value in an “If-None-Match”
header. Whenever the current ETag on the server matches the If-None-Match value from the browser,
the server is allowed to respond with 304 Not Modified
rather than the actual content.
If you open the network tab in your browser’s dev tools once more and refresh this page, you’ll see
/26
is served with a 304
response. You should also see fewer than 2 kiB transferred for the
entire page! If you see more than that, it’s likely browser extensions you have injecting stuff. Try
it again in Incognito Mode or with a guest profile.
A problem with 304 Not Modified
is that there’s still a round-trip to the server. But is it really
a problem? If you look carefully at that network tab, you might see /26
served to you in fewer
than ~60 milliseconds. The single origin server for this page is somewhere in the eastern US, but
the reported latency will stay low worldwide. This is because rather than serving from the origin,
the resources are cached and served from a network of delivery points around the globe, commonly
described as a CDN (content-delivery network). This reduces load on the origin and ensures snappy
performance for users all over the world. Low-latency world-wide is important if you respect your
users — even this irrelevant blog regularly gets traffic from USA, Europe, and Asia.
This site uses Vercel’s CDN, which has at the time of this writing 119 locations to serve content. Beyond static resources, you can also run code in these locations to implement custom low-latency functionality.
🏰 CDN locations are the towers in-between the spawn point and your base. When things are working correctly they take care of most of the waves.
Even a mostly-static site might have dynamic components. For example, beyond the city search edge function mentioned above, I have a VPS monitoring chart, some DIY analytics, some LLM silliness, and a page that requires users to sign in to compile code changes. All of these require additional dynamic content that varies by time or user input.
To ensure most content can remain optimized for static delivery, the dynamic parts are all handled via client-side JavaScript that makes separate API calls to dedicated dynamic API endpoints. This enables a clean separation of the static and dynamic parts of the site, and also helps prevent unnecessary triggering of dynamic functionality from crawlers and other bots.
🏰 Requests for dynamic resources typically must reach the origin. Fortunately in the Tower Defense metaphor the origin itself is not the end-goal of the creeps. They are after precious resources inside: CPU cycles and upstream APIs. Continue to the next difficulty level for some “inside the server” strategies for protection.
A data-driven site has most of its content change automatically over time. This blog is not such a site, so for this section we’ll look at my recent project hn.unlurker.com. Unlurker falls fully into the dynamic category: it always shows only the latest activity from Hacker News. The content becomes stale quickly so caching is a challenge. The Unlurker site uses two layers of caching:
Even if you have a data-driven dynamic site, often the content can be treated as static for a short time. For Unlurker, new comments and stories only appear a few times a minute, so I can apply the following cache-control header:
public, max-age=15, s-maxage=15, stale-while-revalidate=15
If you look at the headers from hn.unlurker.com you’ll only see
public, max-age=15
because the CDN strips the rest and handles them internally. To see the effect,
toggle options in the drop-downs back and forth every few seconds. You will see the latency stay at
< ~60 ms forever. The CDN does the expensive refresh from the origin in the background
asynchronously due to the stale-while-revalidate=15
. You can inspect the X-Vercel-Cache
header
to see whether the content was HIT
(fresh) STALE
(still used but triggered an asynchronous
refresh) or MISS
(fully stale and fetched synchronously from the origin). For me this corresponds
to a latency of 1-2ms for local cache, ~60 ms for hit or stale from the CDN, and likely ~800 ms for
a miss as it goes all the way to my poor VPS and likely the HN API.
stale-while-revalidate
is a relatively recent cache-control option. It keeps latency low for all
users; nobody “pays the price” for being the first to request after the cache expires. Control over
this header is why I couldn’t use NextJS for Unlurker — NextJS doesn’t seem to support it for
dynamic pages, and NextJS ISR has a major limitation compared to stale-while-revalidate
in that it
doesn’t support a maximum staleness. For low-traffic sites, users can see hours-old content, which
is unacceptable. I switched to react-router on Vercel which doesn’t mess
with the headers.
With the options available on hn.unlurker.com, there are only 10 * 12 * 8 * 2 or 1920 possible combinations, refreshed at most once every 15 seconds, so this technique caps the front-end request rate to 128 requests per second, regardless of the incoming request rate from user’s browsers.
The front-end function in this case applies no further caching and each request initiates a single fetch to the backend API for data.
🏰 Is the Tower Defense metaphor breaking down yet? Short-term cache control headers are like the browser-cache and CDN towers we’ve discussed so far, clearing nearly all the creeps, but they periodically spawn a creep of their own that heads for the origin. Could this be a novel gameplay mechanic? You read it here first.
The Unlurker backend runs on a shared 2 vCPU VPS. The program running there is the last chance to protect the CPU cycles and the upstream HN APIs. For dynamic sites like this one, caching and efficient request handling within the web server is just as important as leveraging browser caching and CDNs.
The Unlurker backend uses memory caching to protect the CPU cycles, then single instancing and disk caching to improve performance and protect the HN API.
Anything expensive to compute that might be requested multiple times is useful to keep in a memory cache. Unlurker maintains a cache of normalized comment text for each item. The comments and stories themselves are stored in a memory cache as well for 60 seconds. This makes the cost of a request with no cache misses just a few hash lookups plus the response serialization.
Upon a memory-cache miss, the backend needs to fetch the story or comment from the disk cache or maybe even the HN API. These are relatively expensive operations. To reduce the cost, requests for the same resource are combined into a single request. This is easy in Go, typically using the singleflight package but in this case (for good integration with the memory cache) using a custom implementation. No matter how many requests come in for the same item concurrently, only one check will be made against the disk cache and only one request made to the HN API. Unlurker’s overall load on the HN API is likely lost in the noise (especially if I’ve created load by convincing enough people to try downloading the whole thing)
Memory caching alone has a couple of issues: space in RAM is limited and process restarts clear the entire cache. To address these, Unlurker also keeps stories and comments in a disk cache in the form of a SQLite database. This is a bit slower than memory, but it effectively has no size limit and survives process restarts.
Rather than expire items after a fixed 60 seconds, the disk cache uses a “staleness” function based on the age of the story or comment. It slowly climbs from ~60 seconds for new items to ~30 minutes for items a few days old, then more rapidly increases until items more than a couple weeks old are considered immutable.
Requests for items are made in batches, so deriving the expiration from the creation time of items also helps eliminate clusters of expirations and spreads the requests made to the HN API out over time.
🏰 The memory cache, single-instancing, and disk caches are clusters of powerful towers right around the base taking care of almost all of the remaining creeps. If they are well-chosen, they can handle an incredible load, and the eventual number that pass to consume significant CPU and initiate requests to the HN API will be less than your health points. You’ve won!
I have a single VPS, so I can get by with a simple SQLite database. If I had many instances of my API on separate servers, I might at some point want to replace the disk cache with a Redis instance and might consider using Redis for cross-server single-instancing. But for my site (and probably most other sites) it’s way beyond what the situation requires.
Unfortunately for this article, I’ve only recently begun to add some authenticated features to my side projects. The article LLM not LLVM requires users to authenticate before they can use an LLM to “recompile” the examples on the page, but it’s solely a client-side function.
For per-user sites, the first step is always to identify and isolate the non-per-user pieces and serve them with the same techniques as for static and dynamic sites.
Beyond that, for the truly per-user pieces, caching at the edge becomes much more challenging — data is often too sensitive to cache in the CDN, and even if you could, it’s per-user anyway so cache hits are rare. The caching solution becomes a partnership between the user’s browser and the origin server, which understands the authentication scheme and can cache per-user responses using the same memory and disk and single-instancing schemes already mentioned.
Strategies that download data to the user’s browser and handle requests locally can help. This way many functions over slowly-changing data can be computed on the client, and only deltas need to be synced from the server.
There are many more approaches — maybe my next project should require authentication so I can explore this difficulty level a bit more.
Caching has always been critically important for site performance, and as more-and-more sites depend on metered APIs (like OpenAI’s LLMs) and serverless hosting providers (like Vercel) it’s become just as important for cost management. Get the cache architecture right and you’ll be surprised at how far you can stretch a tiny budget with only a few vCPU and a few gigabytes of RAM. Keep in mind, the sites of the past ran on servers with a miniscule fraction of the resources given to sites today, and in many cases they probably handled the load far better thanks for more careful caching and planning.
Thanks for reading! If you have any questions or comments, please reach out to me on X. If anyone cares to develop Tower Defense: Cache Control, go right ahead!