Surprise! Your cache doesn't.

A few days ago, I wrote a post about a peculiar piece of code that a friend of mine had sent me. Since it was interesting bit of code, I thought Hacker News would enjoy it, so I posted it there. To my great pleasure, the post shot up to the first place in a few minutes and continued there for a full day, bringing just over 50,000 visitors to this blog, in total.

I was very happy that people were liking and discussing this post (and the discussion was very interesting in its own right), but I noticed that AppEngine, where this blog is hosted, was struggling to serve it. I had to create new instances because the average latency was about ten seconds(!), even though this blog is pretty much only text and static media, and I use Django’s per-site cache to cache every single page.

Ten seconds per request made no sense when all you’re doing is fetching the page from memcached, but that’s what I was seeing. What’s worse, the site loaded moderately fast for me, so I had no idea what might be going on. After reading the Django documentation, I was none the wiser, everything should be working properly.

However, after looking up some settings, I came upon this gem:

CacheMiddleware used to provide a way to cache requests only if they weren’t made by a logged-in user. This mechanism was largely ineffective because the middleware correctly takes into account the Vary: Cookie HTTP header, and this header is being set on a variety of occasions.

Which means that, whenever a user had a cookie (such as from Google Analytics, which I use), the pages weren’t fetched from the cache, effectively disabling Django’s caching middleware! You can imagine what a disaster that was. I found and used a snippet to remove Google Analytics cookies before considering whether to fetch from the cache, and my request times dropped from 10 sec to 5 ms (that’s almost two thousand times faster).

The entire site now feels very, very responsive, as it’s pretty much all static (except for when I change something, where the entire cache is cleared).

I should modify that piece of code to strip all cookies except the Django one, but it works fairly well for now, and the site is more responsive than ever. If you’re using Django’s per-site cache, be careful, it really may not caching your entire site at all.

UPDATE: I’ve modified the script to strip all non-Django cookies. Here it is:

import re

from django.middleware.cache import UpdateCacheMiddleware


class SmartUpdateCacheMiddleware(UpdateCacheMiddleware):
    def process_request(self, request):
        cookies = request.META.get('HTTP_COOKIE', '')

        # Strip all non-Django cookies.
        new_cookies = []
        for cookie in re.split("\;\s*", cookies):
            key, value = cookie.split("=")
            if "=" not in cookie:
                continue
            if key.lower().strip() in ("csrftoken", "sessionid"):
                new_cookies.append("%s=%s" % (key, value))
        request.META['HTTP_COOKIE'] = "; ".join(new_cookies)