Dispatcher Caching in AEM - CMO & CTO (An AI Generated Experiment to the past)

Cache is the feature you only notice when it fails. When it works, nobody calls at 3 AM.

Dispatcher caching in AEM CQ is not fancy. It is a brick wall that takes the hit before your publish boxes do. We just came out of a launch week where the new iPhone and late night promos spiked traffic hard, and the site held thanks to Dispatcher. So I am putting down the lessons while they are fresh. Some choices are specific to CQ 5 and Apache httpd, but the core ideas age well.

The night the phones lit up

We had a clean publish farm. Two nodes, a healthy author, replication humming. Then the campaign hit prime time and traffic multiplied. Requests for content pages were flooding the farm. CPU on publish climbed. Latency crept up. We turned on a stricter Dispatcher cache ruleset, flushed once, and the graph changed shape in seconds. Publish threads dropped. Apache started serving cached HTML straight out of disk. The business folks never saw the cliff we avoided. That is the job.

Since then, my mental model is simple. Author creates. Publish renders. Dispatcher protects. It sits in front, decides what to allow, what to cache, when to evict, and what to pass through. When it is strict and predictable, life is good. When it is lax, you bleed in small, hard to trace ways.

Think of Dispatcher like a club bouncer

The bouncer has a list. Allowed methods. Allowed paths. What becomes a file on disk. What gets tossed back. It does not need to be clever. It needs to be consistent. Cache everything you can that is not personalized or behind a login. Use publish for the rest. Tie cache invalidation to authoring. Keep surprises low.

With that, here are the bits that save time over and over. Three deep dives, then a short wrap.

Deep dive 1: A readable dispatcher.any

If the dispatcher.any cannot be read in one sitting, it is doing too much. Split farms by host if needed, but keep each farm tidy. The basics below work for most public sites with two publish renderers.

# dispatcher.any snippet
/farms [
  /website {
    /clientheaders [
      "Referer"
      "User-Agent"
      "Accept-Language"
      "X-Forwarded-For"
      "X-Forwarded-Proto"
      "Cookie"
      "Authorization"
    ]

    /virtualhosts [
      "*.example.com"
      "example.com"
    ]

    /renders [
      { /hostname "publish1.internal" /port "4503" }
      { /hostname "publish2.internal" /port "4503" }
    ]

    /filter {
      /0001 { /type "allow" /method "GET"  /url "*" }
      /0002 { /type "allow" /method "HEAD" /url "*" }
      /0003 { /type "deny"  /method "POST" /url "*" }
      /0004 { /type "deny"  /extensions [ "json" "xml" ] /url "/content/*/jcr:*" }
      /0005 { /type "deny"  /url "/system/*" }
    }

    /cache {
      /docroot "/var/www/html"
      /statfileslevel "2"

      /rules {
        /0000 { /type "allow" /glob "*.html" }
        /0001 { /type "allow" /glob "*.css" }
        /0002 { /type "allow" /glob "*.js" }
        /0003 { /type "allow" /glob "*.png" }
        /0004 { /type "allow" /glob "*.jpg" }
        /0099 { /type "deny"  /glob "/libs/*" }
      }

      /ignoreUrlParams {
        /0001 { /glob "q"        /type "deny" }     # search changes content
        /0002 { /glob "id"       /type "deny" }
        /0003 { /glob "wcmmode"  /type "deny" }     # authoring mode
        /9999 { /glob "*"        /type "allow" }    # ignore the rest
      }
    }

    /stickyConnectionsFor "/apps|/libs|/system|/content/secure/"
  }
]

A few highlights. Filter early to stop odd verbs and internal paths. Keep client headers tight so cache files group similar requests. The sticky setting keeps session areas sane. And read that ignoreUrlParams block like a white list of what to ignore. Deny the params that change output, allow the rest so they do not break cache keys.

Deep dive 2: Invalidation that authors can trust

The fastest site is a static site. AEM gives us a way to get close while still editing in a nice console. The trick is wiring authoring to cache flush agents so authors do not need to ping ops for every change.

On each publish node, set a Dispatcher flush agent in the publish agents folder. Point it to your web tier. When content is activated, the publish node will call Dispatcher to evict the right files. You can test the flush endpoint by hand:

curl -u admin:admin \
  -F "path=/content/site/en/*" \
  -F "path=/etc.clientlibs/*" \
  http://web01.example.com/dispatcher/invalidate.cache

The magic player here is statfileslevel. This controls how deep Dispatcher writes a small .stat file that signals freshness for a folder. When a path is activated, Dispatcher updates the stat files up to that level. On the next request, cached files older than the stat file get regenerated. Pick a level that matches your content tree. Two or three folders deep is a common sweet spot for large sites.

You can also add a manual nuke for panic cases. A safe pattern is a protected endpoint that touches the top stat file and prunes HTML only. Keep it behind basic auth and your office IPs.

# Apache httpd snippet to protect a flush helper
<Location "/admin/flush-all">
  AuthType Basic
  AuthName "Admin"
  AuthUserFile "/etc/httpd/.htpasswd"
  Require valid-user
  Order allow,deny
  Allow from 10.0.0.0/8
  Satisfy all
</Location>

# Helper script touch
#!/bin/sh
find /var/www/html -name ".stat" -exec touch {} \; -print

Last point on trust. Do not cache authoring modes. The wcmmode param should bypass cache or be denied in ignore rules. Editors will think the site is broken if they see stale or mixed chrome.

Deep dive 3: Headers, TTLs, and not shooting yourself in the foot

Dispatcher writes files. Apache serves them. That means Apache headers decide what browsers and CDNs do next. Set strong headers for assets, lighter headers for HTML, and be strict with cookies that would split the cache.

# Apache headers for assets
<FilesMatch "\.(css|js|png|jpg|gif|woff|ttf)$">
  Header set Cache-Control "public, max-age=2592000"   # 30 days
  ExpiresActive On
  ExpiresDefault "access plus 30 days"
</FilesMatch>

# Apache headers for HTML
<FilesMatch "\.html$">
  Header set Cache-Control "public, max-age=300"       # 5 minutes
</FilesMatch>

# Avoid cache splits by removing noisy cookies
Header unset Set-Cookie
RequestHeader unset Cookie

The cookie lines above are blunt. Use them with care. A better approach is to only pass through the cookies that matter to publish. Many default cookies are for authoring or analytics and will explode your cache if they vary every request. Keep a short cookie allow list.

# Keep only a short set of cookies on the way to publish
# Everything else gets stripped at the edge
RewriteEngine On
RewriteCond %{HTTP:Cookie} !(^|;\s*)(sessionid|remember_me)= [NC]
RewriteRule ^ - [E=DROP_COOKIES:1]

Header edit* Cookie "(^|;\s*)(?!sessionid=|remember_me=)[^;]*" "" env=DROP_COOKIES

Two gotchas we keep seeing. First, never cache the login area. Block it in rules or mount it under a clear secure path and mark it as no cache. Second, query strings that look harmless often are not. Kill tracking params in ignoreUrlParams so they do not create duplicate files. Keep the few that change output.

Reflective close: simple, fast, repeatable

AEM gives us a rich authoring story. Dispatcher gives us a simple way to make that story fast on the open web. The setup that stays healthy looks the same every time. Clear filters. Tight headers. Honest cache rules. Reliable invalidation. A short list of exceptions. A bit of command line that you trust at three in the morning.

If you are launching a new CQ site this month, do a dry run with the cache empty, then warm it and watch the difference in logs. Tail Apache access, not just AEM error. Count the misses. Make a small dashboard that tells you the ratio of cached HTML to passed through requests. And keep a copy of your dispatcher.any in version control with comments that you would be proud to hand to the next person.

The web keeps getting faster hardware and bigger spikes. The tactic is still the same. Render once, serve many. Let authoring trigger smart evictions. Treat Dispatcher as your first feature. When it is boring, everything else can shine.