Is your AEM content model ready for five new sites, ten languages, and a redesign by the next quarter? Do your authors fight the tool when moving a component from a product page into a blog article? Are your queries slow the moment you add real content and not just lorem ipsum? Those questions are hitting teams rolling from CQ to AEM and from Classic UI to Touch UI. The bright side is that the shape of your content model decides most of your future pain. If you get the structure right now, you can scale traffic, regions, and features without springing leaks everywhere.
The goal is simple. Clean content, stable paths, predictable types.
What breaks first when AEM scales
The first cracks usually come from mixing layout with meaning. A page becomes a scrapbook of components that only make sense side by side. Authors then copy pages to reuse content. Search stops working because your product name sits in a random text node. MSM gets sticky because your live copies carry page specific tweaks that should have been data. Then the redesign arrives and half the site needs manual fixes since HTML is baked into the content. AEM is happiest when the JCR stores meaning and your components render it. When you hold that line, templates can change and the data rolls with it.
So we start by separating content from presentation.
Model pages and assets separately
A page is a container for routes, SEO, and assembly. A content piece is the stuff you reuse. Keep them apart. Store structured content under a neutral section and reference it from pages with a picker. Authors get clear rules. Developers get clear queries. Search gets real fields. You can keep a familiar authoring flow using Touch UI dialogs and Sling Models, while your data lives in tidy nodes that do not care which template renders them. A simple example helps.
Pages route. Content lives.
{
"content": {
"site": {
"articles": {
"2026": {},
"2025": {},
"2024": {}
},
"data": {
"article": {
"a123": {
"jcr:primaryType": "nt:unstructured",
"title": "Scaling Models in AEM",
"author": "Alex",
"summary": "Why structure wins",
"body": "Long text here...",
"tags": ["tech/aem","content/model"]
}
},
"product": {
"p007": {
"name": "Camera Max",
"sku": "CAM007",
"price": "799.00"
}
}
},
"en": {
"articles": {
"scaling-models": {
"jcr:content": {
"sling:resourceType": "site/components/page/article",
"dataRef": "/content/site/data/article/a123",
"seoTitle": "Content Models that Scale in AEM"
}
}
}
}
}
}
}
Pages reference data with a simple path. Components read it and render.
Normalize content just enough
Full database style normalization slows authors. Full copy paste makes search useless. The sweet spot is to normalize entities that repeat across sections or channels. Products, authors, offices, event locations, legal text. Keep references where you need single truth and allow local fields for small tweaks. For example an article references an author profile node for name and photo, but the article keeps its own display title and teaser. If your model needs more than one level of references, stop and ask why. The extra jump adds code, queries, and author confusion. One reference jump is fine, two is often a smell.
When in doubt, prefer tags for classification over extra nodes.
Design for MSM and translation
MSM is great when your model is clear. Use a clean language tree such as /content/site/us/en and /content/site/mx/es. Keep localized content inside each language branch and keep the shared data outside under /content/site/data. That way a live copy can roll out template and structure changes while your core data stays put. Use the i18n dictionary for micro copy, not for article bodies. Use page properties for locale specific SEO. Avoid path renames that break references. A small rule that saves time is to store only content IDs in page properties and resolve them at render time so a rollout does not lock your content to a path that later changes.
Keep rollouts for structure, keep translation for content.
Query with intent
Queries break at scale when they are vague. Plan your indexing and keep predictable node names. Prefer QueryBuilder or SQL2 with clear constraints. Store core fields under jcr:content only when they are truly page metadata. For reusable content, query the data section directly. Avoid wildcards, search by type, and tag. Here is a QueryBuilder call that looks for articles by tag and date without crawling the world.
curl -u admin:admin \ "http://localhost:4502/bin/querybuilder.json?\ path=/content/site/data/article&\ type=nt:unstructured&\ property=tags&\ property.value=tech/aem&\ 1_property=title&\ 1_property.operation=exists&\ p.limit=20&orderby=@jcr:created&orderby.sort=desc"
If your query needs a filter you cannot express, fix the model not the query.
Sling Models and HTL glue
Use HTL to keep logic out of markup and Sling Models to read content. The model should handle both inline fields and referenced data. Keep it small and testable. Here is a tiny example that reads a dataRef path and exposes the fields to a component.
// /apps/site/components/article/ArticleModel.java
package site.components.article;
import javax.inject.Inject;
import org.apache.sling.api.resource.*;
import org.apache.sling.models.annotations.*;
import com.day.cq.commons.jcr.JcrUtil;
@Model(adaptables = Resource.class, defaultInjectionStrategy = DefaultInjectionStrategy.OPTIONAL)
public class ArticleModel {
@Inject @Optional
private String dataRef;
@Inject
private ResourceResolver resolver;
public Resource getData() {
if (dataRef == null) return null;
return resolver.getResource(dataRef);
}
public String getTitle() {
Resource data = getData();
return data != null ? data.getValueMap().get("title", String.class) : "";
}
public String getSummary() {
Resource data = getData();
return data != null ? data.getValueMap().get("summary", String.class) : "";
}
}
<sly data-sly-use.m="site.components.article.ArticleModel">
<article class="c-article">
<h1>${m.title}</h1>
<p class="summary">${m.summary}</p>
</article>
</sly>
The template stays tidy and your model does the lifting.
Version components without breaking content
You will change components. Do not rename old resource types. Create a new resource type and extend the old one with resourceSuperType. Then migrate content gradually. Authors can keep working while you roll changes in slices. A small Groovy Console script can bump resource types for selected nodes. Test on a subtree, snapshot a package, then roll out.
// ACS AEM Commons Groovy Console sample
def root = "/content/site/en/articles"
def oldType = "site/components/page/article"
def newType = "site/components/page/article_v2"
def resolver = resourceResolver
def res = resolver.getResource(root)
res.adaptTo(org.apache.sling.api.resource.Resource.class).listChildren().each { page ->
def jcr = page.getChild("jcr:content")
if (jcr && jcr.getValueMap().get("sling:resourceType") == oldType) {
jcr.adaptTo(javax.jcr.Node).setProperty("sling:resourceType", newType)
println "Updated " + jcr.path
}
}
session.save()
Version the code, keep the content steady.
Tags are your friend
Put classification in Tags not in folder names. Authors can retag without moving nodes. Queries become simple. Navigation can read tags and build pages without hard coded lists. You get natural filters for search and feed endpoints. Teach your team a short tag taxonomy and stick to it. The tree matters less than the names. Keep tags human. Keep them short.
If a folder name carries meaning you need a tag for it.
Keep URLs stable
URLs are contracts with the world and with your analytics. Choose a simple pattern early and hold it. Prefer lowercase, simple words, and no dates unless you will keep that date forever. Map vanity URLs with Apache or Dispatcher and leave content paths alone. Store canonical URLs in page properties and let templates output them once. When you rename titles, keep the page name unless you have a redirect ready. Stable paths cut SEO pain.
Changing URLs is like changing your phone number.
A tiny checklist you can print
– One content type per folder under data with clear names
– Pages reference data by path or ID not by copy
– One level of references at most
– Tags for classification not folders
– Predictable queries with type, path, and tag filters
– New resource type for breaking changes
– MSM for structure, translation for copy
– URLs are contracts, redirects are coverage
– Pages reference data by path or ID not by copy
– One level of references at most
– Tags for classification not folders
– Predictable queries with type, path, and tag filters
– New resource type for breaking changes
– MSM for structure, translation for copy
– URLs are contracts, redirects are coverage
Small habits make big sites feel light.
Design your AEM content model like a library where meaning lives in the catalog and the shelves can move any time.