CI for Performance Tests: Treating Speed as a Feature - CMO & CTO (An AI Generated Experiment to the past)

We shipped on a Friday night. Big promo. Fresh build. Buzzing chat room. Traffic started to roll in and then everything felt like wading through cold syrup. Spinners everywhere. Our APM graphs drew a perfect ski slope. Engineers scrambled, product looked worried, and the chat turned quiet. The code worked, but it was slow. We treated speed like a nice to have. That night it was the only thing anyone cared about.

\n\n\n\n

That was the last time I agreed to ship without a speed safety net. Since then I have been busy wiring CI for performance tests. Nothing fancy, just JMeter running in the same loop as unit tests and deploys. The lesson is simple. Speed is a feature. If you do not test it on every change, it will drift until it bites you at the worst time.

\n\n\n\n

Why put performance in CI at all

\n\n\n\n

We already measure speed in staging and we run a big load test once in a while. Still, the ugly regressions sneak in between those moments. CI is where change lives. That is where we catch things early.

\n\n\n\n

Here is the promise. Every push gives you a small signal about speed. Every night gives you a bigger one. Every week you run a longer soak that tells you if memory, GC, or caches go sideways. Stack those together and you get confidence without waiting for a fire drill.

\n\n\n\n

Also, people respond to a scoreboard. A green build that shows p95 under target is a quiet win. A red build that says p95 jumped by 30 percent triggers a quick talk long before a customer tweets at you.

\n\n\n\n

The tool belt right now

\n\n\n\n

Today I lean on Jenkins or TeamCity for CI, Apache JMeter for the load bit, and either the Jenkins Performance Plugin or Graphite for trends. JMeter runs in headless mode just fine. We pass in test data and environment settings as properties so the same plan works on a laptop, CI, or a throwaway server in the cloud. If you love Maven, there is a JMeter plugin for that route too.

\n\n\n\n

A JMeter plan CI can run without babysitting

\n\n\n\n

Make your plan boring and repeatable. Keep it short for the per commit step. Something like two minutes tops. Focus on a small slice that hits your most common path. Login, search, product page, add to cart, checkout. Save the heavy tests for night runs.

\n\n\n\n

Use properties everywhere. That keeps the plan reusable across branches and jobs.

\n\n\n\n

// In JMeter, reference properties like:\n${__P(baseUrl,https://staging.example.com)}\n${__P(users,20)}\n${__P(ramp,30)}\n${__P(duration,120)}\n\n// Add a Constant Throughput Timer only if you really need it\n// Add a Duration Assertion on key samplers

\n\n\n\n

Run JMeter headless in your CI job

\n\n\n\n

Here is a Jenkins shell step I keep around. It runs JMeter, stores the raw results, and prints a tiny summary. No clicky bits. No GUI.

\n\n\n\n

#!/usr/bin/env bash\nset -euo pipefail\n\nexport JMETER_HOME=/opt/apache-jmeter-2.10\nexport PATH="$JMETER_HOME/bin:$PATH"\n\nBUILD_TS=$(date +%s)\nRESULTS="results-${BUILD_NUMBER:-local}-${BUILD_TS}.jtl"\nLOGFILE="jmeter-${BUILD_NUMBER:-local}.log"\n\njmeter -n \\\n  -t checkout.jmx \\\n  -JbaseUrl="https://staging.example.com" \\\n  -Jusers="${USERS:-30}" \\\n  -Jramp="${RAMP:-30}" \\\n  -Jduration="${DURATION:-120}" \\\n  -Jbuild="${BUILD_NUMBER:-local}" \\\n  -l "$RESULTS" \\\n  -j "$LOGFILE"\n\necho "Raw results saved to $RESULTS"\n# Quick summary: count, avg, 95th, errors\nawk -F, 'NR>1 {count++; sum+=$2; a[count]=$2; if($8!~/^$/) err++} END {\n  # 2nd column is elapsed in ms in JTL CSV (make sure JMeter CSV config matches)\n  n=count\n  if(n==0){print "No samples"; exit 1}\n  asort(a)\n  p=int(0.95*n); if(p<1) p=1\n  printf("Samples=%d Avg=%.0fms P95=%dms Errors=%d\\n", n, sum/n, a[p], err+0)\n}' "$RESULTS"

\n\n\n\n

The awk is rough but it gets you a quick readout. For better reports, I use the JMeter Plugins CMDReporter to render trend charts on night runs.

\n\n\n\n

Hard gates with thresholds

\n\n\n\n

Soft warnings become wallpaper. Put in hard gates. You can do it inside JMeter with a Duration Assertion or outside by parsing the JTL and failing the build if numbers are out of range.

\n\n\n\n

# fail_on_threshold.sh\n#!/usr/bin/env bash\nset -euo pipefail\nJTL="$1"\nMAX_P95_MS="${2:-500}"\nP95=$(awk -F, 'NR>1 {a[++n]=$2} END {asort(a); p=int(0.95*n); if(p<1)p=1; print a[p]+0}' "$JTL")\necho "P95=${P95}ms Threshold=${MAX_P95_MS}ms"\nif [[ "$P95" -gt "$MAX_P95_MS" ]]; then\n  echo "Failing build on P95 threshold"\n  exit 2\nfi

\n\n\n\n

Wire that as a post step. If p95 slips, the build turns red. No debate.

\n\n\n\n

Store and chart the trend

\n\n\n\n

Red or green is not the whole story. Trends matter. For that I like Graphite. It takes plain text. You can push p95, error rate, and throughput per build. Then hang a simple dashboard on a big screen.

\n\n\n\n

# push_to_graphite.sh\n#!/usr/bin/env bash\nset -euo pipefail\n\nMETRIC_PREFIX="ci.perf.checkout"\nGRAPHITE_HOST="graphite.mycompany"\nGRAPHITE_PORT=2003\nTS=$(date +%s)\nJTL="$1"\n\nread SAMPLES AVGMS P95MS ERRORS < <(awk -F, 'NR>1 {count++; sum+=$2; a[count]=$2; if($8!~/^$/) err++} END {\n  n=count; if(n==0){print 0,0,0,0; exit}\n  asort(a); p=int(0.95*n); if(p<1)p=1\n  printf("%d %.0f %d %d\\n", n, sum/n, a[p], err+0)\n}' "$JTL")\n\n{\n  echo "${METRIC_PREFIX}.samples ${SAMPLES} ${TS}"\n  echo "${METRIC_PREFIX}.avg_ms ${AVGMS} ${TS}"\n  echo "${METRIC_PREFIX}.p95_ms ${P95MS} ${TS}"\n  echo "${METRIC_PREFIX}.errors ${ERRORS} ${TS}"\n} | nc -w 3 "$GRAPHITE_HOST" "$GRAPHITE_PORT" || true

\n\n\n\n

If you do not run Graphite, the Jenkins Performance Plugin can read JTL files and draw charts per build. It is not fancy, but it does the job.

\n\n\n\n

Fast feedback tiers

\n\n\n\n

Do not try to cram a full stress test into every commit. A simple split keeps people happy and keeps the queue moving.

\n\n\n\n

Per commit smoke perf. One to two minutes. Low user count. Hits the golden path. Gates on p95 and errors.
Night baseline. Five to ten minutes. Medium user count. Includes a few more flows. Stores trend metrics.
Weekly soak. Sixty minutes or more. Steady load. Watches memory and GC. Catches slow leaks and hot spots.

\n\n\n\n

Keep a small warm up before you measure. Most apps need caches to settle. Thirty to sixty seconds is enough for the short runs.

\n\n\n\n

Repeatability and data

\n\n\n\n

Repeatable tests need repeatable data. Random emails with a fixed seed. Known products. A predictable cart. If your app writes a lot, reset state or use throwaway tenants. If you rely on a third party, add stubs for the hot calls so your test is not blocked by someone elseï¿½s rate limits.

\n\n\n\n

Noise and environments

\n\n\n\n

If you can, test on a reserved box. Shared staging is fine for the smoke step, but your night baseline will be all over the place if other teams hammer the same server. Keep the VM size and JVM flags stable. Watch CPU steal time if you are on shared hosts. For Java, keep the same heap size across runs. For Node or Python, lock versions for the run.

\n\n\n\n

Front end speed is part of the story

\n\n\n\n

JMeter checks your server. Users feel the page. I like to pair JMeter with a WebPageTest call or a headless YSlow run on the key page. One quick synthetic page timing in CI tells you if a new sprite sheet or a blocking script slowed the first paint.

\n\n\n\n

# very small PhantomJS YSlow sample\nphantomjs yslow.js "https://staging.example.com/" --info basic --format plain | tee yslow.txt\ngrep -E "overall|requests|size" yslow.txt

\n\n\n\n

Keep that light so you do not slow the build. One page is enough for a red flag.

\n\n\n\n

What managers and PMs care about

\n\n\n\n

Speed moves business numbers. Every study I have seen points the same way. One hundred ms can move conversion. People bounce when pages stall. Search engines reward fast pages. None of this is new, but teams forget until a big day hurts.

\n\n\n\n

Two things make this stick.

\n\n\n\n

Definition of Done includes speed. For each story, define a target like p95 under 500 ms for the key call it touches. Tie that to the CI gate. When the build is red, the story is not done.
A simple scoreboard. Show p95, error rate, and throughput per build. Keep a weekly email with a tiny sparkline and a short note. No long PDF. One glance should tell people if we are getting faster or slower.

\n\n\n\n

There is also a cost angle. Throwing bigger boxes at the problem later is expensive and it hides real issues. A small fix early saves a pile of money and support time. The build gate forces that small fix at the right moment.

\n\n\n\n

Finally, pick targets that match user expectations. Do not chase a random number. If your app is a dashboard that loads a lot of data, maybe a one second p95 is fine. If it is a single input form, aim for snappy. Write down the target per flow. Use the same words in planning and in CI. That is how you keep people aligned without long meetings.

\n\n\n\n

Common pitfalls and quick fixes

\n\n\n\n

Chasing averages. Averages lie. Use p95 or p99. That is what users feel when the system is under load.
Running too hot. If you push past capacity on each commit, you will get noisy results and angry teammates. Keep the smoke run gentle.
Ignoring errors. A fast 500 is not a win. Gate on both speed and errors.
Mixing concerns. Separate API tests from browser tests so you can see where the time goes.
Unstable data. If test data grows without bounds, your nightly numbers will drift. Reset or seed.

\n\n\n\n

Putting it together in one CI job

\n\n\n\n

# Jenkins freestyle outline\n# 1. Checkout\n# 2. Build and deploy to a throwaway env or a staging slot\n# 3. Run perf smoke\n\nexport USERS=25\nexport RAMP=20\nexport DURATION=120\n\n./run_jmeter.sh\n./fail_on_threshold.sh results-*.jtl 500\n./push_to_graphite.sh results-*.jtl\n\n# Archive artifacts: *.jtl, jmeter-*.log\n# Publish JMeter report via Jenkins Performance Plugin if you use it\n# Mark build unstable if p95 within 10% of the threshold to get attention

\n\n\n\n

This is not a moonshot. It is a few scripts and a plan that fits in source control. The hard part is the habit. Once it is there, you will not want to fly without it.

\n\n\n\n

Your turn this week

\n\n\n\n

If you made it this far, you probably had a slow release at some point too. Here is a small challenge. Pick one path. Wire one perf smoke in CI. Ship it. Share the number.

\n\n\n\n

Create a tiny JMeter plan that logs in and hits one hot endpoint.
Run it headless on your laptop with two to five users for two minutes.
Add a Duration Assertion or the bash threshold script so the run fails when p95 passes your target. Start with a friendly target like 500 ms.
Put it in your CI job after deploy to staging. Archive the JTL.
Show the number in standup. Make it part of your plan for the next story.

\n\n\n\n

That is it. No giant framework. No special team. Just a small, steady drumbeat that says speed is a feature and we protect it with the same care we give to tests and code review. When the next big promo lands, you will be glad this is in place.

\n\n\n\n

And if you want a template, ping me and I will share a starter JMX and the scripts from this post. Keep shipping fast stuff.

Software Architecture Software Engineering