LLMs and Legacy Systems: A Delicate Balance

Everyone wants large language models to connect to the old stuff without breaking the business, and that is the game this season.

When people say plug an LLM into a monolith, they picture a friendly bot with superpowers, but the picture that actually ships looks more like a careful set of adapters, narrow scopes, and boring guardrails that respect the shape of the stack you already have. The rule of thumb that has held up in big shops is simple, start read only, ground every answer on sources you can show, and never let the model improvise a write that you cannot reverse. That means LLM as a coworker, not a root user, with tasks that sit at the edges, tickets, drafts, reports, test data, and later very small writes that pass through checks you already trust. On mainframes and older app servers, the play is to point the model at searchable snapshots and event feeds, not at the live core, then let a tiny service translate any approved action into the calls your world speaks, BAPI for SAP, stored procs for Oracle, CICS for the big iron, MQ or Kafka if that is your lane. Your vector store needs a map to the truth, so keep a catalog of which tables or docs feed which embeddings, add document ids, versions, and owners, and flush stale entries when the source of record changes. Prompts also need a plan, write them like API contracts, with a schema the model must follow, and log every request and response with a request id, the model name, the grounding docs, and the final action taken so that an auditor or a stressed out engineer can replay the case. Security gets real the moment data leaves the building, so draw a bright line around PII, secrets, and trade data, and add redaction and hashing before anything hits the model. In many shops the safe call is to keep on prem inferencing for sensitive tasks and cloud calls for public content, with a broker that picks the route based on a policy you can read. Latency matters too, a model that takes ten seconds to answer a customer is a model that will get turned off by the call center lead, so set SLOs for tail latency and add a fast path with caching for the common cases. When the model must write, make it write to a queue, not to the core tables, and have a pre existing service validate the intent, check access, and commit, that way rollback and idempotency are already part of the story. For marketers and growth teams this pattern is gold, the model can draft copy and product names, propose segments, and build journeys, while a review step and your CDP run the final checks for consent, tone, and budget. The same story repeats in dev teams, where a chat window can summarize logs and tickets from Jira and GitHub, point to the error that matches a spike in traces, and suggest a patch that goes into a branch, not to main, with tests attached. You keep the human in the loop where it matters, you let the machine speed up the boring parts, and you keep your blast radius small. Companies that do well with this treat RAG as the default, not fine tune, because the data changes daily, and because the legal team smiles when they see that every word came from a source they know. Fine tunes still have a place, for tone, for domain terms, for code style, but you train them on approved corpora and you track model versions like you track releases. Speaking of tracking, turn on OpenTelemetry, tag spans with user ids and model calls, and feed that into your SLO board so that you can see when a prompt causes a spike, not just when a host dies. Now about prompt injection and jailbreaks, you will not stop every trick, so you put a content filter and a policy checker after the model, you strip system prompts from the outputs, you drop links and commands that target internal endpoints, and you lock down credentials so that the model can never fetch more than its role allows. On the cost side, early wins come from small local models for frequent tasks, with a call out to a larger model for rare hard cases, picked by a router that uses a cheap heuristic like token count or a classifier you trained once. Do not skip a playbook for outages, with a kill switch that routes traffic back to the old path, a banner in the tools that tells users what changed, and a rotation with names on it so that people know who to ping at three in the morning. If you run SAP or Oracle, fight the urge to let the bot post journal entries on day one, start with suggestions and comments, then let the finance app post based on a small set of rules that you can rehearse. If you run retail on an old stack, let the model write product enrichments and search synonyms, not inventory writes, then measure click through and return rate and adjust. A big part of success is evaluation, not just how smart the model sounds, but task success, time to answer, fallout rate, and the share of work that moved from people to the machine without new risk. Build a golden set of tickets, emails, forms, and code diffs that represent your reality, and run every new prompt or model against that set before you ship. Keep your legal and risk partners close, bring them into the channel, show them logs and samples, and agree on a clear line between assist and automate. The move from pilot to real use happens when teams trust the boring parts, access, data lineage, backing services, and the two or three tasks that save an hour a day in a way that feels safe.

Pick one workflow that hurts, write down the inputs, outputs, data owners, and failure modes, then ship a narrow assist that sits beside the process and reports everything it does. Define three risk tiers for your stack, public content, internal only, and sensitive, attach model choices and routes to each tier, and make a safe default route. Measure with real numbers, task success, time to answer, rework rate, and user trust, tell the story in a weekly note, and retire a bot that loses ground three weeks straight. Most of all, resist the rewrite fantasy, add LLM superpowers at the edges, keep your receipts, and move only when the logs and the people doing the work say the change paid off.