The system nobody was allowed to touch
Every engineering team I have worked with has one.
It sits at the center of the business. It processes the money, the orders, or the compliance records. It was built by engineers who have long since retired. Nobody fully understands it. The documentation, if it ever existed, has not been updated since 2009. And the unspoken rule that every new engineer learns in their first month is always the same: do not touch it (AI agents for legacy code modernization).
I spent six weeks on a project in 2024 trying to add a single new payment method to a system like this. The codebase was 400,000 lines of Java written across twelve years. No tests. No architecture documentation. Business logic scattered across helper classes with names like UtilProcessorHelperManager2.java. I made the change. It passed QA. It broke a tax calculation that nobody knew was related to payment method selection, in a module three layers away from anything I had touched.
That experience is why I pay close attention to what AI agents are now doing in the legacy modernization space. Because the scale of the problem has finally found a tool capable of matching it.

The scale of the problem nobody talks about honestly
Legacy modernization lives in the “important but not urgent” category at most companies until suddenly it does not.
Pegasystems research puts the average annual technical debt waste at more than $370 million per enterprise. That number includes failed modernization projects, the ongoing cost of keeping legacy systems alive, and the opportunity cost of capabilities that could not be built because the foundation would not support them.
Deloitte finds that nearly 60 percent of AI leaders view legacy system integration as the primary barrier to agentic AI adoption. This is the compounding trap. You cannot deploy modern AI workflows because your infrastructure is too old. You cannot modernize your infrastructure because it is too risky to touch. So you stay stuck.
The U.S. government alone spends roughly 80 percent of its IT budgets maintaining legacy systems instead of building competitive capabilities. This is not a startup problem. It is one of the most expensive operational drags in enterprise computing, and it has been accelerating as the engineers who built these systems approach retirement age.
The average COBOL programmer is 55 years old. Hiring new COBOL talent is nearly impossible in 2026. The developer who built your system in 1985 is retiring this year. When that person leaves, the institutional knowledge about why the system behaves the way it does goes with them. The code becomes the only specification, and nobody who remains knows how to read it fluently.
That convergence of factors is what makes 2026 a genuine inflection point. The urgency has never been higher. And for the first time, the tooling is capable of helping.
What AI agents can actually do to legacy code?
Before getting into the how, I want to be precise about what AI agents can and cannot do here. The gap between the marketing claims and the production reality is significant, and teams that go into a modernization program with false expectations spend the first three months disillusioned.
AI agents are genuinely excellent at the analysis phase. Morgan Stanley’s DevGen.AI initiative reviewed millions of lines of legacy code and saved 280,000 developer hours by translating outdated code into plain-English specifications for modernization. Reading code and explaining it in natural language is a task that plays directly to what large language models do best.
AI agents are genuinely useful for mechanical translation tasks. Some researchers show that GenAI handled 69 to 75 percent of code edits during large-scale migrations, cutting project duration by around half. Fujitsu reports proof-of-concept trials where GenAI reduced modernization timelines by about 20 percent, and agentic AI cut them by up to 50 percent.
AI agents are a multiplier, not a replacement. AI is an efficient colleague, not an autonomous replacement. When integrated into a structured workflow with proper prompt engineering, code reviews, and testing, AI-assisted modernization can shorten the path from “we should modernize” to working software significantly.
Where AI agents consistently fail is in the business logic embedded in decades of undocumented behavior. A COBOL program that applies a 0.003 percent fee adjustment in one specific transaction type under three specific conditions is not doing that arbitrarily. It is encoding a contract clause or a regulatory requirement that someone negotiated in 1998. An AI agent will translate the code correctly. It will not tell you why that behavior exists or whether eliminating it would violate a current obligation.
That distinction matters for how you structure the work.
The three-phase pipeline that actually works
Teams that get strong results from AI-assisted legacy modernization are not throwing agents at the entire codebase and hoping for the best. AI-assisted legacy code modernization delivers the most value across three phases: codebase analysis, incremental translation, and equivalence validation.
Each phase requires different tools, different human roles, and different acceptance criteria.
Phase 1: codebase analysis and knowledge extraction
You cannot modernize what you cannot understand. The first job of the AI agents is to build a map of what the legacy system actually does, documented clearly enough that your current engineers can reason about it.
This phase involves feeding the codebase to agents specialized in static analysis, dependency mapping, and natural language explanation. The agents trace call graphs, identify dead code paths, extract business rules from logic branches, and produce plain-English summaries of module behavior. They flag circular dependencies, identify shared mutable state, and surface the parts of the codebase where a change in one place breaks something three layers away.
// Example: Prompting a Claude Code agent for legacy module analysis // Run this against each major module before planning migration scope const moduleAnalysisPrompt = ` You are analyzing a legacy Java service module for modernization planning. Read all files in src/payments/core/ and produce: 1. BEHAVIOR SUMMARY - What this module does in plain English (no jargon) - Inputs it accepts, outputs it produces - External systems it calls or depends on 2. BUSINESS LOGIC INVENTORY - Every conditional branch that encodes a business rule - Magic numbers or constants with non-obvious significance - Error handling paths and what they represent 3. RISK ASSESSMENT - Circular dependencies - Shared mutable state that could cause concurrency issues - Code paths with no test coverage - Logic that appears to encode regulatory or contractual behavior 4. MODERNIZATION BLOCKERS - Things that will require a human domain expert to explain - Logic that cannot safely be translated without business sign-off Format each section clearly. Flag any behavior you cannot confidently explain with [REQUIRES HUMAN REVIEW] so the team can schedule a domain expert session. `;
The output from this phase is the modernization map: a structured document that tells your engineers what the system does, what is safe to change, and what requires a conversation with someone from the business before touching.
Teams that skip this phase and go straight to translation produce systems that work technically and fail commercially. The new system passes all the tests and misses a regulatory requirement that the old system handled silently for fifteen years.
Phase 2: incremental translation with the strangler fig pattern
The right architecture for AI-assisted modernization is almost always the strangler fig pattern. The strangler fig pattern provides a safe approach to modernizing mainframe applications through three phases: transform, coexist, and eliminate. Organizations gradually replace monolithic applications with microservices while keeping the original application running.
The name comes from a tree that grows around an existing tree, gradually replacing it without the original ever collapsing. Your new system grows around your legacy system. Traffic is incrementally rerouted. The legacy system shrinks. Eventually, it is retired.
AI agents accelerate the translation phase by generating the first draft of each extracted module in the target language and framework. AI automatically analyzes system interfaces and generates 60 percent of REST and gRPC API wrappers, allowing the legacy system to function as a backend behind modern APIs while creating comprehensive test harnesses that verify new implementations behave identically to replaced legacy modules.
// Example: Multi-agent strangler fig pipeline for a payment calculation module // Agent 1 extracts the interface, Agent 2 translates, Agent 3 writes equivalence tests // STEP 1: Interface extraction agent prompt const extractInterfacePrompt = ` Read src/legacy/PaymentCalculatorBean.java. Extract and document: - The public interface (method signatures, parameter types, return types) - All business rules implemented in the calculation logic - Edge cases explicitly handled (null checks, boundary conditions, special rates) - External dependencies (database queries, config lookups, service calls) Output a TypeScript interface definition and a specification document that a developer could use to reimplement this module from scratch. `; // STEP 2: Translation agent prompt const translateModulePrompt = ` Using the specification from Step 1, implement PaymentCalculator in TypeScript. Requirements: - Match the public interface exactly (inputs, outputs, error cases) - Implement every business rule documented in the specification - Replace the JDBC database calls with the new PostgreSQL repository (src/db/PaymentRepository.ts) - Preserve all edge case handling including the 0.003% fee adjustment for WIRE_INTERNATIONAL transactions above $50,000 (business rule from legacy line 847) Do NOT optimize or refactor the business logic. Translate it faithfully first. Optimization is a separate step after equivalence is confirmed. `; // STEP 3: Equivalence test agent prompt const writeEquivalenceTestsPrompt = ` Write a characterization test suite that proves the new TypeScript PaymentCalculator produces identical outputs to the legacy PaymentCalculatorBean for all known inputs. Include: - All happy path cases from the legacy test suite - Every edge case identified in the specification (including WIRE_INTERNATIONAL fee) - Boundary conditions: zero amounts, maximum amounts, null optional fields - All documented error paths The test suite must pass against BOTH the legacy Java implementation (via the comparison harness) and the new TypeScript implementation simultaneously. A passing test confirms equivalence. A failing test means the translation missed something. `;
The parallel execution of the legacy system and the new system, with output comparison at every step, is what makes the strangler fig approach safe at production scale. You do not flip a switch. You route one percent of traffic, compare outputs, fix divergences, route five percent, repeat.

Phase 3: equivalence validation
Equivalence validation is the phase that determines whether your modernization actually worked or just appears to have worked.
The goal is to prove that for every input the legacy system has ever processed, the new system produces identical output. Not similar output. Not a better output. Identical output, until you have confirmed the old behavior was correct and intentionally chosen to change it.
AI-powered migration achieves 99.5 percent logic preservation through parallel testing. Every transaction that runs on COBOL gets tested against the new Java implementation. That 0.5 percent divergence rate is where the domain expert conversations happen. Each divergence is either a translation error to fix or a deliberate business logic change to document and sign off on.
Characterization tests are the technical foundation of this phase. A characterization test does not test what the code should do. It tests what the code currently does, capturing the actual behavior as the specification. Refactoring without tests is rearranging dynamite. Every existing behavior must have tests before you touch anything.
// Characterization test pattern: capture legacy behavior as the spec
// Run against both systems to confirm equivalence
describe('PaymentCalculator equivalence tests', () => {
// These tests define what the legacy system DOES, not what it SHOULD do.
// A failing test means the new implementation diverges from legacy behavior.
// Divergences require explicit business sign-off before proceeding.
test('standard domestic wire: applies base rate', async () => {
const result = await newCalculator.calculate({
type: 'WIRE_DOMESTIC',
amount: 10000,
currency: 'USD'
});
// Value captured from legacy system via comparison harness
expect(result.fee).toBe(25.00);
expect(result.totalAmount).toBe(10025.00);
});
test('international wire above $50k: applies 0.003% adjustment', async () => {
const result = await newCalculator.calculate({
type: 'WIRE_INTERNATIONAL',
amount: 75000,
currency: 'USD'
});
// This edge case exists in legacy line 847 — reason unknown, business sign-off needed
// Legacy output captured: fee = 225.00, adjustment = 2.25
expect(result.fee).toBe(225.00);
expect(result.internationalAdjustment).toBe(2.25); // [REQUIRES BUSINESS SIGN-OFF]
expect(result.totalAmount).toBe(75227.25);
});
test('zero amount transaction: returns zero fee without error', async () => {
const result = await newCalculator.calculate({
type: 'WIRE_DOMESTIC',
amount: 0,
currency: 'USD'
});
expect(result.fee).toBe(0);
expect(result.totalAmount).toBe(0);
});
});What multi-agent pipelines look like in practice
The most advanced modernization programs in 2026 are not using a single AI agent. They are orchestrating specialized agents in sequence, each focused on a different part of the work.
Advanced teams at BCG X orchestrate multiple specialized AI agents in sequence, each focused on tasks like code analysis, code conversion, testing, and so on, to handle complex migrations in a coordinated way.
A typical pipeline looks like this in practice:
The analysis agent reads the legacy module, extracts the interface contract, documents business rules, and produces the specification. It flags anything it cannot confidently explain and marks it for human review.
The translation agent takes the specification and produces the target-language implementation. It does not optimize. It translates faithfully. Optimization comes after equivalence is confirmed.
The test generation agent reads both the legacy code and the new implementation, then writes characterization tests that capture every behavior it identified. It runs the tests against both systems and flags any divergence.
The review agent reads the complete diff, the specification, and the test results, then posts a structured review of the translation quality. It identifies patterns that look correct but warrant a domain expert’s attention before the module goes to production.
The human engineer does the one thing agents cannot: they review the flagged items, run domain expert sessions for the business logic questions, make explicit decisions about what to preserve and what to intentionally change, and sign off on each module before it enters the traffic routing phase.

Real results from 2026: what the numbers actually show
The production results from AI-assisted legacy modernization programs are now significant enough to be specific about.
McKinsey found that AI-augmented modernization can accelerate timelines by 40 to 50 percent. The firm also estimates that using generative AI for modernization can lead to a 40 percent cut in technical debt-related costs while improving output quality.
The FinTech case study from McKinsey is one of the most concrete data points available. A FinTech company needed to modernize 20,000 lines of code, which it estimated would take 700 to 800 hours to migrate properly. After deploying generative AI agents, the business successfully cut that number by 40 percent. That is 280 to 320 hours saved on a single module. Scaled across an enterprise codebase, the economics change the calculus on whether modernization is financially viable.
The insurance industry case study is equally concrete. A top 15 global insurer improved its code modernization efficiency and testing by over 50 percent, while also seeing a greater than 50 percent acceleration of coding tasks.
The ceiling number is striking. Early case studies indicate some tasks can be finished over 100 times faster with generative AI agents than via traditional methods. That number applies to the mechanical translation tasks, not the full modernization program end-to-end. But for the right task type, the order of magnitude improvement is real.
In March 2026, Anthropic put institutional weight behind this opportunity. Anthropic launched a Code Modernization starter kit as part of its $100 million Claude Partner Network, identifying legacy code modernization as one of the highest-demand enterprise workloads.
The failure patterns that end modernization programs early
Most legacy modernization programs fail. The failure rate has been embarrassingly high for thirty years. AI agents do not automatically fix the structural reasons why programs fail. They can amplify them.
Legacy modernization fails repeatedly due to three patterns: attempting to modernize the entire system within a single program, losing embedded business logic during translation, and a skills gap that no single team can bridge.
The big-bang rewrite is the most tempting and most dangerous approach. Big-bang rewrites often become multi-year programs with unclear milestones. Business teams wait too long for value, engineering teams spend months rebuilding existing logic, and leadership loses confidence before the transformation reaches production. AI agents make this approach more tempting because the translation speed creates an illusion that you can do the whole thing at once. You cannot. The equivalence validation and domain expert review steps do not compress just because the translation was fast.
Losing business logic during translation is the failure mode that is hardest to detect. The most expensive bugs come from removing behavior that looked redundant but encoded a tax rule or contract clause from 2014. The 0.003 percent international wire adjustment I used in the code examples earlier is exactly this. Remove it during modernization, and you have a financial services compliance issue. The agents will not catch this unless the test suite captures the behavior and the equivalence check surfaces the divergence.
AI amplifies good processes and bad processes equally. A team without tests, search, or observability gets worse output, not better, when they layer agents on top. This is the warning I give every team that approaches legacy modernization with the expectation that AI will compensate for missing foundations. It will not. The characterization tests, the static analysis tooling, and the domain expert availability need to be in place before the agents start working.
The tools doing this work in 2026 (legacy code modernization 2026)
| Tool / Platform | Primary role in modernization | Best for | Notable capability |
|---|---|---|---|
| Claude Code (Anthropic) | Multi-agent orchestration, code analysis, translation | Complex multi-file refactors and codebase audits | 80.8% SWE-bench score; dynamic workflows for parallel module processing |
| GitHub Copilot Modernize | Legacy .NET analysis and upgrade path generation | Microsoft ecosystem, .NET modernization | Launched March 2026; generates full upgrade strategy and code changes across the codebase |
| Capgemini CAALM | Enterprise mainframe modernization at scale | COBOL mainframe migrations in regulated industries | Extracts business rules, automates migration steps, dedicated mainframe gen AI offering |
| Azure Legacy Modernization Agents | COBOL to Java/C# pipeline with multi-agent architecture | COBOL to Java migrations on Azure | Open source; splits large COBOL files at semantic boundaries; three-tier complexity scoring |
| AWS Transform | Mainframe application modernization on AWS | Organizations moving mainframe workloads to the cloud | Strangler fig implementation support; progressive modernization without downtime |
| Hexaview Legacy Insights | AI documentation generation for legacy code | Teams need to document before they modernize | 94% accuracy on LegacyCodeBench; 20 points better than GPT-4o on enterprise COBOL |
| Sourcegraph Cody | Codebase intelligence and search for large codebases | Understanding where things live before touching them | Semantic code search that finds relevant files the agent did not know to open |
The decision your team needs to make before starting
The first strategic decision in any modernization program is not which AI tools to use. It is the modernization approach that fits your situation.
A full rewrite makes sense when the system architecture is fundamentally broken, the codebase is genuinely unmaintainable, or the business logic no longer reflects how the company operates. AI agents accelerate a rewrite significantly. They do not change the risk profile. A big-bang rewrite that goes wrong with AI assistance goes wrong just as badly as one without it (AI legacy system modernization).
The strangler fig approach is right for the majority of situations. The legacy system keeps running. New modules are built alongside it. Traffic is incrementally rerouted as equivalence is confirmed. The business sees value incrementally rather than waiting for a multi-year rewrite to complete. AI agents shine in this model because each module extraction and translation is a discrete, well-scoped task.
An AI agent wrapper makes sense when the priority is modern capabilities now, not a full architectural overhaul. An AI agent wrapper allows enterprises to keep the legacy system running while creating modern interfaces around it. The business gets capabilities like intelligent search, workflow automation, predictive analytics, and modern UI without immediately rewriting the entire core. This is the fastest path to business value. It is not the same as modernization, and teams should be clear with themselves about which goal they are pursuing.
What to do in the first thirty days
The teams that make real progress on legacy modernization in 2026 do not spend the first thirty days procuring AI tools. They spend them answering four questions that no tool can answer for them.
First: what does this system actually do? Run the analysis agents. Produce the plain-English specification. Identify every module and its dependencies. Map the call graph. Find the dark corners where nobody has looked in five years.
Second: Which parts of this system are actively being changed? Those are the highest-priority modernization targets because they carry the highest risk as long as they stay legacy. The code that nobody touches is expensive to maintain but not actively dangerous. The code that gets modified quarterly by engineers who do not fully understand it is where the incidents come from.
Third: Who in the business understands why the system behaves the way it does? These people are your domain experts. They are not optional in an AI-assisted modernization program. They are the ones who can tell you whether the 0.003 percent fee adjustment is a bug or a contract requirement. Identify them now, before the agents start flagging things for review, so you have a process for getting those questions answered.
Fourth: What is your test coverage today? If the answer is close to zero, your first sprint is not modernization. It is characterization testing. You write tests that capture the current behavior of each module you plan to touch. Those tests become the equivalence contract that makes everything else safe.
You cannot modernize what you cannot find or understand. Strategy and AI matter, but the thread running through every successful program is: you cannot modernize what you cannot find or understand.
Further reading
- Legacy Code Modernization Playbook 2026: Sourcegraph’s Complete Practical Guide
- Azure Legacy Modernization Agents: Open Source COBOL to Java Multi-Agent Framework
- McKinsey: AI for IT Modernization — Faster, Cheaper, Better
The code that keeps the lights on deserves better than neglect
That six-week project I described at the start cost the company three times what it should have. The broken tax calculation made it into production, was caught by a finance reconciliation report three weeks later, and required an emergency patch and a manual correction run on affected transactions.
The cost was not the engineering time. The cost was the downstream consequence of a change made inside a system that nobody fully understood, with no tests to catch the breakage before it shipped.
The systems at the center of most enterprise operations are exactly like that one. They are not broken. They are working. They are just working in ways that are increasingly opaque, increasingly expensive to operate, and increasingly dangerous to modify as the people who built them move on.
AI agents do not make legacy modernization easy. Sustainable software modernization still requires clear guardrails, architectural expertise, and proven techniques such as characterization testing and seams. What they make possible is doing that work at a pace and cost that makes the decision to actually do it financially viable for the first time.
The $370 million per year that enterprises lose to legacy technical debt is not an unavoidable cost. In 2026, for the first time, it is genuinely a choice.

