Fix the AI-generated app before you rebuild it

The AI-generated app has problems. Something's not working. The instinct is to rebuild from scratch, this time with better prompts, a clearer specification, more careful review. The instinct is almost always wrong. Most rebuilds are unnecessary, and most of the ones that happen anyway produce the same problems with the same architectural shape because the operator didn't change what would have prevented the problems the first time.

I want to walk through the four categories that "broken AI-generated app" actually falls into, the triage sequence that distinguishes them, and the rare cases where rebuilding genuinely is the right answer. The discipline is to find out which category you're in before committing to a path, because each category has a different correct response.

Category one: stale-state breakage

The code is mostly right. The runtime isn't running the right code. The build process has cached an old version, the server is serving from a stale location, the development environment and the deployed environment diverged at some point and now disagree about what the actual app is.

This category looks like "the agent fixed it but it's still broken" or "it works locally but not deployed" or "the change went in but the behavior didn't update." The cause isn't the code; it's the gap between what the code says and what the runtime is doing.

The fix is a rebuild-and-restart of the runtime, often combined with cache clearing. Not a rebuild of the application. A rebuild of the artifacts and the running process. This is usually a minutes-to-hours of work, not days. The diagnostic is the stale-server check covered earlier in this cluster.

Throwing the whole app away because of stale-state breakage means re-doing all the work the existing app represents to solve a problem that wasn't in the work. The cost-benefit is severely against rebuild here.

Category two: surface-vs-semantic mismatch

The visible output looks right. The underlying behavior is wrong. The page shows the right number; the calculation that produced it ignores an important input. The button reads "submit" and changes state to "submitted"; the actual submission isn't happening.

This category is dangerous because it hides itself. Surface inspection passes. Tests that only check rendered output pass. The failure shows up downstream, often as data corruption or incorrect business outcomes that emerge slowly.

The fix is to identify where surface and semantic diverged, then correct the semantic layer to actually match the surface. This requires reading both layers and tracing the gap. It's diagnostic work more than rebuild work. Once the gap is found, the fix is usually surgical: change the specific function that calculates wrong, or add the actual submission code that the button was supposed to invoke.

Rebuild from scratch in this category usually reproduces the same surface-vs-semantic gap because the same prompts produce the same kinds of code. The discipline that catches it (explicit semantic checks alongside surface checks) is the same whether you're fixing or rebuilding. Better to install the discipline against the existing code than start over without it.

The pattern is covered in detail in two bugs one symptom and the gate that caught us, both of which document specific surface-vs-semantic failures and how they were caught.

Category three: structural drift

The code has accumulated changes that no longer have coherent architecture. Each individual change made sense at the time. The cumulative effect is a codebase that doesn't reflect any single architectural opinion. Patterns mix. Conventions diverge. The system works in spots and breaks in spots without clear reasons.

This category is harder to fix than the previous two because the fix isn't a single change. It requires either restoring architectural coherence (which is multiple coordinated changes) or accepting the drift and bounding it (commit to one approach going forward, mark the existing inconsistency as legacy).

The fix engagement looks like: read the codebase, capture the architectural opinion that should hold, identify the largest drift points, fix the highest-impact ones, document the rest as known debt. This is days to weeks of focused work for a meaningful app. It's also much less than a rebuild, and produces a codebase that's now coherent enough to extend.

Rebuilding in this category can be the right answer if the drift is severe enough that the fix work exceeds the rebuild work. The threshold is usually higher than operators think; even badly drifted code is often salvageable.

Category four: fundamental mismatch

The app does the wrong thing. Not in details. In purpose. The original specification was wrong, or the validation showed the product needed to change in ways the architecture can't accommodate, or the business pivoted into territory the existing system can't reach.

This category is the genuine case for rebuilding. The existing code isn't fixable because it's the wrong code; fixing it would produce a different version of the wrong thing. The right response is to use what you learned from the existing build to specify what the new build needs to do, then build the new thing deliberately.

The honest test for category four: if you could fix all the bugs and clean up all the drift in the current app, would the result be the app you actually need? If yes, you're not in category four; the fix is real. If no, the app is the wrong shape for what you need and rebuild is appropriate.

This category is rarer than operators usually assume. Most "I need to rebuild" intuitions are actually category one, two, or three with category four labeling.

The triage sequence

To find which category you're in:

First, run the stale-state check. Rebuild the artifacts. Restart the runtime. Verify what the actual current state is versus what you assumed it was. Many "broken" apps are stale-state issues that disappear when the runtime is genuinely current.

Second, run surface-vs-semantic checks on the apparent failures. For each thing that's not working right, check both the visible behavior and the underlying state. Identify gaps. If most failures are surface-vs-semantic mismatches, you're in category two and the fix is targeted semantic repairs.

Third, assess structural coherence. Read through the codebase. Does it follow one architectural pattern or many? Are conventions consistent? Are the major components reflecting one set of decisions or many? If there's significant drift across major components, you're in category three and the fix is architectural restoration.

Fourth, ask the category-four question. If the previous steps produced fixes, would the resulting app be what you actually need? If yes, finish the fixes. If no, you might be in category four and rebuilding is genuinely on the table.

Most apps are in category one or two. Many are in category three. Few are in category four. The triage saves you from category-four-labeling category-one problems and committing to rebuilds that wouldn't have been necessary.

The fix engagement structure

When the triage identifies fixable categories, the engagement that produces the fix has a specific shape:

Assessment phase. Read the codebase. Run the diagnostics. Document the actual state. Identify the specific fixes needed.

Stabilization phase. Apply the targeted fixes. Verify with the surface-vs-semantic discipline that each fix actually addresses the underlying problem. Don't expand scope beyond what was identified in assessment.

Documentation phase. Capture what was done, what assumptions the existing code makes, what known debt remains. The next team (or the next AI session) needs this context to extend without breaking.

This is the same shape as the reduced promotion gates pattern I run on my own builds. The number of gates is small enough to move through deliberately. The discipline is enforced. The output is verified at each step.

A typical fix engagement for a meaningfully broken app runs days to a couple of weeks of focused work. Much less than a rebuild. Much more than "ask the agent to try again." Right-sized for the actual problem.

When rebuild really is the answer

Even after the triage, sometimes rebuild is genuinely right. The category-four cases are real. There are also cases where:

The existing build was so opaque (no documentation, no tests, no readable code) that fixing it would take longer than rebuilding.

The original spec was so different from current needs that even a fixed version wouldn't fit.

The cost of operating the existing system, even fixed, is higher than the cost of operating a new one.

When one of these applies, rebuild is the right answer. The discipline is to verify the applicability before committing rather than defaulting to it. The rebuild should also do the things that prevent the same outcome from recurring: specifications captured, decisions logged, surface-vs-semantic discipline installed from the start.

What this means for the next step

If you're looking at a broken AI-generated app and considering a rebuild, run the triage first. The triage is hours, not days. The outcome tells you whether fixing or rebuilding is actually the right path. Most of the time the answer will be fix, which is significantly less work than rebuild. Sometimes the answer will be rebuild, in which case at least you'll know why and what to do differently.

The triage is the highest-leverage move available when an AI-generated app is broken and the next step is unclear.


If you've got an AI-generated app that's broken and you're trying to decide between fixing it and rebuilding, send the repo state, what's not working, and what the app is supposed to do. VibeKoded can scope a rescue diagnostic, stabilization sprint, or rebuild plan. → Work with VibeKoded