Milestone raises $10M to maximize the ROI of generative AI coding for enterprises 🎉

Back to QA lobby

Refactoring old code sounds easy until you have to deal with a codebase that’s 10 years old, lacks tests, has inconsistent naming, and business logic scattered over files that no one really understands. AI can really help, but the quality of the results depends a lot on how you frame the problem. Bad prompts make rewrites that are too general and break things in small ways.

Give the Model More Than Just the Code

It’s not very useful to paste a function and say, “Refactor this.” The model doesn’t know what the function is supposed to do, what it’s connected to, or why it was written the way it was. Before you write the prompt, gather the relevant context: the function signature, its callers, any known limitations, and a short description of what the module does.

A prompt that begins with “This is a payment processing module from a Rails 4 app.” The method below handles charge retries, logs, and provides status updates for some customers. If you say, “I need to separate those concerns without changing the external interface,” you’ll get something much closer to usable than just pasting it.

Plan the Task Carefully

One of the most common mistakes people make when using AI for code refactoring is treating it like a big job. Giving it the command to “refactor the entire service layer” is too much. The output is either overly confident or too cautious, making only minimal changes.

Slice the work into pieces:

  • Single-responsibility problems: Ask it to identify where a class or function is doing too much, and then work on each one separately.
  • Naming and readability: Keep these separate from changes to the logic so you can review them separately.
  • Duplication: Give it two or three similar code blocks and ask it to identify a shared abstraction. You make the final decision, though.

Keeping the scope tight also makes the output easier to review. A 40-line diff is reviewable. A 400-line rewrite of legacy code without tests is a risk most teams shouldn’t take. Tracking your refactor rate across sprints helps here, too. If the ratio of refactored code to new output keeps climbing without a corresponding drop in code smells, the prompting strategy probably needs adjusting.

Be Clear About Limits

There are limits on legacy systems that aren’t obvious in the code. You might not be able to change the name of a database column if it is used by an external API. Because it’s wired into a job queue, a certain method might have to remain synchronous. Unless you tell it, AI doesn’t know anything about that.

Make it a habit to include a section on constraints in your prompts. “Don’t change the name of the process_order method. The return value must still be a hash with keys for status and message. Assume that there is no test coverage.” This significantly affects the model’s goal.

Ask for an explanation along with the code

When you use AI to refactor code that someone else wrote, getting only the refactored output provides only half the value. Tell the model to tell you what it changed and why. This does a few things: it shows you what the model thought, it helps you find places where it got the original meaning wrong, and it gives you something to write in the commit message or code review notes.

A simple addition like “After the refactored code, explain each change and flag anything you’re uncertain about” makes the output significantly more useful in a real review workflow. This matters especially when you’re working in a red-green refactor cycle and need to verify that behavior hasn’t shifted before tests go green again.

Do small passes of iteration

Don’t try to get the best result all at once. Use the first answer as a rough draft. Look it over, push back on certain parts, and ask more questions. Let us know if an abstraction doesn’t look right. If the model made a pattern that doesn’t fit your stack, send it back.

This step-by-step method is more like how you would work with a junior engineer than running a script. The model responds well to specific corrections, and over the course of a session, it becomes increasingly aligned with the conventions of your actual codebase.

Last Thoughts

Well-structured prompting for refactoring legacy code is mostly about giving the model enough information so that it doesn’t have to guess. The more you treat it like a helpful but clueless coworker, the better the results tend to be. It won’t make the decisions you have to make about risk and architecture for you, but with the right prompts, it can take a lot of the mechanical work off your plate.

Ready to Transform
Your GenAI
Investments?

Don’t leave your GenAI adoption to chance. With Milestone, you can achieve measurable ROI and maintain a competitive edge.
Website Design & Development InCreativeWeb.com