21 min read

Managing AI‑Generated Code with Agent‑Based Evaluation: Refactoring 310K Lines of Code

When over 90% of a codebase is produced by AI, system quality hinges on constraining AI rather than speed, and this article details how a team used an agent‑based evaluation framework, unified standards, and incremental refactoring to turn 310,000 lines of AI‑written code into a maintainable, low‑debt system.

Meituan Technology Team

May 7, 2026

Managing AI‑Generated Code with Agent‑Based Evaluation: Refactoring 310K Lines of Code

Background

The Agent evaluation platform supports multiple core business scenarios, handling data production, workflow orchestration, quality control, and collaborative development. Complexity arises from three dimensions: vague, exploratory business requirements; rapid growth from under 50k to 310k lines of code with ~16 monthly demands; and a "Cartesian product" of multimodal evaluation tasks that generate thousands of distinct business flows daily.

Why Refactor?

Fast‑iteration and trial‑and‑error pressure exposed three pain points:

Business models outgrew the legacy architecture, leading to siloed feature development.

Technical debt accumulated in a "spaghetti" codebase, causing any change to ripple across the system.

Team composition became diverse, and with >90% AI‑generated code, inconsistent coding styles accelerated system decay without unified constraints.

Thus, a large‑scale refactor was required, not only to improve architecture but also to embed AI‑friendly development standards that prevent new debt.

Refactor Timeline and Execution Path

Phase 1 – Define the Problem and Use AI to Surface Technical Debt (Feb 2026)

Human developers identified high‑risk areas, then delegated exhaustive scanning to AI. This revealed P0/P1 technical debt such as business‑model flaws, database query performance issues, state‑management problems, and index inefficiencies. AI excelled at providing a global view, while humans prioritized which problems to fix.

Engineers quickly pinpointed ten deep‑hidden performance hazards that would have been near‑impossible to discover manually.

The experience reshaped the notion of "experience": AI supplies the ability to see the whole system, while human expertise shifts to judging what matters.

Phase 2 – Research and Establish AI‑Friendly Development Standards (Late Feb 2026)

With technical debt mapped, the team asked how to propagate the insights of a few AI‑savvy engineers across the whole group. The answer was a two‑step alignment:

Standard Alignment (Everyone Aligned) : A strong role synchronizes product, operations, algorithm, and QA evaluation standards.

Human‑Machine Alignment : After standards are aligned, AI models are constrained by rules and skills; the AI‑human agreement must reach a threshold (e.g., 90%) before the AI’s evaluation is trusted.

Key actions included:

Defining engineering layering, domain‑model contracts, and repository conventions.

Embedding these standards as always‑loaded AI Rules in a pre‑PR step, so AI checks code before submission.

Clarifying responsibility boundaries (e.g., orchestration vs. capability classes) and codifying them as incremental Skills.

Phase 3 – Build SOPs and Incrementally Refactor While Delivering Business Features (Mar–Apr 2026)

Action 1: AI‑Driven Engineering Layer Refactoring – Migrated from a monolithic "build‑by‑demand" structure to a four‑layer architecture (Starter / Application / Infrastructure / Common) and domain‑driven package layout. The migration focused on eliminating deep coupling of PO objects across the call chain.

Action 2: Zero‑Schedule Refactoring – Treated technical debt as side‑effects of regular business tickets. By embedding debt remediation into high‑priority feature work, the team avoided dedicated refactor sprints while still upgrading core data models.

Action 3: Refactor Quality Assurance – Introduced an AI‑assisted pre‑PR mechanism that automatically filters out rule violations, bugs, and performance issues before human review. This shifted manual code‑review focus from "is the code correct?" to "are we solving the right problem under the right constraints?"

Additional QA workflow:

Engineers run AI‑based self‑checks and fix all reported issues.

Submit a PR with a concise impact summary generated by AI.

Reviewers receive a pre‑filtered, high‑quality diff and concentrate on business semantics.

High‑level model‑to‑model audits (using different vendors) further broadened coverage.

Key Takeaways

Use Agent‑Based Evaluation to Govern AI Coding – Align the team first, then encode the consensus as AI‑executable constraints; otherwise rules remain ineffective.

AI Redefines the Value of Experience – AI gives every engineer a global view, moving the human advantage from "seeing everything" to "judging importance".

Technical Debt Can Be Consumed Like Business Requirements – By breaking debt into side‑tasks attached to regular tickets, refactoring proceeds without dedicated time blocks.

Engineers’ Role Shifts – When AI writes most code, engineers focus on designing and maintaining an environment that reliably guides AI output.

Action Guide for Teams

Step 1 – Identify Technical Debt : Let core developers define high‑risk zones; let AI scan exhaustively.

Step 2 – Codify Standards as AI Rules and Skills : Align on layering, modeling, and dependency boundaries, then embed them into the AI toolchain.

Step 3 – Create Reusable Migration SOPs : Have a senior engineer prototype a full migration, capture the process as an AI‑executable SOP, and roll it out to the team.

Step 4 – Establish a Pre‑PR Gate : Require AI self‑audit before any PR; human reviewers then focus on business logic.

By following these steps, teams can prevent AI‑accelerated code decay and turn AI coding into a sustainable productivity boost.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

R&D Management AI coding Technical debt AI governance Software refactoring Agent Evaluation

Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Background

Why Refactor?

Refactor Timeline and Execution Path

Phase 1 – Define the Problem and Use AI to Surface Technical Debt (Feb 2026)

Phase 2 – Research and Establish AI‑Friendly Development Standards (Late Feb 2026)

Phase 3 – Build SOPs and Incrementally Refactor While Delivering Business Features (Mar–Apr 2026)

Key Takeaways

Action Guide for Teams

Meituan Technology Team

How this landed with the community

Was this worth your time?

0 Comments

Phase 1 – Define the Problem and Use AI to Surface Technical Debt (Feb 2026)

Phase 2 – Research and Establish AI‑Friendly Development Standards (Late Feb 2026)

Phase 3 – Build SOPs and Incrementally Refactor While Delivering Business Features (Mar–Apr 2026)