Artificial Intelligence 17 min read

AI-Powered Code Review System: Design, Implementation, and Lessons Learned

The team built a low‑cost AI‑powered code‑review assistant that injects line‑level comments into GitLab merge requests, using LLMs via Feishu, iterating quickly through MVP and optimization phases, achieving 64 integrations, 150+ daily comments, feedback‑driven prompt refinement, and demonstrating high ROI for small‑to‑medium teams while outlining future IDE and rule‑based extensions.

Youzan Coder
Youzan Coder
Youzan Coder
AI-Powered Code Review System: Design, Implementation, and Lessons Learned

At the beginning of the year, the rapid rise of deepseek sparked interest in leveraging large language models ( LLM ) for innovative applications. Our team decided to explore how AI could assist the existing code‑review (CR) workflow while keeping costs low and focusing on rapid iteration.

Core Principles

Quick action, continuous validation : Turn ideas into practice as soon as possible.

Focus on ROI : Small‑to‑medium teams should avoid heavy investment in low‑level platform building.

Result‑oriented, process‑aware : Keep learning and documenting the exploration process.

Guided by these principles, we identified the code‑review process ( CR ) as a pain point—issues with standardization, depth, and manpower. We therefore set out to build an AI CR solution that improves efficiency and quality while accumulating AI‑application experience.

Project Progress and Results

Overall Workflow

The diagram below (originally an image) illustrates the end‑to‑end flow: MR event → webhook notification → diff extraction → prompt construction → LLM analysis → structured comment generation → comment injection.

Development Timeline

Basic functionality (usable stage) Implemented AI CR → comment feedback pipeline (3 person‑days).

Process optimization (improvement stage) Enhanced user experience and system response (5‑7 person‑days).

Prompt‑engineering refinement (deepening stage) Improved AI analysis accuracy; currently in progress.

Key Metrics

Metric

Value

Integrated applications

64

units

Average daily comments

150+

Positive feedback rate

2%

(low but acceptable)

Negative feedback rate

4%

Product Evolution

We documented the evolution of the project, focusing on problem solving and process thinking.

Version 0.1 – MVP Exploration

We evaluated three integration schemes:

Solution

Description

Evaluation

Standalone platform

Build a dedicated platform for all

CR

operations.

❌ Too costly, conflicts with low‑investment goal.

Report‑output

Generate an AI audit report after each MR.

⚠️ Weak usability, low relevance.

Inline line‑level comments

Provide AI comments directly on the MR diff view.

✅ Best fit: user‑friendly, high integration, but context‑limited.

We chose the line‑level comment approach because it offers fine‑grained feedback while preserving developers' familiar workflow.

Basic Process :

用户提 MR → 调用 LLM → 写入 MR 评论

Key Component Breakdown

MR Event Capture and Diff Parsing

We rely on GitLab's API and webhook mechanisms:

MR event notification : webhook configured for Merge Request Event .

MR diff retrieval : /projects/${id}/merge_requests/${mrId}/changes .

After obtaining the diff, we transform it into a structured format (file metadata, change type, line numbers, content, and surrounding context) so the LLM can understand the exact location of modifications.

{
  "file_meta": {
    "path": "current file path",
    "old_path": "original path if renamed",
    "lines_changed": "number of changed lines"
  },
  "changes": [
    {
      "type": "add/delete",
      "old_line": "line number in old file (null if added)",
      "new_line": "line number in new file (null if deleted)",
      "content": "changed line content",
      "context": {"old": "old context", "new": "new context"}
    }
  ]
}

LLM Invocation and Comment Writing

The LLM call is orchestrated through Feishu's Aily platform, which provides prompt composition, knowledge handling, and result tuning. The generated comment is then posted back to GitLab via /projects/${id}/merge_requests/${mrId}/discussions .

Prompt‑Engineering Guidelines

Clearly define the review scope.

Specify input data structure.

Standardize output format.

Example prompt skeleton (simplified):

# Role
You are a professional review expert.
# Review dimensions and criteria (ordered by priority)
...
# Input format
{
  "file_meta": {...},
  "changes": [{...}]
}
# Output format
[{
  "file": "path",
  "lines": {"old": null, "new": 12},
  "category": "issue type",
  "severity": "critical/high/medium/low",
  "analysis": "brief technical analysis",
  "suggestion": "actionable fix with code example"
}]

Version 0.2 – Iterative Optimization

After the MVP launch, we identified several problems:

Problem

Description

Over‑commenting

Too many AI comments cause noise.

Insufficient quality

Comments lack depth and actionable insight.

No feedback loop

Missing mechanism to evaluate AI comment quality.

Single rule set

Inconsistent business standards across teams.

Solutions

Process only the initial MR diff (full change) and treat subsequent commits as incremental CR updates. // Action enum – handle only open and update with actual commits const actionEnum = ['open', 'update', 'close', 'reopen', 'merge', 'unmerge', 'approved', 'unapproved']; // Capture correct diff refs for comment insertion { base_sha: mrChangeInfo.data.diff_refs.base_sha, start_sha: mrChangeInfo.data.diff_refs.start_sha, head_sha: mrChangeInfo.data.diff_refs.head_sha, }

Filter comments to output only High severity or above. # Severity standards 1. Critical – system crash / data loss 2. High – functional defect / security issue 3. Medium – potential risk / code smell 4. Low – style issue (non‑functional)

Introduce a feedback button on AI comments to collect user evaluations for future prompt tuning.

Expose business‑specific review rules via Feishu multi‑dimensional tables, allowing per‑application customization.

Feedback Mechanism

We added a one‑click feedback script that records both the comment content and user rating, feeding the data back into the AI model for continuous improvement.

Business Customization Capability

By linking Feishu documents and tables to the Aily workflow, each application can define its own review standards, which are dynamically injected into the prompt at runtime.

Outlook and Summary

Two rapid iterations delivered the original goal: low‑cost optimization of the code‑review process. Highlights include:

High ROI : ~10 person‑days produced integration with 64 applications, averaging 150+ daily AI comments and uncovering over 50 actionable issues.

Seamless integration : Inline line‑level comments fit naturally into existing GitLab MR workflows.

Fast iteration : From problem discovery to solution deployment within weeks.

Continuous refinement : Feedback loops and prompt‑engineering keep the system improving.

We view AI‑assisted CR as an assistant rather than a full replacement for human reviewers, given current model limitations. The pragmatic approach—leveraging existing APIs (GitLab, Feishu) and focusing on prompt engineering—delivers the best cost‑benefit for small‑to‑medium teams.

Future directions include maintaining flexibility to adopt emerging LLM capabilities, deeper integration with IDE plugins, and expanding rule‑based custom checks for specific business domains.

AIAutomationLLMprompt engineeringsoftware developmentCode ReviewGitLab
Youzan Coder
Written by

Youzan Coder

Official Youzan tech channel, delivering technical insights and occasional daily updates from the Youzan tech team.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.