AI-Powered Code Review System: Design, Implementation, and Lessons Learned
The team built a low‑cost AI‑powered code‑review assistant that injects line‑level comments into GitLab merge requests, using LLMs via Feishu, iterating quickly through MVP and optimization phases, achieving 64 integrations, 150+ daily comments, feedback‑driven prompt refinement, and demonstrating high ROI for small‑to‑medium teams while outlining future IDE and rule‑based extensions.
At the beginning of the year, the rapid rise of deepseek sparked interest in leveraging large language models ( LLM ) for innovative applications. Our team decided to explore how AI could assist the existing code‑review (CR) workflow while keeping costs low and focusing on rapid iteration.
Core Principles
Quick action, continuous validation : Turn ideas into practice as soon as possible.
Focus on ROI : Small‑to‑medium teams should avoid heavy investment in low‑level platform building.
Result‑oriented, process‑aware : Keep learning and documenting the exploration process.
Guided by these principles, we identified the code‑review process ( CR ) as a pain point—issues with standardization, depth, and manpower. We therefore set out to build an AI CR solution that improves efficiency and quality while accumulating AI‑application experience.
Project Progress and Results
Overall Workflow
The diagram below (originally an image) illustrates the end‑to‑end flow: MR event → webhook notification → diff extraction → prompt construction → LLM analysis → structured comment generation → comment injection.
Development Timeline
Basic functionality (usable stage) Implemented AI CR → comment feedback pipeline (3 person‑days).
Process optimization (improvement stage) Enhanced user experience and system response (5‑7 person‑days).
Prompt‑engineering refinement (deepening stage) Improved AI analysis accuracy; currently in progress.
Key Metrics
Metric
Value
Integrated applications
64units
Average daily comments
150+Positive feedback rate
2%(low but acceptable)
Negative feedback rate
4%Product Evolution
We documented the evolution of the project, focusing on problem solving and process thinking.
Version 0.1 – MVP Exploration
We evaluated three integration schemes:
Solution
Description
Evaluation
Standalone platform
Build a dedicated platform for all
CRoperations.
❌ Too costly, conflicts with low‑investment goal.
Report‑output
Generate an AI audit report after each MR.
⚠️ Weak usability, low relevance.
Inline line‑level comments
Provide AI comments directly on the MR diff view.
✅ Best fit: user‑friendly, high integration, but context‑limited.
We chose the line‑level comment approach because it offers fine‑grained feedback while preserving developers' familiar workflow.
Basic Process :
用户提 MR → 调用 LLM → 写入 MR 评论Key Component Breakdown
MR Event Capture and Diff Parsing
We rely on GitLab's API and webhook mechanisms:
MR event notification : webhook configured for Merge Request Event .
MR diff retrieval : /projects/${id}/merge_requests/${mrId}/changes .
After obtaining the diff, we transform it into a structured format (file metadata, change type, line numbers, content, and surrounding context) so the LLM can understand the exact location of modifications.
{
"file_meta": {
"path": "current file path",
"old_path": "original path if renamed",
"lines_changed": "number of changed lines"
},
"changes": [
{
"type": "add/delete",
"old_line": "line number in old file (null if added)",
"new_line": "line number in new file (null if deleted)",
"content": "changed line content",
"context": {"old": "old context", "new": "new context"}
}
]
}LLM Invocation and Comment Writing
The LLM call is orchestrated through Feishu's Aily platform, which provides prompt composition, knowledge handling, and result tuning. The generated comment is then posted back to GitLab via /projects/${id}/merge_requests/${mrId}/discussions .
Prompt‑Engineering Guidelines
Clearly define the review scope.
Specify input data structure.
Standardize output format.
Example prompt skeleton (simplified):
# Role
You are a professional review expert.
# Review dimensions and criteria (ordered by priority)
...
# Input format
{
"file_meta": {...},
"changes": [{...}]
}
# Output format
[{
"file": "path",
"lines": {"old": null, "new": 12},
"category": "issue type",
"severity": "critical/high/medium/low",
"analysis": "brief technical analysis",
"suggestion": "actionable fix with code example"
}]Version 0.2 – Iterative Optimization
After the MVP launch, we identified several problems:
Problem
Description
Over‑commenting
Too many AI comments cause noise.
Insufficient quality
Comments lack depth and actionable insight.
No feedback loop
Missing mechanism to evaluate AI comment quality.
Single rule set
Inconsistent business standards across teams.
Solutions
Process only the initial MR diff (full change) and treat subsequent commits as incremental CR updates. // Action enum – handle only open and update with actual commits const actionEnum = ['open', 'update', 'close', 'reopen', 'merge', 'unmerge', 'approved', 'unapproved']; // Capture correct diff refs for comment insertion { base_sha: mrChangeInfo.data.diff_refs.base_sha, start_sha: mrChangeInfo.data.diff_refs.start_sha, head_sha: mrChangeInfo.data.diff_refs.head_sha, }
Filter comments to output only High severity or above. # Severity standards 1. Critical – system crash / data loss 2. High – functional defect / security issue 3. Medium – potential risk / code smell 4. Low – style issue (non‑functional)
Introduce a feedback button on AI comments to collect user evaluations for future prompt tuning.
Expose business‑specific review rules via Feishu multi‑dimensional tables, allowing per‑application customization.
Feedback Mechanism
We added a one‑click feedback script that records both the comment content and user rating, feeding the data back into the AI model for continuous improvement.
Business Customization Capability
By linking Feishu documents and tables to the Aily workflow, each application can define its own review standards, which are dynamically injected into the prompt at runtime.
Outlook and Summary
Two rapid iterations delivered the original goal: low‑cost optimization of the code‑review process. Highlights include:
High ROI : ~10 person‑days produced integration with 64 applications, averaging 150+ daily AI comments and uncovering over 50 actionable issues.
Seamless integration : Inline line‑level comments fit naturally into existing GitLab MR workflows.
Fast iteration : From problem discovery to solution deployment within weeks.
Continuous refinement : Feedback loops and prompt‑engineering keep the system improving.
We view AI‑assisted CR as an assistant rather than a full replacement for human reviewers, given current model limitations. The pragmatic approach—leveraging existing APIs (GitLab, Feishu) and focusing on prompt engineering—delivers the best cost‑benefit for small‑to‑medium teams.
Future directions include maintaining flexibility to adopt emerging LLM capabilities, deeper integration with IDE plugins, and expanding rule‑based custom checks for specific business domains.
Youzan Coder
Official Youzan tech channel, delivering technical insights and occasional daily updates from the Youzan tech team.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.