Artificial Intelligence 9 min read

Mastering AI Agent Reflection: The Generate‑Reflect‑Refine Loop

This article explains the Reflection design pattern for AI agents, detailing how a three‑step generate‑reflect‑refine cycle can iteratively improve outputs, provides both a simple two‑call implementation and a structured class‑based version, and shares practical tips, benchmarks, and references to the original research.

Qborfy AI

Mar 29, 2026

Mastering AI Agent Reflection: The Generate‑Reflect‑Refine Loop

Reflection is a design pattern that enables an AI agent to "self‑review" its output, identify shortcomings, and iteratively improve the result.

What is Reflection?

The core idea is analogous to a programmer writing code, running it, encountering errors, and then fixing the code. In the AI context the steps are:

Generate : the model produces an initial answer.

Reflect : the model examines its own answer, looking for completeness, accuracy, clarity, and other issues.

Refine : based on the reflection, the model generates a better answer.

This loop can repeat until a quality threshold is reached.

Simple two‑call implementation

def reflection_agent(query, max_iterations=3):
    # First generation
    initial_response = llm.chat(f"请回答：{query}")
    current_response = initial_response
    for i in range(max_iterations):
        # Reflection prompt
        reflection_prompt = f"""
        你刚才的回答是：{current_response}
        请从以下几个方面反思这个回答：
        1. 是否完整回答了问题？
        2. 有没有遗漏重要信息？
        3. 逻辑是否清晰？
        4. 有没有可以改进的地方？
        如果有改进空间，请指出具体问题。
        如果已经很好了，请回复"无需改进"。
        """
        reflection = llm.chat(reflection_prompt)
        if "无需改进" in reflection:
            break
        # Refine based on reflection
        refine_prompt = f"""
        原始回答：{current_response}
        反思意见：{reflection}
        请根据反思意见，重新生成一个更好的回答。
        """
        current_response = llm.chat(refine_prompt)
    return current_response

The loop follows the generate → reflect → refine cycle until the model reports no further improvements.

Structured reflection with a class

class ReflectionAgent:
    def __init__(self, llm):
        self.llm = llm
    def generate(self, task):
        """First step: generate initial result"""
        prompt = f"请完成以下任务：
{task}"
        return self.llm.chat(prompt)
    def reflect(self, task, output):
        """Second step: structured reflection"""
        reflection_prompt = f"""
        任务：{task}
        你的输出：{output}
        请按以下维度进行反思，输出 JSON 格式：
        {{
            "completeness": "是否完整（1-5分）",
            "accuracy": "是否准确（1-5分）",
            "clarity": "是否清晰（1-5分）",
            "issues": ["问题1", "问题2"],
            "suggestions": ["建议1", "建议2"],
            "needs_improvement": true/false
        }}
        """
        response = self.llm.chat(reflection_prompt)
        return json.loads(response)
    def refine(self, task, output, reflection):
        """Third step: improve based on reflection"""
        refine_prompt = f"""
        任务：{task}
        当前输出：{output}
        反思意见：
        - 完整性评分：{reflection['completeness']}
        - 准确性评分：{reflection['accuracy']}
        - 存在的问题：{', '.join(reflection['issues'])}
        - 改进建议：{', '.join(reflection['suggestions'])}
        请重新生成一个更好的回答。
        """
        return self.llm.chat(refine_prompt)
    def run(self, task, max_iterations=3, quality_threshold=4):
        """Execute the full generate‑reflect‑refine pipeline"""
        output = self.generate(task)
        for i in range(max_iterations):
            reflection = self.reflect(task, output)
            scores = [reflection['completeness'], reflection['accuracy'], reflection['clarity']]
            avg_score = sum(scores) / len(scores)
            if not reflection['needs_improvement'] or avg_score >= quality_threshold:
                print(f"达到质量要求，停止迭代。平均分：{avg_score}")
                break
            print(f"第 {i+1} 轮优化，当前平均分：{avg_score}")
            output = self.refine(task, output, reflection)
        return output

This version adds a quantitative scoring system, allowing the loop to stop automatically when a predefined quality threshold is met, thus avoiding wasteful token consumption.

Practical tips

Make reflection prompts specific : ask the model to evaluate completeness, accuracy, and clarity rather than a vague "check it".

Set stopping conditions : define a quality threshold so the agent stops iterating once the score is sufficient.

Keep intermediate outputs : printing each round’s result helps debug and observe improvement.

Cold knowledge

Reflection is not new : self‑evaluation has long been a core mechanism in reinforcement learning; the pattern simply transfers it to large language models.

Origin paper : Shinn et al. (2023) introduced the concept in "Reflexion: Language Agents with Verbal Reinforcement Learning".

LangGraph support : the langgraph library can model the generate‑reflect‑refine loop as a directed graph, removing the need to write explicit loops.

Different from Self‑Consistency : Self‑Consistency generates multiple answers and selects the best (breadth), whereas Reflection repeatedly polishes a single answer (depth).

Model strength matters : GPT‑4 yields far better reflection results than GPT‑3.5; weaker models may degrade performance when forced to reflect.

References

Shinn et al., "Reflexion: Language Agents with Verbal Reinforcement Learning" (2023) – https://arxiv.org/abs/2303.11366

LangChain Reflection documentation – https://python.langchain.com/docs/use_cases/code_understanding

Andrew Ng, "AI Agentic Design Patterns" – https://www.deeplearning.ai/the-batch/how-agents-can-improve-llm-performance/

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

code generation AI agents LLM prompt engineering Reflection iteration

Written by

Qborfy AI

A knowledge base that logs daily experiences and learning journeys, sharing them with you to grow together.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.