Artificial Intelligence 14 min read

Execution → Observation → Reflection → Improvement: How Hermes Closes the Skill Loop

The article dissects Hermes' background review mechanism, showing how a silent daemon thread performs post‑conversation reflection, writes valuable insights to a skill or memory store, shares prompt designs, fork‑agent isolation, priority update rules, and common pitfalls for building continuously learning LLM agents.

James' Growth Diary

May 24, 2026

Execution → Observation → Reflection → Improvement: How Hermes Closes the Skill Loop

Why "Finish Execution" Is Wasteful

Traditional ReAct loops stop after Reasoning → Action → Observation, so any learned tricks or user corrections disappear when the session ends. Hermes adds a fourth step, Review , which runs asynchronously after the dialogue finishes, allowing the agent to persist knowledge across sessions without blocking the user.

Traditional ReAct : Reasoning → Action → Observation – no cross‑session learning.

LangGraph : adds a State update but still limited to a single session.

Hermes : Reasoning → Action → Observation → Background Review – persists updates to a skill store.

Three Review Prompts: Memory vs Skill

The file background_review.py defines three prompts that target different review dimensions.

# _MEMORY_REVIEW_PROMPT – save user persona, preferences, expectations
_MEMORY_REVIEW_PROMPT = (
    "Review the conversation above and consider saving to memory if appropriate.

"
    "Focus on:
"
    "1. Has the user revealed personal details worth remembering?
"
    "2. Has the user expressed expectations about how you should behave?

"
    "If something stands out, save it using the memory tool."
    "If nothing is worth saving, just say 'Nothing to save.' and stop."
)

# _SKILL_REVIEW_PROMPT – capture new techniques, corrections, outdated skills
_SKILL_REVIEW_PROMPT = (
    "Be ACTIVE — most sessions produce at least one skill update, even if small. A pass that does nothing is a missed learning opportunity...

"
    "Signals to look for:
"
    "  • User corrected your style, tone, format, or verbosity.
"
    "  • Non‑trivial technique, fix, workaround, or debugging path emerged.
"
    "  • A skill that was loaded turned out to be wrong or outdated."
)

# _COMBINED_REVIEW_PROMPT – default, merges both aspects
_COMBINED_REVIEW_PROMPT = (
    "**Memory**: who the user is — persona, preferences, personal details...
"
    "**Skills**: how to do this class of task — be ACTIVE, capture corrections."
)

The skill prompt’s opening sentence deliberately biases the LLM toward action: "Be ACTIVE — most sessions produce at least one skill update…" This counteracts the model’s default conservatism.

Non‑Blocking Daemon Thread Design

The entry point spawn_background_review_thread creates a daemon thread so the main conversation can return immediately.

# agent/background_review.py (simplified)
import threading

def spawn_background_review_thread(
    agent,
    messages_snapshot: list,
    *,
    review_memory: bool = True,
    review_skills: bool = False,
) -> threading.Thread | None:
    if review_memory and review_skills:
        prompt = _COMBINED_REVIEW_PROMPT
    elif review_skills:
        prompt = _SKILL_REVIEW_PROMPT
    else:
        prompt = _MEMORY_REVIEW_PROMPT

    t = threading.Thread(
        target=_run_review_in_thread,
        args=(agent, messages_snapshot, prompt),
        daemon=True,   # auto‑cleanup on process exit, no user wait
        name="bg-review",
    )
    t.start()
    return t

Setting daemon=True gives priority to user experience: if the user aborts, the review is discarded. The article contrasts this with a Node.js implementation that uses worker.unref() for equivalent behaviour.

Forked Review Agent: Inheritance vs Isolation

The review agent is a lightweight clone of the main AIAgent. It inherits the provider, model, and API key to share the prefix cache, but isolates session state, iteration limits, and tool whitelist.

# Fork agent core logic
review_agent = AIAgent(
    model=agent.model,               # inherit cache
    provider=agent.provider,         # same provider
    api_key=agent.api_key,           # same credentials
    max_iterations=16,              # limit to avoid runaway cost
    quiet_mode=True,                # suppress output
    parent_session_id=agent.session_id,
    skip_memory=True,               # do not trigger external memory plugins
)
# Bind parent memory store manually
review_agent._memory_write_origin = "background_review"
review_agent._memory_store = agent._memory_store
review_agent._memory_nudge_interval = 0   # prevent recursive review

provider / model / api_key : inherited – shares prefix cache, saves tokens.

session_id : new – avoids contaminating main session history.

max_iterations : set to 16 – prevents costly runaway review.

external memory plugins : skipped – prevents harness prompt from polluting user memory.

tool whitelist : only memory and skill_manage – disallows shell commands or dangerous actions.

memory_nudge_interval : set to 0 – avoids recursive review triggers.

Because the forked agent shares the same provider/model, Anthropic’s KV cache can be hit, reducing the incremental cost of a review to only a few hundred output tokens.

Four‑Level Skill Update Priority Chain

When the review agent decides to write a skill, it follows a strict priority order extracted from _SKILL_REVIEW_PROMPT:

SKILL_UPDATE_PRIORITY = [
    "UPDATE_CURRENTLY_LOADED_SKILL",   # 1. Update skill loaded in this session
    "UPDATE_EXISTING_UMBRELLA",      # 2. Update an existing umbrella skill
    "ADD_SUPPORT_FILE_UNDER_UMBRELLA",# 3. Add support files (references, templates, scripts)
    "CREATE_NEW_CLASS_LEVEL_SKILL",   # 4. Create a brand‑new class‑level skill
]

A TypeScript helper isValidNewSkillName enforces naming conventions, rejecting patterns like fix‑, debug‑, dates, or PR numbers to keep the skill library tidy.

function isValidNewSkillName(name: string): boolean {
  const ANTI_PATTERNS = [
    /^fix-/i,
    /^debug-/i,
    /^audit-.*-today$/i,
    /^PR-\d+/i,
    /-\d{4}-\d{2}-\d{2}$/ // date suffix
  ];
  // "git-workflow" → valid | "fix-login-bug-2026-05-23" → invalid
  return !ANTI_PATTERNS.some(p => p.test(name));
}

This priority chain prevents skill‑library fragmentation. Unrestricted creation would quickly fill the repository with noisy files such as fix‑xxx‑2026‑05‑23, making reuse impossible.

Summarizing Review Results for the User

After the background review finishes, Hermes calls summarize_background_review_actions to extract a readable list of successful writes, filtering out any tool calls that already appeared in the main conversation.

# agent/background_review.py
def summarize_background_review_actions(review_messages: list, prior_snapshot: list) -> list[str]:
    """Iterate review agent tool results and collect successful write summaries."""
    existing_ids = {
        m.get("tool_call_id")
        for m in prior_snapshot
        if m.get("role") == "tool" and m.get("tool_call_id")
    }
    actions = []
    for msg in review_messages:
        if msg.get("role") != "tool":
            continue
        if msg.get("tool_call_id") in existing_ids:
            continue
        data = json.loads(msg.get("content", "{}"))
        if not data.get("success"):
            continue
        message = data.get("message", "")
        if "created" in message.lower() or "updated" in message.lower():
            actions.append(message)
    return actions

The deduplication ensures the user sees only new actions, e.g., "Skill 'git‑workflow' updated" or "Memory updated".

Common Pitfalls

Trigger interval too low : Review incurs LLM inference cost. A sensible default is every 5‑10 rounds.

Inheriting the full tool list : The review agent should only have memory and skill_manage. A broader list could execute unsafe commands.

Recording environment errors as skills : Failures like pip install xxx are runtime issues, not reusable knowledge; storing them makes the agent incorrectly avoid those tools later.

Assuming review blocks the response : Because it runs in a daemon thread, the main dialogue returns immediately; aborting the process simply cancels the review.

Conclusion

Hermes achieves a persistent skill loop not by memorising the dialogue but by launching an independent background review agent after each session. The review agent inherits the parent’s provider/model to share the prefix cache, keeping incremental token cost low. Skill writes follow a four‑level priority chain that prevents library fragmentation, and the deliberately aggressive "Be ACTIVE" prompt forces the LLM to treat inaction as a failure, overcoming its default conservatism.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

ReAct Hermes LLM agents Daemon Thread prefix cache skill management Background Review

Written by

James' Growth Diary

I am James, focusing on AI Agent learning and growth. I continuously update two series: “AI Agent Mastery Path,” which systematically outlines core theories and practices of agents, and “Claude Code Design Philosophy,” which deeply analyzes the design thinking behind top AI tools. Helping you build a solid foundation in the AI era.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.