Information Security 7 min read

Toxic Agent Flow: Exploiting GitHub MCP to Leak Private Repositories via Prompt Injection

A newly disclosed vulnerability in GitHub's Model‑Centric Programming (MCP) enables attackers to hijack AI agents through crafted GitHub Issues, injecting malicious prompts that cause the assistant to retrieve and expose private repository data, while the article also outlines mitigation strategies and defensive code examples.

Architecture Digest

Jun 4, 2025

Toxic Agent Flow: Exploiting GitHub MCP to Leak Private Repositories via Prompt Injection

Recent reports reveal a critical vulnerability in GitHub's Model‑Centric Programming (MCP) integration that allows attackers to hijack AI agents such as Claude Desktop through crafted GitHub Issues.

The attack, dubbed “Toxic Agent Flow”, does not compromise the MCP server or the underlying large model; instead it injects malicious prompts into a public repository issue, causing the AI assistant to retrieve and expose data from a private repository.

Attackers can place a malicious issue in a public repo (e.g., username/public-repo) and, when the user queries the issue list, the AI agent automatically calls the MCP tool, follows the injected prompt, accesses the private repo (e.g., username/private-repo) and leaks project names, migration plans, salaries, etc., via a new pull request in the public repo.

The article provides a step‑by‑step demonstration, screenshots of the chat flow, and a list of leaked information extracted during testing.

To mitigate the risk, the authors propose two defensive strategies: (1) data‑flow permission controls using tools like Invariant Guardrails, illustrated with a policy snippet that restricts each session to a single repository; and (2) continuous security monitoring with the Invariant MCP‑scan scanner, which audits agent tool calls in real time.

raise Violation("You can access only one repo per session.")
if:
    (call_before: ToolCall) -> (call_after: ToolCall)
    call_before.function.name in (...repo 操作集)
    call_after.function.name in (...repo 操作集)
    call_before.arguments["repo"] != call_after.arguments["repo"] or
    call_before.arguments["owner"] != call_after.arguments["owner"]

They also emphasize that model alignment alone cannot prevent such indirect attacks and that system‑level safeguards—access control, flow isolation, and real‑time monitoring—are essential.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

MCP GitHub prompt injection AI security Agent Defense Toxic Agent Flow

Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.