Information Security 7 min read

Toxic Agent Flow: Exploiting GitHub MCP to Leak Private Repositories via Prompt Injection

A newly disclosed vulnerability in GitHub's Model‑Centric Programming (MCP) enables attackers to hijack AI agents through crafted GitHub Issues, injecting malicious prompts that cause the assistant to retrieve and expose private repository data, while the article also outlines mitigation strategies and defensive code examples.

Architecture Digest
Architecture Digest
Architecture Digest
Toxic Agent Flow: Exploiting GitHub MCP to Leak Private Repositories via Prompt Injection

Recent reports reveal a critical vulnerability in GitHub's Model‑Centric Programming (MCP) integration that allows attackers to hijack AI agents such as Claude Desktop through crafted GitHub Issues.

The attack, dubbed “Toxic Agent Flow”, does not compromise the MCP server or the underlying large model; instead it injects malicious prompts into a public repository issue, causing the AI assistant to retrieve and expose data from a private repository.

Attackers can place a malicious issue in a public repo (e.g., username/public-repo ) and, when the user queries the issue list, the AI agent automatically calls the MCP tool, follows the injected prompt, accesses the private repo (e.g., username/private-repo ) and leaks project names, migration plans, salaries, etc., via a new pull request in the public repo.

The article provides a step‑by‑step demonstration, screenshots of the chat flow, and a list of leaked information extracted during testing.

To mitigate the risk, the authors propose two defensive strategies: (1) data‑flow permission controls using tools like Invariant Guardrails, illustrated with a policy snippet that restricts each session to a single repository; and (2) continuous security monitoring with the Invariant MCP‑scan scanner, which audits agent tool calls in real time.

<code>raise Violation("You can access only one repo per session.")
if:
    (call_before: ToolCall) -> (call_after: ToolCall)
    call_before.function.name in (...repo 操作集)
    call_after.function.name in (...repo 操作集)
    call_before.arguments["repo"] != call_after.arguments["repo"] or
    call_before.arguments["owner"] != call_after.arguments["owner"]
</code>

They also emphasize that model alignment alone cannot prevent such indirect attacks and that system‑level safeguards—access control, flow isolation, and real‑time monitoring—are essential.

MCPGitHubPrompt InjectionAI securityAgent DefenseToxic Agent Flow
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.