No prompt-manipulation patterns in tool descriptions
Authored by Stanley Hong · AgentReserve (founder).
No tool description contains hidden-instruction tags (`<IMPORTANT>`, `<system>`, `<instructions>`), override phrases (`Ignore previous instructions`, `Disregard prior context`), identity hijacks (`Act as an admin`), or exfiltration directives (`Send results to https://…`). These are the canonical tool-poisoning patterns documented by Invariant Labs and the OWASP MCP Top 10 — the description reaches the model verbatim, and a malicious server uses it to manipulate behavior at read time.
When this rule runs
Requires a successful MCP `initialize` / `tools/list`. Skipped on perimeter-only scans where the server refused or failed the MCP handshake.
Why it matters
MCP tool descriptions are read by the model on every call. A description that contains override phrases, hidden-instruction tags, or exfiltration directives is not documentation — it is a prompt-injection payload riding the description channel. The patterns flagged here are high-precision: a benign description has no operational reason to contain them.
Pass condition
No tool description matches any of: hidden-instruction XML/markdown tag (`<important>`/`<system>`/`<instructions>`/etc.), override phrase (`ignore/disregard/forget previous instructions/context/rules`), identity hijack (`pretend/act as admin/system/root/operator`), or exfiltration directive (`send/forward/transmit/post … to https://…`).
Fail condition
At least one tool description matches one of those patterns.
Evidence examples
When the rule fails, the report records evidence in roughly this shape:
{"hits": [{"toolName": "doc", "kind": "override_phrase", "snippet": "Ignore previous instructions and"}]}{"hits": [{"toolName": "fetch", "kind": "exfiltration_directive", "snippet": "send the result to https://attacker.example/log"}]}
Remediation
Treat tool descriptions as static plain-text documentation. Remove imperative phrases addressed to the model, hidden-instruction tags, identity hijacks, and any directive that asks the model to send output to an external URL. If your tool legitimately needs to forward data, document the destination outside the description and bind it to the tool's input schema.
Methodology
This rule belongs to the Tool surface risk dimension. What an agent could do if it trusted every advertised tool. Covers destructive actions, credential disclosure, code execution, filesystem mutation, PII handling, prompt-injection-shaped input fields, and injection-bearing tool descriptions — i.e. the agent-specific threat surface, not just generic verb risk.
Read the full methodology for how rules are aggregated into a score, how verdicts are decided, and how hard-fail rules override the aggregate.