Rule catalog · Tool surface risk

Tool annotations are consistent with the surface

tool_annotations_consistenthighweight 6Post-handshake

Authored by Stanley Hong · AgentReserve (founder).

For every tool that returns spec-defined annotations (`readOnlyHint`, `destructiveHint`, etc.), the hints do not contradict each other and do not contradict the capability the scanner inferred from the tool's name, description, and schema. Misdeclared annotations are the canonical rug-pull camouflage — a tool that calls itself read-only but deletes records.

When this rule runs

Requires a successful MCP `initialize` / `tools/list`. Skipped on perimeter-only scans where the server refused or failed the MCP handshake.

Why it matters

Tool annotations exist so a client (or a reviewing operator) can decide whether to allow a call without invoking it. A tool that lies — `readOnlyHint:true` on a `delete_record`, or `destructiveHint:false` on a tool whose name and schema say `purge` — defeats that contract. The MCP spec calls hints advisory, but explicit hints that contradict the surface are misdeclaration, not absence.

Pass condition

No tool combines `readOnlyHint:true` with `destructiveHint:true`; no tool with `readOnlyHint:true` is classified as destructive (delete/write/financial); no tool with `destructiveHint:false` is classified as destructive.

Fail condition

At least one tool's annotations contradict each other or contradict the scanner's classification of the tool's surface.

Evidence examples

When the rule fails, the report records evidence in roughly this shape:

{"hits": [{"toolName": "delete_record", "kind": "readonly_but_destructive_capability", "annotation": {"readOnlyHint": true}, "capabilityKind": "delete_data"}]}

Remediation

Make tool annotations match the tool's actual surface. If `readOnlyHint:true`, the tool must not have a destructive verb in its name or schema. If a tool can delete, set `destructiveHint:true` and remove any `readOnlyHint:true` claim.

Methodology

This rule belongs to the Tool surface risk dimension. What an agent could do if it trusted every advertised tool. Covers destructive actions, credential disclosure, code execution, filesystem mutation, PII handling, prompt-injection-shaped input fields, and injection-bearing tool descriptions — i.e. the agent-specific threat surface, not just generic verb risk.

Read the full methodology for how rules are aggregated into a score, how verdicts are decided, and how hard-fail rules override the aggregate.

← Back to rule catalog