Rule catalog · Tool surface risk

Destructive tools positively declare destructiveHint

destructive_tools_declare_destructive_hintmediumweight 4Post-handshake

Authored by Stanley Hong · AgentReserve (founder).

Every tool the capability classifier flags as destructive (delete/write/financial verb in name, description, or schema) sets `annotations.destructiveHint: true`. Companion rule to `tool_annotations_consistent` — that rule fires only on contradictions (`destructiveHint: false` on a destructive tool). This rule fires on silence: a destructive-classified tool whose annotations omit `destructiveHint` entirely. Silence is exactly the camouflage pattern an operator needs flagged.

When this rule runs

Requires a successful MCP `initialize` / `tools/list`. Skipped on perimeter-only scans where the server refused or failed the MCP handshake.

Why it matters

MCP annotations exist so a client can decide whether to allow a call without invoking it. A tool that the classifier marks as destructive but advertises no annotations leaves the trust contract one-sided: the client either trusts the verb in the name (which the operator can rename next week) or invokes the tool to find out — neither acceptable. A positive `destructiveHint:true` is the spec-compliant way for the server to confirm what the surface implies.

Pass condition

Every destructive-classified tool sets `annotations.destructiveHint: true` (or has its silence covered by an explicit consistency-rule failure already).

Fail condition

At least one destructive-classified tool advertises annotations without `destructiveHint`, or advertises no annotations at all.

Evidence examples

When the rule fails, the report records evidence in roughly this shape:

{"hits": [{"toolName": "delete_record", "capabilityKind": "delete_data"}]}

Remediation

Set `annotations: { destructiveHint: true }` on every tool whose verb (delete/drop/purge/wipe/truncate/write/transfer/charge/refund/...) implies state mutation. Treat the annotation as a positive contract, not a default.

Methodology

This rule belongs to the Tool surface risk dimension. What an agent could do if it trusted every advertised tool. Covers destructive actions, credential disclosure, code execution, filesystem mutation, PII handling, prompt-injection-shaped input fields, and injection-bearing tool descriptions — i.e. the agent-specific threat surface, not just generic verb risk.

Read the full methodology for how rules are aggregated into a score, how verdicts are decided, and how hard-fail rules override the aggregate.

← Back to rule catalog