Rule catalog · Metadata transparency

Tool surface unchanged since the previous scan

tool_surface_unchanged_since_last_scanmediumweight 5Post-handshake

Authored by Stanley Hong · AgentReserve (founder).

The advertised tool surface (each tool's `name`, `description`, `inputSchema`, and `annotations`) hashes to the same value as the most recent prior scan of this server. A drift in any of those fields is the rug-pull pattern documented by the `postmark-mcp` September 2025 incident: a server changes a tool's behavior or name post-install while the user's existing approval still grants the agent access. The rule is silent on first scans (no prior surface to compare against).

When this rule runs

Requires a successful MCP `initialize` / `tools/list`. Skipped on perimeter-only scans where the server refused or failed the MCP handshake.

Why it matters

Approvals to use a tool are granted against a snapshot of its surface. When the surface changes silently — a `delete_temporary_files` is renamed to `delete_files`, or a `summarize` description gains an exfiltration directive — the original approval no longer reflects what the agent will actually do. There is no in-band MCP signal for this drift today; comparing hashes across scans is the cheapest defensible mitigation.

Pass condition

A prior scan exists for this server and its tool-surface hash matches the current scan's hash exactly.

Fail condition

A prior scan exists and its tool-surface hash differs from the current scan's hash.

Evidence examples

When the rule fails, the report records evidence in roughly this shape:

{"changed": true, "currentHash": "ab12…", "priorHash": "98ef…"}

Remediation

Treat tool-surface drift as a re-review event. Diff the current `tools/list` response against the previous one (the report links the prior scan), validate the change against an operator-known release, and rotate any approvals that depended on the old shape.

Methodology

This rule belongs to the Metadata transparency dimension. Whether the server identifies itself and documents its tools — and whether the advertised identity matches the wire identity (cert CN/SAN, hostname). Operators need a stable name, a version, and an internally consistent identity claim to perform any kind of audit.

Read the full methodology for how rules are aggregated into a score, how verdicts are decided, and how hard-fail rules override the aggregate.

← Back to rule catalog