How we score MCP servers.
Passive observation. Public rules. Deterministic math. The score reflects the surface area an agent inherits by trusting a server — not the probability that it’s malicious.
Current methodology: v2026-05-06-rebalance-and-false-positive-disclosure
Two questions, two side-probes.
We ask every server two questions, then leave.
- Who are you?— the server’s name, version, and what it claims to support. (
initialize) - What tools do you offer? — the menu, with descriptions and input schemas. (
tools/list)
We read the menu. We never order. Pointing AgentReserve at a server is as safe as visiting its homepage — no emails sent, no rows written, no APIs charged.
In parallel, two passive side-probes that never call MCP methods.
- Transport observation. A plain
GETcaptures TLS state (version, cert validity, days to expiry), HSTS, status,Content-Type, the final URL, and the redirect chain that got there. - Auth discovery. On a 401/403 we follow RFC 9728 protected-resource metadata, then RFC 8414 / OpenID Connect Discovery for each named authorization server. Status and
Content-Typeare recorded for every well-known fetch.
We never call tools/call. There is no code path in the scanner that accepts a tool name and arguments.
Bounded by construction.
- 10sHard timeout per probe
- 1 MBMax response size
- 3Max redirect hops
- https / httpOnly schemes accepted
- SSRF guardPrivate, link-local, CGNAT, multicast rejected
- RedactorBearer tokens & JWTs stripped before persistence
Wire to surface, in eight categories.
Every rule in the rule catalog belongs to exactly one category — wire, handshake, tool surface, schema discipline, documentation, and exposed breadth.
- 01
Transport security
How the server is reached on the wire. Covers TLS and protocol-level confidentiality of probe traffic.
- 02
Endpoint hygiene
Properties of the URL itself: whether the host is intended for public use, whether secrets appear in the URL, and other observable URL-level signals.
- 03
MCP discovery posture
Whether the server cooperates with the MCP handshake — protocol version negotiation, capability flags, and other discovery signals clients depend on.
- 04
Tool surface risk
What an agent could do if it trusted every advertised tool. Covers destructive actions, credential disclosure, code execution, filesystem mutation, PII handling, prompt-injection-shaped input fields, and injection-bearing tool descriptions — i.e. the agent-specific threat surface, not just generic verb risk.
- 05
Schema quality
Whether the tool surface is reviewable without invoking it. Tools without input schemas force agents to guess argument shapes; tool names that aren't plain ASCII identifiers confuse logging and allow-listing.
- 06
Metadata transparency
Whether the server identifies itself and documents its tools — and whether the advertised identity matches the wire identity (cert CN/SAN, hostname). Operators need a stable name, a version, and an internally consistent identity claim to perform any kind of audit.
- 07
Exposure minimization
Whether the server keeps its surface small. Large, sprawling tool sets expand the agent's blast radius and are harder to review.
- 08
Auth discovery posture
When authorization is required, whether the server cooperates with the standards-based discovery chain — RFC 9728 protected resource metadata, RFC 8414 authorization server metadata, validated issuers, and safe grant types.
Pass / fail. Weighted. Rounded.
- A
- ≥ 90
- B
- ≥ 80
- C
- ≥ 70
- D
- ≥ 60
- F
- < 60
A rule either passes or fails — no partial credit. Rule changes ship as new rule ids, so historical scores never change retroactively without an explicit re-scan.
Deterministic from rules and score.
- blockAny CRITICAL hard-fail rule failed.
- reviewScore < 80, or any HIGH (or non-hard-fail CRITICAL) rule failed.
- allowScore ≥ 80 and no HIGH or CRITICAL rule failed.
- unknownCoverage was none/minimal and no useful tools or initialize metadata surfaced.
When coverage is low, verdicts skew conservative. A scan with thin signal resolves to review or unknown rather than allow— there isn’t enough evidence to allow.
Tier and level travel with every score.
Most production MCP servers are auth-protected. Refusing to score them would penalize operators for doing security right. Every report carries two coverage axes — a tier (how authenticated the probe was) and a level (how much MCP signal it observed). A grade letter without both is meaningless.
- perimeter
Handshake didn't complete, but the endpoint emitted MCP-shaped evidence (Bearer challenge, RFC 9728 / 8414 metadata). Capped at 87 / B; verdict floored at review.
- public_handshake
Both probe questions answered without credentials. Standard scoring, no cap.
- authenticated
Probe completed under DCR-assisted or operator-supplied credentials. Standard scoring, no cap.
- full
Protocol version + serverInfo.name returned; tools carry schemas and descriptions.
- partial
Some signals retrieved, but parts of the surface were missing.
- minimal
Very little protocol surface visible.
- none
No usable signal. Always forces an unknown verdict.
Proof-of-MCP gate. Before publishing at any tier, the scanner must observe at least one of: a parseable initialize result; a Bearer WWW-Authenticate challenge; RFC 9728 protected-resource metadata; or RFC 8414 / OpenID authorization-server metadata. A bare 500-returning route is not an MCP server.
Some failures override the aggregate.
A hard-fail rule at CRITICAL severity forces the verdict to block, regardless of how many other rules passed. They cover capabilities that have no business in a public tools/list — arbitrary code execution, credential disclosure, uncontrolled filesystem mutation, admin control, and identification as a known-vulnerable build.
no_public_credential_access_toolsNo credential-access tools in the public surfaceno_public_code_execution_toolsNo code-execution tools in the public surfaceno_public_filesystem_write_toolsNo filesystem-write tools in the public surfaceno_public_admin_control_toolsNo admin-control tools in the public surfaceserverInfo_not_in_known_advisory_listServer identity does not match a known security advisory
Three depths. All passive.
- 01
Public passive scan
The default. No credentials.
Probes the transport, walks the OAuth metadata chain (RFC 9728 → RFC 8414 / OpenID), and calls initialize + tools/list anonymously when the endpoint is open.
- 02
Limited public scan
When auth is required.
On 401 / 403, the report carries no score. It surfaces TLS state, the auth challenge, parsed OAuth metadata, and a follow-up: try DCR if the AS supports it. Auth-required is a missing measurement, not a verdict.
- 03
DCR-assisted authenticated scan
User-triggered only.
When DCR was discovered passively and the AS advertises client_credentials, AgentReserve attempts RFC 7591 registration, requests a short-lived token, and repeats initialize + tools/list authenticated. Token and any client secret live in memory for one scan, then drop.
DCR-assisted reports are marked Authenticated passive scan via DCR; user-supplied API key reports are marked Authenticated passive scan via API key. Treat them as higher-coverage but still passive — AgentReserve still does not invoke tools/call, and never will. AgentReserve never stores, persists, or logs a user-supplied bearer token or API key.
By name and description, never by call.
Verb heuristics produce the coarse class. Destructive verbs (delete, drop, transfer, send, …) outrank write verbs (create, update, …) which outrank read verbs (list, get, search, …).
Independently, sensitive-capability detectors flag tools whose name or description references credentials, code execution, filesystem mutation, or PII. These flags drive the hard-fail rules and appear on each tool row even when the verb classification looks benign.
Heuristics are imperfect. The classification is a starting point for review, not a verdict.
What this score is not.
Trust model
- 01
The server controls every input we score.
initialize metadata, tool names, descriptions, and schemas are attacker-controllable. A clean rubric can be satisfied without changing actual behavior.
- 02
We can't detect cloaking.
A server is free to advertise one tools/list to AgentReserve and a different one to authenticated agents. Surface divergence by client identity is invisible to a passive scan.
- 03
It's a point-in-time snapshot.
Tool surface, descriptions, and serverInfo can change between the scan and your agent's next call. A score is not a continuous attestation.
- 04
Operator identity is not verified.
No domain-ownership check, brand/typosquat detection, signed-publisher attestation, or supply-chain provenance.
What we don't test
- 01
Tool behavior vs. tool description.
Classification is heuristics on names and descriptions. A list_users tool can do anything internally. Instruction-poisoning hidden in descriptions is not deeply analyzed.
- 02
Authentication strength.
We observe that auth exists (Bearer challenge, RFC 9728 / 8414 metadata). We do not exercise token validation, scope and audience enforcement, JWT signing algorithm, expiry, replay, or DCR-endpoint hardening.
- 03
Schema enforcement.
Presence of an inputSchema does not mean the server rejects violating input. We never send the request that would prove it.
- 04
Resources, prompts, and sampling.
Methodology enumerates the tool surface only. MCP also exposes resources, prompts, and sampling — none are scored today.
- 05
Tool outputs and indirect prompt injection.
Because we never call tools/call, injection that lives in tool responses — currently the most common MCP attack class — is outside the scoring envelope.
- 06
Runtime, hosting, and dependencies.
No CVE check, SBOM, container provenance, or infrastructure signal. The score says nothing about how the server is built or run.
- 07
Single vantage point.
One geography, one network, one moment. Geo-fenced auth, GeoDNS, or transient upstream flakiness can produce a different result elsewhere.
Methodology versions, in order.
Each entry below maps to a METHODOLOGY_VERSION stamped into report coverage JSON. Older scores keep their original version, so a re-scan is required for them to reflect the current rules.
v2026-05-06-rebalance-and-false-positive-disclosurecurrentTool-surface weight rebalance + hard-fail false-positive disclosure
Closes the consistency gaps surfaced in the staff-security-engineer review of the rule catalog. (1) no_public_send_message_tools and no_public_network_access_tools rebalanced from HIGH/4 to HIGH/6, matching peer HIGH-severity tool-surface rules (no_public_destructive_tools/8, no_public_financial_action_tools/6); both rules describe blast radii on par with their peers and the 4-vs-6/8 gap was unjustified. (2) no_pii_keywords_in_tool_surface dropped from MEDIUM/5 to LOW/3 with reworded copy that calls out its governance-prompt-not-vuln intent; legitimate CRM/HR/support servers were being penalised for handling PII by design. (3) all_tools_have_descriptions dropped from LOW/4 to LOW/3, normalising it against peer LOW-severity rules (every other LOW is weight 2-3). (4) The four hard-fail capability rules (no_public_credential_access_tools, no_public_code_execution_tools, no_public_filesystem_write_tools, no_public_admin_control_tools) gain explicit false-positive disclosure in their description: each is a keyword scan over name/description/schema, so benignly-named tools (`validate_api_key_format`, sandboxed `eval_expression`, content-store `upload_file`, status `get_subscription_grant`) will trip the rule; the hard-fail forces a conscious operator override rather than silent ranking. No rule IDs added or removed; no probe behaviour change. Server-identity-consistency rule wording was clarified separately on the same day (algorithm spelled out, multi-tenant CDN false-positive case acknowledged) without affecting weight or severity.
v2026-05-03-final-active-provenance-feedbackFinal hardening: opt-in active probe + manifest provenance + feedback loop
Closes the three remaining items from the security review. (1) active_probe_read_tools_honor_their_schema (MEDIUM/4, handshake): opt-in active probe lane that calls tools/call against READ-classified tools with strictly-typed arguments synthesized from each tool's own inputSchema. Doubly-gated by the env var AGENTRESERVE_ENABLE_DEEP_SCAN AND the per-request flag deepScan; either alone is a no-op (T3.13). The carve-out lives in src/lib/mcp/active-transport.ts so the existing no-tools/call grep test stays a pure invariant on the default code path. SECURITY_INVARIANTS.md gains I9 documenting the gating, hard scope (3 calls/scan, READ-only, primitive args, sensitive-name skip), and tests. (2) serverInfo_advertises_current_published_version (MEDIUM/4, handshake): manifest provenance lookup against an in-tree allow-list of known-published servers (initially the @modelcontextprotocol/* namespace). Routes call lookupManifestProvenance which fetches the registry's `latest` dist-tag through safeFetch with a 1-hour LRU cache. Stays not-applicable when the server is outside the allow-list, so private deployments aren't penalised (T2.6). (3) RuleResultFeedback table + POST /api/v1/feedback/rule-result/[ruleResultId] endpoint (T3.15): operators can submit false-positive / false-negative / confirmed feedback per rule result. Append-only; the future calibration job consumes this data. UI wiring deferred — operators can submit via API today. With these three the original tier-2 / tier-3 review is fully closed; the remaining caveats are operational (sandboxing, recalibration job, full external advisory feed) and live on their own tickets.
v2026-05-03-tier-final-resource-content-correlation-advisoriesTier-final hardening: resource content sampling + cross-server correlation + advisory list
Adds three rules and one new MCP method to the probe surface. (1) resource_contents_no_injection (MEDIUM/5, handshake): the probe now calls resources/read on up to 3 https resources (8 KB each, post-redact) and scans the body for the same dangerous-markdown / HTML / non-https URL patterns we already apply to descriptions. Closes the gap that a server can ship clean metadata while the body carries an injection payload. The new method is gated by load-bearing caps documented in SECURITY_INVARIANTS.md I8 — it does NOT widen the no-tools/call invariant (I1). (2) tool_descriptions_unique_across_servers (MEDIUM/4, handshake): per-tool description fingerprints are persisted in a new ToolDescriptionFingerprint table; at score time the API route asks the persistence layer 'does any other server advertise this hash?' and fails the rule when so. Catches both lazy plagiarism and coordinated-campaign rug-pulls. Schema migration ships in this commit (forward-only). (3) serverInfo_not_in_known_advisory_list (CRITICAL/12, handshake, hardFail): tier-3 / T2.7 / T3.14 lite — an in-tree advisory catalog (initially empty; entries land via PR with a public reference) hard-fails the verdict when serverInfo.name+version matches a documented vulnerable build. Companion to the unrolled external-feed integration (still deferred). Genuinely deferred items (T3.13 active fuzzing, T3.15 calibration loop, T2.6 manifest provenance via npm/PyPI calls) all require a separate change window: respectively a sandbox + UI consent flow that breaks the no-tools/call invariant, an operational feedback API, and outbound calls to third-party registries. The remaining catalog will document them as deferred rather than fail to ship them silently.
v2026-05-03-tier3-positive-declarationsTier-3 hardening: positive destructive-hint declaration + duplicate-description detector
Adds two rules. (1) destructive_tools_declare_destructive_hint (MEDIUM/4, handshake): destructive-classified tools (delete/write/financial verb) MUST set annotations.destructiveHint:true. Companion to tool_annotations_consistent — that rule fires only on contradictions, this one fires on silence (no annotation at all on a destructive tool). The under-declared case is the canonical camouflage pattern. (2) tool_surface_has_no_duplicate_descriptions (LOW/2, handshake): within the advertised tools/list, no two tools share a byte-identical non-empty description. Catches both lazy autogenerated catalogs and the camouflage where a malicious tool inherits a benign sibling's description verbatim. Other tier-3 items (active fuzzing in a sandboxed lane, external threat-intel feeds, cross-server DB correlation, calibration loop) are deliberately deferred — each requires either a load-bearing-invariant change (the no-tools/call / no-resources/read type contract in transport.ts), a database schema change, or external service integration. Each belongs in its own change window with explicit owner sign-off, not a quiet edit in this batch.
v2026-05-03-tier2-pagination-and-anomalyTier-2 hardening: pagination walking + size/time anomaly rules
Adds three rules and one probe behavior. (1) probe_walked_full_tool_surface (MEDIUM/4, handshake): the probe driver now follows `nextCursor` on tools/list, resources/list, and prompts/list with hard caps (5 pages / 500 items). The rule fails when the cap stops the walk before the server signals the final page — the surface this report scored is incomplete and the verdict must be re-validated against the full server. Closes the most-cited tier-2 evasion: hiding destructive tools beyond page 1. (2) tool_descriptions_within_size_bound (LOW/2, handshake): each tool description ≤ 4096 UTF-8 bytes. Caps the per-call token cost an MCP description can drive up and forecloses context-window-eviction by oversized descriptions. (3) probe_completed_within_time_bound (INFO/1, handshake): aggregate probe wall-clock under 15 s. Surfaces tarpit-style slow-roll evasion using existing telemetry. ProbeOutcome gains an optional `pagination` summary; ToolsListResult / ResourcesListResult / PromptsListResult gain optional `nextCursor`. Tier-2 items requiring external feeds (manifest/CVE matching) and additional upstream calls (resources/read content sampling) are deliberately deferred — they change the probe's trust model and ship behind a different change window.
v2026-05-03-tier1-hardeningTier-1 hardening: PRM scoring, schema integrity, resource SSRF, tool diff, partial-success
Adds five rules and changes one probe behavior. (1) auth_advertises_protected_resource_metadata (MEDIUM/4, auth-required, perimeter-eligible): scores RFC 9728 PRM advertisement (resource_metadata= hint or fetched well-known doc) — closes the documented-but-unscored MCP Authorization §discovery requirement. (2) tool_input_schemas_well_formed (MEDIUM/4, handshake): flags inputSchemas that are oversized (>64 KB), too deeply nested (>32), contain non-local or unresolvable $ref, or contain $ref cycles. (3) no_dangerous_resource_uris (HIGH/6, handshake): flags resources/list URIs using file://, plaintext http://, cloud-metadata hosts, loopback / RFC 1918 / link-local / IPv6 ULA / private-TLD addresses, or uncommon schemes — the MCP analogue of SSRF. (4) no_new_tools_since_last_scan (MEDIUM/4, handshake): names tool names added since the prior scan (capability creep). (5) tool_descriptions_unchanged_since_last_scan (MEDIUM/4, handshake): names tools whose description was rewritten between scans (rug-pull camouflage). The probe driver now absorbs tools/list failures: a successful initialize followed by a failing tools/list returns a ProbeOutcome with toolsListError set, instead of collapsing the whole probe into a perimeter-tier failure that lost initialize-derived signal. mcp_tools_list_succeeded fails on toolsListError; tool-surface rules become not-applicable on the empty surface.
v2026-04-30-rugpull-detectionRug-pull detection (tool-surface hash diff)
Adds tool_surface_unchanged_since_last_scan (MEDIUM/5, handshake): hashes the canonicalized tool surface (each tool's name, description, inputSchema, annotations) and compares against the most recent prior scan for the same server. Silent on first scans. Closes the rug-pull pattern documented by the postmark-mcp Sept 2025 incident. Wired into the public web and v1 API scan routes via the new getPreviousToolSurface helper.
v2026-04-30-resource-prompt-scanResources / prompts injection scan
Probe-client now calls `resources/list` and `prompts/list` after `tools/list` (best-effort, swallows method-not-found). Adds no_resource_or_prompt_injection (MEDIUM/5, handshake) which scans resource and prompt entries for the same dangerous markdown / HTML / non-HTTPS URL patterns covered for tool descriptions.
v2026-04-30-origin-rebinding-probeDNS-rebinding Origin probe
Adds transport_validates_origin (HIGH/8, perimeter): sends a cross-origin POST initialize and asserts the server returns 403. Closes the DNS-rebinding class documented in the MCP 2025-11-25 security best practices (CVE-2025-10625, CVE-2026-23744). Extends TransportObservation with `originProbe`.
v2026-04-30-tool-poisoning-patternsTool-description manipulation rule
Adds no_tool_description_manipulation (HIGH/6, handshake): flags tool descriptions that contain hidden-instruction tags (`<IMPORTANT>`, `<system>`), override phrases (`Ignore previous instructions`), identity hijacks (`Act as admin`), or exfiltration directives (`Send results to https://…`). Complement to the existing no_description_injection rule which covers technical injection (markdown links, HTML tags, non-HTTPS URLs).
v2026-04-30-tool-annotationsTool annotations consistency rule
Adds tool_annotations_consistent (HIGH/6, handshake): flags tools whose spec-defined `readOnlyHint` / `destructiveHint` annotations contradict each other or contradict the capability classifier (rug-pull camouflage). Extends McpUpstreamTool with the optional `annotations` field defined by the MCP 2025-03-26+ spec.
v2026-04-30-mcp-spec-2025-11Auth posture + initialize.instructions hardening
Adds three rules aligned with the MCP 2025-11-25 spec and OAuth 2.1 / RFC 9700: auth_supports_pkce_s256 (HIGH/6), auth_metadata_urls_https_public (HIGH/6), and initialize_instructions_no_injection (MEDIUM/5). The two auth rules also run at perimeter tier so auth-walled servers can be evaluated without a successful MCP handshake.
v2026-04-26-agent-threatsAgent-specific threat rules
Adds four agent-specific threat rules (prompt-injection input vectors, description-borne injection, server identity consistency, ASCII tool-name hygiene) and drops no_broad_schema_risk weight from 3 to 2 to avoid double-counting prompt-injection-shaped fields. Methodology page and rule-category copy updated to describe the new surface.