Overview

MCPSpec is "Postman for MCP" — a comprehensive testing, debugging, and documentation platform for Model Context Protocol servers. It enables developers to test MCP servers interactively, create reusable test collections, generate documentation, run security audits, and measure performance.

Key capabilities:

Installation

MCPSpec requires Node.js 22+ (LTS).

npm install -g mcpspec

Verify the installation:

mcpspec --version

You can also use npx mcpspec without a global install.

Quick Start

No configuration needed. Run a pre-built community collection to see MCPSpec in action:

# Install
npm install -g mcpspec

# Run a community test suite (filesystem server)
mcpspec test examples/collections/servers/filesystem.yaml

# Or explore a server interactively
mcpspec inspect "npx -y @modelcontextprotocol/server-filesystem /tmp"

Expected output:

MCPSpec running Filesystem Server Tests (12 tests)
   List allowed directories (52ms)
   List /tmp directory contents (45ms)
   Write a test file (67ms)
   Read the test file back (38ms)
  ... 8 more
  Tests: 12 passed (12 total)
  Time:  1.23s

Your First Collection

Create a file called mcpspec.yaml in your project root:

name: My First Tests
server: npx -y @modelcontextprotocol/server-filesystem /tmp

tests:
  - name: List allowed directories
    call: list_allowed_directories
    expect:
      - exists: $.content

  - name: Read a test file
    call: read_file
    with:
      path: /tmp/mcpspec-test.txt
    expect:
      - exists: $.content

  - name: Handle missing file
    call: read_file
    with:
      path: /tmp/nonexistent-file-12345.txt
    expectError: true

Run it:

mcpspec test

The server field is a command string. MCPSpec spawns the process, connects via stdio, runs the tests, and shuts it down. The expect shorthand lets you write assertions without verbose syntax.

Simple Format

The simple format is ideal for most test suites. Use call/with/expect for concise test definitions:

name: Simple Filesystem Tests
server: npx -y @modelcontextprotocol/server-filesystem /tmp

tests:
  - name: List allowed directories
    call: list_allowed_directories
    expect:
      - exists: $.content

  - name: Read a test file
    call: read_file
    with:
      path: /tmp/mcpspec-test.txt
    expect:
      - exists: $.content

  - name: Handle missing file
    call: read_file
    with:
      path: /tmp/nonexistent-file-12345.txt
    expectError: true

Key fields:

Advanced Format

The advanced format adds environments, tags, typed assertions, variable extraction, and more:

schemaVersion: "1.0"
name: Environment-Aware Tests
description: Tests that use environment variables

server:
  transport: stdio
  command: npx
  args: ["-y", "@modelcontextprotocol/server-filesystem", "{{baseDir}}"]

environments:
  dev:
    variables:
      baseDir: /tmp
  staging:
    variables:
      baseDir: /var/tmp

defaultEnvironment: dev

tests:
  - id: test-list
    name: List directories
    tags: [smoke]
    call: list_allowed_directories
    assertions:
      - type: schema
      - type: latency
        maxMs: 10000

  - id: test-read
    name: Read test file
    tags: [integration]
    call: read_file
    with:
      path: "{{baseDir}}/mcpspec-test.txt"
    assertions:
      - type: exists
        path: $.content

Additional fields:

Server Configuration

Three ways to configure the server:

String shorthand (stdio)

server: npx -y @modelcontextprotocol/server-filesystem /tmp

Stdio object

server:
  transport: stdio
  command: npx
  args: ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
  env:
    NODE_ENV: test
  timeouts:
    connect: 10000
    call: 30000

SSE / HTTP

# Server-Sent Events
server:
  transport: sse
  url: http://localhost:3000/sse

# Streamable HTTP
server:
  transport: streamable-http
  url: http://localhost:3000/mcp

Environments & Variables

Define named environments with variable sets. Use {{variable}} syntax in server config, tool inputs, and assertions:

environments:
  dev:
    variables:
      baseDir: /tmp
      apiUrl: http://localhost:3000
  prod:
    variables:
      baseDir: /data
      apiUrl: https://api.example.com

defaultEnvironment: dev

Switch environments at runtime:

mcpspec test --env prod

Note: YAML is loaded with FAILSAFE_SCHEMA, meaning all values are strings. MCPSpec automatically coerces types (numbers, booleans) when needed.

Tags & Filtering

Add tags to tests and filter with --tag:

tests:
  - name: Quick smoke test
    tags: [smoke, api]
    call: health_check
# Run only smoke tests
mcpspec test --tag smoke

# The @ prefix is optional and stripped automatically
mcpspec test --tag @smoke

Multiple --tag flags run tests matching any of the specified tags.

Retries & Timeouts

Configure per-test timeouts and retries:

tests:
  - name: Flaky network call
    call: fetch_data
    timeout: 15000    # 15s timeout for this test
    retries: 2        # Retry up to 2 times on error

Timeout hierarchy (inner timeouts must be less than outer):

LevelDefaultPurpose
test30,000msTotal test timeout
mcpCall25,000msSingle tool call
transport20,000msHTTP/stdio response
assertion5,000msExpression evaluation
cleanup5,000msPost-test cleanup

Note: Retries only trigger on thrown errors (e.g., connection failures), not on assertion failures. This prevents retrying tests that are simply wrong.

Assertions Overview

MCPSpec has 10 assertion types. You can use either the shorthand format (in expect) or the full format (in assertions).

Shorthand format

expect:
  - exists: $.content           # Path exists
  - equals: [$.id, 123]         # Exact match
  - contains: [$.tags, "active"] # Contains value
  - matches: [$.email, ".*@.*"] # Regex match

Full format

assertions:
  - type: exists
    path: $.content
  - type: equals
    path: $.id
    value: 123
  - type: latency
    maxMs: 1000
TypePurposeKey Fields
schemaValidate response structure
equalsExact deep matchpath, value
containsArray/string containspath, value
existsPath exists and not nullpath
matchesRegex pattern matchpath, pattern
typeType checkpath, expected
lengthArray/string lengthpath, operator, value
latencyResponse time checkmaxMs
mimeTypeContent type checkexpected
expressionSafe expression evalexpr

schema

Validates that the response is a valid MCP tool result (object or array with expected structure).

# Shorthand — not available, use full format
assertions:
  - type: schema

No additional fields needed. Checks that the response has content with proper structure.

equals

Exact deep comparison using JSON.stringify.

# Shorthand
expect:
  - equals: [$.id, 123]

# Full
assertions:
  - type: equals
    path: $.id
    value: 123

contains

Checks if an array includes a value or a string contains a substring.

# Shorthand
expect:
  - contains: [$.tags, "active"]

# Full
assertions:
  - type: contains
    path: $.tags
    value: "active"

exists

Checks that a JSONPath resolves to a non-null value.

# Shorthand
expect:
  - exists: $.content

# Full
assertions:
  - type: exists
    path: $.content

matches

Matches a string value against a regular expression.

# Shorthand
expect:
  - matches: [$.email, ".*@.*\\.com"]

# Full
assertions:
  - type: matches
    path: $.email
    pattern: ".*@.*\\.com"

type

Checks the JavaScript type of a value at the given path.

assertions:
  - type: type
    path: $.count
    expected: number   # string, number, boolean, object, array

length

Checks the length of an array or string using a comparison operator.

assertions:
  - type: length
    path: $.items
    operator: gt    # eq, gt, gte, lt, lte
    value: 0

latency

Asserts that the tool call completed within the specified time.

assertions:
  - type: latency
    maxMs: 1000   # milliseconds

expression

Evaluates a safe expression using the expr-eval library. This is not arbitrary code — only comparisons, logical operators, math, property access, and array indexing are available.

assertions:
  - type: expression
    expr: "response.items.length > 0 and response.total == response.items.length"

Available: ==, !=, >, <, and, or, not, in, property access, array indexing, basic math.

NOT available: Function definitions, loops, require, file access, eval.

mcpspec test

Run a test collection.

mcpspec test [collection]
OptionDescriptionDefault
[collection]Path to collection YAMLmcpspec.yaml
--env <name>Environment to usedefaultEnvironment
--tag <tags...>Filter tests by tag (repeatable)
--parallel <n>Parallel test execution1 (sequential)
--reporter <type>console, json, junit, html, tapconsole
--output <path>Output file for results
--ciCI mode (no colors, structured output)false
--baseline <name>Compare against named baseline
--watchRe-run on file changes (300ms debounce)false
# Run with tags and parallel
mcpspec test --tag smoke --parallel 4

# CI mode with JUnit output
mcpspec test --ci --reporter junit --output results.xml

# Compare against baseline
mcpspec test --baseline main

mcpspec inspect

Interactive REPL for exploring an MCP server.

mcpspec inspect <server>

REPL commands:

CommandDescription
.toolsList all available tools
.resourcesList all available resources
.call <tool> <json>Call a tool with JSON arguments
.schema <tool>Display tool's input JSON Schema
.infoShow server info (name, version, capabilities)
.helpShow help
.exitDisconnect and exit
mcpspec inspect "npx -y @modelcontextprotocol/server-filesystem /tmp"
# > .tools
# > .call read_file {"path": "/tmp/test.txt"}
# > .schema read_file
# > .exit

mcpspec record

Record, list, replay, and delete inspector sessions.

mcpspec record start <server>           # Record a session
mcpspec record list                      # List saved recordings
mcpspec record replay <name> <server>   # Replay and diff
mcpspec record delete <name>             # Delete a recording

During a recording session, use the same REPL commands as inspect plus:

Recordings are stored in ~/.mcpspec/recordings/.

mcpspec mock

Start a mock MCP server from a recording, or generate a standalone file.

mcpspec mock <recording> [options]
OptionDescriptionDefault
--mode <mode>match or sequentialmatch
--latency <ms>Response delay: 0, milliseconds, or original0
--on-missing <behavior>error or emptyerror
--generate <path>Generate standalone .js file
# Start mock server (stdin/stdout)
mcpspec mock my-api

# Tape/cassette style matching
mcpspec mock my-api --mode sequential

# Simulate original response timing
mcpspec mock my-api --latency original

# Generate standalone .js file (only requires @modelcontextprotocol/sdk)
mcpspec mock my-api --generate ./mocks/server.js
node ./mocks/server.js

mcpspec audit

Run a security scan against an MCP server.

mcpspec audit <server> [options]
OptionDescriptionDefault
--mode <mode>passive, active, or aggressivepassive
--acknowledge-riskSkip confirmation for active/aggressivefalse
--fail-on <severity>Exit code 6 if findings at: info, low, medium, high, critical
--rules <rules...>Only run specific rulesall
--exclude-tools <tools...>Skip specific tools
--dry-runPreview targets without scanningfalse
# Passive scan (safe for production)
mcpspec audit "npx my-server"

# Active scan with CI gate
mcpspec audit "npx my-server" --mode active --acknowledge-risk --fail-on medium

# Preview what would be scanned
mcpspec audit "npx my-server" --dry-run

mcpspec score

Calculate the MCP Score (0–100 quality rating) for a server.

mcpspec score <server> [options]
OptionDescriptionDefault
--badge <path>Output shields.io-style SVG badge
--min-score <n>Fail if below threshold
mcpspec score "npx my-server"
mcpspec score "npx my-server" --badge ./badge.svg --min-score 80

mcpspec bench

Run performance benchmarks against a server.

mcpspec bench <server> [options]
OptionDescriptionDefault
--iterations <n>Number of iterations100
--tool <name>Tool to benchmarkfirst available
--args <json>JSON arguments for tool call{}
--timeout <ms>Timeout per call30000
--warmup <n>Warmup iterations (excluded from results)5
mcpspec bench "npx my-server" --iterations 200 --tool echo --args '{"message":"test"}'

Output includes: min, max, mean, median, P95, P99, standard deviation, and throughput (calls/sec).

mcpspec docs

Auto-generate documentation from server introspection.

mcpspec docs <server> [options]
OptionDescriptionDefault
--format <type>markdown or htmlmarkdown
--output <dir>Output directorycurrent dir
mcpspec docs "npx my-server" --format html --output ./docs

mcpspec compare & baseline

Compare test runs and manage baselines for regression detection.

# Save current results as a baseline
mcpspec baseline save main

# List saved baselines
mcpspec baseline list

# Compare two runs
mcpspec compare <run1> <run2>

# Compare against a baseline
mcpspec compare --baseline main

mcpspec init

Scaffold a new MCPSpec project with an interactive wizard or template.

mcpspec init [directory] --template <minimal|standard|full>

Templates:

mcpspec ci-init

Generate CI/CD pipeline configuration.

mcpspec ci-init [options]
OptionDescriptionDefault
--platform <type>github, gitlab, or shellauto-detect
--collection <path>Path to collection file
--server <command>Server command
--checks <list>Comma-separated: test, audit, score, benchtest
--fail-on <severity>Audit severity gate
--min-score <n>Minimum MCP Score threshold
--forceOverwrite existing filesfalse
# Interactive wizard
mcpspec ci-init

# Non-interactive
mcpspec ci-init --platform github --checks test,audit,score --fail-on medium --min-score 80

Auto-detects platform from .github/ or .gitlab-ci.yml. GitLab --force surgically replaces only the mcpspec job block.

mcpspec ui

Launch the web dashboard.

mcpspec ui [options]
OptionDescriptionDefault
-p, --port <port>Port to listen on6274
--host <host>Host to bind to127.0.0.1
--no-openDo not auto-open browser

The UI includes 10 pages: Dashboard, Servers, Collections, Runs, Inspector, Recordings, Audit, Benchmark, Docs, and Score. Features dark mode and real-time WebSocket updates.

Recording & Replay

Record an inspector session, save it, and replay it later against a new server version to catch regressions.

Full workflow

# 1. Start a recording session
mcpspec record start "npx my-server"

# 2. Interact with the server (REPL)
# > .call get_user {"id": "1"}
# > .call list_items {}
# > .call create_item {"name": "test"}

# 3. Save the recording
# > .save my-api

# 4. List recordings
mcpspec record list

# 5. Replay against a new version — diffs results step-by-step
mcpspec record replay my-api "npx my-server-v2"

Replay output

Replaying 3 steps...
  1/3 get_user (id=1)...         [OK] 42ms → {"name": "Alice"}
  2/3 list_items...              [CHANGED] 38ms
  3/3 create_item (name=test)... [OK] 51ms → {"id": "abc"}
  Summary: 2 matched, 1 changed, 0 added, 0 removed

Use cases

Mock Servers

Generate a mock MCP server from any recording. Mock servers communicate via stdio and are drop-in replacements for real servers in CI, offline development, and testing.

Match mode (default)

Looks up responses by exact input match first. If no exact match, falls back to the next queued response for that tool.

mcpspec mock my-api

Sequential mode

Tape/cassette style — responses are served in the order they were recorded, regardless of input.

mcpspec mock my-api --mode sequential

Latency simulation

mcpspec mock my-api --latency original   # Simulate real timing
mcpspec mock my-api --latency 100        # Fixed 100ms delay

Standalone file generation

Generate a standalone .js file that can be committed to your repo. The generated file only requires @modelcontextprotocol/sdk as a runtime dependency.

mcpspec mock my-api --generate ./mocks/server.js
node ./mocks/server.js

Missing tool behavior

mcpspec mock my-api --on-missing error   # Return error (default)
mcpspec mock my-api --on-missing empty   # Return empty content

Security Audit

MCPSpec includes 8 security rules, including LLM-specific threats like Tool Poisoning and Excessive Agency.

Security rules

RuleModeDetects
Tool PoisoningpassiveLLM prompt injection in descriptions, hidden Unicode (zero-width, bidirectional), cross-tool manipulation, embedded code, overly long descriptions
Excessive AgencypassiveDestructive tools without confirmation params, arbitrary code/command params, overly broad schemas, missing descriptions
Path Traversalpassive../../etc/passwd style directory escape attacks
Input ValidationpassiveMissing constraints (enum, pattern, min/max) on tool inputs
Info DisclosurepassiveLeaked paths, stack traces, API keys in tool descriptions
Resource ExhaustionactiveUnbounded loops, large allocations
Auth BypassactiveMissing auth checks, hardcoded credentials
InjectionactiveSQL and command injection in tool inputs

Scan modes

Confirmation flow

Active and aggressive modes display a warning and require explicit confirmation:

⚠️ SECURITY SCAN WARNING
This sends potentially harmful payloads.
NEVER run against production systems!
Is this a TEST environment? [y/N]

Use --acknowledge-risk to skip the prompt (for CI).

Each finding includes severity, evidence, and remediation advice.

MCP Score

The MCP Score rates server quality from 0 to 100 across 5 weighted categories:

CategoryWeightWhat it measures
Documentation25%Tool descriptions, parameter docs, resource docs
Schema Quality25%Input schema completeness and correctness
Error Handling20%Graceful error responses, proper error codes
Responsiveness15%Response latency for tool calls
Security15%Results from passive security scan

Schema quality sub-criteria

CriterionWeight
Structure (proper object type)20%
Property types defined20%
Property descriptions20%
Required fields specified15%
Constraints (enum, pattern, min/max)15%
Naming conventions (camelCase)10%

Score ranges

Badge generation

mcpspec score "npx my-server" --badge ./mcp-score.svg

Generates a shields.io-style SVG badge for your README.

CI/CD Integration

Use mcpspec ci-init to generate pipeline configurations, or use these examples directly.

GitHub Actions

name: MCP Server Tests
on: [push, pull_request]

jobs:
  mcpspec:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: '22'
      - run: npm install -g mcpspec

      - name: Run tests
        run: mcpspec test --ci --reporter junit --output results.xml

      - name: Security audit
        run: mcpspec audit "npx my-server" --mode passive --fail-on high

      - name: Quality gate
        run: mcpspec score "npx my-server" --min-score 80

      - uses: mikepenz/action-junit-report@v4
        if: always()
        with:
          report_paths: results.xml

GitLab CI

mcpspec:
  image: node:22
  stage: test
  script:
    - npm install -g mcpspec
    - mcpspec test --ci --reporter junit --output results.xml
    - mcpspec audit "npx my-server" --mode passive --fail-on high
  artifacts:
    when: always
    paths:
      - results.xml
    reports:
      junit: results.xml
    expire_in: 1 week

Exit codes for pipeline gating

CodeMeaning
0Success
1Test failure
2Runtime error
3Config error
4Connection error
5Timeout
6Security findings above threshold
7Validation error
130Interrupted (Ctrl+C)

Pre-commit hook

#!/bin/bash
# .git/hooks/pre-commit
if git diff --cached --name-only | grep -q "collections/.*\.yaml$"; then
  mcpspec test $(git diff --cached --name-only | grep "collections/.*\.yaml$")
fi

Transports

MCPSpec supports three transport types for communicating with MCP servers:

stdio (default)

MCPSpec spawns the server as a child process and communicates via stdin/stdout. This is the most common transport and works with any server that reads from stdin and writes to stdout.

# Shorthand
server: npx my-server /tmp

# Explicit
server:
  transport: stdio
  command: npx
  args: ["my-server", "/tmp"]
  env:
    NODE_ENV: test

SSE (Server-Sent Events)

Connects to an already-running server via SSE. Useful for remote servers or servers started independently.

server:
  transport: sse
  url: http://localhost:3000/sse

Streamable HTTP

The newer HTTP-based transport for MCP servers.

server:
  transport: streamable-http
  url: http://localhost:3000/mcp

When to use each

TransportUse when
stdioTesting local servers, CI pipelines, most common
sseServer is already running, remote testing
streamable-httpModern HTTP-based MCP servers

Collection Schema

Complete field reference for collection YAML files.

# Top-level fields
schemaVersion: "1.0"            # Optional, currently "1.0"
name: string                    # Required — collection name
description: string             # Optional
server: string | ServerConfig   # Required — see Server Configuration
environments:                   # Optional — named variable sets
  <name>:
    variables:
      <key>: <value>
defaultEnvironment: string      # Optional
tests:                          # Required — at least 1 test
  - id: string                  # Optional — unique identifier
    name: string                # Required — display name
    tags: string[]              # Optional — for --tag filtering
    timeout: number             # Optional — ms, default 30000
    retries: number             # Optional — retry on error
    type: tool                  # Optional — default "tool"
    call: string                # Tool name to invoke
    tool: string                # Alias for call
    with: object                # Tool input arguments
    input: object               # Alias for with
    expect: Shorthand[]         # Shorthand assertions
    assertions: Assertion[]     # Full assertions
    expectError: boolean        # Expect error response
    extract:                    # Optional — extract variables
      - name: string
        path: string            # JSONPath

# ServerConfig object
server:
  name: string                  # Optional
  transport: stdio|sse|streamable-http  # Default: stdio
  command: string               # For stdio
  args: string[]                # For stdio
  url: string                   # For SSE/HTTP
  env: Record<string, string>  # Environment variables
  timeouts:
    connect: number             # Connection timeout (ms)
    call: number                # Call timeout (ms)

# Assertion object
assertions:
  - type: schema|equals|contains|exists|matches|type|length|latency|mimeType|expression
    path: string                # JSONPath (for value assertions)
    value: any                  # Expected value
    expected: any               # Expected type or mime type
    pattern: string             # Regex pattern
    maxMs: number               # Max latency
    operator: eq|gt|gte|lt|lte  # For length
    expr: string                # Safe expression

Exit Codes

CodeConstantMeaning
0SUCCESSAll tests passed / operation successful
1TEST_FAILUREOne or more tests failed
2ERRORRuntime error (unexpected crash)
3CONFIG_ERRORInvalid collection YAML or configuration
4CONNECTION_ERRORCould not connect to server
5TIMEOUTOperation timed out
6SECURITY_FINDINGSSecurity findings above --fail-on threshold
7VALIDATION_ERRORInput validation failed
130INTERRUPTEDInterrupted by user (Ctrl+C)

Configuration

Timeout defaults

SettingDefault
Test timeout30,000ms
MCP call timeout25,000ms
Transport timeout20,000ms
Assertion timeout5,000ms
Cleanup timeout5,000ms

Rate limiting defaults

SettingDefault
Max calls/second10
Max concurrent5
Backoff initial1,000ms
Backoff multiplier2x
Backoff max30,000ms

Environment variables

VariablePurpose
MCPSPEC_REMOTE_ACCESSSet to true to allow non-localhost access to UI server
MCPSPEC_TOKENAuthentication token when remote access is enabled

Troubleshooting

Connection Timeout

If you see Connection Timed Out:

YAML Parse Errors

Process Cleanup

MCPSpec registers cleanup handlers for SIGINT, SIGTERM, and uncaught exceptions. If a server process gets stuck:

Common Mistakes

MCPSpec — MIT License — GitHubnpm