Token Optimizer

The token optimizer reduces the number of tokens consumed by MCP protocol overhead. Each technique is independent, configured per server, and disabled by default.

Configure optimizers in Settings under the Token Optimizer section. Each server gets its own set of toggles. Track savings in the Token Log.

Schema Cache

Caches the full tool list (tools/list response) per server so the AI client does not request it repeatedly.

TTL: how long the cache is valid (60 seconds to 24 hours, default 1 hour)
The cache invalidates automatically if the tool schema changes (tracked via SHA256 hash)

Session Dedup

Skips resending the tool list if it has not changed within the same AI session. After the first tools/list response, subsequent requests in the same session return a lightweight sentinel instead of the full schema.

Lazy Loading

Instead of sending full tool schemas upfront, sends only tool names and a brief summary (first sentence of the description, up to 120 characters). The AI can request the full schema for any specific tool on demand.

Tool threshold: the minimum number of tools before lazy loading activates (default: 20). Also activates if the total schema exceeds 20KB.
Injects a synthetic mcpfw_get_tool_schema tool that the AI calls to fetch a specific tool’s full definition.

Budget Enforcement

Caps the size of tool responses to prevent oversized payloads from consuming your token budget.

Budget: maximum tokens per response (1,000 to 100,000, default 8,000)
Oversized responses are truncated, and the remainder is stored for pagination
Injects a synthetic mcpfw_read_more tool that the AI calls to fetch the next page

The AI sees a truncated response with a note that more data is available, then requests additional pages as needed.

Null/Empty Stripping

Removes null values, empty strings, empty arrays, and empty objects from JSON responses. Many tool responses contain fields like "metadata": null or "tags": [] that add tokens without information.

Threshold: minimum response size before stripping activates (100 to 10,000 tokens, default 1,000)
Only applies if the response is large enough to benefit

Description Truncation

Shortens tool descriptions in the schema to reduce bloat. Truncates at the nearest word boundary and appends ”…” when shortened.

Character limit: maximum description length (50 to 2,000 characters, default 200)

Result Caching

Caches the results of read-only tool calls. If the AI calls the same tool with the same arguments, the cached result is returned without hitting the upstream server.

TTL: how long cached results are valid (60 seconds to 24 hours, default 5 minutes)
Only applies to “safe” tools (those without write, create, update, delete, or similar words in their name)
Cache key is a SHA256 hash of the server ID, tool name, and sorted arguments

TOON Encoding (Experimental)

Compresses JSON responses into a format that LLMs can read efficiently. Uses indentation instead of braces, tabular format for uniform arrays, and bare strings where possible.

Savings of 20-60% on structured and tabular data
Only applies to JSON responses of 512 bytes or more
Falls back to regular JSON if encoding fails or produces a larger result

This is marked as experimental. It works well with Claude models but may not be understood by all AI clients.

Measuring savings

The Token Log on the Monitor page shows:

Total tokens routed and saved
Savings rate as a percentage
Estimated cost avoided (based on your selected model pricing)
Breakdown by optimizer technique
Per-server savings with sparkline trends

Usage data is recorded in hourly buckets, rolled up to daily and monthly aggregates. You can view 24-hour, 7-day, and 30-day ranges.