Token Optimizer
The token optimizer reduces the number of tokens consumed by MCP protocol overhead. Each technique is independent, configured per server, and disabled by default.
Configure optimizers in Settings under the Token Optimizer section. Each server gets its own set of toggles. Track savings in the Token Log.
Schema Cache
Section titled “Schema Cache”Caches the full tool list (tools/list response) per server so the AI client does not request it repeatedly.
- TTL: how long the cache is valid (60 seconds to 24 hours, default 1 hour)
- The cache invalidates automatically if the tool schema changes (tracked via SHA256 hash)
Session Dedup
Section titled “Session Dedup”Skips resending the tool list if it has not changed within the same AI session. After the first tools/list response, subsequent requests in the same session return a lightweight sentinel instead of the full schema.
Lazy Loading
Section titled “Lazy Loading”Instead of sending full tool schemas upfront, sends only tool names and a brief summary (first sentence of the description, up to 120 characters). The AI can request the full schema for any specific tool on demand.
- Tool threshold: the minimum number of tools before lazy loading activates (default: 20). Also activates if the total schema exceeds 20KB.
- Injects a synthetic
mcpfw_get_tool_schematool that the AI calls to fetch a specific tool’s full definition.
Budget Enforcement
Section titled “Budget Enforcement”Caps the size of tool responses to prevent oversized payloads from consuming your token budget.
- Budget: maximum tokens per response (1,000 to 100,000, default 8,000)
- Oversized responses are truncated, and the remainder is stored for pagination
- Injects a synthetic
mcpfw_read_moretool that the AI calls to fetch the next page
The AI sees a truncated response with a note that more data is available, then requests additional pages as needed.
Null/Empty Stripping
Section titled “Null/Empty Stripping”Removes null values, empty strings, empty arrays, and empty objects from JSON responses. Many tool responses contain fields like "metadata": null or "tags": [] that add tokens without information.
- Threshold: minimum response size before stripping activates (100 to 10,000 tokens, default 1,000)
- Only applies if the response is large enough to benefit
Description Truncation
Section titled “Description Truncation”Shortens tool descriptions in the schema to reduce bloat. Truncates at the nearest word boundary and appends ”…” when shortened.
- Character limit: maximum description length (50 to 2,000 characters, default 200)
Result Caching
Section titled “Result Caching”Caches the results of read-only tool calls. If the AI calls the same tool with the same arguments, the cached result is returned without hitting the upstream server.
- TTL: how long cached results are valid (60 seconds to 24 hours, default 5 minutes)
- Only applies to “safe” tools (those without write, create, update, delete, or similar words in their name)
- Cache key is a SHA256 hash of the server ID, tool name, and sorted arguments
TOON Encoding (Experimental)
Section titled “TOON Encoding (Experimental)”Compresses JSON responses into a format that LLMs can read efficiently. Uses indentation instead of braces, tabular format for uniform arrays, and bare strings where possible.
- Savings of 20-60% on structured and tabular data
- Only applies to JSON responses of 512 bytes or more
- Falls back to regular JSON if encoding fails or produces a larger result
This is marked as experimental. It works well with Claude models but may not be understood by all AI clients.
Measuring savings
Section titled “Measuring savings”The Token Log on the Monitor page shows:
- Total tokens routed and saved
- Savings rate as a percentage
- Estimated cost avoided (based on your selected model pricing)
- Breakdown by optimizer technique
- Per-server savings with sparkline trends
Usage data is recorded in hourly buckets, rolled up to daily and monthly aggregates. You can view 24-hour, 7-day, and 30-day ranges.