OmniMCP: Semantic Routing Tool for Unified Mult - MCP Server Management, Reducing Token Consumption

Omnimcp

OmniMCP is a semantic routing tool that centrally manages multiple MCP servers through a single interface. It solves the context bloat problem caused by traditional MCP tool definitions, enables on - demand dynamic loading of tools, and significantly reduces token consumption.

Developer tools Artificial intelligence chatbots #Semantic Routing #MCP Management #Tool Discovery #Context Optimization .Python

rating : 2.5 points

downloads : 6.5K

update time : 2025-12-03

Open Site

What is OmniMCP?

OmniMCP is an intelligent semantic routing system designed to address the issues caused by an excessive number of tools in the MCP (Model Context Protocol) ecosystem. It centrally manages multiple MCP servers through a single interface, enabling AI assistants to intelligently discover and execute tools without loading all tool definitions, thereby saving a significant amount of context space.

How to use OmniMCP?

Using OmniMCP is straightforward: First, configure your list of MCP servers. Then, interact with all servers via a unified `sematic_router` tool. AI assistants only need to learn this one tool to access all connected services such as file systems, GitHub, databases, and Slack.

Use Cases

OmniMCP is particularly suitable for the following scenarios: 1. Complex workflows that require simultaneous use of multiple MCP servers 2. Situations where an excessive number of tools slows down the AI assistant's response 3. Automated processes that require cross - server task coordination 4. Scenarios where reducing the context consumption of AI assistants is desired to improve performance

Key Features

Semantic Search Tool

Search for all indexed tools using natural language descriptions without having to remember specific tool names. The system uses AI embedding technology to understand your intent and returns the most relevant tools.

Single Unified Interface

Learn just one `sematic_router` tool to access all functions of MCP servers. The AI assistant's context only needs to store the definition of this one tool instead of dozens of them.

Lazy - Load Servers

MCP servers are started only when needed and automatically shut down after tasks are completed, saving system resources. Tool modes are loaded only before execution and do not occupy the initial context.

Intelligent Content Management

Automatically handle large results: text chunking, image description, and audio referencing. Avoid large results from occupying excessive context space.

Background Execution Mode

Long - running tasks can be executed in the background without blocking the conversation. It supports parallel execution of multiple tasks to improve efficiency.

Cross - Server Coordination

Easily coordinate multiple servers to complete complex workflows, such as: search for news → generate a video → convert the format → send a notification.

Advantages

Context savings: Reduce from over 60K tokens to approximately 500 tokens, saving over 99% of context space.

Reduced hallucinations: AI assistants select from 3 - 5 relevant tools instead of over 50 similar ones, resulting in more accurate selections.

Cache maintenance: The definition of a single tool remains unchanged, and the prompt cache is effective, improving response speed.

Intelligent discovery: Find the correct tool through semantic search without having to remember complex names.

Resource optimization: Servers are started on demand, saving memory and CPU resources.

Easy scalability: Adding a new server only requires configuration without modifying the AI assistant's prompt.

Limitations

Requires additional setup: Compared to directly using MCP servers, OmniMCP needs to be configured.

Depends on external services: It requires the OpenAI API for semantic search and content description.

Initial indexing time: Indexing multiple servers for the first time may take some time.

Learning curve: It is necessary to understand new workflows and operation methods.

How to Use

Install OmniMCP

Install OmniMCP using a package management tool.

Configure Environment Variables

Set the necessary API keys and storage paths.

Create a Server Configuration File

Create a JSON configuration file to define the MCP servers to connect to.

Index Server Tools

Run the indexing command to let OmniMCP know all available tools.

Start the OmniMCP Server

Start the HTTP server or stdio interface.

Configure the AI Assistant

Add OmniMCP to your AI assistant's MCP configuration.

Usage Examples

Cross - Server News Video Generation

Use multiple MCP servers to collaborate in generating a news video: search for the latest news in parallel, generate a video using AI, and convert the format to GIF.

Data Analysis Workflow

Read a CSV file, analyze the data, generate a report, and send it to Slack.

Code Review Automation

Automatically retrieve a GitHub PR, analyze code changes, and generate review comments.

Frequently Asked Questions

What's the difference between OmniMCP and directly using MCP servers?

Do I need to configure each MCP server separately?

Which MCP servers does OmniMCP support?

How accurate is the semantic search?

How are large files or images handled?

How to solve the problem of the 'uvx' command not being found?

Does OmniMCP have a performance overhead?

Can it run in Docker?

Related Resources

GitHub Repository

OmniMCP source code, issue tracking, and contribution guidelines.

PyPI Package

Python package distribution page to view versions and installation statistics.

MCP Official Documentation

Official documentation and specifications of the Model Context Protocol.

ScaleMCP Research Paper

An academic research on the scalability of MCP tools, similar to the concept of OmniMCP.

Anthropic Tool Usage Guide

Anthropic's engineering practices for efficient tool usage.

uv Installation Guide

Installation and usage guide for the 'uv' package manager.

Qdrant Vector Database

The vector database used by OmniMCP for semantic search.

🚀 OmniMCP

Semantic router for MCP ecosystems

Discover and execute tools across multiple MCP servers without context bloat

🚀 Quick Start

Step 1: Create your MCP servers config (`mcp-servers.json`)

This is an enhanced schema of Claude Desktop's MCP configuration with additional OmniMCP-specific fields.

{
  "mcpServers": {
    "filesystem": {
      "command": "npx",  // or "uvx", "docker", any executable
      "args": ["-y", "@modelcontextprotocol/server-filesystem", "/path/to/allowed/directory"],
      "env": {},  // optional: environment variables for this server
      "timeout": 30.0,  // optional: seconds to wait for MCP server startup (default: 30)
      "hints": ["file operations", "read write files"],  // optional: help semantic search discover this server
      "blocked_tools": [],  // optional: tools to index but block at runtime
      "ignore": false,  // optional: skip indexing this server entirely
      "overwrite": false  // optional: force re-indexing even if already indexed
    },
    "github": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-github"],
      "env": {"GITHUB_TOKEN": "..."},
      "blocked_tools": ["delete_repository", "fork_repository"]  // indexed but execution blocked
    },
    "elevenlabs": {
      "command": "uvx",
      "args": ["elevenlabs-mcp"],
      "env": {"ELEVENLABS_API_KEY": "..."},
      "hints": [
        "When the user requests audio generation (speech or music), always execute in background mode",
        "Proactively offer to play audio using the ElevenLabs tool when contextually relevant"
      ]
    },
    "exa": {
      "command": "npx",
      "args": ["-y", "mcp-remote", "https://mcp.exa.ai/mcp"],
      "env": {"EXA_API_KEY": "..."},
      "hints": [
        "Always execute web searches in background mode to avoid blocking the conversation",
        "Multiple Exa tool calls can be fired in parallel to efficiently gather information from different sources"
      ]
    }
  }
}

Step 2: Set environment variables

export OPENAI_API_KEY="sk-..."
export QDRANT_DATA_PATH="/path/to/qdrant_data"
export TOOL_OFFLOADED_DATA_PATH="/path/to/tool_offloaded_data"

Step 3: Index your servers (recommended before serving)

uvx --env-file .env omnimcp index --config-path mcp-servers.json

Step 4: Run the server

# Default: HTTP transport (recommended)
uvx --env-file .env omnimcp serve --config-path mcp-servers.json --transport http --host 0.0.0.0 --port 8000

# stdio transport (for local MCP clients - requires pre-indexing)
uvx --env-file .env omnimcp serve --config-path mcp-servers.json --transport stdio

✨ Features

Solving MCP Ecosystem Problems

Context Bloat: Traditional MCP tool definitions consume a large number of tokens, leaving limited space for actual conversation, documents, and reasoning. OmniMCP exposes a single semantic_router tool, reducing token usage from ~60K to ~3K.
Hallucinations: With a large number of similar tools, models are more likely to pick wrong tools and hallucinate parameters. OmniMCP's semantic search and progressive schema loading help improve tool selection accuracy.
Caching Issues: Dynamic tool loading breaks caching. OmniMCP's approach ensures that prompt caches stay intact as the tool definition never changes.
Context Bloat from Tool Results: Large tool results can bloat the context. OmniMCP handles large results through content offloading, such as chunking text, describing images, and storing results for retrieval.

Key Features

Semantic Search: Find relevant tools by intent, not name. Tools are pre-indexed with embeddings, searched at runtime, and returned as text results.
Lazy Server Loading: Servers start only when needed and shut down when done.
Progressive Schema Loading: Full JSON schema is fetched only before execution and returned as text in the conversation.
Content Offloading: Large results are chunked, images are described, and results are stored for retrieval.

📦 Installation

uv pip install omnimcp // or uv add omnimcp

💻 Usage Examples

Basic Usage

# Search for tools
search_tools("read CSV files and analyze data")
    → Returns: filesystem.read_file, database.query (ranked by relevance)

# Get tool details
get_tool_details("filesystem", "read_file")
    → Returns: Full JSON schema with parameters

# Manage server
manage_server("filesystem", "start")
    → Launches the server process

# Execute tool
execute_tool("filesystem", "read_file", {"path": "/data/sales.csv"})
    → Returns: File content (chunked if large)

Advanced Usage

# Queue for background execution
execute_tool("server", "slow_tool", args, in_background=True, priority=1)
# Returns: task_id

# Check status
poll_task_result(task_id)
# Returns: status, result when done

📚 Documentation

Configuration

Environment Variables

OmniMCP requires several environment variables to operate. You must configure these before running index or serve commands.

Variable	Description	Example
`OPENAI_API_KEY`	OpenAI API key for embeddings and descriptions	`sk-proj-...`
`TOOL_OFFLOADED_DATA_PATH`	Path for storing offloaded content (large results, images)	`/path/to/tool_offloaded_data`

Qdrant connection (choose ONE mode):

Mode	Variables	Description
Local file	`QDRANT_DATA_PATH=/path/to/data`	Embedded Qdrant, persists to disk
In-memory	`QDRANT_DATA_PATH=:memory:`	Embedded Qdrant, no persistence (testing)
Remote server	`QDRANT_URL=http://localhost:6333`	Docker or self-hosted Qdrant
Qdrant Cloud	`QDRANT_URL=https://xxx.qdrant.io` `QDRANT_API_KEY=your-api-key`	Managed Qdrant Cloud

Optional variables (with defaults):

Variable	Default	Description
`EMBEDDING_MODEL_NAME`	`text-embedding-3-small`	OpenAI embedding model
`DESCRIPTOR_MODEL_NAME`	`gpt-4.1-mini`	Model for generating tool descriptions
`VISION_MODEL_NAME`	`gpt-4.1-mini`	Model for describing images
`MAX_RESULT_TOKENS`	`5000`	Chunk threshold for large results
`DESCRIBE_IMAGES`	`true`	Use vision to describe images
`DIMENSIONS`	`1024`	Embedding dimensions

Setup methods:

Option 1: Local file storage (simplest)

export OPENAI_API_KEY="sk-proj-..."
export QDRANT_DATA_PATH="/path/to/qdrant_data"
export TOOL_OFFLOADED_DATA_PATH="/path/to/tool_offloaded_data"

Option 2: Docker Qdrant server

# Start Qdrant in Docker
docker run -p 6333:6333 -p 6334:6334 qdrant/qdrant

# Configure OmniMCP
export OPENAI_API_KEY="sk-proj-..."
export QDRANT_URL="http://localhost:6333"
export TOOL_OFFLOADED_DATA_PATH="/path/to/tool_offloaded_data"

Option 3: Qdrant Cloud

export OPENAI_API_KEY="sk-proj-..."
export QDRANT_URL="https://your-cluster.qdrant.io"
export QDRANT_API_KEY="your-qdrant-api-key"
export TOOL_OFFLOADED_DATA_PATH="/path/to/tool_offloaded_data"

Option 4: Create a .env file

# .env
OPENAI_API_KEY=sk-proj-...
QDRANT_DATA_PATH=/path/to/qdrant_data
# OR for remote: QDRANT_URL=http://localhost:6333
TOOL_OFFLOADED_DATA_PATH=/path/to/tool_offloaded_data

Then use the --env-file flag when running commands:

uvx --env-file .env omnimcp index --config-path mcp-servers.json

Transport Modes

HTTP Transport (Default - Recommended)

Best for most use cases: remote access, web integrations, or when using with mcp-remote or mcp-proxy.

uvx omnimcp serve --config-path mcp-servers.json --transport http --host 0.0.0.0 --port 8000

Use cases:

Remote access - Serve OmniMCP on a server, connect from anywhere
Multiple clients - Share one OmniMCP instance across multiple agents
Web integrations - REST API access to your MCP ecosystem
mcp-remote/mcp-proxy - Expose OmniMCP through MCP proxy layers

Example with mcp-remote:

npm install mcp-remote

{
  "mcpServers": {
    "omnimcp-remote": {
      "command": "npx",
      "args": ["-y", "mcp-remote", "http://your-server:port/mcp"]
    }
  }
}

HTTP mode advantages:

Indexing happens once on server startup
Multiple clients share the same indexed data

stdio Transport

Best for local MCP clients that communicate via standard input/output (Claude Desktop, Cline, etc.).

⚠️ IMPORTANT: You MUST run uvx omnimcp index before using stdio mode to avoid slow startup times.

uvx omnimcp serve --config-path mcp-servers.json --transport stdio

Recommended workflow:

Index first - Run omnimcp index before adding to your MCP client config
Then mount - Add OmniMCP to your client's MCP configuration
Start serving - Client launches OmniMCP automatically via stdio

Example client config (claude_desktop_config.json):

{
  "mcpServers": {
    "omnimcp": {
      "command": "uvx",
      "args": ["omnimcp", "serve"],
      "env": {
        "CONFIG_PATH": "/path/mcp/config/file",
        "TRANSPORT": "stdio|http",
        "OPENAI_API_KEY": "sk-...",
        "QDRANT_DATA_PATH": "/path/to/qdrant_data",
        "TOOL_OFFLOADED_DATA_PATH": "/path/to/tool_offloaded_data"
        // add other env keys
      }
    }
  }
}

Troubleshooting `uvx` Issues

Error: spawn uvx ENOENT or command not found: uvx

This means uv is not installed or not in your PATH. Detailed troubleshooting guide • Official MCP docs.

Quick fixes:

Install uv (if not installed):

# macOS/Linux
brew install uv

# or official installer
curl -LsSf https://astral.sh/uv/install.sh | sh

Ensure uv is in PATH (macOS/Linux):

# Check if installed
which uvx

# If not found, add to PATH (add to ~/.zshrc or ~/.bashrc)
export PATH="$HOME/.local/bin:$PATH"

# Or create symlink
sudo ln -s ~/.local/bin/uvx /usr/local/bin/uvx

Use absolute path in config (find with which uvx):

"command": "/Users/you/.local/bin/uvx"  // macOS
"command": "C:\\Users\\you\\.local\\bin\\uvx.exe"  // Windows

Alternative: Use mcp-remote for HTTP mode

If uvx issues persist, run OmniMCP via HTTP and connect through mcp-remote:

# Terminal 1: Run OmniMCP HTTP server
uvx omnimcp serve --config-path mcp-servers.json --transport http --port 8000

// Claude Desktop config
{
  "mcpServers": {
    "omnimcp": {
      "command": "npx",
      "args": ["-y", "mcp-remote", "http://localhost:8000/mcp"]
    }
  }
}

This bypasses stdio issues and works reliably across platforms.

Operations

Operation	Description
`search_tools`	Semantic search across all indexed tools
`get_server_info`	View server capabilities and limitations
`list_server_tools`	Browse tools on a specific server
`get_tool_details`	Get full schema before execution
`manage_server`	Start or shutdown server instances
`list_running_servers`	Show active servers
`execute_tool`	Run tools with optional background mode
`poll_task_result`	Check background task status
`get_content`	Retrieve offloaded content by reference

Content Management

Large tool results are automatically handled:

Text > 5000 tokens → Chunked, preview returned with reference ID
Images → Offloaded, described with GPT-4 vision, reference returned
Audio → Offloaded, reference returned

Retrieve full content with get_content(ref_id, chunk_index).

🔧 Technical Details

Architecture: Meta-Tool Pattern

OmniMCP uses the meta-tool pattern, similar to Claude Code's Agent Skills system. Instead of exposing dozens of individual tools, it exposes a single semantic_router tool as the only entry point to your entire MCP ecosystem.

Progressive Disclosure Workflow

Architecture & Request Flow

How It Works

Traditional MCP approach:

tools: [
  {name: "github_create_issue", description: "..."},
  {name: "github_create_pr", description: "..."},
  {name: "filesystem_read", description: "..."},
  // ... 50+ more tools
]

❌ Problems: Context bloat, tool hallucination, no caching

OmniMCP's meta-tool approach:

{
  "tools": [
    {
      "name": "semantic_router",
      "description": "Universal gateway to MCP ecosystem...\n
        OPERATIONS: search_tools, get_server_info, execute_tool...\n
        LIST OF INDEXED SERVERS:\n
        filesystem: 8 tools (file operations, read/write)\n
        github: 35 tools (issues, PRs, repos)\n
        ...",
      "input_schema": {
        "operation": "search_tools | execute_tool | ..."
      }
    }
  ]
}

✅ Benefits: Single tool definition, server list in description, dynamic discovery

Parallel to Claude Skills

Claude Skills and OmniMCP share the same architectural insight:

Aspect	Claude Skills	OmniMCP
Meta-tool	`Skill` tool	`semantic_router` tool
Discovery	Skill descriptions in tool description	Server list + hints in tool description
Invocation	`Skill(command="skill-name")`	`semantic_router(operation="execute_tool", server_name=...)`
Context injection	Skill instructions loaded on invocation	Tool schemas fetched on-demand, returned as text
Cache-friendly	Tool definition never changes	Tool definition never changes
Dynamic list	`<available_skills>` section	`LIST OF INDEXED SERVERS` section
Behavioral hints	Skill descriptions guide LLM	Server `hints` field guides LLM

Key insight: Both systems inject instructions through prompt expansion rather than traditional function calling. The tool description becomes a dynamic directory that the LLM reads and reasons about, while actual execution details are loaded lazily.

Example of behavioral guidance:

Claude Skills:

<available_skills>
  skill-creator: "When user wants to create a new skill..."
  internal-comms: "When user wants to write internal communications..."
</available_skills>

OmniMCP hints:

{
  "elevenlabs": {
    "hints": [
      "When the user requests audio generation (speech or music), always execute in background mode",
      "Proactively offer to play audio using the ElevenLabs tool when contextually relevant"
    ]
  }
}

Both inject behavioral instructions that shape how the LLM uses the tools, not just what they do.

📄 License

MIT License - see LICENSE for details.

Related Research

OmniMCP builds on emerging research in scalable tool selection for LLM agents:

ScaleMCP: Dynamic and Auto-Synchronizing Model Context Protocol Tools

Lumer et al. (2025) introduce ScaleMCP, addressing similar challenges in MCP tool selection at scale. Their approach emphasizes:

Dynamic tool retrieval - Giving agents autonomy to discover and add tools during multi-turn interactions
Auto-synchronizing storage - Using MCP servers as the single source of truth via CRUD operations
Tool Document Weighted Average (TDWA) - Novel embedding strategy that selectively emphasizes critical tool document components

Their evaluation across 5,000 financial metric servers demonstrates substantial improvements in tool retrieval and agent invocation performance, validating the importance of semantic search in MCP ecosystems.

Key insight: Both OmniMCP and ScaleMCP recognize that traditional monolithic tool repositories don't scale. The future requires dynamic, semantic-first approaches to tool discovery.

Anthropic's Advanced Tool Use

Anthropic's Tool Search feature (2025) introduces three capabilities that align with OmniMCP's architecture:

Tool Search Tool - Discover thousands of tools without consuming context window
Programmatic Tool Calling - Invoke tools in code execution environments to reduce context impact
Tool Use Examples - Learn correct usage patterns beyond JSON schema definitions

Quote from Anthropic: "Tool results and definitions can sometimes consume 50,000+ tokens before an agent reads a request. Agents should discover and load tools on-demand, keeping only what's relevant for the current task."

This mirrors OmniMCP's core philosophy: expose minimal interface upfront (single semantic_router tool), discover tools semantically, load schemas progressively.

Convergent Evolution

These independent efforts converge on similar principles:

Semantic discovery over exhaustive enumeration
Progressive loading over upfront tool definitions
Agent autonomy to query and re-query tool repositories
Context efficiency as a first-class design constraint

OmniMCP implements these principles through semantic routing, lazy server loading, and content offloading—making large-scale MCP ecosystems practical today.

Development

git clone https://github.com/milkymap/omnimcp.git
cd omnimcp
uv sync
uv run pytest

Docker

Build

docker build -t omnimcp:latest -f Dockerfile .

Run

With local Qdrant storage:

docker run --rm -it \
  -v /path/to/qdrant_data:/data/qdrant \
  -v /path/to/tool_offloaded_data:/data/tool_offloaded_data \
  -v /path/to/mcp-servers.json:/app/config/mcp-servers.json:ro \
  --env-file .env.docker \
  -p 8000:8000 \
  omnimcp:latest serve

With remote Qdrant (Cloud or Docker):

docker run --rm -it \
  -v /path/to/tool_offloaded_data:/data/tool_offloaded_data \
  -v /path/to/mcp-servers.json:/app/config/mcp-servers.json:ro \
  --env-file .env.docker \
  -p 8000:8000 \
  omnimcp:latest serve

Environment file (.env.docker) - choose ONE Qdrant mode:

OPENAI_API_KEY=sk-proj-...
CONFIG_PATH=/app/config/mcp-servers.json
TOOL_OFFLOADED_DATA_PATH=/data/tool_offloaded_data
TRANSPORT=http
HOST=0.0.0.0
PORT=8000

# Option 1: Local Qdrant storage
QDRANT_DATA_PATH=/data/qdrant

# Option 2: Remote Qdrant (comment out QDRANT_DATA_PATH above)
# QDRANT_URL=https://your-cluster.qdrant.io
# QDRANT_API_KEY=your-api-key

Available commands:

# Show help
docker run --rm omnimcp:latest --help

# Index servers
docker run --rm \
  -v /path/to/mcp-servers.json:/app/config/mcp-servers.json:ro \
  --env-file .env.docker \
  omnimcp:latest index

# Serve (HTTP mode)
docker run --rm -d \
  -v /path/to/mcp-servers.json:/app/config/mcp-servers.json:ro \
  --env-file .env.docker \
  -p 8000:8000 \
  --name omnimcp \
  omnimcp:latest serve