Implement MCP Code Execution: Secure Python Run with Low Cost for Data Science

MCP Server Code Execution Mode

This is a server that implements the MCP code execution mode. Through a single-tool bridge and zero-context discovery mechanism, it reduces the overhead of MCP tool calls from tens of thousands of tokens to about 200 tokens and securely executes Python code in the root container, supporting data science and secure isolation.

Developer tools Artificial intelligence chatbots #Code Execution #Secure Sandbox #MCP Bridge #Zero-Context .Python

rating : 3.5 points

downloads : 8.6K

update time : 2025-12-03

Open Site

What is the MCP Code Execution Server?

This is an innovative MCP (Model Context Protocol) server that changes the way AI assistants use tools. Traditional methods require loading descriptions of all tools into the AI's context (possibly consuming over 30,000 tokens), while this server only exposes one core tool: `run_python`. The AI assistant writes Python code to discover, call, and combine other MCP tools, greatly reducing context overhead.

How to use the MCP Code Execution Server?

You only need to configure this server in your AI assistant (such as Claude Desktop), and the AI assistant can use various MCP tools by writing Python code. The server will automatically discover all MCP tools you've configured and execute the code in a secure container sandbox to ensure system security.

Applicable Scenarios

Suitable for complex workflows that require using multiple MCP tools simultaneously, such as data science analysis, file processing, cross-system integration, automation scripts, etc. Especially suitable for users who want AI assistants to write code to combine multiple tools to complete complex tasks.

Main Features

Zero-Context Discovery

AI assistants can dynamically discover available MCP tools without pre-loading descriptions of all tools, reducing context token consumption by over 90%.

Code-First Execution

AI assistants write Python code to call tools, supporting complex logic, loops, and conditional judgments, which is more powerful than simple tool calls.

Secure Sandbox

All code is executed in a container with no network access, a read-only file system, and unprivileged users to ensure system security.

Persistent State

The code execution environment remains persistent, and variables, imports, and function definitions are retained between multiple calls, supporting complex workflows.

Multi-Server Support

Automatically discovers and proxies any standard MCP server, supporting seamless integration of over 100 tools.

Intelligent Tool Search

Provides a fuzzy search function. AI assistants can search for tools by keywords without remembering specific tool names.

Advantages

Significantly reduce context overhead: from over 30,000 tokens to about 200 tokens

More powerful tool combination ability: AI can write complex logic to combine multiple tools

Better security: code is executed in a strictly isolated container

More flexible tool discovery: dynamically discover tools without pre-configuring all tools

Support for complex workflows: support loops, conditional judgments, error handling, etc.

Compatible with the existing MCP ecosystem: can proxy any standard MCP server

Limitations

Requires the AI assistant to have the ability to write code

Initial setup requires configuring the container runtime (Podman/Docker)

Code execution has resource limitations (memory, CPU, time)

Does not support network access (for security reasons)

Requires Python 3.14 or higher

How to Use

Install Dependencies

Ensure that Python 3.14+ and the container runtime (Podman or Docker) are installed on the system.

Install the Server

Use the uv tool to install the MCP Code Execution Server.

Configure the AI Assistant

Add this server to the MCP configuration file of the AI assistant.

Configure MCP Tools

Create a JSON file in the ~/MCPs directory to configure the MCP tools you need.

Start Using

Start the AI assistant. Now it can write Python code to use the configured MCP tools.

Usage Examples

File Processing Workflow

The AI assistant discovers the file system tool, lists files, and searches for files containing specific content.

Cross-System Integration

The AI assistant combines Jira and GitHub tools to automatically create GitHub issues from bugs in Jira.

Data Analysis Task

The AI assistant uses data science tools to process and analyze data files.

Frequently Asked Questions

Is this server secure?

Do I need to learn Python to use it?

Which MCP tools are supported?

Is there a time limit for code execution?

How to add new MCP tools?

What's the difference from the traditional MCP usage method?

Related Resources

GitHub Repository

Project source code and latest updates

Anthropic Official Documentation

Anthropic's official article on code execution and MCP

MCP Protocol Documentation

Official documentation of the Model Context Protocol

Docker MCP Gateway

Docker's blog post on dynamic MCPs

Apple CodeAct Research

Apple's research on code execution AI agents

🚀 MCP Code Execution Server: Zero-Context Discovery for 100+ MCP Tools

This bridge implements Anthropic's discovery pattern with rootless security. It reduces MCP context from 30K to 200 tokens while proxying any stdio server, helping you stop paying 30,000 tokens per query.

🚀 Quick Start

1. Prerequisites (macOS or Linux)

Check the Python version: python3 --version
If needed, install Python 3.14 via the package manager or from python.org.
For macOS, run brew install podman or brew install --cask docker.
For Ubuntu/Debian, run sudo apt-get install -y podman or curl -fsSL https://get.docker.com | sh.

curl -LsSf https://astral.sh/uv/install.sh | sh

podman pull python:3.14-slim
# or
docker pull python:3.14-slim

Note on Pydantic compatibility (Python 3.14): If you use Python 3.14, make sure you have a modern Pydantic release installed (e.g., pydantic >= 2.12.0). Some older Pydantic versions or environments that install a separate typing package from PyPI may raise errors like:

TypeError: _eval_type() got an unexpected keyword argument 'prefer_fwd_module'

If you encounter this error, run:

pip install -U pydantic
pip uninstall typing  # if present; the stdlib's typing should be used

Then re-run the project setup (e.g., remove .venv/ and run uv sync).

2. Install Dependencies

Use uv to sync the project environment:

uv sync

3. Launch Bridge

uvx --from git+https://github.com/elusznik/mcp-server-code-execution-mode mcp-server-code-execution-mode run

If you prefer to run from a local checkout, use the equivalent command:

uv run python mcp_server_code_execution_mode.py

4. Register with Your Agent

Add the following server configuration to your agent's MCP settings file (e.g., mcp_config.json, claude_desktop_config.json, etc.):

{
  "mcpServers": {
    "mcp-server-code-execution-mode": {
      "command": "uvx",
      "args": [
        "--from",
        "git+https://github.com/elusznik/mcp-server-code-execution-mode",
        "mcp-server-code-execution-mode",
        "run"
      ],
      "env": {
        "MCP_BRIDGE_RUNTIME": "podman"
      }
    }
  }
}

5. Execute Code

# Use MCP tools in sandboxed code
result = await mcp_filesystem.read_file(path='/tmp/test.txt')

# Complex workflows
data = await mcp_search.search(query="TODO")
await mcp_github.create_issue(repo='owner/repo', title=data.title)

Load Servers Explicitly

run_python only loads the MCP servers you request. Pass them via the servers array when you invoke the tool so proxies such as mcp_serena or mcp_filesystem become available inside the sandbox:

{
  "code": "print(await mcp_serena.search(query='latest AI papers'))",
  "servers": ["serena", "filesystem"]
}

If you omit the list, the discovery helpers still enumerate everything, but any RPC call that targets an unloaded server returns Server '<name>' is not available.

Note: The servers array only controls which proxies are generated for a sandbox invocation. It does not set server configuration fields such as cwd. The cwd property is part of the host/server config, and LLMs should call runtime.describe_server(name) or inspect runtime.list_loaded_server_metadata() to discover the configured cwd before assuming the server's working directory.

Note: Server configurations can include an optional cwd property. If present, the bridge will start the host MCP server process in that working directory; agents should check runtime.describe_server(name) to discover a server's configured cwd before making assumptions.

✨ Features

🛡️ Robustness & Reliability

Lazy Runtime Detection: Starts up instantly even if Podman/Docker isn't ready. Checks for runtime only when code execution is requested.
Self-Reference Prevention: Automatically detects and skips configurations that would launch the bridge recursively.
Noise Filtering: Ignores benign JSON parse errors (like blank lines) from chatty MCP clients.
Smart Volume Sharing: Probes Podman VMs to ensure volume sharing works, even on older versions.

🔒 Security First

Rootless containers: No privileged helpers required.
Network isolation: No network access.
Read-only filesystem: Immutable root.
Dropped capabilities: No system access.
Unprivileged user: Runs as UID 65534.
Resource limits: Memory, PIDs, CPU, time.
Auto-cleanup: Temporary IPC directories.

⚡ Performance

Persistent sessions: Variables and state retained across calls.
Persistent clients: MCP servers stay warm.
Context efficiency: 95%+ reduction vs traditional MCP.
Async execution: Proper resource management.
Single tool: Only run_python in Claude's context.

🔧 Developer Experience

Multiple access patterns:

mcp_servers["server"]           # Dynamic lookup
mcp_server_name                 # Attribute access
from mcp.servers.server import * # Module import

Top-level await: Modern Python patterns.
Type-safe: Proper signatures and docs.
Compact responses: Plain-text output by default with optional TOON blocks when requested.

Response Formats

Default (compact): Responses render as plain text plus a minimal structuredContent payload containing only non-empty fields. stdout/stderr lines stay intact, so prompts remain lean without sacrificing content.
Optional TOON: Set MCP_BRIDGE_OUTPUT_MODE=toon to emit Token-Oriented Object Notation blocks. We still drop empty fields and mirror the same structure in structuredContent; TOON is handy when you want deterministic tokenisation for downstream prompts.
Fallback JSON: If the TOON encoder is unavailable, we automatically fall back to pretty JSON blocks while preserving the trimmed payload.

Discovery Workflow

SANDBOX_HELPERS_SUMMARY in the tool schema only advertises the discovery helpers (discovered_servers(), list_servers(), query_tool_docs(), search_tool_docs(), etc.). It never includes individual server or tool documentation.
On first use, the LLM typically calls discovered_servers() (or list_servers_sync() for the cached list) to enumerate MCP servers, then query_tool_docs(server) / query_tool_docs_sync(server) or search_tool_docs("keyword") / search_tool_docs_sync("keyword") to fetch the relevant subset of documentation.
Tool metadata is streamed on demand, keeping the system prompt at roughly 200 tokens regardless of how many servers or tools are installed.
Once the LLM has the docs it needs, it writes Python that uses the generated mcp_<alias> proxies or mcp.runtime helpers to invoke tools.

Need a short description without probing the helpers? Call runtime.capability_summary() to print a one-paragraph overview suitable for replying to questions such as “what can the code-execution MCP do?”

📦 Installation

Prerequisites

macOS or Linux
Python 3.14+
Podman or Docker

Steps

Follow the steps in the Quick Start section to install the necessary dependencies and launch the bridge.

💻 Usage Examples

File Processing

# List and filter files
files = await mcp_filesystem.list_directory(path='/tmp')

for file in files:
    content = await mcp_filesystem.read_file(path=file)
    if 'TODO' in content:
        print(f"TODO in {file}")

Data Pipeline

# Extract data
transcript = await mcp_google_drive.get_document(documentId='abc123')

# Process
summary = transcript[:500] + "..."

# Store
await mcp_salesforce.update_record(
    objectType='SalesMeeting',
    recordId='00Q5f000001abcXYZ',
    data={'Notes': summary}
)

Multi-System Workflow

# Jira → GitHub migration
issues = await mcp_jira.search_issues(project='API', status='Open')

for issue in issues:
    details = await mcp_jira.get_issue(id=issue.id)

    if 'bug' in details.description.lower():
        await mcp_github.create_issue(
            repo='owner/repo',
            title=f"Bug: {issue.title}",
            body=details.description
        )

Inspect Available Servers

from mcp import runtime

print("Discovered:", runtime.discovered_servers())
print("Cached servers:", runtime.list_servers_sync())
print("Loaded metadata:", runtime.list_loaded_server_metadata())
print("Selectable via RPC:", await runtime.list_servers())

# Peek at tool docs for a server that's already loaded in this run
loaded = runtime.list_loaded_server_metadata()
if loaded:
  first = runtime.describe_server(loaded[0]["name"])
  for tool in first["tools"]:
    print(tool["alias"], "→", tool.get("description", ""))

# Ask for summaries or full schemas only when needed
if loaded:
  summaries = await runtime.query_tool_docs(loaded[0]["name"])
  detailed = await runtime.query_tool_docs(
    loaded[0]["name"],
    tool=summaries[0]["toolAlias"],
    detail="full",
  )
  print("Summaries:", summaries)
  print("Cached tools:", runtime.list_tools_sync(loaded[0]["name"]))
  print("Detailed doc:", detailed)

# Fuzzy search across loaded servers without rehydrating every schema
results = await runtime.search_tool_docs("calendar events", limit=3)
for result in results:
  print(result["server"], result["tool"], result.get("description", ""))

# Synchronous helpers for quick answers without extra awaits
print("Capability summary:", runtime.capability_summary())
print("Docs from cache:", runtime.query_tool_docs_sync(loaded[0]["name"]) if loaded else [])
print("Search from cache:", runtime.search_tool_docs_sync("calendar"))

Example output seen by the LLM when running the snippet above with the stub server:

Discovered: ('stub',)
Loaded metadata: ({'name': 'stub', 'alias': 'stub', 'tools': [{'name': 'echo', 'alias': 'echo', 'description': 'Echo the provided message', 'input_schema': {...}}]},)
Selectable via RPC: ('stub',)

Clients that prefer listMcpResources can skip executing the helper snippet and instead request the resource://mcp-server-code-execution-mode/capabilities resource. The server advertises it via resources/list, and reading it returns the same helper summary plus a short checklist for loading servers explicitly.

📚 Documentation

README.md: This file, provides a quick start guide.
GUIDE.md: A comprehensive user guide.
ARCHITECTURE.md: A technical deep dive into the project's architecture.
HISTORY.md: Details the evolution and lessons learned.
STATUS.md: Shows the current state and roadmap of the project.

🔧 Technical Details

Overview

This bridge implements the "Code Execution with MCP" pattern, which combines ideas from industry leaders:

Apple's CodeAct: "Your LLM Agent Acts Better when Generating Code."
Anthropic's Code execution with MCP: "Building more efficient agents."
Cloudflare's Code Mode: "LLMs are better at writing code to call MCP, than at calling MCP directly."
Docker's Dynamic MCPs: "Stop Hardcoding Your Agents’ World."
Terminal Bench's Terminus: "A realistic terminal environment for evaluating LLM agents."

Instead of exposing hundreds of individual tools to the LLM (which consumes massive context and confuses the model), this bridge exposes one tool: run_python. The LLM writes Python code to discover, call, and compose other tools.

Why This vs. JS "Code Mode"?

While there are JavaScript-based alternatives (like universal-tool-calling-protocol/code-mode), this project is built for Data Science and Security:

Feature	This Project (Python)	JS Code Mode (Node.js)
Native Language	Python (The language of AI/ML)	TypeScript/JavaScript
Data Science	Native (`pandas`, `numpy`, `scikit-learn`)	Impossible / Hacky
Isolation	Hard (Podman/Docker Containers)	Soft (Node.js VM)
Security	Enterprise (Rootless, No Net, Read-Only)	Process-level
Philosophy	Infrastructure (Standalone Bridge)	Library (Embeddable)

Choose this if: You want your agent to analyze data, generate charts, use scientific libraries, or if you require strict container-based isolation for running untrusted code.

What This Solves (That Others Don't)

The Pain: MCP Token Bankruptcy

Connecting Claude to 11 MCP servers with ~100 tools requires 30,000 tokens of tool schemas loaded into every prompt. That's $0.09 per query before you ask a single question. Scaling to 50 servers will break your context window.

Why Existing "Solutions" Fail

Docker MCP Gateway: Manages containers well, but still streams all tool schemas into Claude's context. No token optimization.
Cloudflare Code Mode: V8 isolates are fast, but you can't proxy your existing MCP servers (Serena, Wolfram, custom tools). Platform lock-in.
Academic Papers: Describe Anthropic's discovery pattern, but provide no hardened implementation.
Proofs of Concept: Skip security (no rootless), skip persistence (cold starts), skip proxying edge cases.

The Fix: Discovery-First Architecture

Constant 200-token overhead regardless of server count.
Proxy any stdio MCP server into rootless containers.
Fuzzy search across servers without preloading schemas.
Production-hardened with capability dropping and security isolation.

Architecture: How It Differs

Traditional MCP (Context-Bound)
┌─────────────────────────────┐
│   LLM Context (30K tokens)  │
│  - serverA.tool1: {...}     │
│  - serverA.tool2: {...}     │
│  - serverB.tool1: {...}     │
│  - … (dozens more)          │
└─────────────────────────────┘
        ↓
  LLM picks tool
        ↓
   Tool executes

This Bridge (Discovery-First)
┌─────────────────────────────┐
│  LLM Context (≈200 tokens)  │
│  “Use discovered_servers(), │
│   query_tool_docs(),        │
│   search_tool_docs()”       │
└─────────────────────────────┘
        ↓
      LLM discovers servers
        ↓
      LLM hydrates schemas
        ↓
      LLM writes Python
        ↓
   Bridge proxies execution

The result is a constant overhead. Whether you manage 10 or 1000 tools, the system prompt stays right-sized and schemas flow only when requested.

Comparison At A Glance

Capability	Docker MCP Gateway	Cloudflare Code Mode	Research Patterns	This Bridge
Solves token bloat	❌ Manual preload	❌ Fixed catalog	❌ Theory only	✅ Discovery runtime
Universal MCP proxying	✅ Containers	⚠️ Platform-specific	❌ Not provided	✅ Any stdio server
Rootless security	⚠️ Optional	✅ V8 isolate	❌ Not addressed	✅ Cap-dropped sandbox
Auto-discovery	⚠️ Catalog-bound	❌ N/A	❌ Not implemented	✅ 9 config paths
Tool doc search	❌	❌	⚠️ Conceptual	✅ `search_tool_docs()`
Production hardening	⚠️ Depends on you	✅ Managed service	❌ Prototype	✅ Tested bridge

Vs. Dynamic Toolsets (Speakeasy)

Speakeasy's Dynamic Toolsets use a 3-step flow: search_tools → describe_tools → execute_tool. While this saves tokens, it forces the agent into a "chatty" loop:

Search: "Find tools for GitHub issues"
Describe: "Get schema for create_issue"
Execute: "Call create_issue"

This Bridge (Code-First) collapses that loop:

Code: "Import mcp_github, search for 'issues', and create one if missing."

The agent writes a single Python script that performs discovery, logic, and execution in one round-trip. It's faster, cheaper (fewer intermediate LLM calls), and can handle complex logic (loops, retries) that a simple "execute" tool cannot.

Vs. OneMCP (Gentoro)

OneMCP provides a "Handbook" chat interface where you ask questions and it plans execution. This is great for simple queries but turns the execution into a black box.

This Bridge gives the agent raw, sandboxed control. The agent isn't asking a black box to "do it"; the agent is the programmer, writing the exact code to interact with the API. This allows for precise edge-case handling and complex data processing that a natural language planner might miss.

Unique Features

Two-stage discovery: discovered_servers() reveals what exists; query_tool_docs(name) loads only the schemas you need.

Fuzzy search across servers: Let the model find tools without memorising catalog names:

from mcp import runtime

matches = await runtime.search_tool_docs("calendar events", limit=5)
for hit in matches:
    print(hit["server"], hit["tool"], hit.get("description", ""))

Zero-copy proxying: Every tool call stays within the sandbox, mirrored over stdio with strict timeouts.
Rootless by default: Podman/Docker containers run with --cap-drop=ALL, read-only root, no-new-privileges, and explicit memory/PID caps.
Compact + TOON output: Minimal plain-text responses for most runs, with deterministic TOON blocks available via MCP_BRIDGE_OUTPUT_MODE=toon.

Who This Helps

Teams juggling double-digit MCP servers who cannot afford context bloat.
Agents that orchestrate loops, retries, and conditionals rather than single tool invocations.
Security-conscious operators who need rootless isolation for LLM-generated code.
Practitioners who want to reuse existing MCP catalogs without hand-curating manifests.

Philosophy: The "No-MCP" Approach

This server aligns with the philosophy that you might not need MCP at all for every little tool. Instead of building rigid MCP servers for simple tasks, you can use this server to give your agent raw, sandboxed access to Bash and Python.

Ad-Hoc Tools: Need a script to scrape a site or parse a file? Just write it and run it. No need to deploy a new MCP server.
Composability: Pipe outputs between commands, save intermediate results to files, and use standard Unix tools.
Safety: Unlike giving an agent raw shell access to your machine, this server runs everything in a secure, rootless container. You get the power of "Bash/Code" without the risk.

Configuration

Environment Variables

Variable	Default	Description
`MCP_BRIDGE_RUNTIME`	auto	Container runtime (podman/docker)
`MCP_BRIDGE_IMAGE`	python:3.14-slim	Container image
`MCP_BRIDGE_TIMEOUT`	30s	Default timeout
`MCP_BRIDGE_MAX_TIMEOUT`	120s	Max timeout
`MCP_BRIDGE_MEMORY`	512m	Memory limit
`MCP_BRIDGE_PIDS`	128	Process limit
`MCP_BRIDGE_CPUS`	-	CPU limit
`MCP_BRIDGE_CONTAINER_USER`	65534:65534	Run as UID:GID
`MCP_BRIDGE_RUNTIME_IDLE_TIMEOUT`	300s	Shutdown delay
`MCP_BRIDGE_STATE_DIR`	`~/MCPs`	Host directory for IPC sockets and temp state
`MCP_BRIDGE_OUTPUT_MODE`	`compact`	Response text format (`compact` or `toon`)
`MCP_BRIDGE_LOG_LEVEL`	`INFO`	Bridge logging verbosity

Server Discovery

Primary Location:

~/MCPs/*.json (Recommended)

Note: Support for scanning individual agent configuration files (e.g., .claude.json, .vscode/mcp.json) is currently postponed. Please place all your MCP server definitions .jsons in the ~/MCPs directory to ensure they are discovered.

Example Server (~/MCPs/filesystem.json):

{
  "mcpServers": {
    "filesystem": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
    }
  }
}

Note: To prevent recursive launches, the bridge automatically skips any config entry that appears to start mcp-server-code-execution-mode again (including uvx … mcp-server-code-execution-mode run). Set MCP_BRIDGE_ALLOW_SELF_SERVER=1 if you intentionally need to expose the bridge as a nested MCP server.

Docker MCP Gateway Integration

When you rely on docker mcp gateway run to expose third-party MCP servers, the bridge simply executes the gateway binary. The gateway is responsible for pulling tool images and wiring stdio transports, so make sure the host environment is ready:

Run docker login for every registry referenced in the gateway catalog (e.g., Docker Hub mcp/* images, ghcr.io/github/github-mcp-server). Without cached credentials, the pull step fails before any tools come online.
Provide required secrets for those servers—github-official needs github.personal_access_token, others may expect API keys or auth tokens. Use docker mcp secret set <name> (or whichever mechanism your gateway is configured with) so the container sees the values at start-up.
Mirror any volume mounts or environment variables that the catalog expects (filesystem paths, storage volumes, etc.). Missing mounts or credentials commonly surface as failed to connect: calling "initialize": EOF during the stdio handshake.
If list_tools only returns the internal management helpers (mcp-add, code-mode, …), the gateway never finished initializing the external servers—check the gateway logs for missing secrets or registry access errors.

State Directory & Volume Sharing

Runtime artifacts (including the generated /ipc/entrypoint.py and related handshake metadata) live under ~/MCPs/ by default. Set MCP_BRIDGE_STATE_DIR to relocate them.
When the selected runtime is Podman, the bridge automatically issues podman machine set --rootful --now --volume <state_dir>:<state_dir> so the VM can mount the directory. On older podman machine builds that do not support --volume, the bridge now probes the VM with podman machine ssh test -d <state_dir> and proceeds if the share is already available.
Docker Desktop does not expose a CLI for file sharing; ensure the chosen state directory is marked as shared in Docker Desktop → Settings → Resources → File Sharing before running the bridge.
To verify a share manually, run docker run --rm -v ~/MCPs:/ipc alpine ls /ipc (or the Podman equivalent) and confirm the files are visible.