🚀 CodeGraph CLI MCP Server
A high-performance CLI tool for managing MCP servers and indexing codebases with advanced architectural analysis capabilities.

🚀 Quick Start
1. Initialize a New Project
codegraph init
codegraph init --name my-project
2. Index Your Codebase
codegraph index .
codegraph index . --languages rust,python,typescript
RUST_LOG=info,codegraph_vector=debug codegraph index . --workers 10 --batch-size 256 --max-seq-len 512 --force
codegraph index . --watch
3. Start MCP Server
codegraph start stdio
codegraph start http --port 3000
codegraph start dual --port 3000
(Optional) Start with Local Embeddings
export CODEGRAPH_EMBEDDING_PROVIDER=local
export CODEGRAPH_LOCAL_MODEL=sentence-transformers/all-MiniLM-L6-v2
cargo run -p codegraph-api --features codegraph-vector/local-embeddings
4. Search Your Code
codegraph search "authentication handler"
codegraph search "fn authenticate" --search-type exact
codegraph search "function with async keyword" --search-type ast
✨ Features
Core Features
-
Project Indexing
- Multi-language support (Rust, Python, JavaScript, TypeScript, Go, Java, C++)
- Incremental indexing with file watching
- Parallel processing with configurable workers
- Smart caching for improved performance
-
MCP Server Management
- STDIO transport for direct communication
- HTTP streaming with SSE support
- Dual transport mode for maximum flexibility
- Background daemon mode with PID management
-
Code Search
- Semantic search using embeddings
- Exact match and fuzzy search
- Regex and AST-based queries
- Configurable similarity thresholds
-
Architecture Analysis
- Component relationship mapping
- Dependency analysis
- Code pattern detection
- Architecture visualization support
📦 Installation
Method 1: Install from Source
git clone https://github.com/jakedismo/codegraph-cli-mcp.git
cd codegraph-cli-mcp
cargo build --release
cargo install --path crates/codegraph-mcp
codegraph --version
Enabling Local Embeddings (Optional)
If you want to use a local embedding model (Hugging Face) instead of remote providers:
- Build with the local embeddings feature for crates that use vector search (the API and/or CLI server):
cargo build -p codegraph-api --features codegraph-vector/local-embeddings
cargo build -p core-rag-mcp-server --features codegraph-vector/local-embeddings
- Set environment variables to switch the provider at runtime:
export CODEGRAPH_EMBEDDING_PROVIDER=local
export CODEGRAPH_LOCAL_MODEL=sentence-transformers/all-MiniLM-L6-v2
- Run as usual (the first run will download model files from Hugging Face and cache them locally):
cargo run -p codegraph-api --features codegraph-vector/local-embeddings
Model cache locations:
- Default Hugging Face cache:
~/.cache/huggingface (or $HF_HOME) via hf-hub
- You can pre-populate this cache to run offline after the first download
Method 2: Install Pre-built Binary
curl -L https://github.com/jakedismo/codegraph-cli-mcp/releases/latest/download/codegraph-$(uname -s)-$(uname -m).tar.gz | tar xz
sudo mv codegraph /usr/local/bin/
codegraph --version
Method 3: Using Cargo
cargo install codegraph-mcp
codegraph --version
💻 Usage Examples
Basic Usage
codegraph init --name my-project
codegraph index .
codegraph start stdio
codegraph search "authentication handler"
Advanced Usage
RUST_LOG=info,codegraph_vector=debug codegraph index . --workers 10 --batch-size 256 --max-seq-len 512 --force --languages rust,python,typescript
codegraph start http --port 3000
codegraph search "function with async keyword" --search-type ast
📚 Documentation
Overview
CodeGraph is a powerful CLI tool that combines MCP (Model Context Protocol) server management with sophisticated code analysis capabilities. It provides a unified interface for indexing projects, managing embeddings, and running MCP servers with multiple transport options. All you now need is an Agent(s) to create your very own deep code and project knowledge synthehizer system!
Key Capabilities
- 🔍 Advanced Code Analysis: Parse and analyze code across multiple languages using Tree-sitter
- 🚄 Dual Transport Support: Run MCP servers with STDIO, HTTP, or both simultaneously
- 🎯 Vector Search: Semantic code search using FAISS-powered vector embeddings
- 📊 Graph-Based Architecture: Navigate code relationships with RocksDB-backed graph storage
- ⚡ High Performance: Optimized for large codebases with parallel processing and batched embeddings
- 🔧 Flexible Configuration: Extensive configuration options for embedding models and performance tuning
RAW PERFORMANCE ✨✨✨
170K lines of rust code in 0.49sec! 21024 embeddings in 3:24mins! On M3 Pro 32GB Qdrant/all-MiniLM-L6-v2-onnx on CPU no Metal acceleration used!
Parsing completed: 353/353 files, 169397 lines in 0.49s (714.5 files/s, 342852 lines/s)
[00:03:24] [########################################] 21024/21024 Embeddings complete
Architecture
CodeGraph System Architecture
┌─────────────────────────────────────────────────────┐
│ CLI Interface │
│ (codegraph CLI) │
└─────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────┐
│ Core Engine │
│ ┌─────────────┐ ┌──────────────┐ ┌────────────┐ │
│ │ Parser │ │ Graph Store │ │ Vector │ │
│ │ (Tree-sittr)│ │ (RocksDB) │ │ Search │ │
│ └─────────────┘ └──────────────┘ │ (FAISS) │ │
│ └────────────┘ │
└─────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────┐
│ MCP Server Layer │
│ ┌─────────────┐ ┌──────────────┐ ┌────────────┐ │
│ │ STDIO │ │ HTTP │ │ Dual │ │
│ │ Transport │ │ Transport │ │ Mode │ │
│ └─────────────┘ └──────────────┘ └────────────┘ │
└─────────────────────────────────────────────────────┘
Embeddings with ONNX Runtime (macOS)
- Default provider: CPU EP. Works immediately with Homebrew
onnxruntime.
- Optional CoreML EP: Set
CODEGRAPH_ONNX_EP=coreml to prefer CoreML when using an ONNX Runtime build that includes CoreML.
- Fallback: If CoreML EP init fails, CodeGraph logs a warning and falls back to CPU.
How to use ONNX embeddings
export CODEGRAPH_EMBEDDING_PROVIDER=onnx
export CODEGRAPH_ONNX_EP=cpu
export CODEGRAPH_LOCAL_MODEL=/path/to/onnx-file
export CODEGRAPH_EMBEDDING_PROVIDER=onnx
export CODEGRAPH_ONNX_EP=coreml
export CODEGRAPH_LOCAL_MODEL=/path/to/onnx-file
cargo install --path crates/codegraph-mcp --features "embeddings,codegraph-vector/onnx,faiss"
Notes
- ONNX Runtime on Apple platforms accelerates via CoreML, not Metal. If you need GPU acceleration on Apple Silicon, use CoreML where supported.
- Some models/operators may still run on CPU if CoreML doesn’t support them.
Enabling CoreML feature at build time
- The CoreML registration path is gated by the Cargo feature
onnx-coreml in codegraph-vector.
- Build with:
cargo build -p codegraph-vector --features "onnx,onnx-coreml"
- In a full workspace build, enable it via your consuming crate’s features or by adding:
--features codegraph-vector/onnx,codegraph-vector/onnx-coreml.
- You still need an ONNX Runtime library that was compiled with CoreML support; the feature only enables the registration call in our code.
Prerequisites
System Requirements
- Operating System: Linux, macOS, or Windows
- Rust: 1.75 or higher
- Memory: Minimum 4GB RAM (8GB recommended for large codebases)
- Disk Space: 1GB for installation + space for indexed data
Required Dependencies
brew install cmake clang
sudo apt-get update
sudo apt-get install cmake clang libssl-dev pkg-config
sudo dnf install cmake clang openssl-devel
Optional Dependencies
- FAISS (for vector search acceleration)
- Local Embeddings (HuggingFace + Candle + ONNX/ORT(coreML) osx-metal/cuda/cpu)
- Enables on-device embedding generation (no external API calls)
- Downloads models from HuggingFace Hub on first run and caches them locally
- Internet access required for the initial model download (or pre-populate cache)
- Default runs on CPU; advanced GPU backends (CUDA/Metal) require appropriate hardware and drivers
- CUDA (for GPU-accelerated embeddings)
- Git (for repository integration)
Performance Benchmarks
Run repeatable, end-to-end benchmarks that measure indexing speed (with local embeddings + FAISS), vector search latency, and graph traversal throughput.
Build with performance features
Pick one of the local embedding backends and enable FAISS:
cargo install --path crates/codegraph-mcp --features "embeddings,codegraph-vector/onnx,faiss"
cargo install --path crates/codegraph-mcp --features "embeddings-local,faiss"
Configure local embedding backend
ONNX (CoreML/CPU):
export CODEGRAPH_EMBEDDING_PROVIDER=onnx
export CODEGRAPH_ONNX_EP=coreml
export CODEGRAPH_LOCAL_MODEL=/path/to/model.onnx
Local HF + Candle (CPU/Metal/CUDA):
export CODEGRAPH_EMBEDDING_PROVIDER=local
export CODEGRAPH_LOCAL_MODEL=sentence-transformers/all-MiniLM-L6-v2
Run the benchmark
codegraph perf . \
--langs rust,ts,go \
--warmup 3 --trials 20 \
--batch-size 128 --device metal \
--clean --format json
What it measures
- Indexing: total time to parse -> embed -> build FAISS (global + shards)
- Embedding throughput: embeddings per second
- Vector search: latency (avg/p50/p95) across repeated queries
- Graph traversal: BFS depth=2 micro-benchmark
Sample output (numbers will vary by machine and codebase)
{
"env": {
"embedding_provider": "local",
"device": "metal",
"features": { "faiss": true, "embeddings": true }
},
"dataset": {
"path": "/repo/large-project",
"languages": ["rust","ts","go"],
"files": 18234,
"lines": 2583190
},
"indexing": {
"total_seconds": 186.4,
"embeddings": 53421,
"throughput_embeddings_per_sec": 286.6
},
"vector_search": {
"queries": 100,
"latency_ms": { "avg": 18.7, "p50": 12.3, "p95": 32.9 }
},
"graph": {
"bfs_depth": 2,
"visited_nodes": 1000,
"elapsed_ms": 41.8
}
}
Tips for reproducibility
- Use
--clean for cold start numbers, and run a second time for warm cache numbers.
- Close background processes that may compete for CPU/GPU.
- Pin versions:
rustc --version, FAISS build, and the embedding model.
- Record the host: CPU/GPU, RAM, storage, OS version.
CLI Commands
Global Options
codegraph [OPTIONS] <COMMAND>
Options:
-v, --verbose Enable verbose logging
--config <PATH> Configuration file path
-h, --help Print help
-V, --version Print version
Command Reference
init - Initialize CodeGraph Project
codegraph init [OPTIONS] [PATH]
Arguments:
[PATH] Project directory (default: current directory)
Options:
--name <NAME> Project name
--non-interactive Skip interactive setup
codegraph start <TRANSPORT> [OPTIONS]
Transports:
stdio STDIO transport (default)
http HTTP streaming transport
dual Both STDIO and HTTP
Options:
--config <PATH> Server configuration file
--daemon Run in background
--pid-file <PATH> PID file location
HTTP Options:
-h, --host <HOST> Host to bind (default: 127.0.0.1)
-p, --port <PORT> Port to bind (default: 3000)
--tls Enable TLS/HTTPS
--cert <PATH> TLS certificate file
--key <PATH> TLS key file
--cors Enable CORS
codegraph stop [OPTIONS]
Options:
--pid-file <PATH> PID file location
-f, --force Force stop without graceful shutdown
status - Check Server Status
codegraph status [OPTIONS]
Options:
--pid-file <PATH> PID file location
-d, --detailed Show detailed status information
codegraph index <PATH> [OPTIONS]
Arguments:
<PATH> Path to project directory
Options:
-l, --languages <LANGS> Languages to index (comma-separated)
--exclude <PATTERNS> Exclude patterns (gitignore format)
--include <PATTERNS> Include only these patterns
-r, --recursive Recursively index subdirectories
--force Force reindex
--watch Watch for changes
--workers <N> Number of parallel workers (default: 4)
search - Search Indexed Code
codegraph search <QUERY> [OPTIONS]
Arguments:
<QUERY> Search query
Options:
-t, --search-type <TYPE> Search type (semantic|exact|fuzzy|regex|ast)
-l, --limit <N> Maximum results (default: 10)
--threshold <FLOAT> Similarity threshold 0.0-1.0 (default: 0.7)
-f, --format <FORMAT> Output format (human|json|yaml|table)
config - Manage Configuration
codegraph config <ACTION> [OPTIONS]
Actions:
show Show current configuration
set <KEY> <VALUE> Set configuration value
get <KEY> Get configuration value
reset Reset to defaults
validate Validate configuration
Options:
--json Output as JSON (for 'show')
-y, --yes Skip confirmation (for 'reset')
codegraph stats [OPTIONS]
Options:
--index Show index statistics
--server Show server statistics
--performance Show performance metrics
-f, --format <FMT> Output format (table|json|yaml|human)
codegraph clean [OPTIONS]
Options:
--index Clean index database
--vectors Clean vector embeddings
--cache Clean cache files
--all Clean all resources
-y, --yes Skip confirmation prompt
Configuration
Configuration File Structure
Create a .codegraph/config.toml file:
[general]
project_name = "my-project"
version = "1.0.0"
log_level = "info"
[indexing]
languages = ["rust", "python", "typescript"]
exclude_patterns = ["**/node_modules/**", "**/target/**", "**/.git/**"]
include_patterns = ["src/**", "lib/**"]
recursive = true
workers = 4
watch_enabled = false
incremental = true
[embedding]
model = "openai"
dimension = 1536
batch_size = 100
cache_enabled = true
cache_size_mb = 500
[vector]
index_type = "flat"
nprobe = 10
similarity_metric = "cosine"
[database]
path = "~/.codegraph/db"
cache_size_mb = 128
compression = true
write_buffer_size_mb = 64
[server]
default_transport = "stdio"
http_host = "127.0.0.1"
http_port = 3000
enable_tls = false
cors_enabled = true
max_connections = 100
[performance]
max_file_size_kb = 1024
parallel_threads = 8
memory_limit_mb = 2048
optimization_level = "balanced"
Environment Variables
export CODEGRAPH_LOG_LEVEL=debug
export CODEGRAPH_DB_PATH=/custom/path/db
export CODEGRAPH_EMBEDDING_MODEL=local
export CODEGRAPH_HTTP_PORT=8080
Embedding Model Configuration
[embedding.openai]
api_key = "${OPENAI_API_KEY}"
model = "text-embedding-3-large"
dimension = 3072
[embedding.local]
model_path = "~/.codegraph/models/codestral.gguf"
device = "cpu"
context_length = 8192
User Workflows
Workflow 1: Complete Project Setup and Analysis
codegraph init --name my-awesome-project
codegraph config set embedding.model local
codegraph config set performance.optimization_level speed
codegraph index . --languages rust,python --recursive
codegraph start http --port 3000 --daemon
codegraph search "database connection" --limit 20
codegraph stats --index --performance
Workflow 2: Continuous Development with Watch Mode
codegraph index . --watch --workers 8 &
codegraph start dual --daemon
codegraph status --detailed
codegraph search "TODO" --search-type exact
Workflow 3: Integration with AI Tools
codegraph start stdio
cat > ~/.codegraph/mcp-config.json << EOF
{
"name": "codegraph-server",
"version": "1.0.0",
"tools": [
{
"name": "analyze_architecture",
"description": "Analyze codebase architecture"
},
{
"name": "find_patterns",
"description": "Find code patterns and anti-patterns"
}
]
}
EOF
Workflow 4: Large Codebase Optimization
codegraph config set performance.memory_limit_mb 8192
codegraph config set vector.index_type ivf
codegraph config set database.compression true
codegraph index /path/to/large/project \
--workers 16 \
--exclude "**/test/**,**/vendor/**"
codegraph search "class.*Controller" --search-type regex --limit 100
Integration Guide
Integrating with Claude Desktop
- Add to Claude Desktop configuration:
{
"mcpServers": {
"codegraph": {
"command": "codegraph",
"args": ["start", "stdio"],
"env": {
"CODEGRAPH_CONFIG": "~/.codegraph/config.toml"
}
}
}
}
- Restart Claude Desktop to load the MCP server
Integrating with VS Code
- Install the MCP extension for VS Code
- Add to VS Code settings:
{
"mcp.servers": {
"codegraph": {
"command": "codegraph",
"args": ["start", "stdio"],
"rootPath": "${workspaceFolder}"
}
}
}
API Integration
import requests
import json
base_url = "http://localhost:3000"
response = requests.post(f"{base_url}/index", json={
"path": "/path/to/project",
"languages": ["python", "javascript"]
})
response = requests.post(f"{base_url}/search", json={
"query": "async function",
"limit": 10
})
results = response.json()
Using with CI/CD
name: CodeGraph Analysis
on: [push, pull_request]
jobs:
analyze:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Install CodeGraph
run: |
cargo install codegraph-mcp
- name: Index Codebase
run: |
codegraph init --non-interactive
codegraph index . --languages rust,python
- name: Run Analysis
run: |
codegraph stats --index --format json > analysis.json
- name: Upload Results
uses: actions/upload-artifact@v2
with:
name: codegraph-analysis
path: analysis.json
Troubleshooting
Common Issues and Solutions
- Issue: Server fails to start
lsof -i :3000
codegraph stop --force
codegraph start http --port 3001
codegraph index . --workers 16
codegraph index . --exclude "**/node_modules/**,**/dist/**"
codegraph config set indexing.incremental true
- Issue: Out of memory during indexing
codegraph config set embedding.batch_size 50
codegraph config set performance.memory_limit_mb 1024
codegraph index . --streaming
- Issue: Vector search returns poor results
codegraph search "query" --threshold 0.5
codegraph config set embedding.model openai
codegraph index . --force
codegraph search "query" --search-type fuzzy
- Issue: Hugging Face model fails to download
export CODEGRAPH_LOCAL_MODEL=sentence-transformers/all-MiniLM-L6-v2
export HF_TOKEN=your_hf_access_token
ls -lah ~/.cache/huggingface
- Issue: Local embeddings are slow
Debug Mode
Enable debug logging for troubleshooting:
export RUST_LOG=debug
codegraph --verbose index .
tail -f ~/.codegraph/logs/codegraph.log
Health Checks
codegraph status --detailed
codegraph config validate
codegraph test db
codegraph test embeddings
🔧 Technical Details
Architecture
The CodeGraph system consists of a CLI interface, a core engine, and an MCP server layer. The core engine includes a parser, a graph store, and a vector search module. The MCP server layer supports STDIO, HTTP, or dual transport modes.
Performance Optimization
- Parallel processing and batched embeddings are used to optimize performance for large codebases.
- Smart caching is implemented to improve indexing speed.
- The system is optimized for multi-language support using Tree-sitter.
Embeddings
- ONNX Runtime is used for embeddings, with support for CPU and CoreML on macOS.
- FAISS is used for vector search, providing semantic code search capabilities.
📄 License
This project is dual-licensed under MIT and Apache 2.0 licenses. See LICENSE-MIT and LICENSE-APACHE for details.
🙏 Acknowledgments
Made with ❤️ by the CodeGraph Team