🚀 MCP Conceal
MCP Conceal is an MCP proxy that pseudo-anonymizes Personally Identifiable Information (PII) before data is sent to external AI providers such as Claude, ChatGPT, or Gemini. This helps protect sensitive information while maintaining the necessary data for AI analysis.
sequenceDiagram
participant C as AI Client (Claude)
participant P as MCP Conceal
participant S as Your MCP Server
C->>P: Request
P->>S: Request
S->>P: Response with PII
P->>P: PII Detection
P->>P: Pseudo-Anonymization
P->>P: Consistent Mapping
P->>C: Sanitized Response
MCP Conceal performs pseudo-anonymization instead of redaction to preserve the semantic meaning and data relationships required for AI analysis. For example, john.smith@acme.com becomes mike.wilson@techcorp.com, protecting sensitive information while maintaining the data structure.
🚀 Quick Start
Prerequisites
Install Ollama for LLM-based PII detection:
- Install Ollama: ollama.ai
- Pull model:
ollama pull llama3.2:3b
- Verify:
curl http://localhost:11434/api/version
Basic Usage
Create a minimal mcp-server-conceal.toml:
[detection]
mode = "regex_llm"
[llm]
model = "llama3.2:3b"
endpoint = "http://localhost:11434"
See the Configuration section for all available options.
Run as proxy:
mcp-server-conceal \
--target-command python3 \
--target-args "my-mcp-server.py" \
--config mcp-server-conceal.toml
✨ Features
- Pseudo-anonymizes PII before data reaches external AI providers.
- Preserves semantic meaning and data relationships for AI analysis.
- Offers multiple detection modes to balance speed and accuracy.
- Allows for customization of detection prompts and configuration settings.
📦 Installation
Download Pre-built Binary
- Visit the Releases page
- Download the binary for your platform:
| Platform |
Binary |
| Linux x64 |
mcp-server-conceal-linux-amd64 |
| macOS Intel |
mcp-server-conceal-macos-amd64 |
| macOS Apple Silicon |
mcp-server-conceal-macos-aarch64 |
| Windows x64 |
mcp-server-conceal-windows-amd64.exe |
- Make executable:
chmod +x mcp-server-conceal-* (Linux/macOS)
- Add to PATH:
- Linux/macOS:
mv mcp-server-conceal-* /usr/local/bin/mcp-server-conceal
- Windows: Move to a directory in your PATH or add current directory to PATH
Building from Source
git clone https://github.com/gbrigandi/mcp-server-conceal
cd mcp-server-conceal
cargo build --release
Binary location: target/release/mcp-server-conceal
💻 Usage Examples
Basic Usage
mcp-server-conceal \
--target-command python3 \
--target-args "my-mcp-server.py" \
--config mcp-server-conceal.toml
Advanced Usage
Claude Desktop Integration
Configure Claude Desktop to proxy MCP servers:
{
"mcpServers": {
"database": {
"command": "mcp-server-conceal",
"args": [
"--target-command", "python3",
"--target-args", "database-server.py --host localhost",
"--config", "/path/to/mcp-server-conceal.toml"
],
"env": {
"DATABASE_URL": "postgresql://localhost/mydb"
}
}
}
}
Custom LLM Prompts
Customize detection prompts for specific domains:
Template locations:
- Linux:
~/.local/share/mcp-server-conceal/prompts/
- macOS:
~/Library/Application Support/com.mcp-server-conceal.mcp-server-conceal/prompts/
- Windows:
%LOCALAPPDATA%\\com\\mcp-server-conceal\\mcp-server-conceal\\data\\prompts\\
Usage:
- Run MCP Conceal once to auto-generate
default.md in the prompts directory:mcp-server-conceal --target-command echo --target-args "test" --config mcp-server-conceal.toml
- Copy:
cp default.md healthcare.md
- Edit template for domain-specific PII patterns
- Configure:
prompt_template = "healthcare"
Environment Variables
Pass environment variables to target process:
mcp-server-conceal \
--target-command node \
--target-args "server.js" \
--target-cwd "/path/to/server" \
--target-env "DATABASE_URL=postgresql://localhost/mydb" \
--target-env "API_KEY=secret123" \
--config mcp-server-conceal.toml
📚 Documentation
Configuration
Complete configuration reference:
[detection]
mode = "regex_llm"
enabled = true
confidence_threshold = 0.8
[detection.patterns]
email = "\\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Z|a-z]{2,}\\b"
phone = "\\b(?:\\+?1[-\\.\\s]?)?(?:\\(?[0-9]{3}\\)?[-\\.\\s]?)?[0-9]{3}[-\\.\\s]?[0-9]{4}\\b"
ssn = "\\b\\d{3}-\\d{2}-\\d{4}\\b"
credit_card = "\\b\\d{4}[-\\s]?\\d{4}[-\\s]?\\d{4}[-\\s]?\\d{4}\\b"
ip_address = "\\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\b"
url = "https?://[^\\s/$.?#].[^\\s]*"
[faker]
locale = "en_US"
seed = 12345
consistency = true
[mapping]
database_path = "mappings.db"
retention_days = 90
[llm]
model = "llama3.2:3b"
endpoint = "http://localhost:11434"
timeout_seconds = 180
prompt_template = "default"
[llm_cache]
enabled = true
database_path = "llm_cache.db"
max_text_length = 2000
Configuration Guidance
Detection Settings:
confidence_threshold: Lower values (0.6) catch more PII but increase false positives. Higher values (0.9) are more precise but may miss some PII.
mode: Choose based on your latency vs accuracy requirements (see Detection Modes below)
Faker Settings:
locale: Use "en_US" for American names/addresses, "en_GB" for British, etc. Affects realism of generated fake data
seed: Keep consistent across deployments to ensure same real data maps to same fake data
consistency: Always leave true to maintain data relationships
Mapping Settings:
retention_days: Balance between data consistency and storage. Shorter periods (30 days) reduce storage but may cause inconsistent anonymization for recurring data
database_path: Use absolute paths in production to avoid database location issues
Detection Modes
Choose the detection strategy based on your performance requirements and data complexity:
RegexLlm (Default)
Best for production environments - Combines speed and accuracy:
- Phase 1: Fast regex catches common patterns (emails, phones, SSNs)
- Phase 2: LLM analyzes remaining text for complex PII
- Use when: You need comprehensive detection with reasonable performance
- Performance: ~100-500ms per request depending on text size
- Configure:
mode = "regex_llm"
Regex Only
Best for high-volume, latency-sensitive applications:
- Uses only pattern matching - no AI analysis
- Use when: You have well-defined PII patterns and need <10ms response
- Trade-off: May miss contextual PII like "my account number is ABC123"
- Configure:
mode = "regex"
LLM Only
Best for complex, unstructured data:
- AI-powered detection catches nuanced PII patterns
- Use when: Accuracy is more important than speed
- Performance: ~200-1000ms per request
- Configure:
mode = "llm"
🔧 Technical Details
MCP Conceal uses a combination of regular expressions and large language models (LLMs) to detect and pseudo-anonymize PII. The regular expressions are used to quickly identify common PII patterns, while the LLM is used to analyze the remaining text for more complex PII. The tool also uses a SQLite database to store real-to-fake mappings, ensuring consistent anonymization across restarts.
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.