MCP Proxy Service for PII Pseudonymization Before Sending Data to External AI

MCP Server Conceal

MCP Conceal is a proxy service that performs pseudo-anonymization of personally identifiable information (PII) before sending data to external AI providers (such as Claude, ChatGPT, or Gemini) to protect sensitive information. It detects PII and replaces it with fictional but structurally similar fake data using a consistent mapping relationship, thereby maintaining the semantic relationships and structure of the data while protecting privacy.

Security Law and compliance #Data privacy #AI proxy #Anonymization #PII protection .Rust

rating : 2 points

downloads : 5.8K

update time : 2025-08-07

Open Site

Installation

Copy the following command to your Client for configuration

{
  "mcpServers": {
    "database": {
      "command": "mcp-server-conceal",
      "args": [
        "--target-command", "python3",
        "--target-args", "database-server.py --host localhost",
        "--config", "/path/to/mcp-server-conceal.toml"
      ],
      "env": {
        "DATABASE_URL": "postgresql://localhost/mydb"
      }
    }
  }
}

Note: Your key is sensitive information, do not share it with anyone.

🚀 MCP Conceal

MCP Conceal is an MCP proxy that pseudo-anonymizes Personally Identifiable Information (PII) before data is sent to external AI providers such as Claude, ChatGPT, or Gemini. This helps protect sensitive information while maintaining the necessary data for AI analysis.

sequenceDiagram
    participant C as AI Client (Claude)
    participant P as MCP Conceal
    participant S as Your MCP Server
    
    C->>P: Request
    P->>S: Request
    S->>P: Response with PII
    P->>P: PII Detection
    P->>P: Pseudo-Anonymization
    P->>P: Consistent Mapping
    P->>C: Sanitized Response

MCP Conceal performs pseudo-anonymization instead of redaction to preserve the semantic meaning and data relationships required for AI analysis. For example, john.smith@acme.com becomes mike.wilson@techcorp.com, protecting sensitive information while maintaining the data structure.

🚀 Quick Start

Prerequisites

Install Ollama for LLM-based PII detection:

Install Ollama: ollama.ai
Pull model: ollama pull llama3.2:3b
Verify: curl http://localhost:11434/api/version

Basic Usage

Create a minimal mcp-server-conceal.toml:

[detection]
mode = "regex_llm"

[llm]
model = "llama3.2:3b"
endpoint = "http://localhost:11434"

See the Configuration section for all available options.

Run as proxy:

mcp-server-conceal \
  --target-command python3 \
  --target-args "my-mcp-server.py" \
  --config mcp-server-conceal.toml

✨ Features

Pseudo-anonymizes PII before data reaches external AI providers.
Preserves semantic meaning and data relationships for AI analysis.
Offers multiple detection modes to balance speed and accuracy.
Allows for customization of detection prompts and configuration settings.

📦 Installation

Download Pre-built Binary

Visit the Releases page
Download the binary for your platform:

Platform	Binary
Linux x64	`mcp-server-conceal-linux-amd64`
macOS Intel	`mcp-server-conceal-macos-amd64`
macOS Apple Silicon	`mcp-server-conceal-macos-aarch64`
Windows x64	`mcp-server-conceal-windows-amd64.exe`

Make executable: chmod +x mcp-server-conceal-* (Linux/macOS)
Add to PATH:
- Linux/macOS: mv mcp-server-conceal-* /usr/local/bin/mcp-server-conceal
- Windows: Move to a directory in your PATH or add current directory to PATH

Building from Source

git clone https://github.com/gbrigandi/mcp-server-conceal
cd mcp-server-conceal
cargo build --release

Binary location: target/release/mcp-server-conceal

💻 Usage Examples

Basic Usage

mcp-server-conceal \
  --target-command python3 \
  --target-args "my-mcp-server.py" \
  --config mcp-server-conceal.toml

Advanced Usage

Claude Desktop Integration

Configure Claude Desktop to proxy MCP servers:

{
  "mcpServers": {
    "database": {
      "command": "mcp-server-conceal",
      "args": [
        "--target-command", "python3",
        "--target-args", "database-server.py --host localhost",
        "--config", "/path/to/mcp-server-conceal.toml"
      ],
      "env": {
        "DATABASE_URL": "postgresql://localhost/mydb"
      }
    }
  }
}

Custom LLM Prompts

Customize detection prompts for specific domains: Template locations:

Linux: ~/.local/share/mcp-server-conceal/prompts/
macOS: ~/Library/Application Support/com.mcp-server-conceal.mcp-server-conceal/prompts/
Windows: %LOCALAPPDATA%\\com\\mcp-server-conceal\\mcp-server-conceal\\data\\prompts\\

Usage:

Run MCP Conceal once to auto-generate default.md in the prompts directory:

mcp-server-conceal --target-command echo --target-args "test" --config mcp-server-conceal.toml

Copy: cp default.md healthcare.md
Edit template for domain-specific PII patterns
Configure: prompt_template = "healthcare"

Environment Variables

Pass environment variables to target process:

mcp-server-conceal \
  --target-command node \
  --target-args "server.js" \
  --target-cwd "/path/to/server" \
  --target-env "DATABASE_URL=postgresql://localhost/mydb" \
  --target-env "API_KEY=secret123" \
  --config mcp-server-conceal.toml

📚 Documentation

Configuration

Complete configuration reference:

[detection]
mode = "regex_llm"                # Detection strategy: regex, llm, regex_llm
enabled = true                    
confidence_threshold = 0.8        # Detection confidence threshold (0.0-1.0)

[detection.patterns]
email = "\\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Z|a-z]{2,}\\b"
phone = "\\b(?:\\+?1[-\\.\\s]?)?(?:\\(?[0-9]{3}\\)?[-\\.\\s]?)?[0-9]{3}[-\\.\\s]?[0-9]{4}\\b"
ssn = "\\b\\d{3}-\\d{2}-\\d{4}\\b"
credit_card = "\\b\\d{4}[-\\s]?\\d{4}[-\\s]?\\d{4}[-\\s]?\\d{4}\\b"
ip_address = "\\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\b"
url = "https?://[^\\s/$.?#].[^\\s]*"

[faker]
locale = "en_US"                  # Locale for generating realistic fake PII data
seed = 12345                      # Seed ensures consistent anonymization across restarts
consistency = true                # Same real PII always maps to same fake data

[mapping]
database_path = "mappings.db"     # SQLite database storing real-to-fake mappings
retention_days = 90               # Delete old mappings after N days

[llm]
model = "llama3.2:3b"             # Ollama model for PII detection
endpoint = "http://localhost:11434"
timeout_seconds = 180
prompt_template = "default"       # Template for PII detection prompts

[llm_cache]
enabled = true                    # Cache LLM detection results for performance
database_path = "llm_cache.db"
max_text_length = 2000

Configuration Guidance

Detection Settings:

confidence_threshold: Lower values (0.6) catch more PII but increase false positives. Higher values (0.9) are more precise but may miss some PII.
mode: Choose based on your latency vs accuracy requirements (see Detection Modes below)

Faker Settings:

locale: Use "en_US" for American names/addresses, "en_GB" for British, etc. Affects realism of generated fake data
seed: Keep consistent across deployments to ensure same real data maps to same fake data
consistency: Always leave true to maintain data relationships

Mapping Settings:

retention_days: Balance between data consistency and storage. Shorter periods (30 days) reduce storage but may cause inconsistent anonymization for recurring data
database_path: Use absolute paths in production to avoid database location issues

Detection Modes

Choose the detection strategy based on your performance requirements and data complexity:

RegexLlm (Default)

Best for production environments - Combines speed and accuracy:

Phase 1: Fast regex catches common patterns (emails, phones, SSNs)
Phase 2: LLM analyzes remaining text for complex PII
Use when: You need comprehensive detection with reasonable performance
Performance: ~100-500ms per request depending on text size
Configure: mode = "regex_llm"

Regex Only

Best for high-volume, latency-sensitive applications:

Uses only pattern matching - no AI analysis
Use when: You have well-defined PII patterns and need <10ms response
Trade-off: May miss contextual PII like "my account number is ABC123"
Configure: mode = "regex"

LLM Only

Best for complex, unstructured data:

AI-powered detection catches nuanced PII patterns
Use when: Accuracy is more important than speed
Performance: ~200-1000ms per request
Configure: mode = "llm"

🔧 Technical Details

MCP Conceal uses a combination of regular expressions and large language models (LLMs) to detect and pseudo-anonymize PII. The regular expressions are used to quickly identify common PII patterns, while the LLM is used to analyze the remaining text for more complex PII. The tool also uses a SQLite database to store real-to-fake mappings, ensuring consistent anonymization across restarts.