Behavioral MCP Server: AI Coding Assistant Verification for Secure & High

MCP As A Judge

MCP as a Judge is a behavioral MCP server that acts as a validation layer between AI coding assistants and LLMs. By enforcing evidence - based research, code quality reviews, and human decision - making intervention, it ensures the generation of safer and higher - quality code.

Developer tools Artificial intelligence chatbots #Code Review #Quality Gating #AI Verification #Secure Coding .Python

rating : 2.5 points

downloads : 7.9K

update time : 2025-12-29

Open Site

What is MCP as a Judge?

MCP as a Judge is a behavioral Model Context Protocol (MCP) server that acts as a validation layer between AI coding assistants and Large Language Models (LLMs). It enhances code quality by enforcing explicit LLM evaluations, ensuring that AI assistants conduct thorough research, formulate reasonable plans before writing code, and conduct rigorous reviews after code changes and test implementations.

How to use MCP as a Judge?

You can configure MCP as a Judge in AI coding assistants that support MCP (such as GitHub Copilot, Cursor, Claude Code, etc.). After configuration, when the AI assistant performs coding tasks, it will automatically or according to your instructions use the Judge's tools to evaluate plans, code changes, and test implementations, ensuring that each stage meets quality standards.

Applicable Scenarios

MCP as a Judge is most suitable for software development projects that require high-quality and secure code. It is particularly applicable to: - Teams that want to ensure that AI-generated code meets engineering standards - Individual developers who want to avoid common mistakes of AI assistants (such as using outdated information or reinventing the wheel) - Projects that need to enforce security best practices - Scenarios where human decision points need to be integrated into AI-assisted workflows

Main Features

Intelligent Code Evaluation

Intelligently evaluate code through the MCP sampling function, enforce software engineering standards, and mark risks in terms of security, performance, and maintainability.

Comprehensive Plan/Design Review

Verify architectural design, research depth, requirement matching, and implementation methods to ensure that the plan is reasonable and feasible.

User-Driven Decision-Making

Clarify requirements and resolve obstacles through the MCP guidance function, keep decision-making transparent, and ensure human participation in key decisions.

Security Verification

Verify security best practices in system design and code changes, and identify potential attack vectors and permission issues.

Task Workflow Management

Provide a complete task management tool with clear verification points at each stage from task setup to final completion.

Advantages

Improve code quality: Enforce engineering standards and best practices

Reduce errors: Avoid AI using outdated information or reinventing the wheel

Enhance security: Identify security risks and enforce defensive programming

Transparent decision-making: Keep human participation in key decisions and avoid one-sided AI decisions

Cross-platform compatibility: Support multiple AI coding assistants and development environments

Privacy protection: Process data locally without collecting user code and conversations

Limitations

Dependence on MCP support: Requires AI assistants to support the MCP protocol

Configuration complexity: Different assistants require different configuration methods

Performance overhead: Additional verification steps may increase development time

Learning curve: Requires time to adapt to new workflows and tools

Model dependence: The evaluation quality is affected by the capabilities of the LLM model used

How to Use

Choose an Installation Method

Choose the installation method according to your development environment. It is recommended to use the Docker method, which is the simplest and easy to update.

Configure the MCP Client

Add the MCP server configuration to your AI coding assistant. The configuration methods for different assistants vary slightly.

Set the LLM API Key (Optional)

If your AI assistant does not support the full MCP sampling function, you need to set the LLM API key as a fallback.

Select the Sampling Model

In VS Code, configure the model used by the Judge server through the command palette.

Start Using

In coding tasks, use the Judge tool for evaluation through prompts or automatic triggers.

Usage Examples

New Feature Development

When developing a new API endpoint, use the Judge to ensure that each stage from planning to implementation meets the standards.

Code Refactoring

When refactoring existing code, use the Judge to ensure that the changes do not introduce regression issues and maintain code quality.

Test Coverage Improvement

When adding tests to existing code, use the Judge to verify the quality and effectiveness of the tests.

Technical Decision Support

When choosing a technology stack or architecture, use the Judge to obtain objective evaluations and suggestions.

Frequently Asked Questions

What is the difference between MCP as a Judge and IDE built-in rules/agents (such as GitHub Copilot custom instructions, Cursor rules)?

If the Judge is not used automatically, how can I force it to be used?

What is the relationship between the Judge workflow and the task list? Why do we need both?

Which AI coding assistants fully support MCP as a Judge?

How to choose a sampling model for the Judge?

Will using the Judge affect my privacy?

Related Resources

GitHub Repository

Source code and latest version of MCP as a Judge

Model Context Protocol Official Website

Official documentation and specifications of the MCP protocol

VS Code One-Click Installation Link

Quickly install MCP as a Judge in VS Code

LiteLLM Project

Unified LLM API integration library used by the Judge

Contribution Guide

How to contribute to the MCP as a Judge project

🚀 MCP as a Judge ⚖️

MCP as a Judge serves as a validation layer between AI coding assistants and LLMs, ensuring safer and higher - quality code.

🚀 Quick Start

Requirements & Recommendations

MCP Client Prerequisites

MCP as a Judge heavily relies on the MCP Sampling and MCP Elicitation features for its core functionality:

MCP Sampling: Required for AI - powered code evaluation and judgment.
MCP Elicitation: Required for interactive user decision prompts.

System Prerequisites

Docker Desktop / Python 3.13+: Required for running the MCP server.

Supported AI Assistants

AI Assistant	Platform	MCP Support	Status	Notes
GitHub Copilot	Visual Studio Code	✅ Full	Recommended	Complete MCP integration with sampling and elicitation
Claude Code	-	⚠️ Partial	Requires LLM API key	Sampling Support feature request Elicitation Support feature request
Cursor	-	⚠️ Partial	Requires LLM API key	MCP support available, but sampling/elicitation limited
Augment	-	⚠️ Partial	Requires LLM API key	MCP support available, but sampling/elicitation limited
Qodo	-	⚠️ Partial	Requires LLM API key	MCP support available, but sampling/elicitation limited

✅ Recommended setup: GitHub Copilot + VS Code — full MCP sampling; no API key needed.

⚠️ Important Note

For assistants without full MCP sampling (Cursor, Claude Code, Augment, Qodo), you MUST set LLM_API_KEY. Without it, the server cannot evaluate plans or code. See LLM API Configuration.

💡 Usage Tip

Prefer large context models (≥ 1M tokens) for better analysis and judgments.

If the MCP server isn’t auto - used

For troubleshooting, visit the FAQs section.

✨ Features

MCP as a Judge is a behavioral MCP that strengthens AI coding assistants by requiring explicit LLM evaluations for:

Research, system design, and planning
Code changes, testing, and task - completion verification

It enforces evidence - based research, reuse over reinvention, and human - in - the - loop decisions.

If your IDE has rules/agents (Copilot, Cursor, Claude Code), keep using them—this Judge adds enforceable approval gates on plan, code diffs, and tests.

Key problems with AI coding assistants and LLMs

Treat LLM output as ground truth; skip research and use outdated information
Reinvent the wheel instead of reusing libraries and existing code
Cut corners: code below engineering standards and weak tests
Make unilateral decisions when requirements are ambiguous or plans change
Security blind spots: missing input validation, injection risks/attack vectors, least - privilege violations, and weak defensive programming

What it enforces

Evidence - based research and reuse (best practices, libraries, existing code)
Plan - first delivery aligned to user requirements
Human - in - the - loop decisions for ambiguity and blockers
Quality gates on code and tests (security, performance, maintainability)

Key capabilities

Intelligent code evaluation via MCP sampling; enforces software - engineering standards and flags security/performance/maintainability risks
Comprehensive plan/design review: validates architecture, research depth, requirements fit, and implementation approach
User - driven decisions via MCP elicitation: clarifies requirements, resolves obstacles, and keeps choices transparent
Security validation in system design and code changes

Tools and how they help

Tool	What it solves
`set_coding_task`	Creates/updates task metadata; classifies task_size; returns next - step workflow guidance
`get_current_coding_task`	Recovers the latest task_id and metadata to resume work safely
`judge_coding_plan`	Validates plan/design; requires library selection and internal reuse maps; flags risks
`judge_code_change`	Reviews unified Git diffs for correctness, reuse, security, and code quality
`judge_testing_implementation`	Validates tests using real runner output and optional coverage
`judge_coding_task_completion`	Final gate ensuring plan, code, and tests approvals before completion
`raise_missing_requirements`	Elicits missing details and decisions to unblock progress
`raise_obstacle`	Engages the user on trade - offs, constraints, and enforced changes

📦 Installation

Method 1: Using Docker (Recommended)

One - click install for VS Code (MCP)

Notes:

VS Code controls the sampling model; select it via “MCP: List Servers → mcp - as - a - judge → Configure Model Access”.

Configure MCP Settings: Add this to your MCP client configuration file:
```
{
  "command": "docker",
  "args": ["run", "--rm", "-i", "--pull=always", "ghcr.io/othervibes/mcp-as-a-judge:latest"],
  "env": {
    "LLM_API_KEY": "your-openai-api-key-here",
    "LLM_MODEL_NAME": "gpt-4o-mini"
  }
}
```
📝 Configuration Options (All Optional):
- LLM_API_KEY: Optional for GitHub Copilot + VS Code (has built - in MCP sampling).
- LLM_MODEL_NAME: Optional custom model (see Supported LLM Providers for defaults).
- The --pull=always flag ensures you always get the latest version automatically.
Then manually update when needed:
```
# Pull the latest version
docker pull ghcr.io/othervibes/mcp-as-a-judge:latest
```

Method 2: Using uv

Install the package:
```
uv tool install mcp-as-a-judge
```
Configure MCP Settings: The MCP server may be automatically detected by your MCP - enabled client. 📝 Notes:
- No additional configuration needed for GitHub Copilot + VS Code (has built - in MCP sampling).
- LLM_API_KEY is optional and can be set via environment variable if needed.

To update to the latest version:

# Update MCP as a Judge to the latest version
uv tool upgrade mcp-as-a-judge

Select a sampling model in VS Code

Open Command Palette (Cmd/Ctrl + Shift + P) → “MCP: List Servers”.
Select the configured server “mcp - as - a - judge”.
Choose “Configure Model Access”.
Check your preferred model(s) to enable sampling.

📚 Documentation

🔑 LLM API Configuration (Optional)

For AI assistants without full MCP sampling support you can configure an LLM API key as a fallback. This ensures MCP as a Judge works even when the client doesn't support MCP sampling.

Set LLM_API_KEY (unified key). Vendor is auto - detected; optionally set LLM_MODEL_NAME to override the default.

Supported LLM Providers

Rank	Provider	API Key Format	Default Model	Notes
1	OpenAI	`sk-...`	`gpt-4.1`	Fast and reliable model optimized for speed
2	Anthropic	`sk-ant-...`	`claude-sonnet-4-20250514`	High - performance with exceptional reasoning
3	Google	`AIza...`	`gemini-2.5-pro`	Most advanced model with built - in thinking
4	Azure OpenAI	`[a-f0-9]{32}`	`gpt-4.1`	Same as OpenAI but via Azure
5	AWS Bedrock	AWS credentials	`anthropic.claude-sonnet-4-20250514-v1:0`	Aligned with Anthropic
6	Vertex AI	Service Account JSON	`gemini-2.5-pro`	Enterprise Gemini via Google Cloud
7	Groq	`gsk_...`	`deepseek-r1`	Best reasoning model with speed advantage
8	OpenRouter	`sk-or-...`	`deepseek/deepseek-r1`	Best reasoning model available
9	xAI	`xai-...`	`grok-code-fast-1`	Latest coding - focused model (Aug 2025)
10	Mistral	`[a-f0-9]{64}`	`pixtral-large`	Most advanced model (124B params)

Client - Specific Setup

Cursor

Open Cursor Settings:
- Go to File → Preferences → Cursor Settings.
- Navigate to the MCP tab.
- Click + Add to add a new MCP server.

Add MCP Server Configuration:

{
  "command": "uv",
  "args": ["tool", "run", "mcp-as-a-judge"],
  "env": {
    "LLM_API_KEY": "your-openai-api-key-here",
    "LLM_MODEL_NAME": "gpt-4.1"
  }
}

📝 Configuration Options:

LLM_API_KEY: Required for Cursor (limited MCP sampling).
LLM_MODEL_NAME: Optional custom model (see Supported LLM Providers for defaults).

Claude Code

Add MCP Server via CLI:

# Set environment variables first (optional model override)
export LLM_API_KEY="your_api_key_here"
export LLM_MODEL_NAME="claude-3-5-haiku"  # Optional: faster/cheaper model

# Add MCP server
claude mcp add mcp-as-a-judge -- uv tool run mcp-as-a-judge

Alternative: Manual Configuration:
- Create or edit ~/.config/claude-code/mcp_servers.json
```
{
  "command": "uv",
  "args": ["tool", "run", "mcp-as-a-judge"],
  "env": {
    "LLM_API_KEY": "your-anthropic-api-key-here",
    "LLM_MODEL_NAME": "claude-3-5-haiku"
  }
}
```
📝 Configuration Options:
- LLM_API_KEY: Required for Claude Code (limited MCP sampling).
- LLM_MODEL_NAME: Optional custom model (see Supported LLM Providers for defaults).

Other MCP Clients

For other MCP - compatible clients, use the standard MCP server configuration:

{
  "command": "uv",
  "args": ["tool", "run", "mcp-as-a-judge"],
  "env": {
    "LLM_API_KEY": "your-openai-api-key-here",
    "LLM_MODEL_NAME": "gpt-5"
  }
}

📝 Configuration Options:

LLM_API_KEY: Required for most MCP clients (except GitHub Copilot + VS Code).
LLM_MODEL_NAME: Optional custom model (see Supported LLM Providers for defaults).

🔧 Technical Details

Privacy & Flexible AI Integration

🔑 MCP Sampling (Preferred) + LLM API Key Fallback

Primary Mode: MCP Sampling

All judgments are performed using MCP Sampling capability.
No need to configure or pay for external LLM API services.
Works directly with your MCP - compatible client's existing AI model.
Currently supported by: GitHub Copilot + VS Code.

Fallback Mode: LLM API Key

When MCP sampling is not available, the server can use LLM API keys.
Supports multiple providers via LiteLLM: OpenAI, Anthropic, Google, Azure, Groq, Mistral, xAI.
Automatic vendor detection from API key patterns.
Default model selection per vendor when no model is specified.

🛡️ Your Privacy Matters

The server runs locally on your machine.
No data collection - your code and conversations stay private.
No external API calls when using MCP Sampling. If you set LLM_API_KEY for fallback, the server will call your chosen LLM provider only to perform judgments (plan/code/test) with the evaluation content you provide.
Complete control over your development workflow and sensitive information.

🤝 Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

Development Setup

# Clone the repository
git clone https://github.com/OtherVibes/mcp-as-a-judge.git
cd mcp-as-a-judge

# Install dependencies with uv
uv sync --all-extras --dev

# Install pre-commit hooks
uv run pre-commit install

# Run tests
uv run pytest

# Run all checks
uv run pytest && uv run ruff check && uv run ruff format --check && uv run mypy src

© Concepts and Methodology

© 2025 OtherVibes and Zvi Fried. The "MCP as a Judge" concept, the "behavioral MCP" approach, the staged workflow (plan → code → test → completion), tool taxonomy/descriptions, and prompt templates are original work developed in this repository.

Prior Art and Attribution

While “LLM - as - a - judge” is a broadly known idea, this repository defines the original “MCP as a Judge” behavioral MCP pattern by OtherVibes and Zvi Fried. It combines task - centric workflow enforcement (plan → code → test → completion), explicit LLM - based validations, and human - in - the - loop elicitation, along with the prompt templates and tool taxonomy provided here. Please attribute as: “OtherVibes – MCP as a Judge (Zvi Fried)”.

❓ FAQ

How is “MCP as a Judge” different from rules/subagents in IDE assistants (GitHub Copilot, Cursor, Claude Code)?

Feature	IDE Rules	Subagents	MCP as a Judge
Static behavior guidance	✓	✓	✗
Custom system prompts	✓	✓	✓
Project context integration	✓	✓	✓
Specialized task handling	✗	✓	✓
Active quality gates	✗	✗	✓
Evidence - based validation	✗	✗	✓
Approve/reject with feedback	✗	✗	✓
Workflow enforcement	✗	✗	✓
Cross - assistant compatibility	✗	✗	✓

References: GitHub Copilot Custom Instructions, Cursor Rules, Claude Code Subagents

How does the Judge workflow relate to the tasklist? Why do we need both?

Tasklist = planning/organization: tracks tasks, priorities, and status. It doesn’t guarantee engineering quality or readiness.
Judge workflow = quality gates: enforces approvals for plan/design, code diffs, tests, and final completion. It demands real evidence (e.g., unified Git diffs and raw test output) and returns structured approvals and required improvements.
Together: Use the tasklist to organize work; use the Judge to decide when each stage is actually ready to proceed. The server also emits next_tool guidance to keep progress moving through the gates.

If the Judge isn’t used automatically, how do I force it?

In your prompt: "use mcp - as - a - judge" or "Evaluate plan/code/test using the MCP server mcp - as - a - judge".
VS Code: Command Palette → "MCP: List Servers" → ensure "mcp - as - a - judge" is listed and enabled.
Ensure the MCP server is running and, in your client, the judge tools are enabled/approved.