MCP Evals

MCP Evals is a Node.js package and GitHub Action for evaluating MCP tool implementations. Ensure the MCP server tools work properly and perform well through LLM - based scoring.

Developer tools Artificial intelligence chatbots #Tool evaluation #LLM scoring #Automated testing #GitHub integration Local .TypeScript

rating : 2.5 points

downloads : 18

update time : 2025-04-29

What is MCP Evals?

MCP Evals is an evaluation tool that helps developers test and verify the functionality and performance of their Model Context Protocol (MCP) server tools. It uses large language models (LLMs) to automatically score and ensure the tools work as expected.

How to use MCP Evals?

You can use MCP Evals in two ways: through the Node.js package or GitHub Action. Simply create an evaluation configuration file and run the evaluation to get a detailed scoring report.

Applicable scenarios

It is suitable for scenarios where teams developing MCP tools need to continuously verify the tool quality or automatically check the tool performance in the CI/CD process.

Main features

Automatic LLM scoringAutomatically evaluate the quality of tool responses using large language models such as GPT - 4

Multi - dimensional evaluationProvide scores in five dimensions: accuracy, integrity, relevance, clarity, and reasoning ability

GitHub integrationRun automatically as a GitHub Action and feedback the results to the Pull Request

Advantages and limitations

Advantages

Automate the evaluation process and save manual testing time

Provide detailed scores and feedback to help improve the tool

Seamlessly integrate with the CI/CD process

Open - source projects can enjoy free OpenAI quotas

Limitations

Depends on the OpenAI API and requires an internet connection

Evaluation results may be affected by the subjectivity of the LLM

Requires certain configuration work

How to use

Installation

Install as a Node.js package or GitHub Action

Create an evaluation file

Create a TypeScript file to define your evaluation configuration

Run the evaluation

Run the evaluation through the CLI or GitHub Action

Usage examples

Weather tool evaluationEvaluate the accuracy and integrity of the information returned by the weather query tool

Knowledge retrieval evaluationEvaluate the accuracy and relevance of the information returned by the knowledge retrieval tool

Frequently Asked Questions

Do I need an OpenAI API key?

What model is used for evaluation?

How to interpret the scoring results?

Related resources

GitHub repository

Project source code and issue tracking

OpenAI API documentation

OpenAI API usage guide

🚀 MCP Evaluation Tool

A Node.js package and GitHub Action for evaluating MCP (Model Context Protocol) tool implementations using LLM-based scoring. This helps ensure that your MCP server tools run correctly and efficiently.

🚀 Quick Start

This tool can be used as a Node.js package or a GitHub Action to evaluate MCP tool implementations.

✨ Features

Evaluate MCP tool implementations using LLM-based scoring.
Provide detailed evaluation results including accuracy, completeness, relevance, clarity, and reasoning.
Can be integrated into your development workflow as a Node.js package or a GitHub Action.

📦 Installation

As a Node.js Package

npm install mcp-evals

As a GitHub Action

Add the following to your workflow file:

name: Run MCP Evaluation
on:
  pull_request:
    types: [opened, synchronize, reopened]
jobs:
  evaluate:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      pull-requests: write
    steps:
      - uses: actions/checkout@v4
      
      - name: Set up Node.js
        uses: actions/setup-node@v4
        with:
          node-version: '20'
          
      - name: Install dependencies
        run: npm install
        
      - name: Run MCP Evaluation
        uses: mclenhard/mcp-evals@v1.0.9
        with:
          evals_path: 'src/evals/evals.ts'
          server_path: 'src/index.ts'
          openai_api_key: ${{ secrets.OPENAI_API_KEY }}
          model: 'gpt-4'  # Optional, default is gpt-4

💻 Usage Examples

Basic Usage

1. Create Your Evaluation File

Create a file (e.g., evals.ts) and export your evaluation configuration:

import { EvalConfig } from 'mcp-evals';
import { openai } from "@ai-sdk/openai";
import { grade, EvalFunction} from "mcp-evals";

const weatherEval: EvalFunction = {
    name: 'Weather Tool Evaluation',
    description: 'Evaluates the accuracy and completeness of weather information retrieval',
    run: async () => {
      const result = await grade(openai("gpt-4"), "What is the weather in New York?");
      return JSON.parse(result);
    }
};
const config: EvalConfig = {
    model: openai("gpt-4"),
    evals: [weatherEval]
  };
  
  export default config;
  
  export const evals = [
    weatherEval,
    // Add other evaluations here
];

2. Run the Evaluation

As a Node.js Package

You can run the evaluation using the CLI:

npx mcp-eval path/to/your/evals.ts path/to/your/server.ts

As a GitHub Action

The action will automatically perform the following steps:

Run your evaluations.
Post a result comment on the PR.
Update the comment if the PR is updated.

Advanced Usage

The evaluation results are returned in a structured object. Here is the structure of the evaluation result:

interface EvalResult {
  accuracy: number;        // Score ranges from 1-5
  completeness: number;    // Score ranges from 1-5
  relevance: number;       // Score ranges from 1-5
  clarity: number;         // Score ranges from 1-5
  reasoning: number;       // Score ranges from 1-5
  overall_comments: string; // Summary of strengths and weaknesses
}

📚 Documentation

Configuration

Environment Variables

OPENAI_API_KEY: Your OpenAI API key (required)

⚠️ Important Note

If you use this GitHub Action for open-source software development, enable data sharing in your OpenAI billing dashboard to claim 2.5 million free GPT-4o mini tokens per day, making this action effectively free to use.