MCP Rag Server

mcp-rag-server is a service based on the Model Context Protocol (MCP) that supports Retrieval Augmented Generation (RAG) and can index documents and provide relevant context for large language models.

Knowledge management and memory Artificial intelligence chatbots #Document retrieval #Text embedding #Vector storage #MCP Protocol Local .TypeScript

rating : 2.5 points

downloads : 29

update time : 2025-04-29

What is the MCP RAG Server?

The MCP RAG Server is an intelligent document processing system that can automatically analyze the content of your documents, extract key information, and provide the most relevant context when a large language model needs it. It significantly improves the accuracy and relevance of AI conversations through advanced Retrieval Augmented Generation (RAG) technology.

How to use the MCP RAG Server?

Simply install the server and configure your document path, and the system will automatically process all documents. Then you can obtain the most relevant document fragments for any question through simple queries.

Use cases

It is particularly suitable for scenarios that require precise context support, such as knowledge base Q&A, intelligent document retrieval, and customer service system enhancement. Whether it's technical documents, product manuals, or customer information, it can effectively improve the understanding ability of AI.

Main features

Multi-format supportAutomatically process various document formats such as .txt, .md, .json, .jsonl, and .csv without additional conversion.

Smart chunkingConfigurable text chunk size to ensure information integrity while optimizing retrieval efficiency.

Local vector storageUse SQLite to store document vectors. The data is completely under your control and does not rely on cloud services.

Multi-embedding model supportCompatible with various embedding models such as OpenAI, Ollama, Granite, and Nomic to flexibly meet different needs.

Advantages and limitations

Advantages

Fully local operation to ensure data privacy and security

Lightweight design with low resource consumption

Simple installation and configuration process

Seamless integration with mainstream large language models

Limitations

Indexing a large number of documents for the first time may take a long time

Requires running Ollama or other embedding model services locally

Currently does not support real-time document update monitoring

How to use

Install the server

Install globally via npm or run directly using npx

Configure environment variables

Set necessary environment variables, such as the embedding model API address and vector storage path

Index documents

Specify the directory path containing documents to start the indexing process

Query documents

Obtain the most relevant document fragments for your question through queries

Usage examples

Technical document Q&ACreate an intelligent document assistant for the development team to quickly answer API usage questions

Product knowledge baseBuild a product knowledge base to help customer service staff quickly find product information

Frequently Asked Questions

How to check the document indexing progress?

Which embedding model is recommended?

Will the indexed documents be stored in the cloud?

Related resources

MCP Protocol Documentation

Learn more about the Model Context Protocol specification

Ollama Official Website

Get the recommended embedding model

LangChain Documentation

Understand the underlying vector storage technology used

🚀 mcp-rag-server

A server based on the Model Context Protocol (MCP) that supports Retrieval Augmented Generation (RAG). It can index documents and provide relevant information to large language models through the MCP protocol.

🚀 Quick Start

Step 1: Install Dependencies

npm install

Step 2: Start the Server

npm start

Step 3: Interact with Example Commands

node example.js

✨ Features

Document Indexing: Reads files, splits the text according to CHUNK_SIZE, and queues them for embedding processing.
Embedding Processing: Sends each chunk to the embedding API sequentially, and stores the results in an SQLite database.
Query Function: Embeds the query content, retrieves the most relevant text chunks from the vector store, and returns them to the client.

📦 Installation

npm install mcp-rag-server

💻 Usage Examples

General MCP Client Configuration

{
  "mcpServers": {
    "rag": {
      "command": "npx",
      "args": ["-y", "mcp-rag-server"],
      "env": {
        "BASE_LLM_API": "http://localhost:11434/v1",
        "EMBEDDING_MODEL": "nomic-embed-text",
        "VECTOR_STORE_PATH": "./vector_store",
        "CHUNK_SIZE": "500"
      }
    }
  }
}

Example Interaction

# Index documents
>> tool:embedding_documents {"path":"./docs"}

# Check status
>> resource:embedding-status

<< rag://embedding/status
Current completed count: 123
Failed count: 45
Total count: 678

Usage with MCP Tools

# Index documents
>> tool:embedding_documents {"path":"./docs"}

# Query document content
>> tool:query_document {"query":"Query content", "numberOfChunks":3}

# Get URIs of all documents
>> resource:rag://documents

# Get content of a single document
>> resource:rag://document/path/to/document.txt

Usage with MCP Resources

rag://documents: Lists URIs of all documents.
rag://document/{path}: Gets the content of the document at the specified path.
rag://query-document/{numberOfChunks}/{query}: Retrieves document content related to the query by quantity.
rag://embedding/status: Checks the current indexing status (completed, failed, total).

📚 Documentation

Configuration Options

Property	Details	Example Value
`BASE_LLM_API`	The base URL of the embedding API	`"http://localhost:11434/v1"`
`EMBEDDING_MODEL`	The name of the embedding model	`"nomic-embed-text"`
`VECTOR_STORE_PATH`	The path of the vector store	`"./vector_store"`
`CHUNK_SIZE`	The character limit for each chunk	`"500"`

Development

npm install
npm run build      # Compile TypeScript code
npm start          # Start the server
npm run watch      # Monitor file changes

🤝 Contribution

Contributions are welcome! Please open an issue or submit a pull request on GitHub.

📄 License

MIT 2025 Quan Le

Featured MCP Services

Duckduckgo MCP Server

Certified

The DuckDuckGo Search MCP Server provides web search and content scraping services for LLMs such as Claude.

Markdownify is a multi-functional file conversion service that supports converting multiple formats such as PDFs, images, audio, and web page content into Markdown format.

The GitLab MCP server is a project based on the Model Context Protocol that provides a comprehensive toolset for interacting with GitLab accounts, including code review, merge request management, CI/CD configuration, and other functions.

A Python-based MCP Server that provides advanced to-do list management and content organization functions through the Notion API, enabling seamless integration between AI models and Notion.

Framelink Figma MCP Server is a server that provides access to Figma design data for AI programming tools (such as Cursor). By simplifying the Figma API response, it helps AI more accurately achieve one - click conversion from design to code.

UnityMCP is a Unity editor plugin that implements the Model Context Protocol (MCP), providing seamless integration between Unity and AI assistants, including real - time state monitoring, remote command execution, and log functions.

573

5 points

Context7

Context7 MCP is a service that provides real-time, version-specific documentation and code examples for AI programming assistants. It is directly integrated into prompts through the Model Context Protocol to solve the problem of LLMs using outdated information.

The MiniMax Model Context Protocol (MCP) is an official server that supports interaction with powerful text-to-speech, video/image generation APIs, and is suitable for various client tools such as Claude Desktop and Cursor.

Python

761

4.8 points

Zhiqi Future, Your AI Solution Think Tank

English 简体中文繁體中文にほんご