Rag-Vault: Local-First MCP Tool for AI to Access Private Docs Fast

Rag Vault

RAG Vault is a locally prioritized document retrieval augmented generation tool that provides AI assistants with quick access to private documents through the MCP protocol. It supports multiple file formats, performs indexing and searching locally, ensures data privacy, and offers hybrid search, a Web interface, and a remote server mode.

Knowledge management and memory Search tools #Local Retrieval #Document Indexing #Hybrid Search #Privacy First .TypeScript

rating : 2 points

downloads : 5.7K

update time : 2026-03-13

Open Site

What is RAG Vault?

RAG Vault is a locally prioritized document retrieval system designed specifically for AI assistants. It allows you to index private documents (such as API specifications, research papers, and internal documents) and then quickly retrieve relevant information through semantic search. All processing is done on your local machine, ensuring data privacy and security.

How to use RAG Vault?

You can start RAG Vault with just one command, and then configure your AI tools (such as Cursor, Claude Code, or Codex) to connect to it. You can upload documents, search for content, and manage your knowledge base through the AI assistant interface or the Web interface.

Use cases

RAG Vault is particularly suitable for scenarios that require handling sensitive or private documents, such as corporate internal documents, personal research materials, code library documents, API specifications, etc. It is also an ideal choice for users who do not want to upload their data to cloud services.

Main Features

Local First

All data processing is performed on your local machine, eliminating the need to upload documents to cloud servers. Content is only fetched from remote URLs when you explicitly request it.

Hybrid Search

Combines semantic search and keyword matching, enabling it to understand the intent of queries and precisely match technical terms and code snippets.

Easy Setup

You can start using it with just one npx command and a small amount of configuration, without the need to install complex dependencies such as Docker, Python, or databases.

Web Interface

Provides a full-fledged Web interface that supports drag-and-drop uploads, real-time search, document previews, and knowledge base management, eliminating the need to use the command line.

Remote Mode

Supports running as an HTTP server, allowing remote MCP clients (such as Claude.ai) to connect to your local knowledge base.

Multi-Format Support

Supports multiple document formats such as PDF, DOCX, Markdown, TXT, JSON, JSONL, NDJSON, and HTML.

Security Features

Provides production-level security features such as API key authentication, rate limiting, CORS control, and security headers.

AI Skill Packs

Optional skill pack installation to teach AI assistants how to better formulate queries, interpret results, and use the RAG Vault tool.

Advantages

Data is completely localized, ensuring privacy and security.

Free to use, with no query fees or subscription costs.

Easy to use, without the need for complex server infrastructure.

Supports offline use, no network connection is required after the model is cached.

Hybrid search provides better search result quality.

Supports integration with multiple AI tools (Cursor, Claude Code, Codex, etc.)

Limitations

Requires local storage space to store documents and vector databases.

The embedding model (approximately 90MB) needs to be downloaded on the first run.

GPU acceleration may require additional configuration on some systems.

Processing large files may be limited by local hardware performance.

How to Use

Environment Preparation

Install Node.js 20 or a higher version and choose a directory to store your documents.

Configure AI Tools

Edit the corresponding configuration file according to the AI tool you are using to add the RAG Vault server.

Restart AI Tools

After saving the configuration file, completely restart your AI tool for the configuration to take effect.

Start Using

Upload documents and perform searches through the AI assistant interface or the Web interface.

Usage Examples

Search Code Library Documents

Index all Markdown files in the project document directory into RAG Vault, and then search for specific technical questions.

Index Web Documents

Retrieve HTML content from an online API documentation website and index it into the local knowledge base.

Build a Personal Knowledge Base

Index PDF documents in a personal research paper folder into RAG Vault for academic research.

Search for Precise Technical Terms

Use the hybrid search function to find specific error codes or technical terms.

Frequently Asked Questions

Is my data really private?

Can RAG Vault be used offline?

How to enable GPU acceleration?

Can I change the embedding model?

How to back up my data?

Why is there no search result?

What file formats are supported?

Is there a file size limit?

Related Resources

GitHub Repository

The source code and latest version of RAG Vault

MCP Registry

The entry in the Model Context Protocol registry

Security Documentation

Detailed security configuration and best practices

Environment Variable Template

A complete environment variable configuration template

Model Context Protocol

The official documentation and specifications of the MCP protocol

🚀 RAG Vault

RAG Vault gives AI coding assistants fast access to your private documents such as API specs, research papers, and internal docs. Indexing and search run locally, and your data stays on your machine unless you explicitly ingest content from a remote URL.

🚀 Quick Start

RAG Vault offers a straightforward setup process. With just one command to run and minimal setup requirements, it provides privacy by default. Before adding MCP config, follow these steps:

Install Node.js 20 or newer.
Pick a documents directory and set BASE_DIR to that path.
Make sure your AI tool process can read BASE_DIR.
Restart your AI tool after editing config.

For Cursor

Add the following to ~/.cursor/mcp.json:

{
  "mcpServers": {
    "local-rag": {
      "type": "stdio",
      "command": "npx",
      "args": ["-y", "github:RobThePCGuy/rag-vault"],
      "env": {
        "BASE_DIR": "/path/to/your/documents"
      }
    }
  }
}

Replace /path/to/your/documents with your real absolute path.

For Claude Code

Add to .mcp.json in your project directory:

{
  "mcpServers": {
    "local-rag": {
      "type": "stdio",
      "command": "npx",
      "args": ["-y", "github:RobThePCGuy/rag-vault"],
      "env": {
        "BASE_DIR": "./documents",
        "DB_PATH": "./documents/.rag-db",
        "CACHE_DIR": "./.cache",
        "RAG_EMBEDDING_DEVICE": "cpu",
        "RAG_HYBRID_WEIGHT": "0.6",
        "RAG_GROUPING": "related"
      }
    }
  }
}

Or add inline via CLI:

claude mcp add local-rag --scope user --env BASE_DIR=/path/to/your/documents -- npx -y github:RobThePCGuy/rag-vault

For Codex

Add to ~/.codex/config.toml:

[mcp_servers.local-rag]
command = "npx"
args = ["-y", "github:RobThePCGuy/rag-vault"]

[mcp_servers.local-rag.env]
BASE_DIR = "/path/to/your/documents"

Install Skills (Optional)

For enhanced AI guidance on query formulation and result interpretation, install the RAG Vault skills:

# Claude Code (project-level - recommended for team projects)
npx github:RobThePCGuy/rag-vault skills install --claude-code

# Claude Code (user-level - available in all projects)
npx github:RobThePCGuy/rag-vault skills install --claude-code --global

# Codex (user-level)
npx github:RobThePCGuy/rag-vault skills install --codex

# Custom location
npx github:RobThePCGuy/rag-vault skills install --path /your/custom/path

Skills teach Claude best practices for:

Query formulation and expansion strategies
Score interpretation (< 0.3 = good match, > 0.5 = skip)
When to use ingest_file vs ingest_data
HTML ingestion and URL handling

Restart your AI tool, and start talking:

You: "Ingest api-spec.pdf"
AI:  Successfully ingested api-spec.pdf (47 chunks)

You: "How does authentication work?"
AI:  Based on section 3.2, authentication uses OAuth 2.0 with JWT tokens...

✨ Features

Solving Common Pain Points

Pain Point	RAG Vault Solution
"I don't want my docs on someone else's server"	Everything stays local by default. No background cloud calls for indexing or search.
"Semantic search misses exact code terms"	Hybrid search: meaning + exact matches like `useEffect`
"Setup requires Docker, Python, databases..."	One `npx` command plus a small MCP config block.
"Cloud APIs charge per query"	Free forever. No subscriptions.

Security Features

RAG Vault includes security features for production deployment:

API Authentication: Optional API key via RAG_API_KEY
Rate Limiting: Configurable request throttling
CORS Control: Restrict allowed origins
Security Headers: Helmet.js protection

See SECURITY.md for complete documentation.

📦 Installation

The installation process is simple. You can use the following commands based on your needs:

For MCP mode: npx github:RobThePCGuy/rag-vault
For web mode: npx github:RobThePCGuy/rag-vault web

💻 Usage Examples

Search Your Codebase Documentation

You: "Ingest all the markdown files in /docs"
AI:  Ingested 23 files (847 chunks total)

You: "What's the retry policy for failed API calls?"
AI:  According to error-handling.md, failed requests retry 3 times
     with exponential backoff: 1s, 2s, 4s...

Index Web Documentation

You: "Fetch https://docs.example.com/api and ingest the HTML"
AI:  Ingested "docs.example.com/api" (156 chunks)

You: "What rate limits apply to the /users endpoint?"
AI:  The API limits /users to 100 requests per minute per API key...

Build a Personal Knowledge Base

You: "Ingest my research papers folder"
AI:  Ingested 12 PDFs (2,341 chunks)

You: "What do recent studies say about transformer attention mechanisms?"
AI:  Based on attention-mechanisms-2024.pdf, the key finding is...

Search Exact Technical Terms

RAG Vault's hybrid search catches both meaning and exact matches:

You: "Search for ERR_CONNECTION_REFUSED"
AI:  Found 3 results mentioning ERR_CONNECTION_REFUSED:
     1. troubleshooting.md - "When you see ERR_CONNECTION_REFUSED..."
     2. network-errors.pdf - "Common causes include..."

📚 Documentation

Web Interface

RAG Vault includes a full-featured web UI for managing your documents without the command line.

Launch the Web UI

npx github:RobThePCGuy/rag-vault web

Open http://localhost:3000 in your browser.

What You Can Do

Upload documents: Drag and drop PDF, DOCX, Markdown, TXT, JSON, JSONL, and NDJSON files
Search instantly: Type queries and see results with relevance scores
Preview content: Click any result to see the full chunk in context
Manage files: View all indexed documents and delete what you do not need
Switch databases: Create and switch between multiple knowledge bases
Monitor status: See document counts, memory usage, and search mode
Export/Import settings: Back up and restore your vault configuration
Theme preferences: Switch between light, dark, or system theme
Folder browser: Navigate directories to select documents

REST API

The web server exposes a REST API for programmatic access. Set RAG_API_KEY to require authentication:

# With authentication (when RAG_API_KEY is set)
curl -X POST "http://localhost:3000/api/v1/search" \
  -H "Authorization: Bearer your-api-key" \
  -H "Content-Type: application/json" \
  -d '{"query": "authentication", "limit": 5}'

# Search documents (no auth required if RAG_API_KEY is not set)
curl -X POST "http://localhost:3000/api/v1/search" \
  -H "Content-Type: application/json" \
  -d '{"query": "authentication", "limit": 5}'

# List all files
curl "http://localhost:3000/api/v1/files"

# Upload a document
curl -X POST "http://localhost:3000/api/v1/files/upload" \
  -F "file=@spec.pdf"

# Delete a file
curl -X DELETE "http://localhost:3000/api/v1/files" \
  -H "Content-Type: application/json" \
  -d '{"filePath": "/path/to/spec.pdf"}'

# Get system status
curl "http://localhost:3000/api/v1/status"

# Health check (for load balancers)
curl "http://localhost:3000/api/v1/health"

Reader API Endpoints

For programmatic document reading and cross-document discovery:

# Get all chunks for a document (ordered by index)
curl "http://localhost:3000/api/v1/documents/chunks?filePath=/path/to/doc.pdf"

# Find related chunks for cross-document discovery
curl "http://localhost:3000/api/v1/chunks/related?filePath=/path/to/doc.pdf&chunkIndex=0&limit=5"

# Batch request for multiple chunks (efficient for UIs)
curl -X POST "http://localhost:3000/api/v1/chunks/batch-related" \
  -H "Content-Type: application/json" \
  -d '{"chunks": [{"filePath": "/path/to/doc.pdf", "chunkIndex": 0}], "limit": 3}'

Remote Mode

RAG Vault can also run as an HTTP server for remote MCP clients like Claude.ai, Claude Desktop, or any client supporting Streamable HTTP or SSE transports.

# Start remote server (default port 3001)
npx github:RobThePCGuy/rag-vault --remote

# Custom port
npx github:RobThePCGuy/rag-vault --remote --port 8080

Stdio mode is unchanged -- omit --remote and everything works as before with Cursor, Claude Code, and Codex.

Connecting from Claude Desktop

Add to your Claude Desktop config:

{
  "mcpServers": {
    "rag-vault-remote": {
      "type": "url",
      "url": "http://localhost:3001/mcp"
    }
  }
}

Or via Claude Code CLI:

claude mcp add --transport http rag-vault http://localhost:3001/mcp

Connecting from Claude.ai

For Claude.ai (Pro/Max/Team/Enterprise), add as a custom connector with URL https://your-host:3001/mcp. For local development, expose your server with a tunnel:

cloudflared tunnel --url http://localhost:3001

Set RAG_API_KEY for authentication when exposing remotely. The server supports both Streamable HTTP (/mcp) and legacy SSE (/sse) transports, plus a health check at /health.

🔧 Technical Details

How It Works

Document → Parse → Chunk by meaning → Embed locally → Store in LanceDB
                         ↓
Query → Embed → Vector search → Keyword boost → Quality filter → Results

Smart chunking: Splits by meaning, not character count. Keeps code blocks intact.
Hybrid search: Vector similarity finds related content. Keyword boost ranks exact matches higher.
Quality filtering: Groups results by relevance gaps instead of arbitrary top-K cutoffs.
Local by default: Embeddings via Transformers.js. Storage via LanceDB. Network is only needed for initial model download or if you explicitly ingest remote URLs.
MCP tools included: query_documents, ingest_file, ingest_data, delete_file, list_files, status, feedback_pin, feedback_dismiss, and feedback_stats.

Supported Formats

Format	Extension	Notes
PDF	`.pdf`	Full text extraction, header/footer filtering
Word	`.docx`	Tables, lists, formatting preserved
Markdown	`.md`	Code blocks kept intact
Text	`.txt`	Plain text
JSON	`.json`	Converted to searchable key-value text
JSONL / NDJSON	`.jsonl`, `.ndjson`	Parsed line-by-line for logs and structured records
HTML	via `ingest_data`	Auto-cleaned with Readability

Configuration

Environment Variables

Variable	Default	What it does
`BASE_DIR`	Current directory	Only files under this path can be accessed
`DB_PATH`	`./lancedb/`	Where vectors are stored
`CACHE_DIR`	`./models/`	Model cache directory
`MODEL_NAME`	`Xenova/all-MiniLM-L6-v2`	HuggingFace embedding model
`MAX_FILE_SIZE`	`104857600` (100 MB)	Maximum file size in bytes for ingestion
`RAG_EMBEDDING_DEVICE`	`auto`	Inference device: `auto`, `cpu`, `cuda`, `dml`, `webgpu`, `wasm`, `gpu`, `webnn`
`WEB_PORT`	`3000`	Port for web interface
`UPLOAD_DIR`	`./uploads/`	Temporary directory for web UI file uploads

Windows users: RAG_EMBEDDING_DEVICE=auto attempts GPU providers (DirectML) which can fail if ONNX Runtime GPU binaries are not available. If you see embedding initialization errors, set RAG_EMBEDDING_DEVICE=cpu in your MCP config for reliable operation. See the GPU acceleration FAQ for details.

One-command override (no .env edit):

# MCP mode
npx github:RobThePCGuy/rag-vault --embedding-device cpu

# Web mode
npx github:RobThePCGuy/rag-vault web --embedding-device dml

# Explicitly force auto detection
npx github:RobThePCGuy/rag-vault --gpu-auto

Search Tuning

Variable	Default	What it does
`RAG_HYBRID_WEIGHT`	`0.6`	Keyword boost strength. `0` = semantic-only, `1.0` = BM25-only, higher = stronger boost for exact keyword matches
`RAG_GROUPING`	unset	Quality filter grouping mode: `similar` = top group only, `related` = top 2 groups
`RAG_MAX_DISTANCE`	unset	Filter out results below this relevance threshold
`RAG_GROUPING_STD_MULTIPLIER`	`1.5`	Standard-deviation multiplier for detecting relevance gaps between result groups
`RAG_HYBRID_CANDIDATE_MULTIPLIER`	`2`	Multiplier for number of vector candidates to fetch before keyword reranking
`RAG_FTS_MAX_FAILURES`	`3`	Number of full-text search failures before temporarily disabling FTS
`RAG_FTS_COOLDOWN_MS`	`300000` (5 min)	Cooldown period before retrying FTS after max failures reached

Security (optional)

Variable	Default	What it does
`RAG_API_KEY`	unset	API key for authentication
`CORS_ORIGINS`	localhost	Allowed origins (comma-separated, or `*`)
`RATE_LIMIT_WINDOW_MS`	`60000`	Rate limit time window (ms)
`RATE_LIMIT_MAX_REQUESTS`	`100`	Max requests per window

Advanced

Variable	Default	What it does
`ALLOWED_SCAN_ROOTS`	Home directory	Directories allowed for database scanning
`JSON_BODY_LIMIT`	`5mb`	Max request body size
`REQUEST_TIMEOUT_MS`	`30000`	API request timeout
`REQUEST_LOGGING`	`false`	Enable request audit logging

Copy for a complete configuration template.

For code-heavy content, try:

"env": {
  "RAG_HYBRID_WEIGHT": "0.8",
  "RAG_GROUPING": "similar"
}

📄 License

MIT: free for personal and commercial use.

Acknowledgments

Built with Model Context Protocol, LanceDB, and Transformers.js.

Started as a fork of mcp-local-rag by Shinsuke Kagawa. Now it’s its own thing. Huge credit to upstream contributors for the foundation, I’ve been iterating hard from there. Local-first dev tools, all the way.

Frequently Asked Questions

Is my data really private?

For local files, yes. Indexing and search run on your machine after the embedding model downloads (~90MB). RAG Vault only uses network if you choose remote URL ingestion or need to download a model.

Does it work offline?

Yes, after the first run. The model caches locally.

What about GPU acceleration?

RAG Vault uses Transformers.js device auto-selection by default (RAG_EMBEDDING_DEVICE=auto). When GPU providers are properly configured, this can speed up embedding generation.

Important: On Windows, auto tries DirectML (dml) which requires ONNX Runtime GPU binaries. If those binaries are not installed or your GPU setup is incomplete, the server will fail to start entirely — it does not gracefully fall back to CPU. The same applies on Linux without CUDA binaries.

Recommendation: If you encounter embedding initialization errors, set RAG_EMBEDDING_DEVICE=cpu in your MCP config. CPU mode is reliable on all platforms and fast enough for most workloads (the default model is only ~90MB).

"env": {
  "RAG_EMBEDDING_DEVICE": "cpu"
}

Supported device values: auto, cpu, cuda, dml, gpu, wasm, webgpu, webnn, webnn-npu, webnn-gpu, webnn-cpu. The alias directml is also accepted and maps to dml.

Can I change the embedding model?

Yes. Set MODEL_NAME to any compatible HuggingFace model. You must delete DB_PATH and re-ingest because different models produce incompatible vectors.

Recommended upgrade: For better quality and multilingual support, use EmbeddingGemma:

"MODEL_NAME": "onnx-community/embeddinggemma-300m-ONNX"

This model is a strong option for multilingual and higher-quality retrieval use cases.

Other specialized models:

Scientific: sentence-transformers/allenai-specter
Code: jinaai/jina-embeddings-v2-base-code

How do I back up my data?

Copy the DB_PATH directory (default: ./lancedb/).

Troubleshooting

Problem	Solution
No results found	Documents must be ingested first. Run "List all ingested files" to check.
Model download failed	Check internet connection. Model is ~90MB from HuggingFace.
Embedding initialization fails	Set `RAG_EMBEDDING_DEVICE=cpu` in your MCP config. The `auto` default can fail on Windows without GPU binaries.
`Protobuf parsing failed`	Corrupted model cache. Delete `CACHE_DIR` (default: `./models/`) and restart. RAG Vault also auto-retries with an isolated recovery cache.
File too large	Default limit is 100MB. Set `MAX_FILE_SIZE` higher or split the file.
Path outside BASE_DIR	All file paths must be under `BASE_DIR`. Use absolute paths.
MCP tools not showing	Verify config syntax, restart your AI tool completely (Cmd+Q on Mac).
`mcp-publisher login github` fails with `slow_down`	Use token login instead: `mcp-publisher login github --token "$(gh auth token)"` (or pass a PAT).
401 Unauthorized	API key required. Set `RAG_API_KEY` or use correct header format.
429 Too Many Requests	Rate limited. Wait for reset or increase `RATE_LIMIT_MAX_REQUESTS`.
CORS errors	Add your origin to `CORS_ORIGINS` environment variable.

Development

git clone https://github.com/RobThePCGuy/rag-vault.git
cd rag-vault
pnpm install
pnpm --prefix web-ui install

# Install local git hooks (recommended, even for solo dev)
pnpm hooks:install

# Fast local quality gate (backend + web-ui type/lint/format, deps, unused, build, unit tests)
pnpm check:all

# Unit tests only (no model download required)
pnpm test:unit

# Integration/E2E tests (requires model download/network)
pnpm test:integration

# Build
pnpm build

# Run MCP server locally (stdio)
pnpm dev

# Run MCP server locally (remote HTTP + SSE)
pnpm dev:remote

# Run web server locally
pnpm web:dev

# Release to npm (local, guarded)
pnpm release          # patch
pnpm release:minor
pnpm release:major
pnpm release:dry

Test Tiers

pnpm test:unit: deterministic tests for local/CI quality checks, excluding model-download integration paths.
pnpm test:integration: full integration and E2E workflows, including embedding model initialization.

Use RUN_EMBEDDING_INTEGRATION=1 to explicitly opt into network/model-dependent suites.

Release Strategy

Releases are local and scripted via scripts/release-npm.sh.
Supported bumps: patch, minor, major.
The script runs dependency installs, pnpm check:all, and pnpm ui:build before touching version files.
package.json and server.json versions are updated only after checks pass, and auto-restored if any later step fails.
pnpm release:dry performs the full gate plus npm dry-run publish and always restores version files.

Project Structure

src/
├── bin/             # CLI subcommands (skills install)
├── chunker/         # Semantic text splitting
├── embedder/        # Transformers.js wrapper
├── errors/          # Error handling utilities
├── explainability/  # Keyword-based result explanations
├── flywheel/        # Feedback loop (pin/dismiss reranking)
├── parser/          # PDF, DOCX, HTML parsing
├── query/           # Advanced query syntax parser
├── server/          # MCP tool handlers + remote transport
├── utils/           # Config, file helpers, process handlers
├── vectordb/        # LanceDB + hybrid search
└── web/             # Express server + REST API

web-ui/              # React frontend (Vite + Tailwind)