Haiku RAG MCP Server: Integrates LanceDB for Multi-functional Intelligent Retrieval and Generation

Haiku.rag

Haiku RAG is an intelligent retrieval - augmented generation system built on LanceDB, Pydantic AI, and Docling. It supports hybrid search, re - ranking, Q&A agents, multi - agent research processes, and provides local - first document processing and MCP server integration.

Search tools Knowledge management and memory #Intelligent retrieval #Document Q&A #Local RAG #Multi - agent .Python

rating : 5 points

downloads : 5.7K

update time : 2025-12-29

Open Site

What is Haiku RAG?

Haiku RAG is an advanced document intelligent processing system that combines document retrieval, vector search, and artificial intelligence Q&A functions. You can add various documents (such as PDFs and web page content) to the system, and then obtain information from the documents by asking questions in natural language. The system will automatically find relevant content fragments and generate answers with references.

How to use Haiku RAG?

Using Haiku RAG is very simple: First, add your documents to the system, and then you can obtain information by searching for keywords or asking direct questions. The system supports multiple usage methods, including command-line tools, Python programming interfaces, and integration into AI assistants (such as Claude Desktop) for use as a tool.

Use cases

Haiku RAG is particularly suitable for the following scenarios: academic research (quickly find information in papers), enterprise knowledge base management (retrieve internal documents), legal document analysis, technical document query, and any scenario that requires quickly extracting information from a large number of documents.

Main features

Hybrid search

Simultaneously use vector search and full-text search technologies, combining the advantages of both methods to provide more accurate search results.

Intelligent Q&A

Not only can it search for keywords, but it can also understand questions and generate complete answers with references (page numbers, chapter titles).

Research assistant

Multi-step research process: planning, searching, evaluating, and synthesizing to help handle complex research questions.

Document structure awareness

Understand the complete structure of the document (titles, paragraphs, tables, etc.) and provide more accurate context information.

Visual grounding

Highlight the found content fragments on the original page image to visually display the information source.

Time travel

Query the state of the database at any historical time point, supporting version control and historical analysis.

Multi-service provider support

Supports multiple AI services and embedding models such as OpenAI, Ollama, and VoyageAI.

Local-first

It can run without a server, and all data is stored locally. Cloud storage options are also supported.

AI assistant integration

It can be integrated into AI assistants such as Claude Desktop as a tool for direct use in conversations.

File monitoring

Monitor directory changes and automatically index newly added or modified documents.

Advantages

Ready to use: Easy to install, user-friendly configuration, and quick to get started

Comprehensive functions: Covers everything from basic search to complex research analysis

Flexible deployment: Supports local operation and cloud services to meet different needs

Intelligent and efficient: AI-driven search and Q&A, saving manual search time

Accurate references: Provides precise page numbers and chapter references for easy verification

Highly scalable: Supports multiple document formats and AI models

Limitations

Technical requirements: Requires Python 3.12 or a later version

Hardware requirements: Sufficient memory is required when processing a large number of documents or using large models

Learning curve: Advanced functions (such as the research assistant) require some time to get familiar with

Model dependency: Some functions depend on the availability of external AI services

Document format: Support for non-standard format documents may be limited

How to use

Install Haiku RAG

Use the uv package manager to install the full version or the lightweight version. The full version includes all functions, and the lightweight version allows you to install components as needed.

Add documents

Add your PDFs, web pages, or other documents to the system. The system will automatically process the document content and build an index.

Search for content

Use keywords to search for relevant content in the documents. The system will return the most matching fragments.

Ask questions to get answers

Ask questions directly, and the system will search for relevant information from the documents and generate complete answers.

Use the research assistant

For complex questions, use the research assistant for multi-step analysis and synthesis.

Usage examples

Academic paper research

Researchers need to quickly understand the core content and method details of a long paper.

Technical document query

Developers need to find the usage of specific functions from multiple API documents.

Legal document analysis

Lawyers need to compare the changes in contract terms in different versions.

Enterprise knowledge base management

New employees need to quickly understand the company's policies and procedures.

Frequently Asked Questions

What types of documents does Haiku RAG support?

Do I need an internet connection to use it?

How to integrate it into Claude Desktop?

How much storage space is required to process a large number of documents?

Can I customize the AI models for search and Q&A?

How to ensure the accuracy of the search results?

Related resources

Official documentation

Complete installation, configuration, and usage guide

GitHub repository

Source code and issue tracking

Example projects

Contains practical examples such as Docker deployment and the research assistant

Pydantic AI

Documentation of the underlying AI framework

LanceDB

Technical documentation of the vector database

🚀 Haiku RAG

Agentic RAG built on LanceDB, Pydantic AI, and Docling. It offers a comprehensive solution for research and question - answering tasks, leveraging multiple advanced technologies.

✨ Features

Hybrid search — Vector + full - text with Reciprocal Rank Fusion.
Reranking — Options include MxBAI, Cohere, Zero Entropy, or vLLM.
Question answering — QA agents that provide citations (page numbers, section headings).
Research agents — Multi - agent workflows via pydantic - graph: plan, search, evaluate, synthesize.
Document structure — Stores full DoclingDocument, enabling structure - aware context expansion.
Visual grounding — View chunks highlighted on original page images.
Time travel — Query the database at any historical point with --before.
Multiple providers — Embeddings: Ollama, OpenAI, VoyageAI, LM Studio, vLLM. QA/Research: any model supported by Pydantic AI.
Local - first — Embedded LanceDB, no servers required. Also supports S3, GCS, Azure, and LanceDB Cloud.
MCP server — Expose as tools for AI assistants (Claude Desktop, etc.).
File monitoring — Watch directories and auto - index on changes.
Inspector — TUI for browsing documents, chunks, and search results.
CLI & Python API — Full functionality available from the command line or code.

📦 Installation

Python 3.12 or newer required

Full Package (Recommended)

uv pip install haiku.rag

This package includes all features: document processing, all embedding providers, and rerankers.

Slim Package (Minimal Dependencies)

uv pip install haiku.rag - slim

Install only the extras you need. See the Installation documentation for available options.

🚀 Quick Start

# Index a PDF
haiku - rag add - src paper.pdf

# Search
haiku - rag search "attention mechanism"

# Ask questions with citations
haiku - rag ask "What datasets were used for evaluation?" --cite

# Deep QA — decomposes complex questions into sub - queries
haiku - rag ask "How does the proposed method compare to the baseline on MMLU?" --deep

# Research mode — iterative planning and search
haiku - rag research "What are the limitations of the approach?" --verbose

# Interactive research — human - in - the - loop with decision points
haiku - rag research "Compare the approaches discussed" --interactive

# Watch a directory for changes
haiku - rag serve --monitor

See Configuration for customization options.

💻 Usage Examples

Basic Usage

from haiku.rag.client import HaikuRAG

async with HaikuRAG("research.lancedb", create=True) as rag:
    # Index documents
    await rag.create_document_from_source("paper.pdf")
    await rag.create_document_from_source("https://arxiv.org/pdf/1706.03762")

    # Search — returns chunks with provenance
    results = await rag.search("self - attention")
    for result in results:
        print(f"{result.score:.2f} | p.{result.page_numbers} | {result.content[:100]}")

    # QA with citations
    answer, citations = await rag.ask("What is the complexity of self - attention?")
    print(answer)
    for cite in citations:
        print(f"  [{cite.chunk_id}] p.{cite.page_numbers}: {cite.content[:80]}")

For research agents and streaming with AG - UI, see the Agents docs.

📚 Documentation

Full documentation at: https://ggozad.github.io/haiku.rag/

Installation - Provider setup
Configuration - YAML configuration
CLI - Command reference
Python API - Complete API docs
Agents - QA agent and multi - agent research
Server - File monitoring, MCP, and AG - UI
MCP - Model Context Protocol integration
Inspector - Database browser TUI
Benchmarks - Performance benchmarks
Changelog - Version history

MCP Server

Use with AI assistants like Claude Desktop:

haiku - rag serve --mcp --stdio

Add to your Claude Desktop configuration:

{
  "mcpServers": {
    "haiku - rag": {
      "command": "haiku - rag",
      "args": ["serve", "--mcp", "--stdio"]
    }
  }
}

This setup provides tools for document management, search, QA, and research directly in your AI assistant.

💻 Usage Examples

More Examples

See the examples directory for working examples:

[Interactive Research Assistant](examples/ag - ui - research/) - Full - stack research assistant with Pydantic AI and AG - UI featuring human - in - the - loop approval and real - time state synchronization.
Docker Setup - Complete Docker deployment with file monitoring and MCP server.
[A2A Server](examples/a2a - server/) - Self - contained A2A protocol server package with conversational agent interface.

mcp - name: io.github.ggozad/haiku - rag