mnemo MCP Server: Extend AI Assistant Memory with Gemini Cache for Low-cost, Low-latency Info Recall

Mnemo

Mnemo is an MCP service that provides extended memory for AI assistants. Through the context caching function of Gemini, it allows assistants to load large code libraries, documents, PDFs and other materials and conduct natural - language queries, achieving perfect information recall with low cost and low latency.

Knowledge management and memory Developer tools #AI Memory Extension #Context Caching #Code Library Query #Document Loading .TypeScript

rating : 2.5 points

downloads : 7.5K

update time : 2025-12-29

Open Site

What is Mnemo?

Mnemo (Greek: Memory) is an innovative AI assistant extension tool that leverages Google Gemini AI's 1 million token context window and context caching function to provide AI assistants like Claude with the ability to access large knowledge bases. Traditional AI assistants are usually limited by the finite context length and cannot handle large code libraries or document sets. Mnemo allows AI assistants to query and understand a large amount of information without a complex retrieval system by loading the entire knowledge base into Gemini's cache at once.

How to use Mnemo?

Using Mnemo is very simple: 1. Load your code library, documents or PDFs into Gemini's cache. 2. Set a memorable alias for the cache. 3. Query the cached content through natural language. 4. AI assistants (such as Claude) will use the information in the cache to answer your questions. Mnemo supports three deployment methods: local server, self - hosted Cloudflare Worker or hosted service to meet the needs of different users.

Use cases

Mnemo is particularly suitable for the following scenarios: • Developers need AI assistants to understand the entire code library. • Researchers need to analyze a large number of documents or academic papers. • Technical support teams need to access complete product documentation. • Any scenario where AI needs to process large information sets beyond the normal context limit.

Main Features

Multi - source data loading

Supports loading data from multiple sources: GitHub repositories (public and private), any URL (documents, articles), PDF documents, JSON APIs, local file directories, and multi - page crawling.

Perfect information recall

Through Gemini's context caching, a 100% information recall rate is achieved without the need for chunking or retrieval, ensuring that AI can access all the loaded content.

Cost optimization

By leveraging Gemini's caching function, the cost of cached tokens is 75 - 90% lower than that of regular input tokens, significantly reducing the usage cost.

Flexible deployment options

Provides three deployment methods: local development server (supports all functions), self - hosted Cloudflare Worker (suitable for use with Claude.ai), hosted service (for VIP customers).

MCP protocol integration

Fully compatible with the Model Context Protocol, it can be seamlessly integrated with AI assistants such as Claude Desktop and Claude.ai.

Intelligent page crawling

Supports intelligent crawling based on token targets, automatically controlling the crawling depth and scope to ensure the most relevant content is obtained within the token limit.

Advantages

Perfect recall: AI can access all the loaded content without chunking or retrieval.

Cost - effective: The cost of cached tokens is 75 - 90% lower than that of regular input.

Low latency: Cached content can be provided quickly, resulting in a shorter response time.

Easy to use: No complex vector databases or retrieval logic is required.

Flexible deployment: Supports multiple deployment methods including local, self - hosted and hosted.

Limitations

Depends on the Gemini API: A Google Gemini API key is required.

Cache time limit: The cache has a TTL (Time - To - Live) limit, with a default of 1 hour.

Token limit: Limited by Gemini's 1 million token context window.

Cloudflare Worker limit: The self - hosted version has a 40 - page crawling limit.

Requires network connection: A stable network connection is required to load remote resources.

How to Use

Get an API key

First, you need to obtain a Google Gemini API key, which can be created in Google AI Studio.

Choose a deployment method

Select a deployment method according to your needs: local development, self - hosted Cloudflare Worker or hosted service.

Configure the AI assistant

Configure the MCP server connection in Claude Desktop or Claude.ai.

Load data

Use the context_load tool to load data into the cache.

Query data

Query the data in the cache using natural language.

Usage Examples

Code library analysis

Developers need AI assistants to help understand the code structure of a large open - source project.

Technical documentation query

Technical support engineers need to quickly find specific function descriptions in product documentation.

Academic research

Researchers need to analyze the methods and results in multiple academic papers.

API integration

Developers need to know how to integrate third - party API services.

Frequently Asked Questions

What is the difference between Mnemo and traditional RAG (Retrieval - Augmented Generation)?

How much does it cost to use Mnemo?

Can I use Mnemo in Claude.ai?

How long will the cached data be saved?

How to load a private GitHub repository?

What file formats does Mnemo support?

Do I need programming knowledge to use Mnemo?

How is data security ensured?

Related Resources

Official GitHub Repository

The source code, latest version and issue tracking of Mnemo

Gemini API Documentation

The official documentation and guide for the Google Gemini API

Model Context Protocol

The official specification and documentation of the MCP protocol

Cloudflare Workers

The deployment and configuration guide for Cloudflare Workers

Claude MCP Configuration Guide

How to configure and use the MCP server in Claude

Logos Flux Official Website

The official website of the Mnemo development team

🚀 Mnemo

Mnemo extends the memory of AI assistants through Gemini context caching. It (Greek: memory) enables AI assistants like Claude to access large codebases, documentation sites, PDFs, and more by leveraging Gemini's 1M token context window and context caching features.

🚀 Quick Start

Mnemo gives AI assistants access to various resources. You can start using it by choosing one of the deployment options below.

✨ Features

Perfect recall: No chunking or retrieval means no lost context.
Lower latency: Cached context is served quickly.
Cost savings: Cached tokens cost 75 - 90% less than regular input tokens.
Simplicity: No vector databases, embeddings, or complex retrieval logic.

📦 Installation

Mnemo can be deployed in three ways depending on your needs.

Option 1: Local Server (Development & Full Features)

Best for development and when you need to load local files.

# Clone and install
git clone https://github.com/logos-flux/mnemo
cd mnemo
bun install

# Set your Gemini API key
export GEMINI_API_KEY=your_key_here

# Start the server
bun run dev

Claude Code MCP config:

{
  "mcpServers": {
    "mnemo": {
      "type": "http",
      "url": "http://localhost:8080/mcp"
    }
  }
}

Option 2: Self-Hosted Cloudflare Worker (Recommended for Claude.ai)

Deploy to your own Cloudflare account. You control your data and costs. Prerequisites:

Cloudflare account (free tier works)
Gemini API key

# Clone and install
git clone https://github.com/logos-flux/mnemo
cd mnemo/packages/cf-worker

# Configure secrets
bunx wrangler secret put GEMINI_API_KEY
bunx wrangler secret put MNEMO_AUTH_TOKEN  # Optional but recommended

# Create D1 database
bunx wrangler d1 create mnemo-cache

# Deploy
bunx wrangler deploy

Claude.ai MCP config:

{
  "mcpServers": {
    "mnemo": {
      "type": "http",
      "url": "https://mnemo.<your-subdomain>.workers.dev/mcp",
      "headers": {
        "Authorization": "Bearer YOUR_AUTH_TOKEN"
      }
    }
  }
}

Why use this? Claude.ai can't connect to localhost. The Worker gives you an external endpoint that Claude.ai can reach.

Option 3: Managed Hosting (VIP)

Don't want to manage infrastructure? We offer fully managed Mnemo hosting for select clients. Includes:

Dedicated Worker deployment
Priority support
Custom domain
Usage monitoring Contact: lf@logosflux.io for pricing and availability.

💻 Usage Examples

Basic Usage

# Load a GitHub repo
curl -X POST http://localhost:8080/tools/context_load \
  -H "Content-Type: application/json" \
  -d '{"source": "https://github.com/honojs/hono", "alias": "hono"}'

# Load a documentation site (crawls up to token target)
curl -X POST http://localhost:8080/tools/context_load \
  -H "Content-Type: application/json" \
  -d '{"source": "https://hono.dev/docs", "alias": "hono-docs"}'

# Load a PDF
curl -X POST http://localhost:8080/tools/context_load \
  -H "Content-Type: application/json" \
  -d '{"source": "https://arxiv.org/pdf/2303.08774.pdf", "alias": "gpt4-paper"}'

# Load a private repo (with GitHub token)
curl -X POST http://localhost:8080/tools/context_load \
  -H "Content-Type: application/json" \
  -d '{"source": "https://github.com/owner/private-repo", "alias": "private", "githubToken": "ghp_xxx"}'

# Load multiple sources into one cache
curl -X POST http://localhost:8080/tools/context_load \
  -H "Content-Type: application/json" \
  -d '{"sources": ["https://github.com/owner/repo", "https://docs.example.com"], "alias": "combined"}'

# Query the cache
curl -X POST http://localhost:8080/tools/context_query \
  -H "Content-Type: application/json" \
  -d '{"alias": "hono", "query": "How do I add middleware?"}'

# List active caches
curl -X POST http://localhost:8080/tools/context_list \
  -H "Content-Type: application/json" -d '{}'

# Get usage stats with cost tracking
curl -X POST http://localhost:8080/tools/context_stats \
  -H "Content-Type: application/json" -d '{}'

# Evict when done
curl -X POST http://localhost:8080/tools/context_evict \
  -H "Content-Type: application/json" \
  -d '{"alias": "hono"}'

CLI

# Start server
mnemo serve

# Start MCP stdio transport (for Claude Desktop)
mnemo stdio

# Load a project
mnemo load ./my-project my-proj

# Query
mnemo query my-proj "What's the main entry point?"

# List caches
mnemo list

# Remove cache
mnemo evict my-proj

📚 Documentation

MCP Tools

Tool	Description
`context_load`	Load GitHub repos, URLs, PDFs, or local dirs into Gemini cache
`context_query`	Query a cached context with natural language
`context_list`	List all active caches with token counts and expiry
`context_evict`	Remove a cache
`context_stats`	Get usage statistics with cost tracking
`context_refresh`	Reload a cache with fresh content

context_load Parameters

Parameter	Description
`source`	Single source: GitHub URL, any URL, or local path
`sources`	Multiple sources to combine into one cache
`alias`	Friendly name for this cache (1 - 64 chars)
`ttl`	Time to live in seconds (60 - 86400, default 3600)
`githubToken`	GitHub token for private repos
`systemInstruction`	Custom system prompt for queries

Configuration

Variable	Description	Default
`GEMINI_API_KEY`	Your Gemini API key	Required
`MNEMO_PORT`	Server port (local only)	8080
`MNEMO_DIR`	Data directory (local only)	~/.mnemo
`MNEMO_AUTH_TOKEN`	Auth token for protected endpoints	None

Authentication

When MNEMO_AUTH_TOKEN is configured, the /mcp and /tools/* endpoints require authentication:

# Set auth token (Workers)
bunx wrangler secret put MNEMO_AUTH_TOKEN

# Requests must include header:
Authorization: Bearer your-token-here

Public endpoints (no auth required):

GET /health - Health check
GET / - Service info
GET /tools - List available tools

What Can Mnemo Load?

Source	Local Server	Worker
GitHub repos (public)	✅	✅
GitHub repos (private)	✅	✅
Any URL (docs, articles)	✅	✅
PDF documents	✅	✅
JSON APIs	✅	✅
Local files/directories	✅	❌
Multi-page crawls	✅ unlimited	✅ 40 pages max

Costs

You always pay for Gemini API usage regardless of deployment option. Mnemo uses Gemini's context caching which is significantly cheaper than standard input:

Resource	Cost
Cache storage	~$4.50 per 1M tokens per hour
Cached input	75 - 90% discount vs regular input
Regular input	~$0.075 per 1M tokens (Flash)
Example: 100K token codebase cached for 1 hour with 10 queries ≈ $0.47
Cloudflare costs (self-hosted):

Workers: Free tier includes 100K requests/day
D1: Free tier includes 5M reads/day
Likely $0 for moderate usage

Architecture

┌─────────────────────────────────────────────────────────────┐
│                         Mnemo                                │
├─────────────────────────────────────────────────────────────┤
│  MCP Tools                                                   │
│  • context_load    - Load into Gemini cache                 │
│  • context_query   - Query cached context                   │
│  • context_list    - Show active caches                     │
│  • context_evict   - Remove cache                           │
│  • context_stats   - Token usage, costs                     │
│  • context_refresh - Reload cache                           │
├─────────────────────────────────────────────────────────────┤
│  Adapters (v0.2)                                             │
│  • GitHub repos (via API)                                   │
│  • URL loading (HTML, PDF, JSON, text)                      │
│  • Token-targeted crawling                                  │
│  • robots.txt compliance                                    │
├─────────────────────────────────────────────────────────────┤
│  Packages                                                    │
│  • @mnemo/core      - Gemini client, loaders, adapters      │
│  • @mnemo/mcp-server - MCP protocol handling                │
│  • @mnemo/cf-worker - Cloudflare Workers deployment         │
│  • @mnemo/local     - Bun-based local server                │
└─────────────────────────────────────────────────────────────┘