Rlm

The RLM MCP Server is a large-scale context processing tool based on the Recursive Language Model pattern. It allows Claude Code to process text of over 10 million tokens through external variables, avoiding directly inputting massive content into the prompt. Through the processes of loading, chunking, sub-querying, and aggregation, it supports automatic analysis and programmatic execution, and can connect to the Claude API or the local Ollama for free inference.

Developer tools Research and data #Large-scale Context Processing #Recursive Analysis #External Variable Management #MCP Service .Python

rating : 2.5 points

downloads : 4.4K

update time : 2026-03-13

Open Site

What is the RLM MCP Server?

The RLM MCP Server is an intelligent analysis tool specifically designed to handle ultra-large-scale text. Based on the Recursive Language Model (RLM) pattern, it can process huge documents of over 10 million tokens, such as encyclopedias, large code repositories, and log files. Different from traditional methods, RLM stores large text as an external variable and analyzes it through intelligent chunking, parallel querying, and result aggregation, greatly improving processing efficiency and accuracy.

How to use the RLM MCP Server?

Users do not need to directly call the RLM tool. Simply submit a request to Claude to analyze a large file (e.g., 'Analyze this 2MB log file'), and Claude will automatically use the RLM tool in the background for processing and finally return the analysis results. The entire process is transparent to the user, and the operation is simple and intuitive.

Use Cases

RLM is particularly suitable for the following scenarios: • Analyze large log files to find error patterns • Process large reference documents such as encyclopedias • Audit security vulnerabilities in large code repositories • Analyze research papers or technical documents • Process large-scale data exported from databases • Any text analysis task that exceeds the context limit of conventional LLMs

Main Features

Intelligent Chunking Processing

Automatically detect the text type (code, log, document, etc.) and select the optimal chunking strategy (by line, character, or paragraph) to ensure that each chunk is within the processing capacity of the LLM.

Parallel Sub-query

Support parallel analysis of multiple chunks simultaneously, significantly improving processing speed, especially suitable for handling ultra-large-scale documents.

Recursive Analysis Capability

Support multi-layer recursive analysis. Sub-LLMs can continue to use the RLM tool for deeper analysis, suitable for complex document structures.

Multi-model Support

Support Claude Haiku (cloud) and Ollama local models. Users can choose paid or free inference services according to their needs.

Automatic Analysis Mode

Provide the rlm_auto_analyze tool to automatically complete type detection, chunking, querying, and result aggregation, simplifying the operation process.

Python Code Execution

Support the execution of Python code in a sandbox environment for deterministic analysis (e.g., regular matching, data extraction), and provide more accurate results in combination with AI inference.

Advantages

Powerful processing capability: Can handle ultra-large documents of over 10 million tokens, far exceeding the limit of conventional LLMs

Cost-effective: Use lightweight models (such as Haiku) to process chunks, which is more economical than using large models to process the entire document

Flexible configuration: Support cloud and local inference. Users can choose according to their needs

High degree of automation: Users only need to submit a request, and Claude will automatically call the RLM tool for processing

Accurate results: Avoid information loss through chunk analysis and result aggregation

Recursive analysis: Support multi-layer analysis, suitable for complex document structures

Limitations

Learning curve: Need to understand the basic concepts of RLM to fully utilize all functions

Complex configuration: Additional steps are required for local Ollama configuration

Recursive cost: Using cloud models for deep recursion may increase costs

Context management: Need to manually manage the loaded context to avoid excessive memory usage

Dependence on Claude: Currently mainly integrated with Claude Code, with limited support in other environments

How to Use

Install RLM

Clone the RLM repository and install the dependency packages

Configure Claude Code

Add RLM to the MCP server configuration of Claude Code

Enable Automatic Detection

Copy the configuration file and hooks to let Claude automatically recognize when to use RLM

Start Using

Directly submit a request to analyze a large file in Claude Code

Usage Examples

Analyze a Large Log File

Analyze a 2MB server log file to identify error patterns and frequencies

Process an Encyclopedia

Analyze an 11MB encyclopedia to extract relevant articles on a specific topic

Code Security Audit

Check the security vulnerabilities in a large code repository

Research Paper Analysis

Analyze multiple research papers to extract key findings and methodologies

Frequently Asked Questions

What is the difference between RLM and directly using Claude?

Is additional payment required to use RLM?

How to choose between cloud models and local models?

What types of files can RLM handle?

How long does it take to process an 11MB encyclopedia?

How to install and configure Ollama?

What is the difference between rlm_exec and rlm_sub_query?

Where is the data stored? Is it secure?

Related Resources

RLM GitHub Repository

The source code and latest version of the RLM MCP Server

RLM Research Paper

The original research paper on the Recursive Language Model (RLM)

Ollama Official Website

Local LLM running environment, supporting free inference

Claude Code Documentation

The official documentation for using Claude Code

Project Gutenberg

Free e-book resources, including test data such as encyclopedias

MCP Protocol Documentation

The official specification of the Model Context Protocol

🚀 RLM Hooks

RLM Hooks is a set of tools designed to intercept Read tool calls and suggest using RLM tools when files exceed configurable thresholds, helping users avoid hitting token limits when working with large files.

✨ Features

Threshold-based Suggestion: Intercepts Read tool calls and suggests RLM tools when files exceed either size or token thresholds.
Configurable Thresholds: Allows users to customize size and token thresholds via environment variables.
Non-blocking: Suggests RLM tools without blocking the read operation.
JSON Output: Provides JSON output with decision: "approve" and an optional reason message.

📦 Installation

There is no specific installation process described in the original document. If you need to use these hooks, you can follow the configuration steps in the "Configuration" section.

💻 Usage Examples

Basic Usage

The Large File Suggester Hook is implemented in the check-file-size.sh script. It is configured in the .claude/settings.json file as follows:

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Read",
        "hooks": [
          {
            "type": "command",
            "command": ".claude/hooks/check-file-size.sh"
          }
        ]
      }
    ]
  }
}

Advanced Usage

Manual Testing

Create a large test file (>25KB):

python3 -c "print('x' * 150000)" > /tmp/large_file.txt

Test the hook directly:

# Test with large file (should show suggestion)
echo '{"tool_input": {"file_path": "/tmp/large_file.txt"}}' | .claude/hooks/check-file-size.sh

# Expected output:
# {
#   "decision": "approve",
#   "reason": "This file is 146.48 KB ..."
# }

Test with small file (should NOT show suggestion):

echo '{"tool_input": {"file_path": "/Users/richard/projects/fun/rlm/README.md"}}' | .claude/hooks/check-file-size.sh

# Expected output:
# {"decision": "approve"}

Test edge cases:

# Non-existent file
echo '{"tool_input": {"file_path": "/nonexistent.txt"}}' | .claude/hooks/check-file-size.sh
# Output: {"decision": "approve"}

# No file_path
echo '{"tool_input": {}}' | .claude/hooks/check-file-size.sh
# Output: {"decision": "approve"}

Customizing Thresholds

Size threshold (RLM_SIZE_THRESHOLD in bytes):

# Use 50KB size threshold
RLM_SIZE_THRESHOLD=51200 echo '{"tool_input": {"file_path": "somefile.txt"}}' | .claude/hooks/check-file-size.sh

# Use 1MB size threshold
RLM_SIZE_THRESHOLD=1048576 echo '{"tool_input": {"file_path": "somefile.txt"}}' | .claude/hooks/check-file-size.sh

Token threshold (RLM_TOKEN_THRESHOLD in tokens):

# Use 5K token threshold (more aggressive, catches smaller files)
RLM_TOKEN_THRESHOLD=5000 echo '{"tool_input": {"file_path": "somefile.txt"}}' | .claude/hooks/check-file-size.sh

# Use 20K token threshold (closer to hard limit)
RLM_TOKEN_THRESHOLD=20000 echo '{"tool_input": {"file_path": "somefile.txt"}}' | .claude/hooks/check-file-size.sh

Both thresholds together:

# Custom size (100KB) and token (15K) thresholds
RLM_SIZE_THRESHOLD=102400 RLM_TOKEN_THRESHOLD=15000 .claude/hooks/check-file-size.sh

Integration Testing with Claude Code

Start Claude Code in the RLM project directory.
Ask Claude to read a large file (>25KB).
Observe the suggestion message before the file is read.
The read will still proceed (not blocked).

📚 Documentation

Behavior

The hook triggers when EITHER threshold is exceeded:

Size threshold: 25KB (configurable via RLM_SIZE_THRESHOLD env var).
Token threshold: 10K tokens (configurable via RLM_TOKEN_THRESHOLD env var).

Token estimation uses ~4 characters per token (conservative estimate). The 10K token threshold warns early because Claude's Read tool hard-fails at 25K tokens.

The hook suggests RLM tools but does NOT block the read. It outputs JSON with decision: "approve" and an optional reason message.

File Structure

.claude/
  hooks/
    check-file-size.sh     # Hook script
    rlm-suggest.json       # Hook metadata (optional, for documentation)
    README.md              # This file
  settings.json            # Hook configuration