Massive Context MCP

An MCP server based on the recursive language model pattern, which processes ultra - large - scale contexts (over 10 million tokens) through chunking, sub - queries, and local inference, supporting automatic analysis, code execution, and security filtering.

Artificial intelligence chatbots Research and data #Large - scale context #Local inference #Chunked query #Code firewall .Python

rating : 2 points

downloads : 5.0K

update time : 2026-03-12

Open Site

What is Massive Context MCP?

Massive Context MCP is an intelligent tool specifically designed to handle extremely long text content. Traditional AI models have input length limitations and cannot analyze very long documents at once. This tool uses a 'divide and conquer' strategy: first, it divides large documents into small chunks, then conducts intelligent analysis on each chunk, and finally aggregates the results, enabling AI to handle massive content such as books, large codebases, and datasets.

How to use Massive Context MCP?

It's very simple to use: 1) Install the tool on Claude Desktop or Claude Code; 2) Load your large document; 3) Select the analysis target (such as summarization, bug finding, structure extraction, etc.); 4) The tool will automatically chunk and analyze; 5) Get the aggregated results. The entire process can be fully automated or manually controlled at each step.

Use cases

Suitable for various scenarios that require analyzing long documents: analyzing the content and themes of entire books, reviewing security vulnerabilities in large codebases, processing tens of thousands of lines of log files, analyzing legal documents or contracts, researching long academic papers, processing large datasets, etc.

Main features

One-click intelligent analysis

Just specify the analysis target (summarization, bug finding, security audit, etc.), and the tool will automatically detect the content type, select the best chunking strategy, process in parallel, and aggregate the results.

Free local inference

Supports running AI models locally through Ollama, completely free of charge for processing massive text. The tool will automatically detect and prioritize local inference.

Intelligent chunking strategy

Automatically select the chunking method based on the content type: chunk code by function/class, text by paragraph, and logs by line to ensure semantic integrity.

Parallel processing acceleration

Supports processing multiple chunks simultaneously, significantly improving the analysis speed and making full use of computing resources.

Secure code execution

Can run Python code in a sandbox to directly analyze content for deterministic tasks such as pattern matching and data extraction.

Automatic service switching

Intelligently detect available services: prioritize the free local Ollama, and automatically switch to the Claude cloud service when it's unavailable to ensure always availability.

Advantages

💰 Extremely low cost: Local inference is completely free, and the cloud cost is about $0.80 per million tokens.

🚀 Handle massive data: Can process extremely long documents with over 10 million tokens, far exceeding the limitations of ordinary AI.

⚡ Intelligent chunking: Automatically select the best chunking strategy to maintain semantic coherence.

🔒 Privacy protection: When processing locally, data does not leave the device, protecting sensitive information.

🔄 Automatic switching: Intelligently select the optimal service without manual configuration.

📊 Result aggregation: Automatically aggregate the results of chunk analysis to provide complete insights.

Limitations

🖥️ Hardware requirements: Local inference requires 16GB+ RAM and an Apple Silicon chip.

⏱️ Processing time: Analyzing massive documents takes a long time.

🔗 Context association: Chunk processing may lose cross-chunk context connections.

📱 Platform limitations: Automatic installation currently only supports macOS.

🧠 Model limitations: Using smaller models locally has limited complex reasoning ability.

How to use

Install the tool

Install through PyPI or download the .mcpb file for one-click installation on Claude Desktop.

Configure Claude

Add MCP server settings to the Claude configuration file.

Set up local inference (optional)

If you need free local processing, install and start the Ollama service.

Load the document

Load the large document to be analyzed into the tool.

Start analysis

Use the one-click analysis function and specify the analysis target.

Get the results

View the analysis results. The tool will automatically chunk and aggregate the results.

Usage examples

Analyze a long novel

Analyze the entire book 'War and Peace' (3.3MB) to extract character relationships and theme development.

Code security audit

Review a large Python codebase to find security vulnerabilities and potential bugs.

Legal document analysis

Analyze a long legal contract to extract key clauses and obligations.

Log file analysis

Process several gigabytes of server logs to find error patterns and system problems.

Frequently Asked Questions

Is this tool free?

What kind of computer configuration is required?

How large a file can it handle?

Are the analysis results accurate?

How is data privacy protected?

How to choose the analysis target (goal)?

How long does the processing take?

What file formats are supported?

Related resources

GitHub repository

Source code, issue feedback, and the latest version

PyPI package page

Python package installation and version information

MCP registry

Official MCP server registration information

Ollama official website

Local AI inference engine for free processing

Claude API documentation

API documentation for the cloud AI service

RLM paper

Technical paper on the recursive language model pattern

🚀 Massive Context MCP

Handle massive contexts (10M+ tokens) with chunking, sub - queries, and free local inference via Ollama.

🚀 Quick Start

Installation

Option 1: PyPI (Recommended)

uvx massive-context-mcp
# or
pip install massive-context-mcp

With Optional Extras:

# With Code Firewall integration (security filter for rlm_exec)
pip install massive-context-mcp[firewall]

# With Claude Agent SDK (for programmatic Claude API access)
pip install massive-context-mcp[claude]

# With all extras
pip install massive-context-mcp[firewall,claude]

Option 2: Claude Desktop One - Click Download the .mcpb from Releases and double - click to install. Option 3: From Source

git clone https://github.com/egoughnour/massive-context-mcp.git
cd massive-context-mcp
uv sync

Wire to Claude Code / Claude Desktop

Add to ~/.claude/.mcp.json (Claude Code) or claude_desktop_config.json (Claude Desktop):

{
  "mcpServers": {
    "massive-context": {
      "command": "uvx",
      "args": ["massive-context-mcp"],
      "env": {
        "RLM_DATA_DIR": "~/.rlm-data",
        "OLLAMA_URL": "http://localhost:11434"
      }
    }
  }
}

✨ Features

Core Idea

Instead of feeding massive contexts directly into the LLM:

Load context as external variable (stays out of prompt)
Inspect structure programmatically
Chunk strategically (lines, chars, or paragraphs)
Sub - query recursively on chunks
Aggregate results for final synthesis

💻 Usage Examples

Basic Pattern

# 0. (Optional) First-time setup on macOS - choose ONE method:

# Option A: Homebrew (if you have it)
rlm_system_check()
rlm_setup_ollama(install=True, start_service=True, pull_model=True)

# Option B: Direct download (no sudo, fully headless)
rlm_system_check()
rlm_setup_ollama_direct(install=True, start_service=True, pull_model=True)

# 0b. (Optional) Check if Ollama is available for free inference
rlm_ollama_status()

# 1. Load a large document
rlm_load_context(name="report", content=<large document>)

# 2. Inspect structure
rlm_inspect_context(name="report", preview_chars=500)

# 3. Chunk into manageable pieces
rlm_chunk_context(name="report", strategy="paragraphs", size=1)

# 4. Sub-query chunks in parallel (auto-uses Ollama if available)
rlm_sub_query_batch(
    query="What is the main topic? Reply in one sentence.",
    context_name="report",
    chunk_indices=[0, 1, 2, 3],
    concurrency=4
)

# 5. Store results for aggregation
rlm_store_result(name="topics", result=<response>)

# 6. Retrieve all results
rlm_get_results(name="topics")

Processing a 2MB Document

Tested with H.R.1 Bill (2MB):

# Load
rlm_load_context(name="bill", content=<2MB XML>)

# Chunk into 40 pieces (50K chars each)
rlm_chunk_context(name="bill", strategy="chars", size=50000)

# Sample 8 chunks (20%) with parallel queries
# (auto-uses Ollama if running, otherwise Claude SDK)
rlm_sub_query_batch(
    query="What topics does this section cover?",
    context_name="bill",
    chunk_indices=[0, 5, 10, 15, 20, 25, 30, 35],
    concurrency=4
)

Result: Comprehensive topic extraction at $0 cost (with Ollama) or ~$0.02 (with Claude).

Analyzing War and Peace (3.3MB)

Literary analysis of Tolstoy's epic novel from Project Gutenberg:

# Download the text
curl -o war_and_peace.txt https://www.gutenberg.org/files/2600/2600-0.txt

# Load into RLM (3.3MB, 66K lines)
rlm_load_context(name="war_and_peace", content=open("war_and_peace.txt").read())

# Chunk by lines (1000 lines per chunk = 67 chunks)
rlm_chunk_context(name="war_and_peace", strategy="lines", size=1000)

# Sample 10 chunks evenly across the book (15% coverage)
sample_indices = [0, 7, 14, 21, 28, 35, 42, 49, 56, 63]

# Extract characters from each sampled section
rlm_sub_query_batch(
    query="List major characters in this section with brief descriptions.",
    context_name="war_and_peace",
    chunk_indices=sample_indices,
    provider="claude-sdk",  # Haiku 4.5
    concurrency=8
)

Result: Complete character arc across the novel — Pierre's journey from idealist to prisoner to husband, Natásha's growth, Nikolái Rostóv's journey from soldier to landowner — all for ~$0.03.

Metric	Value
File size	3.35 MB
Lines	66,033
Chunks	67
Sampled	10 (15%)
Cost	~$0.03

📚 Documentation

Tools

Setup & Status Tools

Tool	Purpose
`rlm_system_check`	Check system requirements — verify macOS, Apple Silicon, 16GB+ RAM, Homebrew
`rlm_setup_ollama`	Install via Homebrew — managed service, auto - updates, requires Homebrew
`rlm_setup_ollama_direct`	Install via direct download — no sudo, fully headless, works on locked - down machines
`rlm_ollama_status`	Check Ollama availability — detect if free local inference is available

Analysis Tools

Tool	Purpose
`rlm_auto_analyze`	One - step analysis — auto - detects type, chunks, and queries
`rlm_load_context`	Load context as external variable
`rlm_inspect_context`	Get structure info without loading into prompt
`rlm_chunk_context`	Chunk by lines/chars/paragraphs
`rlm_get_chunk`	Retrieve specific chunk
`rlm_filter_context`	Filter with regex (keep/remove matching lines)
`rlm_exec`	Execute Python code against loaded context (sandboxed)
`rlm_sub_query`	Make sub - LLM call on chunk
`rlm_sub_query_batch`	Process multiple chunks in parallel
`rlm_store_result`	Store sub - call result for aggregation
`rlm_get_results`	Retrieve stored results
`rlm_list_contexts`	List all loaded contexts

Quick Analysis with `rlm_auto_analyze`

For most use cases, just use rlm_auto_analyze — it handles everything automatically:

rlm_auto_analyze(
    name="my_file",
    content=file_content,
    goal="find_bugs"  # or: summarize, extract_structure, security_audit, answer:<question>
)

What it does automatically:

Detects content type (Python, JSON, Markdown, logs, prose, code)
Selects optimal chunking strategy
Adapts the query for the content type
Runs parallel sub - queries
Returns aggregated results Supported goals: | Goal | Description | |------|-------------| | summarize | Summarize content purpose and key points | | find_bugs | Identify errors, issues, potential problems | | extract_structure | List functions, classes, schema, headings | | security_audit | Find vulnerabilities and security issues | | answer:<question> | Answer a custom question about the content |

Programmatic Analysis with `rlm_exec`

For deterministic pattern matching and data extraction, use rlm_exec to run Python code directly against a loaded context. This is closer to the paper's REPL approach and provides full control over analysis logic. Tool: rlm_exec Purpose: Execute arbitrary Python code against a loaded context in a sandboxed subprocess. Parameters:

code (required): Python code to execute. Set the result variable to capture output.
context_name (required): Name of a previously loaded context.
timeout (optional, default 30): Maximum execution time in seconds. Features:
Context available as read - only context variable
Pre - imported modules: re, json, collections
Subprocess isolation (won't crash the server)
Timeout enforcement
Works on any system with Python (no Docker needed) Example — Finding patterns in a loaded context:

# After loading a context
rlm_exec(
    code="""
import re
amounts = re.findall(r'\$[\d,]+', context)
result = {'count': len(amounts), 'sample': amounts[:5]}
""",
    context_name="bill"
)

Example Response:

{
  "result": {
    "count": 1247,
    "sample": ["$500", "$1,000", "$250,000", "$100,000", "$50"]
  },
  "stdout": "",
  "stderr": "",
  "return_code": 0,
  "timed_out": false
}

Example — Extracting structured data:

rlm_exec(
    code="""
import re
import json

# Find all email addresses
emails = re.findall(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b', context)

# Count by domain
from collections import Counter
domains = [e.split('@')[1] for e in emails]
domain_counts = Counter(domains)

result = {
    'total_emails': len(emails),
    'unique_domains': len(domain_counts),
    'top_domains': domain_counts.most_common(5)
}
""",
    context_name="dataset",
    timeout=60
)

When to use rlm_exec vs rlm_sub_query:

Use Case	Tool	Why
Extract all dates, IDs, amounts	`rlm_exec`	Regex is deterministic and fast
Find security vulnerabilities	`rlm_sub_query`	Requires reasoning and context
Parse JSON/XML structure	`rlm_exec`	Standard libraries work perfectly
Summarize themes or tone	`rlm_sub_query`	Natural language understanding needed
Count word frequencies	`rlm_exec`	Simple computation, no AI needed
Answer "Why did X happen?"	`rlm_sub_query`	Requires inference and reasoning
Tip: For large contexts, combine both — use `rlm_exec` to filter/extract, then `rlm_sub_query` for semantic analysis of filtered results.

Code Firewall Integration (Optional)

For enhanced security, integrate [code - firewall - mcp](https://github.com/egoughnour/code - firewall - mcp) to filter dangerous code patterns before execution:

pip install massive-context-mcp[firewall]

When installed, rlm_exec can automatically check code against a blacklist of known dangerous patterns (e.g., os.system(), eval(), subprocess with shell = True). The firewall uses structural similarity matching — normalizing code to its skeleton and comparing against blacklisted patterns via embeddings. How it works:

Code is parsed to a syntax tree and normalized (identifiers → _, strings → "S")
Normalized structure is embedded via Ollama
Similarity is checked against blacklisted patterns in ChromaDB
Code is blocked if similarity exceeds threshold (default: 0.85) Configuration (environment variables):

RLM_FIREWALL_ENABLED=true — Enable firewall checks (auto - enabled when package installed)
RLM_FIREWALL_MODE=warn|block — Warn or block on matches (default: warn) Example blocked patterns:
os.system(user_input) — Command injection
eval(untrusted_data) — Code injection
subprocess.Popen(..., shell = True) — Shell injection Use rlm_firewall_status to check firewall availability and configuration.

Providers & Auto - Detection

RLM automatically detects and uses the best available provider:

Provider	Default Model	Cost	Use Case
`auto`	(best available)	$0 or ~$0.80/1M	Default — prefers Ollama if available
`ollama`	gemma3:12b	$0	Local inference, requires Ollama
`claude - sdk`	claude - haiku - 4 - 5	~$0.80/1M input	Cloud inference, always available

How Auto - Detection Works

When you use provider="auto" (the default), RLM:

Checks if Ollama is running at OLLAMA_URL (default: http://localhost:11434)
Checks if gemma3:12b is available (or any gemma3 variant)
Uses Ollama if available, otherwise falls back to Claude SDK The status is cached for 60 seconds to avoid repeated network checks.

Check Ollama Status

Use rlm_ollama_status to see what's available:

rlm_ollama_status()

Response when Ollama is ready:

{
  "running": true,
  "models": ["gemma3:12b", "llama3:8b"],
  "default_model_available": true,
  "best_provider": "ollama",
  "recommendation": "Ollama is ready! Sub - queries will use free local inference by default."
}

Response when Ollama is not available:

{
  "running": false,
  "error": "connection_refused",
  "best_provider": "claude - sdk",
  "recommendation": "Ollama not available. Sub - queries will use Claude API. To enable free local inference, install Ollama and run: ollama serve"
}

Transparent Provider Selection

All sub - query responses include which provider was actually used:

{
  "provider": "ollama",
  "model": "gemma3:12b",
  "requested_provider": "auto",
  "response": "..."
}

Autonomous Usage

Enable Claude to use RLM tools automatically without manual invocation: 1. CLAUDE.md Integration Copy CLAUDE.md.example content to your project's CLAUDE.md (or ~/.claude/CLAUDE.md for global) to teach Claude when to reach for RLM tools automatically. 2. Hook Installation Copy the .claude/hooks/ directory to your project to auto - suggest RLM when reading files >10KB:

cp -r .claude/hooks/ /Users/your_username/your - project/.claude/hooks/

The hook provides guidance but doesn't block reads. 3. Skill Reference Copy the .claude/skills/ directory for comprehensive RLM guidance:

cp -r .claude/skills/ /Users/your_username/your - project/.claude/skills/

With these in place, Claude will autonomously detect when to use RLM instead of reading large files directly into context.

Setting Up Ollama (Free Local Inference)

RLM can automatically install and configure Ollama on macOS with Apple Silicon. There are two installation methods with different trade - offs:

Choosing an Installation Method

Aspect	`rlm_setup_ollama` (Homebrew)	`rlm_setup_ollama_direct` (Direct Download)
Sudo required	Only if Homebrew not installed	❌ Never
Homebrew required	✅ Yes	❌ No
Auto - updates	✅ Yes (`brew upgrade`)	❌ Manual
Service management	✅ `brew services` (launchd)	⚠️ `ollama serve` (foreground)
Install location	`/opt/homebrew/`	`~/Applications/`
Locked - down machines	⚠️ May fail	✅ Works
Fully headless	⚠️ May prompt for sudo	✅ Yes
Recommendation:

Use Homebrew method if you have Homebrew and want managed updates
Use Direct Download for automation, locked - down machines, or when you don't have admin access

Method 1: Homebrew Installation (Recommended if you have Homebrew)

# 1. Check if your system meets requirements
rlm_system_check()

# 2. Install via Homebrew
rlm_setup_ollama(install=True, start_service=True, pull_model=True)

What this does:

Installs Ollama via Homebrew (brew install ollama)
Starts Ollama as a managed background service (brew services start ollama)
Pulls gemma3:12b model (~8GB download) Requirements:
macOS with Apple Silicon (M1/M2/M3/M4)
16GB+ RAM (gemma3:12b needs ~8GB to run)
Homebrew installed

Method 2: Direct Download (Fully Headless, No Sudo)

# 1. Check system (Homebrew NOT required for this method)
rlm_system_check()

# 2. Install via direct download - no sudo, no Homebrew
rlm_setup_ollama_direct(install=True, start_service=True, pull_model=True)

What this does:

Downloads Ollama from https://ollama.com/download/Ollama - darwin.zip
Extracts to ~/Applications/Ollama.app (user directory, no admin needed)
Starts Ollama via ollama serve (background process)
Pulls gemma3:12b model Requirements:
macOS with Apple Silicon (M1/M2/M3/M4)
16GB+ RAM
No special permissions needed! Note on PATH: After direct installation, the CLI is at:

~/Applications/Ollama.app/Contents/Resources/ollama

Add to your shell config if needed:

export PATH="$HOME/Applications/Ollama.app/Contents/Resources:$PATH"

For Systems with Less RAM

Use a smaller model on either installation method:

rlm_setup_ollama(install=True, start_service=True, pull_model=True, model="gemma3:4b")
# or
rlm_setup_ollama_direct(install=True, start_service=True, pull_model=True, model="gemma3:4b")

Manual Setup

If you prefer manual installation or are on a different platform:

Install Ollama from https://ollama.ai or via Homebrew:
```
brew install ollama
```

Start the service:

brew services start ollama
# or: ollama serve

Pull the model:
```
ollama pull gemma3:12b
```
Verify it's working:
```
rlm_ollama_status()
```

Provider Selection

RLM automatically uses Ollama when available. You can also force a specific provider:

# Auto-detection (default) - uses Ollama if available
rlm_sub_query(query="Summarize", context_name="doc")

# Explicitly use Ollama
rlm_sub_query(query="Summarize", context_name="doc", provider="ollama")

# Explicitly use Claude SDK
rlm_sub_query(query="Summarize", context_name="doc", provider="claude-sdk")

Data Storage

graph TD
    A[("$RLM_DATA_DIR")] --> B["📁 contexts/"]
    A --> C["📁 chunks/"]
    A --> D["📁 results/"]

    B --> B1[".txt files"]
    B --> B2[".meta.json"]
    C --> C1["by context name"]
    D --> D1[".jsonl files"]

    style A fill:#339af0,color:#fff
    style B fill:#51cf66,color:#fff
    style C fill:#51cf66,color:#fff
    style D fill:#51cf66,color:#fff

Contexts persist across sessions. Chunked contexts are cached for reuse.

Learning Prompts

Use these prompts with Claude Code to explore the codebase and learn RLM patterns. The code is the single source of truth.

Understanding the Tools

Read src/rlm_mcp_server.py and list all RLM tools with their parameters and purpose.

Explain the chunking strategies available in rlm_chunk_context.
When would I use each one?

What's the difference between rlm_sub_query and rlm_sub_query_batch?
Show me the implementation.

Understanding the Architecture

Read src/rlm_mcp_server.py and explain how contexts are stored and persisted.
Where does the data live?

How does the claude-sdk provider extract text from responses?
Walk me through _call_claude_sdk.

What happens when I call rlm_load_context? Trace the full flow.

Hands - On Learning

Load the README as a context, chunk it by paragraphs,
and run a sub-query on the first chunk to summarize it.

Show me how to process a large file in parallel using rlm_sub_query_batch.
Use a real example.

I have a 1MB log file. Walk me through the RLM pattern to extract all errors.

Extending RLM

Read the test file and explain what scenarios are covered.
What edge cases should I be aware of?

How would I add a new chunking strategy (e.g., by regex delimiter)?
Show me where to modify the code.

How would I add a new provider (e.g., OpenAI)?
What functions need to change?

📄 License

MIT

rlm_system_check

Check if the system meets the requirements for Ollama and gemma3:12b. Verify: macOS, Apple Silicon (M1/M2/M3/M4), 16GB+ RAM, Homebrew installation. Use before attempting Ollama setup.

rlm_setup_ollama

Install Ollama via Homebrew (macOS). Requires pre - installed Homebrew. Uses 'brew install' and 'brew services'. Advantages: automatic updates, pre - built binaries, managed service. Disadvantages: requires Homebrew, may need sudo for the first Homebrew installation.

Parameters

install : bool*

Description

Install Ollama via Homebrew (requires Homebrew)

Parameters

start_service : bool*

Description

Start Ollama as a background service via brew services

Parameters

pull_model : bool*

Description

Pull the default model (gemma3:12b)

Parameters

model : str*

Description

Model to pull (default: gemma3:12b). For systems with lower RAM, use gemma3:4b or gemma3:1b.

rlm_setup_ollama_direct

Install Ollama by direct download (macOS). Download from ollama.com to ~/Applications. Advantages: no Homebrew required, no sudo required, completely headless, suitable for locked/managed machines. Disadvantages: manual PATH setup, no automatic updates, service runs as a foreground process.

Parameters

install : bool*

Description

Download and install Ollama to ~/Applications (no sudo required)

Parameters

start_service : bool*

Description

Start the Ollama server in the background (ollama serve)

Parameters

pull_model : bool*

Description

Pull the default model (gemma3:12b)

Parameters

model : str*

Description

Model to pull (default: gemma3:12b). For systems with lower RAM, use gemma3:4b or gemma3:1b.

rlm_ollama_status

Check the status of the Ollama server and available models. Returns whether Ollama is running, the list of available models, and whether the default model (gemma3:12b) is available. Used to determine if free local inference is available.

Parameters

force_refresh : bool*

Description

Force refresh the cached status (default: false)

rlm_load_context

Load a large context as an external variable. Returns metadata, not the content itself.

Parameters

name : str*

Description

Identifier for this context

Parameters

content : str*

Description

Full context content

rlm_inspect_context

Inspect the loaded context - get structural information without loading the full content into the prompt.

Parameters

name : str*

Description

Context identifier

Parameters

preview_chars : int*

Description

Number of preview characters (default 500)

rlm_chunk_context

Chunk the loaded context according to a strategy. Returns chunk metadata, not the full content.

Parameters

name : str*

Description

Context identifier

Parameters

strategy : str*

Description

Chunking strategy - 'lines', 'chars' or 'paragraphs'

Parameters

size : int*

Description

Chunk size (number of lines/characters according to the strategy)

rlm_get_chunk

Get a specific chunk by index. Used to retrieve a single fragment after chunking.

Parameters

name : str*

Description

Context identifier

Parameters

chunk_index : int*

Description

Index of the chunk to retrieve

rlm_filter_context

Filter the context using regular expressions/string operations. Create a new filtered context.

Parameters

name : str*

Description

Source context identifier

Parameters

output_name : str*

Description

Name of the filtered context

Parameters

pattern : str*

Description

Matching regular expression pattern

Parameters

mode : str*

Description

'keep' or 'remove' matching lines

rlm_sub_query

Perform sub - LLM calls on the chunked or filtered context. Core of the recursive mode.

Parameters

query : str*

Description

Question/instruction for the sub - call

Parameters

context_name : str*

Description

Context identifier to query

Parameters

chunk_index : int*

Description

Optional: specific chunk index

Parameters

provider : str*

Description

LLM provider - 'auto', 'ollama' or 'claude - sdk'. 'auto' prioritizes Ollama (if available, free local inference)

Parameters

model : str*

Description

Model to use (apply provider - specific default)

rlm_store_result

Store sub - call results for subsequent aggregation.

Parameters

name : str*

Description

Result set identifier

Parameters

result : str*

Description

Result content to store

Parameters

metadata : dict*

Description

Optional metadata about this result

rlm_get_results

Retrieve stored results for aggregation.

Parameters

name : str*

Description

Result set identifier

rlm_list_contexts

List all loaded contexts and their metadata.

rlm_sub_query_batch

Process multiple chunks in parallel. Comply with concurrency limits to manage system resources.

Parameters

query : str*

Description

Question/instruction for each sub - call

Parameters

context_name : str*

Description

Context identifier

Parameters

chunk_indices : list[int]*

Description

List of chunk indices to process

Parameters

provider : str*

Description

LLM provider - 'auto', 'ollama' or 'claude - sdk'

Parameters

model : str*

Description

Model to use (apply provider - specific default)

Parameters

concurrency : int*

Description

Maximum number of parallel requests (default 4, maximum 8)

rlm_auto_analyze

Automatically detect the content type and use the optimal chunking strategy for analysis. One - step analysis for common tasks.

Parameters

name : str*

Description

Context identifier

Parameters

content : str*

Description

Content to analyze

Parameters

goal : str*

Description

Analysis goal: 'summarize', 'find_bugs', 'extract_structure', 'security_audit' or 'answer:<your question>'

Parameters

provider : str*

Description

LLM provider - 'auto' prioritizes Ollama (if available)

Parameters

concurrency : int*

Description

Maximum number of parallel requests (default 4, maximum 8)

rlm_firewall_status

Check the status of the code execution firewall. Returns information on whether the firewall is enabled, the Ollama endpoint used, and whether it will block dangerous code patterns. The firewall is automatically enabled when code - firewall - mcp is installed: pip install massive - context - mcp[firewall].

rlm_exec

Execute Python code in a sandbox subprocess against the loaded context. Set the result variable for output.

Parameters

code : str*

Description

Python code to execute. The user sets the result variable for output.

Parameters

context_name : str*

Description

Name of the previously loaded context

Parameters

timeout : int*