Fast-Integrated MCP Tool for Web Content Extraction & Markdown Conversion 核心关键词为“MCP tool”和“web content extraction & Markdown conversion” ，表达了这是一个用于网页内容提取和Markdown转换的MCP工具；“Fast-Integrated”体现了支持快速集成这一特点，整体标题简洁明了，符合英文表达习惯且长度在要求范围内。

MCP Read Website Fast

An efficient webpage content extraction tool designed for AI agents, capable of converting webpages to concise Markdown format, with features such as quick startup, intelligent caching, and polite crawling.

Developer tools Research and data #Webpage Extraction #Markdown Conversion #Intelligent Caching #Efficient Crawling .TypeScript

rating : 2 points

downloads : 9.2K

update time : 2025-07-24

Open Site

What is read-website-fast?

read-website-fast is an MCP server that can quickly extract content from websites and convert it to clean Markdown format. It uses Mozilla Readability technology to identify the main content of webpages and converts HTML to Markdown through the Turndown library.

How to use read-website-fast?

You can use read-website-fast in multiple ways, including installing it in IDEs such as Claude Code, VS Code, and Cursor, or directly invoking it through the command line. This service supports extracting content from webpages, crawling multiple pages, and managing caches.

Applicable Scenarios

Suitable for scenarios where you need to quickly obtain webpage content for analysis, such as AI assistants, knowledge graph construction, and content summary generation. It is particularly suitable for handling large amounts of webpage data while keeping token consumption low.

Main Features

Quick Startup

Use the official MCP SDK for quick startup and optimize performance with lazy loading.

Content Extraction

Extract the main content of webpages through Mozilla Readability technology and remove irrelevant information such as ads and navigation bars.

Markdown Conversion

Use the Turndown library to convert HTML content to Markdown format, supporting the GFM standard.

Intelligent Caching

Cache URLs using SHA-256 hash values to improve the efficiency of repeated requests.

Friendly Crawler

Follow the robots.txt rules and set rate limits to avoid burdening the target website.

Concurrent Requests

Support multi-threaded requests, configure the crawling depth, and improve processing efficiency.

Stream Design

Adopt a streaming processing method to reduce memory usage, suitable for large-scale data processing.

Link Preservation

Preserve all links in the webpage for convenient subsequent knowledge graph construction.

Optional Chunking

Support chunking the extracted content for easy use in downstream tasks.

Advantages

Efficiently extract webpage content and save token consumption of AI models

Support integration with multiple IDEs for convenient use

Provide an intelligent caching mechanism to improve the speed of repeated requests

Follow web crawler specifications and respect website rules

Support multi-layer crawling to meet complex requirements

Limitations

Unable to process dynamic webpage content rendered by JavaScript

Some websites may block automated access

Requires a certain technical foundation for configuration and use

How to Use

Install the Service

Choose an appropriate installation method according to your development environment (such as Claude Code, VS Code, Cursor, etc.).

Execute the Command

Run the specified command in the terminal and enter the URL of the webpage to be extracted.

View the Results

The service will return the extracted Markdown content, which you can use directly or process further.

Usage Examples

Get News Article Content

Use read-website-fast to extract the main body of news articles from news websites for AI summary generation.

Crawl Product Information

Crawl the content of product detail pages from e-commerce websites for building a product database.

Build a Knowledge Graph

Extract text and links from multiple webpages for building a knowledge graph.

Frequently Asked Questions

Why can't some webpages have their content extracted?

How to improve the crawling speed?

Does it support HTTPS websites?

How to clear the cache?

Does it support cross-domain crawling?

Related Resources

GitHub Repository

Project source code and documentation

NPM Package Page

Details and version information of the npm package

Installation Guide

Detailed installation and usage instructions

Usage Tutorial

Tutorial on how to use read-website-fast in different IDEs

Community Support

A community for user communication and problem-solving

🚀 @just-every/mcp-read-website-fast

Fast, token-efficient web content extraction for AI agents - converts websites to clean Markdown.

🚀 Quick Start

This MCP package @just-every/mcp-read-website-fast is designed to quickly and efficiently extract web content for AI agents, converting websites into clean Markdown. It addresses the issues of slow speed and high token consumption in existing MCP web crawlers.

✨ Features

Fast startup: Utilizes the official MCP SDK with lazy loading for optimal performance.
Content extraction: Employs Mozilla Readability (same as Firefox Reader View) for content extraction.
HTML to Markdown: Converts HTML to Markdown with Turndown + GFM support.
Smart caching: Implements smart caching with SHA - 256 hashed URLs.
Polite crawling: Supports robots.txt and rate limiting for polite crawling.
Concurrent fetching: Allows configurable depth crawling with concurrent fetching.
Stream - first design: Features a stream - first design for low memory usage.
Link preservation: Preserves links for knowledge graphs.
Optional chunking: Provides optional chunking for downstream processing.

📦 Installation

Claude Code

claude mcp add read-website-fast -s user -- npx -y @just-every/mcp-read-website-fast

VS Code

code --add-mcp '{"name":"read-website-fast","command":"npx","args":["-y","@just-every/mcp-read-website-fast"]}'

Cursor

cursor://anysphere.cursor-deeplink/mcp/install?name=read-website-fast&config=eyJyZWFkLXdlYnNpdGUtZmFzdCI6eyJjb21tYW5kIjoibnB4IiwiYXJncyI6WyIteSIsIkBqdXN0LWV2ZXJ5L21jcC1yZWFkLXdlYnNpdGUtZmFzdCJdfX0=

JetBrains IDEs

Settings → Tools → AI Assistant → Model Context Protocol (MCP) → Add Choose “As JSON” and paste:

{"command":"npx","args":["-y","@just-every/mcp-read-website-fast"]}

Or, in the chat window, type /add and fill in the same JSON—both paths land the server in a single step.

Raw JSON (works in any MCP client)

{
  "mcpServers": {
    "read-website-fast": {
      "command": "npx",
      "args": ["-y", "@just-every/mcp-read-website-fast"]
    }
  }
}

Drop this into your client’s mcp.json (e.g. .vscode/mcp.json, ~/.cursor/mcp.json, or .mcp.json for Claude).

💻 Usage Examples

Available Tools

read_website_fast - Fetches a webpage and converts it to clean markdown
- Parameters:
  - url (required): The HTTP/HTTPS URL to fetch
  - depth (optional): Crawl depth (0 = single page)
  - respectRobots (optional): Whether to respect robots.txt

Available Resources

read-website-fast://status - Get cache statistics
read-website-fast://clear-cache - Clear the cache directory

Development Usage

Install

npm install
npm run build

Single page fetch

npm run dev fetch https://example.com/article

Crawl with depth

npm run dev fetch https://example.com --depth 2 --concurrency 5

Output formats

# Markdown only (default)
npm run dev fetch https://example.com

# JSON output with metadata
npm run dev fetch https://example.com --output json

# Both URL and markdown
npm run dev fetch https://example.com --output both

CLI Options

-d, --depth <number> - Crawl depth (0 = single page, default: 0)
-c, --concurrency <number> - Max concurrent requests (default: 3)
--no-robots - Ignore robots.txt
--all-origins - Allow cross-origin crawling
-u, --user-agent <string> - Custom user agent
--cache-dir <path> - Cache directory (default: .cache)
-t, --timeout <ms> - Request timeout in milliseconds (default: 30000)
-o, --output <format> - Output format: json, markdown, or both (default: markdown)

Clear cache

npm run dev clear-cache

🔧 Technical Details

Architecture

mcp/
├── src/
│   ├── crawler/        # URL fetching, queue management, robots.txt
│   ├── parser/         # DOM parsing, Readability, Turndown conversion
│   ├── cache/          # Disk-based caching with SHA-256 keys
│   ├── utils/          # Logger, chunker utilities
│   └── index.ts        # CLI entry point

Development

# Run in development mode
npm run dev fetch https://example.com

# Build for production
npm run build

# Run tests
npm test

# Type checking
npm run typecheck

# Linting
npm run lint