Huoshui-fetch MCP Server: Web Content Scraping & Data Extraction Tools

Huoshui Fetch

An MCP server dedicated to web content scraping and conversion, providing a toolset for obtaining, converting, and extracting data from web pages

Research and data Developer tools #Web scraping #Content conversion #Data extraction #MCP tool .Python

rating : 2 points

downloads : 5.5K

update time : 2025-12-29

Open Site

What is huoshui-fetch?

huoshui-fetch is a tool server specifically designed for web content acquisition and processing. It can help you extract useful information from any web page and convert this information into an easy-to-read and use format. Whether you need to obtain news articles, extract web page data, convert HTML content, or analyze web page structure, this tool can provide you with powerful support.

How to use huoshui-fetch?

huoshui-fetch is mainly used through AI assistant applications such as Claude Desktop. You only need to add the corresponding settings to the configuration file, and you can directly call various web processing functions in the conversation. There's no need to write code, and complex web content processing tasks can be completed through simple natural language instructions.

Applicable scenarios

huoshui-fetch is particularly suitable for the following scenarios: 1. Quickly obtain web content during research and study. 2. Save web articles in a clean Markdown format. 3. Extract links and images from web pages in batches. 4. Analyze web page structure and metadata. 5. Convert JSON data into an easy-to-read document format. 6. Obtain web content that requires login to access.

Main features

Web page acquisition tool

Supports obtaining web page content from URLs, allowing you to customize the timeout, redirect handling, and user - agent settings. It especially provides a function to obtain content with custom request headers, which is convenient for accessing web pages that require authentication.

HTML conversion tool

Converts HTML content into a clean Markdown format or extracts plain text content. Supports HTML cleaning function, automatically removing irrelevant elements such as scripts and styles.

Article content extraction

Intelligently extracts the main article content from web pages, automatically filtering out irrelevant information such as navigation bars, advertisements, and sidebars, and focusing on the core content.

Metadata extraction

Extracts metadata information such as the title, description, and Open Graph tags of web pages, helping you quickly understand the overview of web pages.

Link extraction

Extracts all links from web pages and supports filtering by domain name, type, etc., which is convenient for batch processing.

Image extraction

Extracts image information from web pages, supports filtering by size, and obtains detailed information such as image URLs and alt texts.

Structured data extraction

Extracts structured information such as JSON - LD and microdata from web pages, which is particularly suitable for processing product information, article data, etc.

JSON to Markdown

Converts JSON data into an easy - to - read Markdown format, which is convenient for viewing and analyzing structured data.

Advantages

No programming knowledge is required, and it can be used through natural language.

Supports multiple web content processing functions, providing a one - stop solution.

Intelligently extracts the core content of articles and filters out irrelevant information.

Supports custom request headers and can access web pages that require authentication.

Has a variety of output formats to meet different usage needs.

Seamlessly integrates with AI assistants such as Claude Desktop.

Limitations

Requires a Python 3.11+ runtime environment.

Manual adjustment of extraction parameters may be required for some complex web pages.

Cannot handle dynamic content that requires JavaScript rendering.

Requires an internet connection to obtain web page content.

Some websites may have anti - scraping mechanisms that impose restrictions.

How to use

Install the Python environment

Ensure that Python 3.11 or a higher version is installed on your computer. You can download and install it from the official Python website.

Install the uv tool (recommended)

uv is a fast Python package manager and installer that can simplify the installation process.

Configure Claude Desktop

Add the huoshui - fetch server configuration to the Claude Desktop configuration file.

Restart Claude Desktop

After saving the configuration file, restart the Claude Desktop application for the configuration to take effect.

Start using

In the Claude conversation, you can now directly use various web processing functions.

Usage examples

Obtain and save news articles

When you see a valuable news article online and want to save it for later reading or organization, you can use huoshui - fetch to quickly obtain and convert it to a clean format.

Research data collection

When conducting academic research or project investigations, you need to collect relevant data from multiple web pages and organize it into a unified format.

Web page data analysis

When you need to analyze the structure, link relationships, or image resources of a website, you can use the extraction tool to quickly obtain relevant information.

Technical document conversion

Convert the JSON data of API documents or technical specifications into an easy - to - read Markdown format for team members to review.

Frequently Asked Questions

Is huoshui - fetch free?

Do I need programming knowledge to use it?

Can it handle websites that require login?

How accurate is the extracted content?

Does it support Chinese web pages?

How to handle dynamically loaded content?

Are there any usage restrictions or quotas?

How to get technical support?

Related resources

GitHub repository

Obtain the source code, submit issues, and participate in development

MCP protocol documentation

Understand the technical details of the Model Context Protocol

Claude Desktop

Download and install the Claude Desktop application

Python official website

Download the Python programming language

uv tool documentation

Understand how to use the uv package manager

Release guide

Detailed release and deployment instructions

🚀 huoshui-fetch

A dedicated web content fetching and conversion MCP (Model Context Protocol) server that provides tools for fetching, converting, and extracting data from web pages.

🚀 Quick Start

huoshui-fetch is a powerful MCP server designed for web content fetching, conversion, and extraction. It offers a variety of tools to handle different web - related tasks.

✨ Features

Fetching Tools

fetch_url: Fetch content from URLs with customizable timeout, redirect handling, and user - agent.
fetch_with_headers: Fetch URLs with custom headers for authenticated requests.

Conversion Tools

html_to_markdown_tool: Convert HTML to clean Markdown format.
html_to_text_tool: Extract plain text from HTML.
clean_html_tool: Remove scripts/styles and sanitize HTML.
json_to_markdown_tool: Convert JSON data to readable Markdown.

Extraction Tools

extract_article_tool: Extract main article content using readability.
extract_links_tool: Extract all links with filtering options.
extract_metadata_tool: Extract page metadata (title, description, OG tags).
extract_images_tool: Extract images with size filtering.
extract_structured_data_tool: Extract JSON - LD and microdata.

📦 Installation

From MCP Registry (Recommended)

This server is available in the Model Context Protocol Registry. Install it using your MCP client.

mcp - name: io.github.huoshuiai42/huoshui - fetch

# Using uv (recommended)
uv sync

# Or install from GitHub
pip install git+https://github.com/yourusername/huoshui - fetch.git

💻 Usage Examples

Run with uvx (recommended for one - time use)

# From the repository
uvx --from . huoshui - fetch

# From GitHub (once published)
uvx --from git+https://github.com/yourusername/huoshui - fetch.git huoshui - fetch

Run directly

# Using uv
uv run python -m huoshui_fetch

# Or if installed
python -m huoshui_fetch

The server communicates via standard input/output, making it perfect for integration with Claude Desktop and other MCP - compatible clients.

Configuration for Claude Desktop

Add to your Claude Desktop configuration:

{
  "mcpServers": {
    "huoshui - fetch": {
      "command": "uvx",
      "args": ["--no - cache", "--from", ".", "huoshui - fetch"],
      "cwd": "/path/to/huoshui - fetch"
    }
  }
}

Or if installed from GitHub:

{
  "mcpServers": {
    "huoshui - fetch": {
      "command": "uvx",
      "args": [
        "--from",
        "git+https://github.com/yourusername/huoshui - fetch.git",
        "huoshui - fetch"
      ]
    }
  }
}

Example Usage

Once configured, you can use the tools in Claude Desktop:

// Fetch a webpage
fetch_url("https://example.com")

// Convert HTML to Markdown
html_to_markdown_tool("<h1>Hello</h1><p>World</p>")

// Extract article content
extract_article_tool(html_content, "https://example.com/article")

📚 Documentation

Requirements

Python 3.11+
Dependencies listed in pyproject.toml

Development & Publishing

This project includes comprehensive automation for building and publishing to PyPI.

Automated Publishing Workflow

# Complete automated workflow (TestPyPI + PyPI)
uv run python scripts/publish.py --include - pypi

# TestPyPI only (recommended for testing)
uv run python scripts/publish.py

# Bump version and publish
uv run python scripts/publish.py --version - bump patch --include - pypi

Individual Commands

# Version management
uv run python scripts/version_manager.py --check
uv run python scripts/version_manager.py --bump patch

# Setup PyPI credentials (first time)
uv run python scripts/credentials_setup.py

# Build package
uv run python scripts/build.py

# Run comprehensive tests
uv run python scripts/test.py

# Upload to PyPI
uv run python scripts/upload.py

Features

✅ Version Management: Automatic synchronization across all files.
✅ Quality Checks: Ruff linting and MyPy type checking.
✅ Build Automation: Clean builds with validation.
✅ Testing Suite: Comprehensive package and functionality tests.
✅ Publishing Workflow: TestPyPI → PyPI using uv publish (supports .pypirc files).
✅ Error Recovery: Built - in error handling and recovery options.