Scraper MCP

An optimized MCP protocol web scraping server that significantly reduces the token usage of AI processing through server - side HTML to Markdown conversion and CSS selector filtering, providing efficient content extraction functions.

Research and data Developer tools #Web Scraping #Content Optimization #Token Saving .Python

rating : 2.5 points

downloads : 5.3K

update time : 2025-11-12

Open Site

What is Scraper MCP?

Scraper MCP is a web content extraction server specifically designed for AI applications. It can intelligently crawl web content, filter out irrelevant HTML tags, advertisements, and navigation elements, retain only the content you really need, and then provide it to AI in Markdown or plain text format.

How to use Scraper MCP?

It's very simple to use: deploy the server with one click via Docker, and then configure the MCP connection in your AI application (such as Claude Desktop). After that, AI can directly call various content extraction tools to obtain web information.

Use Cases

Suitable for various AI application scenarios that need to extract information from web pages, including: content summarization, information collection, data extraction, link analysis, document processing, etc. Especially suitable for AI assistants that need to process a large amount of web content.

Main Features

Token Optimization

Through server-side content filtering, convert the original HTML to clean text, reducing token consumption by 70 - 90%.

CSS Selector Filtering

Use CSS selectors to precisely extract specific content areas, such as article bodies, product information, etc.

Multiple Format Output

Supports four output formats: HTML, Markdown, plain text, and link extraction, meeting different needs.

Batch Processing

Supports processing multiple URLs simultaneously, and the automatic retry mechanism ensures success rate.

Smart Cache

The three-layer cache system avoids repeated requests and improves efficiency.

Real-time Monitoring

Built-in Web dashboard to view server status, request statistics, and cache status in real-time.

Advantages

Significantly reduce AI processing costs: Compared with the original HTML, it can save 70 - 90% of token usage.

Improve processing efficiency: Server-side preprocessing allows AI to only process clean content.

Easy to integrate: Supports the standard MCP protocol and seamlessly integrates with AI applications such as Claude Desktop.

Flexible configuration: Supports various configuration options such as proxy, cache, and retry.

Real-time monitoring: Built-in Web interface for convenient monitoring and management.

Limitations

Requires additional deployment: Compared with directly calling the API, it requires deploying and maintaining a server.

Depends on the network: Requires a stable network connection to crawl web content.

Limited support for dynamic content: Has limited support for complex dynamic pages rendered by JavaScript.

Configuration complexity: Advanced features require certain technical knowledge for configuration.

How to Use

Deploy the Server

Quickly deploy the Scraper MCP server using Docker.

Configure the AI Client

Add the MCP server connection in AI applications such as Claude Desktop.

Start Using

Directly use various content extraction tools in AI conversations.

Monitor the Status

Monitor the server running status through the Web interface.

Usage Examples

Blog Post Summary

Extract the main content of a blog post, ignoring irrelevant information such as navigation and advertisements.

Website Link Analysis

Analyze all internal and external links on a website.

Product Information Collection

Extract product information from an e-commerce website.

Multi-page Batch Processing

Process multiple related pages simultaneously.

Frequently Asked Questions

What's the difference between Scraper MCP and directly using the web API?

How to configure a proxy server?

Does it support pages rendered by JavaScript?

How to handle anti - scraping mechanisms?

Where is the cached data stored?

How to monitor the server status?

Related Resources

GitHub Repository

Project source code and latest updates

Docker Hub Image

Official Docker image

MCP Protocol Documentation

Official specification of the Model Context Protocol

Claude Desktop

AI desktop application supporting MCP

🚀 Scraper MCP

A context-optimized Model Context Protocol (MCP) server for efficient web scraping. It provides AI tools with pre-processed, filtered web content, reducing token usage by converting raw HTML to markdown/text and applying CSS selectors server-side, so LLMs only receive the data they actually need.

🚀 Quick Start

Option 1: Docker Run (Simplest)

Pull and run the pre-built image from Docker Hub or GitHub Container Registry:

# Using Docker Hub
docker run -d -p 8000:8000 --name scraper-mcp cotdp/scraper-mcp:latest

# OR using GitHub Container Registry
docker run -d -p 8000:8000 --name scraper-mcp ghcr.io/cotdp/scraper-mcp:latest

# View logs
docker logs -f scraper-mcp

# Stop the server
docker stop scraper-mcp && docker rm scraper-mcp

The server will be available at:

MCP Endpoint: http://localhost:8000/mcp (for AI clients)
Dashboard: http://localhost:8000/ (web interface)

Option 2: Docker Compose (Recommended for Production)

For persistent storage, custom configuration, and easier management: 1. Create a docker-compose.yml file:

services:
  scraper-mcp:
    image: cotdp/scraper-mcp:latest  # or ghcr.io/cotdp/scraper-mcp:latest
    container_name: scraper-mcp
    ports:
      - "8000:8000"
    environment:
      - TRANSPORT=streamable-http
      - HOST=0.0.0.0
      - PORT=8000
    volumes:
      - cache:/app/cache
    restart: unless-stopped

volumes:
  cache:

2. (Optional) Create a .env file for proxy or ScrapeOps configuration:

cp .env.example .env
# Edit .env with your proxy or ScrapeOps settings

3. Start the server:

# Start in detached mode
docker-compose up -d

# View logs
docker-compose logs -f scraper-mcp

# Check status
docker-compose ps

4. Stop the server:

# Stop and remove containers
docker-compose down

# Stop, remove containers, and clear cache volume
docker-compose down -v

The server will be available at:

MCP Endpoint: http://localhost:8000/mcp (for AI clients)
Dashboard: http://localhost:8000/ (web interface)

✨ Features

Context Optimization

CSS selector filtering: Extract only relevant content server-side (e.g., .article-content, #main) before sending to LLM.
Smart conversion: Transform HTML to markdown or plain text, eliminating markup noise.
Link extraction: Return structured link objects instead of raw HTML anchor tags.
Targeted scraping: Combine CSS selectors with strip_tags for precision filtering.
Token efficiency: Reduce context window usage by 70 - 90% compared to raw HTML.

Scraping Tools & Infrastructure

Multiple scraping modes: Raw HTML, markdown conversion, plain text extraction, and link extraction.
Batch operations: Process multiple URLs concurrently with automatic retry logic.
Intelligent caching: Three-tier cache system (realtime/default/static) to minimize redundant requests.
Retry & resilience: Exponential backoff with configurable retries for transient failures.
Provider architecture: Extensible design supporting multiple scraping backends.

Monitoring & Management

Real-time dashboard: Monitor server health, request statistics, cache metrics, and recent errors.
Interactive playground: Test scraping tools directly from your browser with live JSON responses.
Runtime configuration: Adjust concurrency, timeouts, retries, cache TTL, and proxy settings without restarts.
Docker support: One-command deployment with Docker Compose.
HTTP/SSE transports: Supports both Streamable HTTP and SSE MCP transports.

📦 Installation

Instant Setup with Claude Code

Pull and run the pre-built image from Docker Hub:

# Using Docker Hub
docker run -d -p 8000:8000 --name scraper-mcp cotdp/scraper-mcp:latest

# Add the MCP server to Claude Code
claude mcp add --transport http scraper http://localhost:8000/mcp --scope user

# View logs
docker logs -f scraper-mcp

# Stop the server
docker stop scraper-mcp && docker rm scraper-mcp

💻 Usage Examples

Try it out in Claude Code

> scrape https://cutler.sg/
~ scrapes the homepage, likely defaults to markdown conversion

> scrape and filter <url> elements from https://cutler.sg/sitemap.xml
~ returns about 100 urls

> scrape and filter all <title> elements from those urls
~ fetches only the titles from all ~100 urls

Token Efficiency Comparison

Without Filtering (raw HTML):

❌ 45,000 tokens for a typical blog post
   - 40,000 tokens: HTML markup, CSS, JavaScript, ads, navigation
   - 5,000 tokens: actual article content

With Scraper MCP (CSS selector + markdown):

✅ 2,500 tokens for the same content
   - 0 tokens: markup eliminated by markdown conversion
   - 0 tokens: ads/navigation filtered by CSS selector
   - 2,500 tokens: clean article text

Result: 95% token reduction, 18x more content in the same context window

Real-World Example

# ❌ Traditional approach: Send raw HTML to LLM
html = requests.get("https://blog.example.com/article").text
# Result: 45KB of HTML → ~45,000 tokens

# ✅ Scraper MCP: Server-side filtering + conversion
scrape_url_markdown(
    "https://blog.example.com/article",
    css_selector="article.main-content"  # Extract only article
)
# Result: 2.5KB of markdown → ~2,500 tokens

When to Use Each Tool

scrape_url_markdown: Articles, documentation, blog posts (best for LLM consumption).
scrape_url_text: Plain text content, minimal formatting needed.
scrape_extract_links: Navigation, link analysis, sitemap generation.
scrape_url (raw HTML): When you need to preserve exact structure or extract meta tags.

📚 Documentation

Dashboard Features

Access the monitoring dashboard at http://localhost:8000/ to monitor and manage your scraper in real-time.

Real-Time Monitoring Dashboard

Track server health, request statistics, retry metrics, and cache performance at a glance:

Server Status: Health indicator, uptime, and start time.
Request Statistics: Total requests, success rate, and failure count.
Retry Analytics: Total retries and average per request.
Cache Metrics: Entry count, size, hit rate with one-click cache clearing.
Recent Requests: Last 10 requests with timestamps, status codes, and response times.
Recent Errors: Last 10 failures with detailed error messages and attempt counts.
Auto-refreshes every 9 seconds for real-time monitoring.

Interactive API Playground

Test all scraping tools without writing code:

Test all four tools: scrape_url, scrape_url_markdown, scrape_url_text, scrape_extract_links.
Configure parameters: URL, timeout, max retries, CSS selectors.
View formatted JSON responses with syntax highlighting.
One-click copy to clipboard.
See execution time for performance testing.

Runtime Configuration

Adjust settings on-the-fly without restarting the server:

Performance Tuning: Concurrency (1 - 50), timeout, max retries.
Cache Control: Default, realtime, and static cache TTL settings.
Proxy Settings: Enable/disable with HTTP/HTTPS/NO_PROXY configuration.
Immediate Effect: Changes apply instantly without server restart.
Non-Persistent: Settings reset on restart (use .env for permanent changes).

Configuration

Environment Setup

Create a .env file in the project root to configure the server. Copy from .env.example:

cp .env.example .env

Key Configuration Options

Standard Proxy (for corporate firewalls):

HTTP_PROXY=http://proxy.example.com:8080
HTTPS_PROXY=http://proxy.example.com:8080
NO_PROXY=localhost,127.0.0.1,.local

See Proxy Configuration section for detailed setup instructions.

ScrapeOps Proxy (for JavaScript rendering, residential IPs, anti-bot):

SCRAPEOPS_API_KEY=your_api_key_here
SCRAPEOPS_RENDER_JS=true           # Enable for SPAs (default: false)
SCRAPEOPS_RESIDENTIAL=true         # Use residential proxies (default: false)
SCRAPEOPS_COUNTRY=us               # Target specific country (optional)
SCRAPEOPS_DEVICE=desktop           # Device type: desktop|mobile|tablet

See ScrapeOps Proxy Integration section for detailed setup, use cases, and cost optimization.

Server Settings (optional, defaults work for most cases):

TRANSPORT=streamable-http          # or 'sse'
HOST=0.0.0.0                       # Bind to all interfaces
PORT=8000                          # Default port
CACHE_DIR=/app/cache               # Cache directory path
ENABLE_CACHE_TOOLS=false           # Expose cache management tools

See .env.example for complete configuration reference with detailed comments.

Available Tools

1. `scrape_url`

Scrape raw HTML content from a URL. Parameters:

urls (string or list, required): Single URL or list of URLs to scrape (http:// or https://).
timeout (integer, optional): Request timeout in seconds (default: 30).
max_retries (integer, optional): Maximum retry attempts on failure (default: 3).
css_selector (string, optional): CSS selector to filter HTML elements (e.g., "meta", "img, video", ".article-content"). Returns:
url: Final URL after redirects.
content: Raw HTML content (filtered if css_selector provided).
status_code: HTTP status code.
content_type: Content-Type header value.
metadata: Additional metadata including:
- headers: Response headers.
- encoding: Content encoding.
- elapsed_ms: Request duration in milliseconds.
- attempts: Total number of attempts made.
- retries: Number of retries performed.
- css_selector_applied: CSS selector used (if provided).
- elements_matched: Number of elements matched (if css_selector provided).

2. `scrape_url_markdown`

Scrape a URL and convert the content to markdown format. Parameters:

urls (string or list, required): Single URL or list of URLs to scrape (http:// or https://).
timeout (integer, optional): Request timeout in seconds (default: 30).
max_retries (integer, optional): Maximum retry attempts on failure (default: 3).
strip_tags (array, optional): List of HTML tags to strip (e.g., ['script', 'style']).
css_selector (string, optional): CSS selector to filter HTML before conversion (e.g., ".article-content", "article p"). Returns:
Same as scrape_url but with markdown-formatted content.
metadata.page_metadata: Extracted page metadata (title, description, etc.).
metadata.attempts: Total number of attempts made.
metadata.retries: Number of retries performed.
metadata.css_selector_applied and metadata.elements_matched (if css_selector provided).

3. `scrape_url_text`

Scrape a URL and extract plain text content. Parameters:

urls (string or list, required): Single URL or list of URLs to scrape (http:// or https://).
timeout (integer, optional): Request timeout in seconds (default: 30).
max_retries (integer, optional): Maximum retry attempts on failure (default: 3).
strip_tags (array, optional): HTML tags to strip (default: script, style, meta, link, noscript).
css_selector (string, optional): CSS selector to filter HTML before text extraction (e.g., "#main-content", "article.post"). Returns:
Same as scrape_url but with plain text content.
metadata.page_metadata: Extracted page metadata.
metadata.attempts: Total number of attempts made.
metadata.retries: Number of retries performed.
metadata.css_selector_applied and metadata.elements_matched (if css_selector provided).

4. `scrape_extract_links`

Scrape a URL and extract all links. Parameters:

urls (string or list, required): Single URL or list of URLs to scrape (http:// or https://).
timeout (integer, optional): Request timeout in seconds (default: 30).
max_retries (integer, optional): Maximum retry attempts on failure (default: 3).
css_selector (string, optional): CSS selector to scope link extraction to specific sections (e.g., "nav", "article.main-content"). Returns:
url: The URL that was scraped.
links: Array of link objects with url, text, and title.
count: Total number of links found.

Local Development

Prerequisites

Python 3.12+
uv package manager

Setup

# Install dependencies
uv pip install -e ".[dev]"

# Run the server locally
python -m scraper_mcp

# Run with specific transport and port
python -m scraper_mcp streamable-http 0.0.0.0 8000

Development Commands

# Run tests
pytest

# Type checking
mypy src/

# Linting and formatting
ruff check .
ruff format .

Docker Images

Pre-Built Images (Recommended)

Multi-platform images are automatically built and published on every release: Docker Hub:

docker pull cotdp/scraper-mcp:latest

GitHub Container Registry:

docker pull ghcr.io/cotdp/scraper-mcp:latest

Available tags:

latest - Latest stable release.
0.1.0, 0.1, 0 - Semantic version tags.
main-<sha> - Latest main branch build. Supported platforms: linux/amd64 and linux/arm64 See the Quick Start section for usage instructions.

Building from Source

If you need to customize the image or build locally:

# Clone the repository
git clone https://github.com/cotdp/scraper-mcp.git
cd scraper-mcp

# Build the image
docker build -t scraper-mcp:custom .

# Run with default settings
docker run -p 8000:8000 scraper-mcp:custom

# Or use docker-compose.yml (modify image: line to use scraper-mcp:custom)
docker-compose up -d

Connecting from Claude Desktop

To use this server with Claude Desktop, add it to your MCP settings:

{
  "mcpServers": {
    "scraper": {
      "url": "http://localhost:8000/mcp"
    }
  }
}

Once connected, Claude can use all four scraping tools. You can monitor requests in real-time by opening http://localhost:8000/ in your browser to access the dashboard.

🔧 Technical Details

Project Structure

scraper-mcp/
├── src/scraper_mcp/
│   ├── __init__.py
│   ├── __main__.py
│   ├── server.py                  # Main MCP server entry point
│   ├── admin/                     # Admin API (config, stats, cache)
│   │   ├── router.py              # HTTP endpoint handlers
│   │   └── service.py             # Business logic
│   ├── dashboard/                 # Web dashboard
│   │   ├── router.py              # Dashboard routes
│   │   └── templates/
│   │       └── dashboard.html     # Monitoring UI
│   ├── tools/                     # MCP scraping tools
│   │   ├── router.py              # Tool registration
│   │   └── service.py             # Scraping implementations
│   ├── models/                    # Pydantic data models
│   │   ├── scrape.py              # Scrape request/response models
│   │   └── links.py               # Link extraction models
│   ├── providers/                 # Scraping backend providers
│   │   ├── base.py                # Abstract provider interface
│   │   └── requests_provider.py  # HTTP provider (requests library)
│   ├── core/
│   │   └── providers.py           # Provider registry and selection
│   ├── cache.py                   # Request caching (disk-based)
│   ├── cache_manager.py           # Cache lifecycle management
│   ├── metrics.py                 # Request/retry metrics tracking
│   └── utils.py                   # HTML processing utilities
├── tests/                         # Pytest test suite
│   ├── test_server.py
│   ├── test_tools.py
│   └── test_utils.py
├── .github/workflows/
│   ├── ci.yml                     # CI/CD: tests, linting
│   └── docker-publish.yml         # Docker image publishing
├── Dockerfile                     # Multi-stage production build
├── docker-compose.yml             # Local development setup
├── pyproject.toml                 # Python dependencies (uv)
├── .env.example                   # Environment configuration template
└── README.md

Architecture

The server uses a provider architecture to support multiple scraping backends:

ScraperProvider: Abstract interface for scraping implementations.
RequestsProvider: Basic HTTP scraper using the requests library.
Future providers: Can add support for Playwright, Selenium, Scrapy, etc. The provider selection is automatic based on URL patterns, making it easy to add specialized providers for different types of websites.

Retry Behavior & Error Handling

The scraper includes intelligent retry logic with exponential backoff to handle transient failures:

Retry Configuration

Default max_retries: 3 attempts.
Default timeout: 30 seconds.
Retry delay: Exponential backoff starting at 1 second.

Retry Schedule

For the default configuration (max_retries = 3):

First attempt: Immediate.
Retry 1: Wait 1 second.
Retry 2: Wait 2 seconds.
Retry 3: Wait 4 seconds. Total maximum wait time: ~7 seconds before final failure.

What Triggers Retries

The scraper automatically retries on:

Network timeouts (requests.Timeout).
Connection failures (requests.ConnectionError).
HTTP errors (4xx, 5xx status codes).

Retry Metadata

All successful responses include retry information in metadata:

{
  "attempts": 2,      // Total attempts made (1 = no retries)
  "retries": 1,       // Number of retries performed
  "elapsed_ms": 234.5 // Total request time in milliseconds
}

Customizing Retry Behavior

# Disable retries
result = await scrape_url("https://example.com", max_retries=0)

# More aggressive retries for flaky sites
result = await scrape_url("https://example.com", max_retries=5, timeout=60)

# Quick fail for time-sensitive operations
result = await scrape_url("https://example.com", max_retries=1, timeout=10)

CSS Selector Filtering

All scraping tools support optional CSS selector filtering to extract specific elements from HTML before processing. This allows you to focus on exactly the content you need.

Supported Selectors

The server uses BeautifulSoup4's .select() method (powered by Soup Sieve), supporting:

Tag selectors: meta, img, a, div.
Multiple selectors: img, video (comma-separated).
Class selectors: .article-content, .main-text.
ID selectors: #header, #main-content.
Attribute selectors: a[href], meta[property="og:image"], img[src^="https://"].
Descendant combinators: article p, div.content a.
Pseudo-classes: p:nth-of-type(3), a:not([rel]).

Usage Examples

# Extract only meta tags for SEO analysis
scrape_url("https://example.com", css_selector="meta")

# Get article content as markdown, excluding ads
scrape_url_markdown("https://blog.com/article", css_selector="article.main-content")

# Extract text from specific section
scrape_url_text("https://example.com", css_selector="#main-content")

# Get only product images
scrape_url("https://shop.com/product", css_selector="img.product-image, img[data-product]")

# Extract only navigation links
scrape_extract_links("https://example.com", css_selector="nav.primary")

# Get Open Graph meta tags
scrape_url("https://example.com", css_selector='meta[property^="og:"]')

# Combine with strip_tags for fine-grained control
scrape_url_markdown(
    "https://example.com",
    css_selector="article",  # First filter to article
    strip_tags=["script", "style"]  # Then remove scripts and styles
)

How It Works

Scrape: Fetch HTML from the URL.
Filter (if css_selector provided): Apply CSS selector to keep only matching elements.
Process: Convert to markdown/text or extract links.
Return: Include elements_matched count in metadata.

CSS Selector Benefits

Reduce noise: Extract only relevant content, ignoring ads, navigation, footers.
Scoped extraction: Get links only from specific sections (e.g., main content, not sidebar).
Efficient: Process less HTML, get cleaner results.
Composable: Works alongside strip_tags for maximum control.

Environment Variables

When running with Docker, you can configure the server using environment variables:

TRANSPORT: Transport type (streamable-http or sse, default: streamable-http).
HOST: Host to bind to (default: 0.0.0.0).
PORT: Port to bind to (default: 8000).
ENABLE_CACHE_TOOLS: Enable cache management tools (true, 1, or yes to enable, default: false).
- When enabled, exposes cache_stats, cache_clear_expired, and cache_clear_all tools.
- Disabled by default for security and simplicity.

Proxy Configuration

The scraper supports HTTP/HTTPS proxies through standard environment variables. This is useful when running behind a corporate firewall or when you need to route traffic through a specific proxy.

Using Proxies with Docker Compose

Create a .env file in the project root (see .env.example for reference):

# HTTP proxy for non-SSL requests
HTTP_PROXY=http://proxy.example.com:8080
http_proxy=http://proxy.example.com:8080

# HTTPS proxy for SSL requests
HTTPS_PROXY=http://proxy.example.com:8080
https_proxy=http://proxy.example.com:8080

# Bypass proxy for specific hosts (comma-separated)
NO_PROXY=localhost,127.0.0.1,.local
no_proxy=localhost,127.0.0.1,.local

Then start the service:

docker-compose up -d

Docker Compose automatically reads .env files and passes variables to the container at both build time (for package installation) and runtime (for HTTP requests).

Using Proxies with Docker Run

docker run -p 8000:8000 \
  -e HTTP_PROXY=http://proxy.example.com:8080 \
  -e HTTPS_PROXY=http://proxy.example.com:8080 \
  -e NO_PROXY=localhost,127.0.0.1,.local \
  scraper-mcp:latest

Proxy with Authentication

If your proxy requires authentication, include credentials in the URL:

HTTP_PROXY=http://username:password@proxy.example.com:8080
HTTPS_PROXY=http://username:password@proxy.example.com:8080

Build-Time vs Runtime Proxies

The proxy configuration works at two stages:

Build time: Used when Docker installs packages (apt, uv, pip).
Runtime: Used when the scraper makes HTTP requests. Both uppercase and lowercase variable names are supported (e.g., HTTP_PROXY and http_proxy).

Verifying Proxy Configuration

Check the container logs to verify proxy settings are being used:

docker-compose logs scraper-mcp

The requests library automatically respects these environment variables and will route all HTTP/HTTPS traffic through the configured proxy.

ScrapeOps Proxy Integration

The scraper includes optional integration with ScrapeOps, a premium proxy service that helps bypass anti-bot measures, render JavaScript, and access geo-restricted content. ScrapeOps automatically enables when an API key is provided.

What is ScrapeOps?

ScrapeOps provides:

JavaScript rendering: Scrape SPAs and dynamic content.
Residential proxies: Less likely to be blocked.
Geo-targeting: Access content from specific countries.
Anti-bot bypass: Automatic header rotation and fingerprinting.
High success rate: Smart retry and optimization.

Enabling ScrapeOps

Simply add your API key to the .env file:

# Get your API key from https://scrapeops.io/
SCRAPEOPS_API_KEY=your_api_key_here

That's it! All scraping requests will automatically route through ScrapeOps. No changes needed to your MCP tools or code.

Configuration Options

Customize ScrapeOps behavior with environment variables (see .env.example for full reference):

# Enable JavaScript rendering for SPAs (default: false)
SCRAPEOPS_RENDER_JS=true

# Use residential proxies instead of datacenter (default: false)
SCRAPEOPS_RESIDENTIAL=true

# Target specific country (optional)
SCRAPEOPS_COUNTRY=us

# Keep original headers instead of optimizing (default: false)
SCRAPEOPS_KEEP_HEADERS=true

# Device type for user agent rotation (default: desktop)
SCRAPEOPS_DEVICE=mobile

Full Example Configuration

# .env file
SCRAPEOPS_API_KEY=your_api_key_here
SCRAPEOPS_RENDER_JS=true
SCRAPEOPS_RESIDENTIAL=true
SCRAPEOPS_COUNTRY=us
SCRAPEOPS_DEVICE=desktop

📄 License

This project is licensed under the MIT License.

Last updated: October 31, 2025

Notion Api MCP

Certified

A Python-based MCP Server that provides advanced to-do list management and content organization functions through the Notion API, enabling seamless integration between AI models and Notion.

Markdownify is a multi-functional file conversion service that supports converting multiple formats such as PDFs, images, audio, and web page content into Markdown format.

The GitLab MCP server is a project based on the Model Context Protocol that provides a comprehensive toolset for interacting with GitLab accounts, including code review, merge request management, CI/CD configuration, and other functions.

TypeScript

20.2K

4.3 points

Duckduckgo MCP Server

Certified

The DuckDuckGo Search MCP Server provides web search and content scraping services for LLMs such as Claude.

Framelink Figma MCP Server is a server that provides access to Figma design data for AI programming tools (such as Cursor). By simplifying the Figma API response, it helps AI more accurately achieve one - click conversion from design to code.

UnityMCP is a Unity editor plugin that implements the Model Context Protocol (MCP), providing seamless integration between Unity and AI assistants, including real - time state monitoring, remote command execution, and log functions.

25.2K

5 points

Context7

Context7 MCP is a service that provides real-time, version-specific documentation and code examples for AI programming assistants. It is directly integrated into prompts through the Model Context Protocol to solve the problem of LLMs using outdated information.

The MiniMax Model Context Protocol (MCP) is an official server that supports interaction with powerful text-to-speech, video/image generation APIs, and is suitable for various client tools such as Claude Desktop and Cursor.

Python

38.6K

4.8 points

Zhiqi Future, Your AI Solution Think Tank

English 简体中文繁體中文にほんご

Scraper MCP

Overview

Installation

Tools List

Content Details

Alternatives

What is Scraper MCP?

How to use Scraper MCP?

Use Cases

Main Features

How to Use

Usage Examples

Frequently Asked Questions

Related Resources

Installation

🚀 Scraper MCP

🚀 Quick Start

Option 1: Docker Run (Simplest)

Option 2: Docker Compose (Recommended for Production)

✨ Features

Context Optimization

Scraping Tools & Infrastructure

Monitoring & Management

📦 Installation

Instant Setup with Claude Code

💻 Usage Examples

Try it out in Claude Code

Token Efficiency Comparison

Real-World Example

When to Use Each Tool

📚 Documentation

Dashboard Features

Real-Time Monitoring Dashboard

Interactive API Playground

Runtime Configuration

Configuration

Environment Setup

Key Configuration Options

Available Tools

1. scrape_url

2. scrape_url_markdown

3. scrape_url_text

4. scrape_extract_links

Local Development

Prerequisites

Setup

Development Commands

Docker Images

Pre-Built Images (Recommended)

Building from Source

Connecting from Claude Desktop

🔧 Technical Details

Project Structure

Architecture

Retry Behavior & Error Handling

Retry Configuration

Retry Schedule

What Triggers Retries

Retry Metadata

Customizing Retry Behavior

CSS Selector Filtering

Supported Selectors

Usage Examples

How It Works

CSS Selector Benefits

Environment Variables

Proxy Configuration

Using Proxies with Docker Compose

Using Proxies with Docker Run

Proxy with Authentication

Build-Time vs Runtime Proxies

Verifying Proxy Configuration

ScrapeOps Proxy Integration

What is ScrapeOps?

Enabling ScrapeOps

Configuration Options

Full Example Configuration

📄 License

Alternatives

1. `scrape_url`

2. `scrape_url_markdown`

3. `scrape_url_text`

4. `scrape_extract_links`