Supadata MCP Server: Video Web Scraping & Retry Rate-limited MCP Protocol

MCP

The Supadata MCP server is an implementation of the model context protocol server that integrates Supadata's video and web page crawling capabilities. It provides functions such as video transcription extraction, web page crawling, scraping, and discovery, and supports automatic retry and rate limiting.

Research and data Developer tools #Video transcription #Web page crawling #Automatic retry #Rate limiting .TypeScript

rating : 2 points

downloads : 9.2K

update time : 2025-07-24

Open Site

Installation

Copy the following command to your Client for configuration

{
     "mcpServers": {
       "supadata-mcp": {
         "command": "npx",
         "args": ["-y", "supadata-mcp"],
         "env": {
           "SUPADATA_API_KEY": "YOUR-API-KEY"
         }
       }
     }
   }

{
  "mcpServers": {
    "supadata-mcp": {
      "command": "npx",
      "args": ["-y", "supadata-mcp"],
      "env": {
        "SUPADATA_API_KEY": "YOUR_API_KEY"
      }
    }
  }
}

{
  "mcpServers": {
    "supadata-mcp": {
      "command": "npx",
      "args": ["-y", "supadata-mcp"],
      "env": {
        "SUPADATA_API_KEY": "YOUR_API_KEY_HERE"
      }
    }
  }
}

Note: Your key is sensitive information, do not share it with anyone.

🚀 Supadata MCP Server

A Model Context Protocol (MCP) server implementation that integrates with Supadata for video & web scraping capabilities.

🚀 Quick Start

You can start exploring the Supadata MCP Server right away by playing around with it on Smithery or on MCP.so's playground.

✨ Features

Video transcript extraction from YouTube, TikTok, Twitter, and file URLs
Web scraping, crawling, and discovery
Automatic retries and rate limiting

📦 Installation

Running with npx

env SUPADATA_API_KEY=your-api-key npx -y supadata-mcp

Manual Installation

npm install -g supadata-mcp

Running on Cursor

Configuring Cursor 🖥️ Note: Requires Cursor version 0.45.6+ For the most up-to-date configuration instructions, please refer to the official Cursor documentation on configuring MCP servers: Cursor MCP Server Configuration Guide

To configure Supadata MCP in Cursor v0.48.6

Open Cursor Settings
Go to Features > MCP Servers
Click "+ Add new global MCP server"

Enter the following code:

{
  "mcpServers": {
    "supadata-mcp": {
      "command": "npx",
      "args": ["-y", "supadata-mcp"],
      "env": {
        "SUPADATA_API_KEY": "YOUR-API-KEY"
      }
    }
  }
}

To configure Supadata MCP in Cursor v0.45.6

Open Cursor Settings
Go to Features > MCP Servers
Click "+ Add New MCP Server"
Enter the following:
- Name: "supadata-mcp" (or your preferred name)
- Type: "command"
- Command: env SUPADATA_API_KEY=your-api-key npx -y supadata-mcp

If you are using Windows and are running into issues, try cmd /c "set SUPADATA_API_KEY=your-api-key && npx -y supadata-mcp"

Replace your-api-key with your Supadata API key. If you don't have one yet, you can create an account and get it from https://www.supadata.dev/app/api-keys

After adding, refresh the MCP server list to see the new tools. The Composer Agent will automatically use Supadata MCP when appropriate, but you can explicitly request it by describing your web scraping needs. Access the Composer via Command+L (Mac), select "Agent" next to the submit button, and enter your query.

Running on Windsurf

Add this to your ./codeium/windsurf/model_config.json:

{
  "mcpServers": {
    "supadata-mcp": {
      "command": "npx",
      "args": ["-y", "supadata-mcp"],
      "env": {
        "SUPADATA_API_KEY": "YOUR_API_KEY"
      }
    }
  }
}

Installing via Smithery

To install Supadata for Claude Desktop automatically via Smithery:

npx -y @smithery/cli install @supadata-ai/mcp --client claude

Running on VS Code

For one-click installation, click one of the install buttons below...

For manual installation, add the following JSON block to your User Settings (JSON) file in VS Code. You can do this by pressing Ctrl + Shift + P and typing Preferences: Open User Settings (JSON).

{
  "mcp": {
    "inputs": [
      {
        "type": "promptString",
        "id": "apiKey",
        "description": "Supadata API Key",
        "password": true
      }
    ],
    "servers": {
      "supadata": {
        "command": "npx",
        "args": ["-y", "supadata-mcp"],
        "env": {
          "SUPADATA_API_KEY": "${input:apiKey}"
        }
      }
    }
  }
}

Optionally, you can add it to a file called .vscode/mcp.json in your workspace. This will allow you to share the configuration with others:

{
  "inputs": [
    {
      "type": "promptString",
      "id": "apiKey",
      "description": "Supadata API Key",
      "password": true
    }
  ],
  "servers": {
    "supadata": {
      "command": "npx",
      "args": ["-y", "supadata-mcp"],
      "env": {
        "SUPADATA_API_KEY": "${input:apiKey}"
      }
    }
  }
}

🛠️ Configuration

Environment Variables

SUPADATA_API_KEY: Your Supadata API key

Usage with Claude Desktop

Add this to your claude_desktop_config.json:

{
  "mcpServers": {
    "supadata-mcp": {
      "command": "npx",
      "args": ["-y", "supadata-mcp"],
      "env": {
        "SUPADATA_API_KEY": "YOUR_API_KEY_HERE"
      }
    }
  }
}

System Configuration

The server includes several configurable parameters that can be set via environment variables. Here are the default values if not configured:

const CONFIG = {
  retry: {
    maxAttempts: 3, // Number of retry attempts for rate-limited requests
    initialDelay: 1000, // Initial delay before first retry (in milliseconds)
    maxDelay: 10000, // Maximum delay between retries (in milliseconds)
    backoffFactor: 2, // Multiplier for exponential backoff
  },
};

Rate Limiting and Batch Processing

The server utilizes Supadata's built-in rate limiting and batch processing capabilities:

Automatic rate limit handling with exponential backoff
Efficient parallel processing for batch operations
Smart request queuing and throttling
Automatic retries for transient errors

🛠️ How to Choose a Tool

Use this guide to select the right tool for your task:

If you need transcripts from video content: use transcript
If you know the exact URL(s) you want:
- For one: use scrape
- For many: use batch_scrape
If you need to discover URLs on a site: use map
If you want to analyze a whole site or section: use crawl (with limits!)

Quick Reference Table

Tool	Best for	Returns
transcript	Video transcript extraction	text/markdown
scrape	Single page content	markdown/html
map	Discovering URLs on a site	URL[]
crawl	Multi-page extraction (with limits)	markdown/html[]

💻 Usage Examples

1. Transcript Tool (`supadata_transcript`)

Extract transcripts from supported video platforms and file URLs.

Best for:

Video content analysis and transcript extraction from YouTube, TikTok, Twitter, and file URLs.

Not recommended for:

Non-video content (use scrape for web pages)

Common mistakes:

Using transcript for regular web pages (use scrape instead).

Prompt Example:

"Get the transcript from this YouTube video: https://youtube.com/watch?v=example"

Usage Example:

{
  "name": "supadata_transcript",
  "arguments": {
    "url": "https://youtube.com/watch?v=example",
    "lang": "en",
    "text": false,
    "mode": "auto"
  }
}

Returns:

Transcript content in text or formatted output
For async processing: Job ID for status checking

2. Check Transcript Status (`supadata_check_transcript_status`)

Check the status of a transcript job.

{
  "name": "supadata_check_transcript_status",
  "arguments": {
    "id": "550e8400-e29b-41d4-a716-446655440000"
  }
}

Returns:

Response includes the status of the transcript job with completion progress and results.

3. Scrape Tool (`supadata_scrape`)

Scrape content from a single URL with advanced options.

Best for:

Single page content extraction, when you know exactly which page contains the information.

Not recommended for:

Extracting content from multiple pages (use batch_scrape for known URLs, or map + batch_scrape to discover URLs first, or crawl for full page content)

Common mistakes:

Using scrape for a list of URLs (use batch_scrape instead).

Prompt Example:

"Get the content of the page at https://example.com."

Usage Example:

{
  "name": "supadata_scrape",
  "arguments": {
    "url": "https://example.com",
    "formats": ["markdown"],
    "onlyMainContent": true,
    "waitFor": 1000,
    "timeout": 30000,
    "mobile": false,
    "includeTags": ["article", "main"],
    "excludeTags": ["nav", "footer"],
    "skipTlsVerification": false
  }
}

Returns:

Markdown, HTML, or other formats as specified.

4. Map Tool (`supadata_map`)

Map a website to discover all indexed URLs on the site.

Best for:

Discovering URLs on a website before deciding what to scrape
Finding specific sections of a website

Not recommended for:

When you already know which specific URL you need (use scrape or batch_scrape)
When you need the content of the pages (use scrape after mapping)

Common mistakes:

Using crawl to discover URLs instead of map

Prompt Example:

"List all URLs on example.com."

Usage Example:

{
  "name": "supadata_map",
  "arguments": {
    "url": "https://example.com"
  }
}

Returns:

Array of URLs found on the site

5. Crawl Tool (`supadata_crawl`)

Starts an asynchronous crawl job on a website and extract content from all pages.

Best for:

Extracting content from multiple related pages, when you need comprehensive coverage.

Not recommended for:

Extracting content from a single page (use scrape)
When token limits are a concern (use map + batch_scrape)
When you need fast results (crawling can be slow)

Warning: Crawl responses can be very large and may exceed token limits. Limit the crawl depth and number of pages, or use map + batch_scrape for better control.

Common mistakes:

Setting limit or maxDepth too high (causes token overflow)
Using crawl for a single page (use scrape instead)

Prompt Example:

"Get all blog posts from the first two levels of example.com/blog."

Usage Example:

{
  "name": "supadata_crawl",
  "arguments": {
    "url": "https://example.com/blog/*",
    "maxDepth": 2,
    "limit": 100,
    "allowExternalLinks": false,
    "deduplicateSimilarURLs": true
  }
}

Returns:

Response includes operation ID for status checking:

{
  "content": [
    {
      "type": "text",
      "text": "Started crawl for: https://example.com/* with job ID: 550e8400-e29b-41d4-a716-446655440000. Use supadata_check_crawl_status to check progress."
    }
  ],
  "isError": false
}

6. Check Crawl Status (`supadata_check_crawl_status`)

Check the status of a crawl job.

{
  "name": "supadata_check_crawl_status",
  "arguments": {
    "id": "550e8400-e29b-41d4-a716-446655440000"
  }
}

Returns:

Response includes the status of the crawl job with details on completion progress and results.

📝 Logging System

The server includes comprehensive logging:

Operation status and progress
Performance metrics
Credit usage monitoring
Rate limit tracking
Error conditions

Example log messages:

[INFO] Supadata MCP Server initialized successfully
[INFO] Starting scrape for URL: https://example.com
[INFO] Batch operation queued with ID: batch_1
[WARNING] Credit usage has reached warning threshold
[ERROR] Rate limit exceeded, retrying in 2s...

⚠️ Error Handling

The server provides robust error handling:

Automatic retries for transient errors
Rate limit handling with backoff
Detailed error messages
Credit usage warnings
Network resilience

Example error response:

{
  "content": [
    {
      "type": "text",
      "text": "Error: Rate limit exceeded. Retrying in 2 seconds..."
    }
  ],
  "isError": true
}

🛠️ Development

# Install dependencies
npm install

# Build
npm run build

# Run tests
npm test

Contributing

Fork the repository
Create your feature branch
Run tests: npm test
Submit a pull request

📄 License

MIT License - see LICENSE file for details

Notion Api MCP

Certified

A Python-based MCP Server that provides advanced to-do list management and content organization functions through the Notion API, enabling seamless integration between AI models and Notion.

Markdownify is a multi-functional file conversion service that supports converting multiple formats such as PDFs, images, audio, and web page content into Markdown format.

TypeScript

43.7K

5 points

Duckduckgo MCP Server

Certified

The DuckDuckGo Search MCP Server provides web search and content scraping services for LLMs such as Claude.

The GitLab MCP server is a project based on the Model Context Protocol that provides a comprehensive toolset for interacting with GitLab accounts, including code review, merge request management, CI/CD configuration, and other functions.

UnityMCP is a Unity editor plugin that implements the Model Context Protocol (MCP), providing seamless integration between Unity and AI assistants, including real - time state monitoring, remote command execution, and log functions.

41.1K

5 points

Figma Context MCP

Framelink Figma MCP Server is a server that provides access to Figma design data for AI programming tools (such as Cursor). By simplifying the Figma API response, it helps AI more accurately achieve one - click conversion from design to code.

A Gmail automatic authentication MCP server designed for Claude Desktop, supporting Gmail management through natural language interaction, including complete functions such as sending emails, label management, and batch operations.

Context7 MCP is a service that provides real-time, version-specific documentation and code examples for AI programming assistants. It is directly integrated into prompts through the Model Context Protocol to solve the problem of LLMs using outdated information.

TypeScript

116.7K

4.7 points

Zhiqi Future, Your AI Solution Think Tank

English 简体中文繁體中文にほんご

MCP

Installation

Content Details

Alternatives

Installation

🚀 Supadata MCP Server

🚀 Quick Start

✨ Features

📦 Installation

Running with npx

Manual Installation

Running on Cursor

Running on Windsurf

Installing via Smithery

Running on VS Code

🛠️ Configuration

Environment Variables

Usage with Claude Desktop

System Configuration

Rate Limiting and Batch Processing

🛠️ How to Choose a Tool

Quick Reference Table

💻 Usage Examples

1. Transcript Tool (`supadata_transcript`)

2. Check Transcript Status (`supadata_check_transcript_status`)

3. Scrape Tool (`supadata_scrape`)

4. Map Tool (`supadata_map`)

5. Crawl Tool (`supadata_crawl`)

6. Check Crawl Status (`supadata_check_crawl_status`)

📝 Logging System

⚠️ Error Handling

🛠️ Development

Contributing

📄 License

Alternatives

MCP

Installation

Content Details

Alternatives

Installation

🚀 Supadata MCP Server

🚀 Quick Start

✨ Features

📦 Installation

Running with npx

Manual Installation

Running on Cursor

Running on Windsurf

Installing via Smithery

Running on VS Code

🛠️ Configuration

Environment Variables

Usage with Claude Desktop

System Configuration

Rate Limiting and Batch Processing

🛠️ How to Choose a Tool

Quick Reference Table

💻 Usage Examples

1. Transcript Tool (supadata_transcript)

2. Check Transcript Status (supadata_check_transcript_status)

3. Scrape Tool (supadata_scrape)

4. Map Tool (supadata_map)

5. Crawl Tool (supadata_crawl)

6. Check Crawl Status (supadata_check_crawl_status)

📝 Logging System

⚠️ Error Handling

🛠️ Development

Contributing

📄 License

Alternatives

1. Transcript Tool (`supadata_transcript`)

2. Check Transcript Status (`supadata_check_transcript_status`)

3. Scrape Tool (`supadata_scrape`)

4. Map Tool (`supadata_map`)

5. Crawl Tool (`supadata_crawl`)

6. Check Crawl Status (`supadata_check_crawl_status`)