Google Research MCP

Google Researcher MCP Server is a service that provides professional research tools for AI assistants, supporting Google search, web scraping, academic paper and patent queries, etc. It can be integrated into AI applications such as Claude through the MCP protocol.

Search tools Research and data #Web search #Academic research #Patent query #Content extraction .TypeScript

rating : 2.5 points

downloads : 6.6K

update time : 2026-03-13

Open Site

What is Google Researcher MCP Server?

This is a Model Context Protocol (MCP) server that provides powerful network research capabilities specifically for AI assistants (such as Claude, GPT, etc.). It allows AI assistants to search for information, read web page content, analyze documents, and integrate this information into conversations like human researchers.

How to use Google Researcher MCP Server?

After installation and configuration, AI assistants can directly call various research tools. For example, you can ask the assistant to 'Search for the latest AI regulation news', and it will automatically use the google_news_search tool to obtain the latest information. The entire process is transparent to users, and AI assistants will intelligently select the most suitable tools to complete tasks.

Applicable scenarios

Suitable for scenarios that require real-time information, multi-source verification, and in-depth research. Such as academic research, market analysis, news summarization, technical document writing, competitive intelligence collection, etc. Whether it's quickly verifying facts or conducting systematic literature reviews, this tool can significantly enhance the information acquisition ability of AI assistants.

Main features

Web search

Search for network information through Google, supporting advanced search options such as site filtering, time range, and language.

News search

Specifically search for news content, which can be filtered by freshness (hours/days/weeks/months) and sorted by time.

Image search

Search for image content, supporting filtering by type (photos/illustrations), size, color, etc.

Web scraping

Extract web page content, including JavaScript-rendered pages, and automatically process PDF, DOCX, PPTX documents.

YouTube subtitle extraction

Automatically extract the subtitle content of YouTube videos, supporting multiple languages and error handling.

Academic search

Search for academic papers (arXiv, PubMed, IEEE, etc.), providing formatted citations (APA, MLA, BibTeX).

Patent search

Search for patent information, supporting filtering by patent office, applicant, inventor, and classification code.

Sequential research

Support multi-step research processes, track research progress, and is suitable for complex investigation tasks.

Quality scoring

Automatically evaluate the quality of information sources (relevance, freshness, authority, content quality).

Intelligent caching

Two-layer caching system (memory + disk) to reduce API calls and improve response speed.

Advantages

One-stop research solution: integrating search, scraping, and analysis.

Intelligent tool selection: AI assistants automatically select the most suitable research tools.

Production-level stability: including enterprise-level functions such as caching, rate limiting, and error handling.

Multi-format support: automatically process various content formats such as web pages, PDF, DOCX, PPTX, and YouTube.

Academic-level research capabilities: specialized academic and patent search tools, suitable for professional research.

Flexible configuration: supporting both local STDIO and remote HTTP connection methods.

Limitations

Requires a Google API key: You must configure the Google Custom Search API to use it.

Network dependency: All functions require a network connection.

Content restrictions: Subject to website robots.txt and copyright restrictions, some content may not be scraped.

JavaScript rendering requires additional resources: Processing dynamic pages requires a Chromium browser environment.

YouTube subtitle availability: Depends on whether the video owner has enabled the subtitle function.

How to use

Get an API key

Access the Google Cloud Console, enable the Custom Search API, and obtain the API key and search engine ID.

Configure the AI assistant

Edit the configuration file according to your AI assistant (Claude Desktop, Claude Code, etc.) and add the MCP server configuration.

Restart the AI assistant

Restart your AI assistant application to make the configuration take effect.

Start using

Now you can let the AI assistant conduct network research, for example: 'Search for the latest news on AI development'.

Usage examples

Academic research assistance

Help researchers quickly find relevant academic literature and obtain paper abstracts and citation information.

News summary generation

Automatically collect the latest news and generate daily briefings or news summaries on specific topics.

Competitive intelligence analysis

Analyze the patent portfolio, technological development, and market dynamics of competitors.

Technical document research

Collect reference materials, example codes, and best practices for technical documents.

Fact checking

Quickly verify the accuracy of a statement or data and find multiple sources for cross-verification.

Frequently Asked Questions

Do I need to pay to use the Google API?

Which AI assistants are supported?

Is there a length limit for the scraped web page content?

How to ensure the accuracy of search results?

Can I search for Chinese content?

What is the accuracy of YouTube subtitle extraction?

Which databases are included in the academic search?

Which patent offices are supported in the patent search?

Related resources

GitHub repository

Project source code, issue tracking, and the latest version.

MCP protocol official website

Official documentation and specifications of the Model Context Protocol.

Google Custom Search API documentation

Official documentation and setup guide for the Google search API.

API setup guide

Detailed steps for obtaining and configuring the Google API key.

Problem feedback

Report bugs, request features, or seek help.

🚀 Google Researcher MCP Server

Professional research tools for AI assistants - Google Search, web scraping, academic papers, patents, and more

🚀 Quick Start

Claude Desktop (macOS)

Add to ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "google-researcher": {
      "command": "npx",
      "args": ["-y", "google-researcher-mcp"],
      "env": {
        "GOOGLE_CUSTOM_SEARCH_API_KEY": "YOUR_API_KEY_HERE",
        "GOOGLE_CUSTOM_SEARCH_ID": "YOUR_SEARCH_ID_HERE"
      }
    }
  }
}

Claude Desktop (Windows)

Add to %APPDATA%\Claude\claude_desktop_config.json:

{
  "mcpServers": {
    "google-researcher": {
      "command": "npx",
      "args": ["-y", "google-researcher-mcp"],
      "env": {
        "GOOGLE_CUSTOM_SEARCH_API_KEY": "YOUR_API_KEY_HERE",
        "GOOGLE_CUSTOM_SEARCH_ID": "YOUR_SEARCH_ID_HERE"
      }
    }
  }
}

One-Click Install (MCPB)

Download the latest .mcpb bundle from GitHub Releases and double-click to install in Claude Desktop. You'll be prompted to enter your Google API credentials.

Claude Code

Add to ~/.claude.json:

{
  "mcpServers": {
    "google-researcher": {
      "command": "npx",
      "args": ["-y", "google-researcher-mcp"],
      "env": {
        "GOOGLE_CUSTOM_SEARCH_API_KEY": "YOUR_API_KEY_HERE",
        "GOOGLE_CUSTOM_SEARCH_ID": "YOUR_SEARCH_ID_HERE"
      }
    }
  }
}

Cline / Roo Code

Use the same JSON configuration above in your MCP settings.

Need API keys? See the API Setup Guide for step-by-step instructions to get your Google API credentials.

Local Development

git clone https://github.com/zoharbabin/google-researcher-mcp.git && cd google-researcher-mcp
npm install && npx playwright install chromium
cp .env.example .env   # Then add your Google API keys to .env
npm run dev            # Server is now running on STDIO transport

Note: This starts the server in STDIO mode, which is all you need for local AI assistant integrations. HTTP transport with OAuth is only required for web-based or multi-client setups — see Choosing a Transport.

Verify It Works

Once configured, ask your AI assistant:

"Search for the latest news about AI regulations"

The assistant will use the google_news_search tool and return current articles. If you see search results, the server is working correctly.

✨ Features

Core Capabilities

Feature	Description
Web Scraping	Fast static HTML + automatic Playwright fallback for JavaScript-rendered pages
YouTube Transcripts	Robust extraction with retry logic and 10 classified error types
Document Parsing	Auto-detects and extracts text from PDF, DOCX, PPTX
Quality Scoring	Sources ranked by relevance (35%), freshness (20%), authority (25%), content quality (20%)

MCP Protocol Support

Feature	Description
Tools	8 tools: `search_and_scrape`, `google_search`, `google_image_search`, `google_news_search`, `scrape_page`, `sequential_search`, `academic_search`, `patent_search`
Resources	Expose server state: `stats://tools` (per-tool metrics), `stats://cache`, `search://recent`, `config://server`
Prompts	Pre-built templates: `comprehensive-research`, `fact-check`, `summarize-url`, `news-briefing`
Annotations	Content tagged with audience, priority, and timestamps

Production Ready

Feature	Description
Caching	Two-layer (memory + disk) with per-tool namespaces, reduces API costs
Dual Transport	STDIO for local clients, HTTP+SSE for web apps
Security	OAuth 2.1, SSRF protection, granular scopes
Resilience	Circuit breaker, timeouts, graceful degradation
Monitoring	Admin endpoints for cache stats, event store, health checks

For detailed documentation: YouTube Transcripts · Architecture · Testing

📦 Installation Guide

Prerequisites

Node.js 20.0.0 or higher
Google API Keys:
- Custom Search API Key
- Custom Search Engine ID
Chromium (for JavaScript rendering): Installed automatically via npx playwright install chromium
OAuth 2.1 Provider (HTTP transport only): An external authorization server (e.g., Auth0, Okta) to issue JWTs. Not needed for STDIO.

Installation & Setup

Clone the Repository:

git clone https://github.com/zoharbabin/google-researcher-mcp.git
cd google-researcher-mcp

Install Dependencies:

npm install
npx playwright install chromium

Configure Environment Variables:
```
cp .env.example .env
```
Open .env and add your Google API keys. All other variables are optional — see the comments in .env.example for detailed explanations.

Running the Server

Development (auto-reload on file changes):
```
npm run dev
```
Production:
```
npm run build
npm start
```

Running with Docker

# Build the image
docker build -t google-researcher-mcp .

# Run in STDIO mode (default, for MCP clients)
docker run -i --rm --env-file .env google-researcher-mcp

# Run with HTTP transport on port 3000
# (MCP_TEST_MODE= overrides the Dockerfile default of "stdio" to enable HTTP)
docker run -d --rm --env-file .env -e MCP_TEST_MODE= -p 3000:3000 google-researcher-mcp

Docker Compose (quick HTTP transport setup):

cp .env.example .env   # Fill in your API keys
docker compose up --build
curl http://localhost:3000/health

Docker with Claude Code (~/.claude/claude_desktop_config.json):

{
  "mcpServers": {
    "google-researcher": {
      "command": "docker",
      "args": ["run", "-i", "--rm", "--env-file", "/path/to/.env", "google-researcher-mcp"]
    }
  }
}

Security note: Never bake secrets into the Docker image. Always pass them at runtime via --env-file or -e flags.

💻 Usage Examples

For AI Assistants (LLMs)

If you are an AI assistant reading this, here's how to use this MCP server:

Recommended Tool Selection

Your Task	Use This Tool
Research a topic, answer a question	`search_and_scrape` — searches AND retrieves content in one call (recommended)
Complex multi-step investigation	`sequential_search` — tracks progress across 3+ searches, supports branching
Find academic papers	`academic_search` — searches arXiv, PubMed, IEEE with citations (APA, MLA, BibTeX)
Search patents	`patent_search` — Google Patents for prior art, FTO analysis
Find recent news	`google_news_search` — with freshness filtering and date sorting
Find images	`google_image_search` — with size/type/color filtering
Get a list of URLs only	`google_search` — when you need URLs but will process pages yourself
Read a specific URL	`scrape_page` — also extracts YouTube transcripts and parses PDF/DOCX/PPTX

Example Tool Calls

// Research a topic (RECOMMENDED for most queries)
{ "name": "search_and_scrape", "arguments": { "query": "climate change effects 2024", "num_results": 5 } }

// Multi-step research with tracking (for complex investigations)
{ "name": "sequential_search", "arguments": { "searchStep": "Starting research on quantum computing", "stepNumber": 1, "totalStepsEstimate": 4, "nextStepNeeded": true } }

// Find academic papers (peer-reviewed sources with citations)
{ "name": "academic_search", "arguments": { "query": "transformer neural networks", "num_results": 5 } }

// Search patents (prior art, FTO analysis)
{ "name": "patent_search", "arguments": { "query": "machine learning optimization", "search_type": "prior_art" } }

// Get recent news
{ "name": "google_news_search", "arguments": { "query": "AI regulations", "freshness": "week" } }

// Find images
{ "name": "google_image_search", "arguments": { "query": "solar panel installation", "type": "photo" } }

// Read a specific page
{ "name": "scrape_page", "arguments": { "url": "https://example.com/article" } }

// Get YouTube transcript
{ "name": "scrape_page", "arguments": { "url": "https://www.youtube.com/watch?v=VIDEO_ID" } }

Client Integration

STDIO Client (Local Process)

import { Client } from "@modelcontextprotocol/sdk/client/index.js";
import { StdioClientTransport } from "@modelcontextprotocol/sdk/client/stdio.js";

const transport = new StdioClientTransport({
  command: "node",
  args: ["dist/server.js"]
});
const client = new Client({ name: "my-client" });
await client.connect(transport);

// Search Google
const searchResult = await client.callTool({
  name: "google_search",
  arguments: { query: "Model Context Protocol" }
});
console.log(searchResult.content[0].text);

// Extract a YouTube transcript
const transcript = await client.callTool({
  name: "scrape_page",
  arguments: { url: "https://www.youtube.com/watch?v=dQw4w9WgXcQ" }
});
console.log(transcript.content[0].text);

HTTP+SSE Client (Web Application)

Requires a valid OAuth 2.1 Bearer token from your configured authorization server.

import { Client } from "@modelcontextprotocol/sdk/client/index.js";
import { StreamableHTTPClientTransport } from "@modelcontextprotocol/sdk/client/streamableHttp.js";

const transport = new StreamableHTTPClientTransport(
  new URL("http://localhost:3000/mcp"),
  {
    getAuthorization: async () => `Bearer YOUR_ACCESS_TOKEN`
  }
);
const client = new Client({ name: "my-client" });
await client.connect(transport);

const result = await client.callTool({
  name: "search_and_scrape",
  arguments: { query: "Model Context Protocol", num_results: 3 }
});
console.log(result.content[0].text);

📚 Documentation

Available Tools

When to Use Each Tool

Tool	Best For	Use When...
`search_and_scrape`	Research (recommended)	You need to answer a question using web sources. Most efficient — searches AND retrieves content in one call. Sources are quality-scored.
`sequential_search`	Complex investigations	3+ searches needed with different angles, or research you might abandon early. Tracks progress, supports branching. You reason; it tracks state.
`academic_search`	Peer-reviewed papers	Research requiring authoritative academic sources. Returns papers with citations (APA, MLA, BibTeX), abstracts, and PDF links.
`patent_search`	Patent research	Prior art search, freedom to operate (FTO) analysis, patent landscaping. Returns patents with numbers, assignees, inventors, and PDF links.
`google_search`	Finding URLs only	You only need a list of URLs (not their content), or want to process pages yourself with custom logic.
`google_image_search`	Finding images	You need visual content — photos, illustrations, graphics. For text research, use search_and_scrape.
`google_news_search`	Current news	You need recent news articles. Use scrape_page on results to read full articles.
`scrape_page`	Reading a specific URL	You have a URL and need its content. Auto-handles YouTube transcripts and documents (PDF, DOCX, PPTX).

Tool Reference

`search_and_scrape` (Recommended for research)

Searches Google and retrieves content from top results in one call. Returns quality-scored, deduplicated text with source attribution. Includes size metadata (estimatedTokens, sizeCategory, truncated) in response.

Parameter	Type	Default	Description
`query`	string	required	Search query (1-500 chars)
`num_results`	number	3	Number of results (1-10)
`include_sources`	boolean	true	Append source URLs
`deduplicate`	boolean	true	Remove duplicate content
`max_length_per_source`	number	50KB	Max content per source in chars
`total_max_length`	number	300KB	Max total combined content in chars
`filter_by_query`	boolean	false	Filter to only paragraphs containing query keywords

`google_search`

Returns ranked URLs from Google. Use when you only need links, not content.

Parameter	Type	Default	Description
`query`	string	required	Search query (1-500 chars)
`num_results`	number	5	Number of results (1-10)
`time_range`	string	-	`day`, `week`, `month`, `year`
`site_search`	string	-	Limit to domain
`exact_terms`	string	-	Required phrase
`exclude_terms`	string	-	Exclude words

`google_image_search`

Searches Google Images with filtering options.

Parameter	Type	Default	Description
`query`	string	required	Search query (1-500 chars)
`num_results`	number	5	Number of results (1-10)
`size`	string	-	`huge`, `large`, `medium`, `small`
`type`	string	-	`clipart`, `face`, `lineart`, `photo`, `animated`
`color_type`	string	-	`color`, `gray`, `mono`, `trans`
`file_type`	string	-	`jpg`, `gif`, `png`, `bmp`, `svg`, `webp`

`google_news_search`

Searches Google News with freshness and date sorting.

Parameter	Type	Default	Description
`query`	string	required	Search query (1-500 chars)
`num_results`	number	5	Number of results (1-10)
`freshness`	string	week	`hour`, `day`, `week`, `month`, `year`
`sort_by`	string	relevance	`relevance`, `date`
`news_source`	string	-	Filter to specific source

`scrape_page`

Extracts text from any URL. Auto-detects: web pages (static/JS), YouTube (transcript), documents (PDF/DOCX/PPTX).

Parameter	Type	Default	Description
`url`	string	required	URL to scrape (max 2048 chars)
`max_length`	number	50KB	Maximum content length in chars. Content exceeding this is truncated at natural breakpoints.
`mode`	string	full	`full` returns content, `preview` returns metadata + structure only (useful to check size before fetching)

`sequential_search`

Tracks multi-step research state. Following the sequential_thinking pattern: you do the reasoning, the tool tracks state.

Parameter	Type	Default	Description
`searchStep`	string	required	Description of current step (1-2000 chars)
`stepNumber`	number	required	Current step number (starts at 1)
`totalStepsEstimate`	number	5	Estimated total steps (1-50)
`nextStepNeeded`	boolean	required	`true` if more steps needed, `false` when done
`source`	object	-	Source found: `{ url, summary, qualityScore? }`
`knowledgeGap`	string	-	Gap identified — what's still missing
`isRevision`	boolean	-	`true` if revising a previous step
`revisesStep`	number	-	Step number being revised
`branchId`	string	-	Identifier for branching research

`academic_search`

Searches academic papers via Google Custom Search API, filtered to academic sources (arXiv, PubMed, IEEE, Nature, Springer, etc.). Returns papers with pre-formatted citations.

Parameter	Type	Default	Description
`query`	string	required	Search query (1-500 chars)
`num_results`	number	5	Number of papers (1-10)
`year_from`	number	-	Filter by min publication year
`year_to`	number	-	Filter by max publication year
`source`	string	all	`all`, `arxiv`, `pubmed`, `ieee`, `nature`, `springer`
`pdf_only`	boolean	false	Only return results with PDF links
`sort_by`	string	relevance	`relevance`, `date`

`patent_search`

Searches Google Patents for prior art, freedom to operate (FTO) analysis, and patent landscaping. Returns patents with numbers, assignees, inventors, and PDF links.

Parameter	Type	Default	Description
`query`	string	required	Search query (1-500 chars)
`num_results`	number	5	Number of results (1-10)
`search_type`	string	prior_art	`prior_art`, `specific`, `landscape`
`patent_office`	string	all	`all`, `US`, `EP`, `WO`, `JP`, `CN`, `KR`
`assignee`	string	-	Filter by assignee/company
`inventor`	string	-	Filter by inventor name
`cpc_code`	string	-	Filter by CPC classification code
`year_from`	number	-	Filter by min year
`year_to`	number	-	Filter by max year

MCP Resources

The server exposes state via the MCP Resources protocol. Use resources/list to discover available resources and resources/read to retrieve them.

URI	Description
`search://recent`	Last 20 search queries with timestamps and result counts
`config://server`	Server configuration (version, start time, transport mode)
`stats://cache`	Cache statistics (hit rate, entry count, memory usage)
`stats://events`	Event store statistics (event count, storage size)

Example (using MCP SDK):

const resources = await client.listResources();
const recentSearches = await client.readResource({ uri: "search://recent" });

MCP Prompts

Pre-built research workflow templates are available via the MCP Prompts protocol. Use prompts/list to discover prompts and prompts/get to retrieve a prompt with arguments.

Basic Research Prompts

Prompt	Arguments	Description
`comprehensive-research`	`topic`, `depth` (quick/standard/deep)	Multi-source research on a topic
`fact-check`	`claim`, `sources` (number)	Verify a claim against multiple sources
`summarize-url`	`url`, `format` (brief/detailed/bullets)	Summarize content from a single URL
`news-briefing`	`topic`, `timeRange` (day/week/month)	Get current news summary on a topic

Advanced Research Prompts

Prompt	Arguments	Description
`patent-portfolio-analysis`	`company`, `includeSubsidiaries`	Analyze a company's patent holdings
`competitive-analysis`	`entities` (comma-separated), `aspects`	Compare companies/products
`literature-review`	`topic`, `yearFrom`, `sources`	Academic literature synthesis
`technical-deep-dive`	`technology`, `focusArea`	In-depth technical investigation

Focus areas for technical-deep-dive: architecture, implementation, comparison, best-practices, troubleshooting

Example (using MCP SDK):

const prompts = await client.listPrompts();

// Basic research
const research = await client.getPrompt({
  name: "comprehensive-research",
  arguments: { topic: "quantum computing", depth: "standard" }
});

// Advanced: Patent analysis
const patents = await client.getPrompt({
  name: "patent-portfolio-analysis",
  arguments: { company: "Kaltura", includeSubsidiaries: true }
});

// Advanced: Competitive analysis
const comparison = await client.getPrompt({
  name: "competitive-analysis",
  arguments: { entities: "React, Vue, Angular", aspects: "performance, learning curve, ecosystem" }
});

🔧 Technical Details

System Architecture

graph TD
    A[MCP Client] -->|local process| B[STDIO Transport]
    A -->|network| C[HTTP+SSE Transport]

    C --> L[OAuth 2.1 + Rate Limiter]
    L --> D
    C -.->|session replay| K[Event Store]
    B --> D[McpServer<br>MCP SDK routing + dispatch]

    D --> F[google_search]
    D --> G[scrape_page]
    D --> I[search_and_scrape]
    D --> IMG[google_image_search]
    D --> NEWS[google_news_search]
    I -.->|delegates| F
    I -.->|delegates| G
    I --> Q[Quality Scoring]

    G --> N[SSRF Validator]
    N --> S1[CheerioCrawler<br>static HTML]
    S1 -.->|insufficient content| S2[Playwright<br>JS rendering]
    G --> YT[YouTube Transcript<br>Extractor]

    F & G & IMG & NEWS --> J[Persistent Cache<br>memory + disk]

    D -.-> R[MCP Resources]
    D -.-> P[MCP Prompts]

    style J fill:#f9f,stroke:#333,stroke-width:2px
    style K fill:#ccf,stroke:#333,stroke-width:2px
    style L fill:#f99,stroke:#333,stroke-width:2px
    style N fill:#ff9,stroke:#333,stroke-width:2px
    style Q fill:#9f9,stroke:#333,stroke-width:2px

For a detailed explanation, see the Architecture Guide.

Usage

Choosing a Transport

	STDIO	HTTP+SSE
Best for	Local MCP clients (Claude Code, Cline, Roo Code)	Web apps, multi-client setups, remote access
Auth	None needed (process-level isolation)	OAuth 2.1 Bearer tokens required
Setup	Zero config — just provide API keys	Requires OAuth provider (Auth0, Okta, etc.)
Scaling	One server per client process	Single server, many concurrent clients

Recommendation: Use STDIO for local AI assistant integrations. Use HTTP+SSE only when you need a shared service or web application integration.

Management API

Administrative and monitoring endpoints (HTTP transport only):

Method	Endpoint	Description	Auth
`GET`	`/health`	Server health check (status, version, uptime)	Public
`GET`	`/version`	Server version and runtime info	Public
`GET`	`/mcp/cache-stats`	Cache performance statistics	`mcp:admin:cache:read`
`GET`	`/mcp/event-store-stats`	Event store usage statistics	`mcp:admin:event-store:read`
`POST`	`/mcp/cache-invalidate`	Clear specific cache entries	`mcp:admin:cache:invalidate`
`POST`	`/mcp/cache-persist`	Force cache save to disk	`mcp:admin:cache:persist`
`GET`	`/mcp/oauth-config`	Current OAuth configuration	`mcp:admin:config:read`
`GET`	`/mcp/oauth-scopes`	OAuth scopes documentation	Public
`GET`	`/mcp/oauth-token-info`	Token details	Authenticated

Security

OAuth 2.1 Authorization

All HTTP endpoints under /mcp/ (except public documentation) are protected by OAuth 2.1:

Token Validation: JWTs are validated against your authorization server's JWKS endpoint (${OAUTH_ISSUER_URL}/.well-known/jwks.json).
Scope Enforcement: Each tool and admin action requires a specific OAuth scope.

Configure OAUTH_ISSUER_URL and OAUTH_AUDIENCE in .env. See .env.example for details.