🚀 Google Researcher MCP Server
Professional research tools for AI assistants - Google Search, web scraping, academic papers, patents, and more
🚀 Quick Start
Claude Desktop (macOS)
Add to ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"google-researcher": {
"command": "npx",
"args": ["-y", "google-researcher-mcp"],
"env": {
"GOOGLE_CUSTOM_SEARCH_API_KEY": "YOUR_API_KEY_HERE",
"GOOGLE_CUSTOM_SEARCH_ID": "YOUR_SEARCH_ID_HERE"
}
}
}
}
Claude Desktop (Windows)
Add to %APPDATA%\Claude\claude_desktop_config.json:
{
"mcpServers": {
"google-researcher": {
"command": "npx",
"args": ["-y", "google-researcher-mcp"],
"env": {
"GOOGLE_CUSTOM_SEARCH_API_KEY": "YOUR_API_KEY_HERE",
"GOOGLE_CUSTOM_SEARCH_ID": "YOUR_SEARCH_ID_HERE"
}
}
}
}
One-Click Install (MCPB)
Download the latest .mcpb bundle from GitHub Releases and double-click to install in Claude Desktop. You'll be prompted to enter your Google API credentials.
Claude Code
Add to ~/.claude.json:
{
"mcpServers": {
"google-researcher": {
"command": "npx",
"args": ["-y", "google-researcher-mcp"],
"env": {
"GOOGLE_CUSTOM_SEARCH_API_KEY": "YOUR_API_KEY_HERE",
"GOOGLE_CUSTOM_SEARCH_ID": "YOUR_SEARCH_ID_HERE"
}
}
}
}
Cline / Roo Code
Use the same JSON configuration above in your MCP settings.
Need API keys? See the API Setup Guide for step-by-step instructions to get your Google API credentials.
Local Development
git clone https://github.com/zoharbabin/google-researcher-mcp.git && cd google-researcher-mcp
npm install && npx playwright install chromium
cp .env.example .env
npm run dev
Note: This starts the server in STDIO mode, which is all you need for local AI assistant integrations. HTTP transport with OAuth is only required for web-based or multi-client setups — see Choosing a Transport.
Verify It Works
Once configured, ask your AI assistant:
"Search for the latest news about AI regulations"
The assistant will use the google_news_search tool and return current articles. If you see search results, the server is working correctly.
✨ Features
Core Capabilities
| Feature |
Description |
| Web Scraping |
Fast static HTML + automatic Playwright fallback for JavaScript-rendered pages |
| YouTube Transcripts |
Robust extraction with retry logic and 10 classified error types |
| Document Parsing |
Auto-detects and extracts text from PDF, DOCX, PPTX |
| Quality Scoring |
Sources ranked by relevance (35%), freshness (20%), authority (25%), content quality (20%) |
MCP Protocol Support
| Feature |
Description |
| Tools |
8 tools: search_and_scrape, google_search, google_image_search, google_news_search, scrape_page, sequential_search, academic_search, patent_search |
| Resources |
Expose server state: stats://tools (per-tool metrics), stats://cache, search://recent, config://server |
| Prompts |
Pre-built templates: comprehensive-research, fact-check, summarize-url, news-briefing |
| Annotations |
Content tagged with audience, priority, and timestamps |
Production Ready
| Feature |
Description |
| Caching |
Two-layer (memory + disk) with per-tool namespaces, reduces API costs |
| Dual Transport |
STDIO for local clients, HTTP+SSE for web apps |
| Security |
OAuth 2.1, SSRF protection, granular scopes |
| Resilience |
Circuit breaker, timeouts, graceful degradation |
| Monitoring |
Admin endpoints for cache stats, event store, health checks |
For detailed documentation: YouTube Transcripts · Architecture · Testing
📦 Installation Guide
Prerequisites
- Node.js 20.0.0 or higher
- Google API Keys:
- Chromium (for JavaScript rendering): Installed automatically via
npx playwright install chromium
- OAuth 2.1 Provider (HTTP transport only): An external authorization server (e.g., Auth0, Okta) to issue JWTs. Not needed for STDIO.
Installation & Setup
-
Clone the Repository:
git clone https://github.com/zoharbabin/google-researcher-mcp.git
cd google-researcher-mcp
-
Install Dependencies:
npm install
npx playwright install chromium
-
Configure Environment Variables:
cp .env.example .env
Open .env and add your Google API keys. All other variables are optional — see the comments in .env.example for detailed explanations.
Running the Server
Running with Docker
docker build -t google-researcher-mcp .
docker run -i --rm --env-file .env google-researcher-mcp
docker run -d --rm --env-file .env -e MCP_TEST_MODE= -p 3000:3000 google-researcher-mcp
Docker Compose (quick HTTP transport setup):
cp .env.example .env
docker compose up --build
curl http://localhost:3000/health
Docker with Claude Code (~/.claude/claude_desktop_config.json):
{
"mcpServers": {
"google-researcher": {
"command": "docker",
"args": ["run", "-i", "--rm", "--env-file", "/path/to/.env", "google-researcher-mcp"]
}
}
}
Security note: Never bake secrets into the Docker image. Always pass them at runtime via --env-file or -e flags.
💻 Usage Examples
For AI Assistants (LLMs)
If you are an AI assistant reading this, here's how to use this MCP server:
Recommended Tool Selection
| Your Task |
Use This Tool |
| Research a topic, answer a question |
search_and_scrape — searches AND retrieves content in one call (recommended) |
| Complex multi-step investigation |
sequential_search — tracks progress across 3+ searches, supports branching |
| Find academic papers |
academic_search — searches arXiv, PubMed, IEEE with citations (APA, MLA, BibTeX) |
| Search patents |
patent_search — Google Patents for prior art, FTO analysis |
| Find recent news |
google_news_search — with freshness filtering and date sorting |
| Find images |
google_image_search — with size/type/color filtering |
| Get a list of URLs only |
google_search — when you need URLs but will process pages yourself |
| Read a specific URL |
scrape_page — also extracts YouTube transcripts and parses PDF/DOCX/PPTX |
Example Tool Calls
{ "name": "search_and_scrape", "arguments": { "query": "climate change effects 2024", "num_results": 5 } }
{ "name": "sequential_search", "arguments": { "searchStep": "Starting research on quantum computing", "stepNumber": 1, "totalStepsEstimate": 4, "nextStepNeeded": true } }
{ "name": "academic_search", "arguments": { "query": "transformer neural networks", "num_results": 5 } }
{ "name": "patent_search", "arguments": { "query": "machine learning optimization", "search_type": "prior_art" } }
{ "name": "google_news_search", "arguments": { "query": "AI regulations", "freshness": "week" } }
{ "name": "google_image_search", "arguments": { "query": "solar panel installation", "type": "photo" } }
{ "name": "scrape_page", "arguments": { "url": "https://example.com/article" } }
{ "name": "scrape_page", "arguments": { "url": "https://www.youtube.com/watch?v=VIDEO_ID" } }
Client Integration
STDIO Client (Local Process)
import { Client } from "@modelcontextprotocol/sdk/client/index.js";
import { StdioClientTransport } from "@modelcontextprotocol/sdk/client/stdio.js";
const transport = new StdioClientTransport({
command: "node",
args: ["dist/server.js"]
});
const client = new Client({ name: "my-client" });
await client.connect(transport);
const searchResult = await client.callTool({
name: "google_search",
arguments: { query: "Model Context Protocol" }
});
console.log(searchResult.content[0].text);
const transcript = await client.callTool({
name: "scrape_page",
arguments: { url: "https://www.youtube.com/watch?v=dQw4w9WgXcQ" }
});
console.log(transcript.content[0].text);
HTTP+SSE Client (Web Application)
Requires a valid OAuth 2.1 Bearer token from your configured authorization server.
import { Client } from "@modelcontextprotocol/sdk/client/index.js";
import { StreamableHTTPClientTransport } from "@modelcontextprotocol/sdk/client/streamableHttp.js";
const transport = new StreamableHTTPClientTransport(
new URL("http://localhost:3000/mcp"),
{
getAuthorization: async () => `Bearer YOUR_ACCESS_TOKEN`
}
);
const client = new Client({ name: "my-client" });
await client.connect(transport);
const result = await client.callTool({
name: "search_and_scrape",
arguments: { query: "Model Context Protocol", num_results: 3 }
});
console.log(result.content[0].text);
📚 Documentation
Available Tools
When to Use Each Tool
| Tool |
Best For |
Use When... |
search_and_scrape |
Research (recommended) |
You need to answer a question using web sources. Most efficient — searches AND retrieves content in one call. Sources are quality-scored. |
sequential_search |
Complex investigations |
3+ searches needed with different angles, or research you might abandon early. Tracks progress, supports branching. You reason; it tracks state. |
academic_search |
Peer-reviewed papers |
Research requiring authoritative academic sources. Returns papers with citations (APA, MLA, BibTeX), abstracts, and PDF links. |
patent_search |
Patent research |
Prior art search, freedom to operate (FTO) analysis, patent landscaping. Returns patents with numbers, assignees, inventors, and PDF links. |
google_search |
Finding URLs only |
You only need a list of URLs (not their content), or want to process pages yourself with custom logic. |
google_image_search |
Finding images |
You need visual content — photos, illustrations, graphics. For text research, use search_and_scrape. |
google_news_search |
Current news |
You need recent news articles. Use scrape_page on results to read full articles. |
scrape_page |
Reading a specific URL |
You have a URL and need its content. Auto-handles YouTube transcripts and documents (PDF, DOCX, PPTX). |
Tool Reference
search_and_scrape (Recommended for research)
Searches Google and retrieves content from top results in one call. Returns quality-scored, deduplicated text with source attribution. Includes size metadata (estimatedTokens, sizeCategory, truncated) in response.
| Parameter |
Type |
Default |
Description |
query |
string |
required |
Search query (1-500 chars) |
num_results |
number |
3 |
Number of results (1-10) |
include_sources |
boolean |
true |
Append source URLs |
deduplicate |
boolean |
true |
Remove duplicate content |
max_length_per_source |
number |
50KB |
Max content per source in chars |
total_max_length |
number |
300KB |
Max total combined content in chars |
filter_by_query |
boolean |
false |
Filter to only paragraphs containing query keywords |
google_search
Returns ranked URLs from Google. Use when you only need links, not content.
| Parameter |
Type |
Default |
Description |
query |
string |
required |
Search query (1-500 chars) |
num_results |
number |
5 |
Number of results (1-10) |
time_range |
string |
- |
day, week, month, year |
site_search |
string |
- |
Limit to domain |
exact_terms |
string |
- |
Required phrase |
exclude_terms |
string |
- |
Exclude words |
google_image_search
Searches Google Images with filtering options.
| Parameter |
Type |
Default |
Description |
query |
string |
required |
Search query (1-500 chars) |
num_results |
number |
5 |
Number of results (1-10) |
size |
string |
- |
huge, large, medium, small |
type |
string |
- |
clipart, face, lineart, photo, animated |
color_type |
string |
- |
color, gray, mono, trans |
file_type |
string |
- |
jpg, gif, png, bmp, svg, webp |
google_news_search
Searches Google News with freshness and date sorting.
| Parameter |
Type |
Default |
Description |
query |
string |
required |
Search query (1-500 chars) |
num_results |
number |
5 |
Number of results (1-10) |
freshness |
string |
week |
hour, day, week, month, year |
sort_by |
string |
relevance |
relevance, date |
news_source |
string |
- |
Filter to specific source |
scrape_page
Extracts text from any URL. Auto-detects: web pages (static/JS), YouTube (transcript), documents (PDF/DOCX/PPTX).
| Parameter |
Type |
Default |
Description |
url |
string |
required |
URL to scrape (max 2048 chars) |
max_length |
number |
50KB |
Maximum content length in chars. Content exceeding this is truncated at natural breakpoints. |
mode |
string |
full |
full returns content, preview returns metadata + structure only (useful to check size before fetching) |
sequential_search
Tracks multi-step research state. Following the sequential_thinking pattern: you do the reasoning, the tool tracks state.
| Parameter |
Type |
Default |
Description |
searchStep |
string |
required |
Description of current step (1-2000 chars) |
stepNumber |
number |
required |
Current step number (starts at 1) |
totalStepsEstimate |
number |
5 |
Estimated total steps (1-50) |
nextStepNeeded |
boolean |
required |
true if more steps needed, false when done |
source |
object |
- |
Source found: { url, summary, qualityScore? } |
knowledgeGap |
string |
- |
Gap identified — what's still missing |
isRevision |
boolean |
- |
true if revising a previous step |
revisesStep |
number |
- |
Step number being revised |
branchId |
string |
- |
Identifier for branching research |
academic_search
Searches academic papers via Google Custom Search API, filtered to academic sources (arXiv, PubMed, IEEE, Nature, Springer, etc.). Returns papers with pre-formatted citations.
| Parameter |
Type |
Default |
Description |
query |
string |
required |
Search query (1-500 chars) |
num_results |
number |
5 |
Number of papers (1-10) |
year_from |
number |
- |
Filter by min publication year |
year_to |
number |
- |
Filter by max publication year |
source |
string |
all |
all, arxiv, pubmed, ieee, nature, springer |
pdf_only |
boolean |
false |
Only return results with PDF links |
sort_by |
string |
relevance |
relevance, date |
patent_search
Searches Google Patents for prior art, freedom to operate (FTO) analysis, and patent landscaping. Returns patents with numbers, assignees, inventors, and PDF links.
| Parameter |
Type |
Default |
Description |
query |
string |
required |
Search query (1-500 chars) |
num_results |
number |
5 |
Number of results (1-10) |
search_type |
string |
prior_art |
prior_art, specific, landscape |
patent_office |
string |
all |
all, US, EP, WO, JP, CN, KR |
assignee |
string |
- |
Filter by assignee/company |
inventor |
string |
- |
Filter by inventor name |
cpc_code |
string |
- |
Filter by CPC classification code |
year_from |
number |
- |
Filter by min year |
year_to |
number |
- |
Filter by max year |
MCP Resources
The server exposes state via the MCP Resources protocol. Use resources/list to discover available resources and resources/read to retrieve them.
| URI |
Description |
search://recent |
Last 20 search queries with timestamps and result counts |
config://server |
Server configuration (version, start time, transport mode) |
stats://cache |
Cache statistics (hit rate, entry count, memory usage) |
stats://events |
Event store statistics (event count, storage size) |
Example (using MCP SDK):
const resources = await client.listResources();
const recentSearches = await client.readResource({ uri: "search://recent" });
MCP Prompts
Pre-built research workflow templates are available via the MCP Prompts protocol. Use prompts/list to discover prompts and prompts/get to retrieve a prompt with arguments.
Basic Research Prompts
| Prompt |
Arguments |
Description |
comprehensive-research |
topic, depth (quick/standard/deep) |
Multi-source research on a topic |
fact-check |
claim, sources (number) |
Verify a claim against multiple sources |
summarize-url |
url, format (brief/detailed/bullets) |
Summarize content from a single URL |
news-briefing |
topic, timeRange (day/week/month) |
Get current news summary on a topic |
Advanced Research Prompts
| Prompt |
Arguments |
Description |
patent-portfolio-analysis |
company, includeSubsidiaries |
Analyze a company's patent holdings |
competitive-analysis |
entities (comma-separated), aspects |
Compare companies/products |
literature-review |
topic, yearFrom, sources |
Academic literature synthesis |
technical-deep-dive |
technology, focusArea |
In-depth technical investigation |
Focus areas for technical-deep-dive: architecture, implementation, comparison, best-practices, troubleshooting
Example (using MCP SDK):
const prompts = await client.listPrompts();
const research = await client.getPrompt({
name: "comprehensive-research",
arguments: { topic: "quantum computing", depth: "standard" }
});
const patents = await client.getPrompt({
name: "patent-portfolio-analysis",
arguments: { company: "Kaltura", includeSubsidiaries: true }
});
const comparison = await client.getPrompt({
name: "competitive-analysis",
arguments: { entities: "React, Vue, Angular", aspects: "performance, learning curve, ecosystem" }
});
🔧 Technical Details
System Architecture
graph TD
A[MCP Client] -->|local process| B[STDIO Transport]
A -->|network| C[HTTP+SSE Transport]
C --> L[OAuth 2.1 + Rate Limiter]
L --> D
C -.->|session replay| K[Event Store]
B --> D[McpServer<br>MCP SDK routing + dispatch]
D --> F[google_search]
D --> G[scrape_page]
D --> I[search_and_scrape]
D --> IMG[google_image_search]
D --> NEWS[google_news_search]
I -.->|delegates| F
I -.->|delegates| G
I --> Q[Quality Scoring]
G --> N[SSRF Validator]
N --> S1[CheerioCrawler<br>static HTML]
S1 -.->|insufficient content| S2[Playwright<br>JS rendering]
G --> YT[YouTube Transcript<br>Extractor]
F & G & IMG & NEWS --> J[Persistent Cache<br>memory + disk]
D -.-> R[MCP Resources]
D -.-> P[MCP Prompts]
style J fill:#f9f,stroke:#333,stroke-width:2px
style K fill:#ccf,stroke:#333,stroke-width:2px
style L fill:#f99,stroke:#333,stroke-width:2px
style N fill:#ff9,stroke:#333,stroke-width:2px
style Q fill:#9f9,stroke:#333,stroke-width:2px
For a detailed explanation, see the Architecture Guide.
Usage
Choosing a Transport
|
STDIO |
HTTP+SSE |
| Best for |
Local MCP clients (Claude Code, Cline, Roo Code) |
Web apps, multi-client setups, remote access |
| Auth |
None needed (process-level isolation) |
OAuth 2.1 Bearer tokens required |
| Setup |
Zero config — just provide API keys |
Requires OAuth provider (Auth0, Okta, etc.) |
| Scaling |
One server per client process |
Single server, many concurrent clients |
Recommendation: Use STDIO for local AI assistant integrations. Use HTTP+SSE only when you need a shared service or web application integration.
Management API
Administrative and monitoring endpoints (HTTP transport only):
| Method |
Endpoint |
Description |
Auth |
GET |
/health |
Server health check (status, version, uptime) |
Public |
GET |
/version |
Server version and runtime info |
Public |
GET |
/mcp/cache-stats |
Cache performance statistics |
mcp:admin:cache:read |
GET |
/mcp/event-store-stats |
Event store usage statistics |
mcp:admin:event-store:read |
POST |
/mcp/cache-invalidate |
Clear specific cache entries |
mcp:admin:cache:invalidate |
POST |
/mcp/cache-persist |
Force cache save to disk |
mcp:admin:cache:persist |
GET |
/mcp/oauth-config |
Current OAuth configuration |
mcp:admin:config:read |
GET |
/mcp/oauth-scopes |
OAuth scopes documentation |
Public |
GET |
/mcp/oauth-token-info |
Token details |
Authenticated |
Security
OAuth 2.1 Authorization
All HTTP endpoints under /mcp/ (except public documentation) are protected by OAuth 2.1:
- Token Validation: JWTs are validated against your authorization server's JWKS endpoint (
${OAUTH_ISSUER_URL}/.well-known/jwks.json).
- Scope Enforcement: Each tool and admin action requires a specific OAuth scope.
Configure OAUTH_ISSUER_URL and OAUTH_AUDIENCE in .env. See .env.example for details.
STDIO users: OAuth is not used for STDIO transport. You can skip all OAuth configuration.
Available Scopes
Tool Execution:
mcp:tool:google_search:execute
mcp:tool:google_image_search:execute
mcp:tool:google_news_search:execute
mcp:tool:scrape_page:execute
mcp:tool:search_and_scrape:execute
Administration:
mcp:admin:cache:read
mcp:admin:cache:invalidate
mcp:admin:cache:persist
mcp:admin:event-store:read
mcp:admin:config:read
📄 License
This project is licensed under the MIT License. See the LICENSE file for details.