🚀 Search Stack
A web search and scraping middleware layer dedicated to AI agents.
Search Stack provides a unified web access API for AI agents such as OpenClaw, Claude Code, and Dify. It offers features like multi-engine search with automatic fallback, anti-scraping rendering using headless Chrome, cookie injection for login states, and precise extraction of page content. Once deployed, all agents can share this service.
🚀 Quick Start
Prerequisites
- Docker + Docker Compose
- (Optional) Tavily API Key - Free for 1000 requests per month
- (Optional) Serper API Key - Free for 2500 requests
You can still use the service without Tavily/Serper API keys. It will automatically fallback to SearXNG, which is completely free.
Step 1: Clone the Project
git clone https://github.com/pinkpills/search-stack.git
cd search-stack
Step 2: Configure Environment Variables
cp .env.example .env
Edit the .env file:
TAVILY_API_KEY=your_tavily_key
SERPER_API_KEY=your_serper_key
SEARXNG_SECRET=
PROXY_API_KEY=
BROWSERLESS_TOKEN=
REDIS_PASSWORD=
Generate random secrets with one command:
python3 -c "
import secrets
for name in ['SEARXNG_SECRET', 'PROXY_API_KEY', 'BROWSERLESS_TOKEN', 'REDIS_PASSWORD']:
print(f'{name}={secrets.token_hex(16)}')
" >> .env
Step 3: Configure SearXNG
Must do! Otherwise, the JSON API of SearXNG will return a 403 error.
cp searxng/settings.yml.example searxng/settings.yml
Edit the searxng/settings.yml file to ensure it contains:
search:
formats:
- html
- json
If you have already started SearXNG before (it will automatically generate settings.yml), you need to manually add the formats configuration and restart the container.
Step 4: Start the Service
docker compose -f search-stack.yml up -d
Wait for all containers to be healthy (about 30 seconds):
docker compose -f search-stack.yml ps
The deployment is complete when all containers show healthy.
Step 5: Verification
curl -s -H "X-API-Key: YOUR_PROXY_API_KEY" http://127.0.0.1:17080/health | python3 -m json.tool
curl -s -X POST http://127.0.0.1:17080/search \
-H "X-API-Key: YOUR_PROXY_API_KEY" \
-H "Content-Type: application/json" \
-d '{"query": "hello world", "count": 3}' | python3 -m json.tool
curl -s -X POST http://127.0.0.1:17080/search \
-H "X-API-Key: YOUR_PROXY_API_KEY" \
-H "Content-Type: application/json" \
-d '{"query": "hello world", "count": 3, "provider": "searxng"}' | python3 -m json.tool
curl -s -X POST http://127.0.0.1:17080/fetch \
-H "X-API-Key: YOUR_PROXY_API_KEY" \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com", "render": false}' | python3 -m json.tool
Tip: If SearXNG returns a 502 error or empty results, it is likely that the formats: [html, json] configuration is missing. Refer to Step 3.
✨ Features
- Multi-engine Fallback — Automatically switch between Tavily, Serper, and SearXNG in priority order. The service remains available even if one engine fails.
- Search + Scraping Integration — Use
/search for searching and /fetch for scraping page content. Set enrich=true to automatically scrape the full text after searching.
- Anti-scraping Resistance — Utilize the Browserless Stealth mode to bypass Cloudflare and JS challenges.
- Content Extraction — Employ three engines (trafilatura, BeautifulSoup, and readability) to accurately extract page content.
- Cookie Management — Dynamically add, delete, and update cookies via the API. Automatically inject cookies into Chrome for rendering. Support direct pasting of browser cookies.
- Login/Anti-scraping Detection — Use multi-dimensional heuristic detection, including HTTP status codes (401/403), text keywords (Chinese, English, Japanese), page titles, HTML structures (password boxes, CAPTCHA embeds, meta refresh redirects), and SPA login walls. Return a
needs_login flag to guide cookie updates.
- Cookie Catcher — Enable remote login in the browser. Use WebSocket + CDP Screencast to control a remote Chrome instance in the Web UI to complete the login process and save cookies with one click.
- SSRF Protection — Block access to private IPs (127/10/172.16/192.168/169.254).
- URL Deduplication — Automatically remove tracking parameters (utm_*, fbclid, etc.) and limit the number of results from the same domain.
- Redis Caching — Cache results with a 15-minute TTL. Return cached results for repeated queries immediately.
- API Key Authentication + Rate Limiting — Implement sliding window rate limiting.
- MCP Server — Run an MCP Server in stdio mode (
mcp-server.ts). Register it via mcporter for use by agents like OpenClaw.
- TikHub Social Media API — Optionally integrate the TikHub API, which provides access to 803 social media platforms (e.g., Douyin, TikTok, Weibo). Built-in automatic fallback.
- HTTP Proxy — Support HTTP/SOCKS5 proxies for anti-scraping with a fixed IP or accessing blocked websites (e.g., YouTube).
- Fully Asynchronous — Use async Redis and a shared httpx connection pool for high concurrency and low latency.
📦 Installation
Comparison with Brave Search
The following data is based on actual tests on February 7, 2026, with the search keyword "claude opus 4.6 review".
| Dimension |
Search Stack |
Brave Search (Built-in in OpenClaw) |
| Search Speed |
0.8 - 1.5s (Tavily/Serper) |
~1 - 2s |
| Cache Hit |
13ms (Redis cache for 15 minutes) |
No cache, re-request every time |
| Chinese Search |
Rich results (from Juejin, Zhihu, Smzdm, etc.) |
Fewer Chinese results, more English sources |
| English Search |
Excellent |
Excellent |
| High Availability |
Three engines with automatic fallback |
Single point of failure |
| Full Text Scraping |
Search + content extraction in one step with enrich=true |
Only return summaries, require additional scraping |
| Anti-scraping Sites |
Rendered by Browserless Chrome |
Unable to scrape |
| Sites Requiring Login |
Cookie injection + automatic detection and guidance |
Not supported |
| Free Quota |
Unlimited fallback with SearXNG |
Strict limits on free keys |
Conclusion: Similar search speed, better Chinese search quality, and far more features than Brave Search.
Architecture
+-----------+
AI Agent ──────────>| search- |──> Tavily API
(OpenClaw / Claude) | proxy |──> Serper API (Google)
POST /search | (FastAPI) |──> SearXNG (self-hosted)
POST /fetch +-----+-----+
|
+-------------+-------------+
| |
+-----+-----+ +--------+--------+
| Redis | | Browserless |
| (cache + | | (headless Chrome |
| rate-limit)| | anti-bot render)|
+------------+ +-----------------+
Four containers, one-click start:
| Service |
Function |
| search-proxy |
Core FastAPI proxy, providing unified search/scraping interfaces |
| Redis |
Result caching (15-minute TTL) + API rate limiting |
| SearXNG |
Self-hosted meta-search engine (aggregating Google, DuckDuckGo, Brave, etc., free and unlimited) |
| Browserless |
Headless Chrome for rendering JS pages and bypassing anti-scraping mechanisms in Stealth mode |
💻 Usage Examples
Basic Usage
curl -s -H "X-API-Key: YOUR_PROXY_API_KEY" http://127.0.0.1:17080/health | python3 -m json.tool
curl -s -X POST http://127.0.0.1:17080/search \
-H "X-API-Key: YOUR_PROXY_API_KEY" \
-H "Content-Type: application/json" \
-d '{"query": "hello world", "count": 3}' | python3 -m json.tool
curl -s -X POST http://127.0.0.1:17080/search \
-H "X-API-Key: YOUR_PROXY_API_KEY" \
-H "Content-Type: application/json" \
-d '{"query": "hello world", "count": 3, "provider": "searxng"}' | python3 -m json.tool
curl -s -X POST http://127.0.0.1:17080/fetch \
-H "X-API-Key: YOUR_PROXY_API_KEY" \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com", "render": false}' | python3 -m json.tool
Advanced Usage
curl -s -X POST http://127.0.0.1:17080/search \
-H "X-API-Key: KEY" -H "Content-Type: application/json" \
-d '{"query": "Python asyncio", "count": 3, "enrich": true}'
📚 Documentation
Why Do We Need Search Stack?
| Pain Points |
Search Stack's Solutions |
| Brave/Google searches have free quota limits |
Multi-engine fallback (Tavily → Serper → SearXNG), SearXNG is completely free and unlimited |
| AI scraping of web pages is blocked by Cloudflare/anti-scraping mechanisms |
Built-in Browserless headless Chrome with Stealth mode to bypass detection |
| Unable to scrape content from websites that require login (e.g., Zhihu, Xiaohongshu) |
Cookie management API + automatic injection and rendering + login detection and guidance |
| Agents are induced to access internal networks (SSRF) |
Built-in private IP blacklist + DNS verification |
| Search results only contain summaries, not full texts |
Automatically scrape the full text of each result with enrich=true |
Integrating with OpenClaw
Search Stack can serve as the default search/scraping tool for OpenClaw, replacing the built-in Brave search. There are two integration methods:
Method 1: Native Plugin (Recommended)
The native plugin directly registers the tool in the AI's tool list and runs within the OpenClaw process. Compared to the MCP + mcporter method:
- Controllable Timeout — Use AbortSignal to return an exception on timeout, ensuring the AI always sees error messages (no SIGKILL and zero output).
- Lower Latency — No need to start a subprocess.
- More Reliable — Does not depend on the mcporter daemon.
Step 1: Install the Plugin
openclaw plugins install --link /opt/search-stack/plugin/
Step 2: Configuration
Edit ~/.openclaw/openclaw.json and add the following to plugins.entries:
{
"plugins": {
"entries": {
"search-stack": {
"enabled": true,
"config": {
"apiUrl": "http://127.0.0.1:17080",
"apiKey": "your_proxy_api_key",
"tikhubApiKey": "your_tikhub_key"
}
}
}
}
}
The value of apiKey is the PROXY_API_KEY in .env.
tikhubApiKey is optional. Fill in the TikHub API Key to enable the social media API.
Note: The configuration must be placed inside the config nested object, not directly under search-stack.
Step 3: Disable the Built-in Brave Search
Edit ~/.openclaw/openclaw.json:
{
"tools": {
"web": {
"search": {
"enabled": false
}
}
}
}
Step 4: Create a Skill File
mkdir -p ~/.openclaw/workspace/skills/web-search/
cp /opt/search-stack/skill-template/SKILL.md ~/.openclaw/workspace/skills/web-search/SKILL.md
The Skill file guides the AI on when and how to use the search tool (two-step principle, cookie workflow, TikHub priority, etc.). The template file is located at skill-template/SKILL.md and can be modified as needed.
Step 5: Restart and Verify
sudo systemctl restart openclaw
openclaw plugins list
Important: If the AI is still using the old method, you need to archive the old session. OpenClaw's session context caches the previous tool usage pattern. Even if the configuration is updated, the old session will still use the old behavior. See "Common Issues → AI does not use search-stack" below for details.
Method 2: MCP + mcporter (Alternative)
This method is suitable for scenarios where you do not want to install the native plugin or need to use Search Stack in an environment other than OpenClaw.
Step 1: Install MCP Server Dependencies
The MCP Server runs using Bun + @modelcontextprotocol/sdk:
curl -fsSL https://bun.sh/install | bash
bun add -g @modelcontextprotocol/sdk zod
Step 2: Register with mcporter
Edit ~/.mcporter/mcporter.json and add Search Stack:
{
"mcpServers": {
"search-stack": {
"command": "/home/your_user/.bun/bin/bun",
"args": ["run", "/opt/search-stack/proxy/mcp-server.ts"],
"keepAlive": true,
"env": {
"SEARCH_STACK_URL": "http://127.0.0.1:17080",
"SEARCH_STACK_API_KEY": "your_proxy_api_key",
"TIKHUB_API_KEY": "your_tikhub_key"
}
}
}
}
Verify the registration:
mcporter daemon restart
mcporter list
Step 3: Create a Skill and Restart
Create ~/.openclaw/workspace/skills/web-search/SKILL.md (using the mcporter exec call format), disable Brave search, and restart OpenClaw.
Note: In the MCP + mcporter method, the AI uses the exec tool to execute the mcporter call search-stack.* command. A SIGKILL on timeout will result in zero output, and the AI may think "the search engine is down". It is recommended to use the native plugin method to avoid this issue.
Remote Deployment (OpenClaw and Search Stack on Different Machines)
This scenario is applicable when Search Stack runs on Server A (with a public domain name) and OpenClaw runs on Server B, and you need to remotely call the Search Stack API.
Server A (search-stack) Server B (openclaw)
┌──────────────────────┐ ┌──────────────────────┐
│ Docker four-piece │ HTTPS │ OpenClaw │
│ search-proxy :17080 │◄────────────│ search-stack plugin │
│ Redis / SearXNG │ │ │
│ Browserless │ │ Only need the plugin/ directory │
└──────────────────────┘ └──────────────────────┘
Step 1: Server A — Configure Reverse Proxy (HTTPS)
Search Stack only listens on 127.0.0.1:17080 by default. For remote access, you need to expose the HTTPS port through an Nginx reverse proxy.
Example Nginx configuration (assuming the domain name is search.example.com):
location /search-stack/ {
proxy_pass http://127.0.0.1:17080/;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_read_timeout 60s;
proxy_send_timeout 60s;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
Security Reminder: The API Key is transmitted in plain text in the request header. You must use HTTPS. It is recommended to use Certbot to automatically apply for a Let's Encrypt certificate.
Step 2: Server B — Obtain Plugin Code
You only need the plugin/ and skill-template/ directories. There is no need to install Docker or the entire project:
git clone https://github.com/pinkpills/search-stack.git /opt/search-stack
cd /opt/search-stack/plugin && npm install
mkdir -p /opt/search-stack && cd /opt/search-stack
git clone --depth 1 --filter=blob:none --sparse \
https://github.com/pinkpills/search-stack.git .
git sparse-checkout set plugin skill-template
cd plugin && npm install
Step 3: Server B — Install the Plugin into OpenClaw
openclaw plugins install --link /opt/search-stack/plugin/
Step 4: Server B — Configure the Remote API Address
Edit ~/.openclaw/openclaw.json:
{
"plugins": {
"entries": {
"search-stack": {
"enabled": true,
"config": {
"apiUrl": "https://search.example.com/search-stack",
"apiKey": "your_proxy_api_key",
"publicUrl": "https://search.example.com/search-stack",
"tikhubApiKey": "your_tikhub_key"
}
}
}
},
"tools": {
"web": {
"search": {
"enabled": false
}
}
}
}
Configuration Explanation:
| Field |
Description |
apiUrl |
The Search Stack API address on Server A (HTTPS URL after Nginx proxy) |
apiKey |
The PROXY_API_KEY in .env on Server A |
publicUrl |
The public URL used in the Cookie Catcher link (must be accessible from the user's browser), usually the same as apiUrl |
tikhubApiKey |
(Optional) TikHub API Key |
Step 5: Server B — Create a Skill and Restart
mkdir -p ~/.openclaw/workspace/skills/web-search/
cp /opt/search-stack/skill-template/SKILL.md ~/.openclaw/workspace/skills/web-search/SKILL.md
sudo systemctl restart openclaw
Verification
Test in an OpenClaw conversation:
User: "Search for a review of Claude Opus 4.6"
AI: Call web_search → Return results (from the remote Search Stack)
User: "Open the first link to see the full text"
AI: Call page_fetch → Return the full text (rendered by the remote Browserless)
If the tool call fails, check:
- Whether Server B can access
apiUrl: curl -H "X-API-Key: KEY" https://search.example.com/search-stack/health
- Whether the plugin is loaded:
openclaw plugins list
- Old session cache: Archive the old session and restart (see "Common Issues → AI does not use search-stack" for details)
Multi-machine Concurrency and Resource Control
When multiple machines share a Search Stack, Browserless Chrome is the main bottleneck — each concurrent rendering session consumes approximately 400 - 500MB of memory. The default configuration is optimized for 3 clients:
MAX_CONCURRENT_SESSIONS=10
MAX_QUEUE_LENGTH=30
CONNECTION_TIMEOUT=120000
deploy:
resources:
limits:
memory: 4g
Suggested adjustments based on the number of clients:
| Number of Clients |
MAX_CONCURRENT_SESSIONS |
memory |
Recommended Server Memory |
| 1 - 2 |
5 |
2g |
4GB+ |
| 3 - 5 |
10 |
4g |
8GB+ |
| 5 - 10 |
20 |
8g |
16GB+ |
When the concurrency limit is exceeded, Browserless will queue the requests. If the queue is also full, it will return a 429 error. page_fetch will report an error, but it will not affect the search function (web_search does not depend on Chrome). The Cookie Catcher has a hard limit of 2 sessions, and multiple users need to queue for login.
Tool List
Regardless of whether you use the native plugin or the MCP method, the provided tools are the same:
| Tool |
Description |
web_search |
Multi-engine search, supporting enrich for full text scraping |
page_fetch |
Scrape web page content, supporting cookie injection, Chrome rendering, and login detection |
cookies_list |
List the domains with configured cookies |
cookies_update |
Add/update domain cookies (supports pasting raw strings) |
cookies_delete |
Delete domain cookies |
cookie_catcher_link |
Generate a remote browser login link (Cookie Catcher) |
tikhub_call |
Call the TikHub social media API (requires configuration of the Key, used as needed) |
Note: The scraping tool is named page_fetch instead of web_fetch to avoid conflicts with the built-in web_fetch tool in OpenClaw. The built-in web_fetch does not support cookie injection and Chrome rendering. Using the same name will cause the AI to call the wrong tool.
Cookie Workflow in Practice
There are two ways to obtain cookies:
Method 1: Manual Copy and Paste (Suitable for Desktop)
User: "Help me check this web page https://zhuanlan.zhihu.com/p/xxxx"
AI: Call page_fetch → Incomplete content (only title/summary)
AI: "This website has strict anti-scraping measures, and the full content was not retrieved.
If you need the full content, you can provide the cookie for this website:
1. Open the website in your browser and log in
2. Press F12 → Network tab → Refresh the page
3. Find the Cookie: line in the request header
4. Copy the entire value and send it to me"
User: "z_c0=xxx; _xsrf=yyy; d_c0=zzz ..."
AI: Automatically extract the domain zhihu.com → cookies_update → Save successfully
AI: Re-scrape with bypass_cache:true → Obtain the full article content
Method 2: Cookie Catcher Remote Login (Suitable for Complex Login Processes)
1. Open in the browser: http://YOUR_HOST:17080/cookie-catcher?key=API_KEY&url=https://threads.net
2. Complete the login in the remote Chrome screen (supports mouse/keyboard/touch screen operations)
3. Click "Save Cookies" → Automatically save to cookies.json
4. Subsequent /fetch requests will automatically inject the cookie
This method is suitable for complex login scenarios that require OAuth redirection, QR code scanning, mobile verification codes, etc.
🔧 Technical Details
Common Issues and Troubleshooting
Deployment-related
Q: SearXNG search returns 403 / 502 / empty results
This is the most common issue. SearXNG disables the JSON search API by default. When search-stack calls it with ?format=json, it will return a 403 Forbidden error.
Solution: Ensure that searxng/settings.yml contains:
search:
formats:
- html
- json
Restart the container after modification:
docker compose -f search-stack.yml restart searxng
Why is it not easy to detect? The search-stack proxy treats SearXNG's 403 error as "unavailable" and silently skips it, falling back to Tavily/Serper. You may think SearXNG is working fine, but it has never been used. You can use provider: "searxng" to force a test and discover the issue.
Q: Permission issues when starting SearXNG for the first time
The SearXNG container runs with UID 977. If the mount directory permissions are incorrect, it will fail to start:
sudo chown -R 977:977 /opt/search-stack/searxng/
docker compose -f search-stack.yml restart searxng
Q: Browserless times out or crashes
Browserless allows a maximum of 5 concurrent Chrome sessions by default. If timeouts occur frequently, check the memory (at least 2GB):
docker stats browserless
You can adjust MAX_CONCURRENT_SESSIONS in search-stack.yml.
Q: Redis connection fails
Confirm that REDIS_PASSWORD is set and not empty in .env:
docker exec search-redis redis-cli -a YOUR_REDIS_PASSWORD ping
Q: search-proxy reports a redis.exceptions.ConnectionError when starting
search-proxy depends on Redis and SearXNG to start first. Although docker compose's depends_on + healthcheck usually handle this, if Redis starts slowly:
docker compose -f search-stack.yml restart search-proxy
MCP Server-related
Q: mcporter list shows that search-stack is unhealthy
Check step by step:
- Confirm that the Docker containers are running:
docker compose -f search-stack.yml ps
- Confirm that the API is reachable:
curl -H "X-API-Key: KEY" http://127.0.0.1:17080/health
- Confirm that the Bun path is correct:
which bun
- Run directly to check for errors:
SEARCH_STACK_URL=http://127.0.0.1:17080 SEARCH_STACK_API_KEY=your_key bun run /opt/search-stack/proxy/mcp-server.ts
Q: z.record() / schema._zod error
This is a known issue with MCP SDK v1.26.0 + Zod v4. z.record() will report Cannot read properties of undefined (reading '_zod') when serializing tools/list.
Solution (handled in this project):
- Use
z.string() instead of z.record() for parameters, and JSON.parse() in the handler.
- Use
z.any() for parameters that need to accept both objects and strings.
Q: mcporter reports "expected string, received object" when passing JSON parameters
mcporter automatically parses JSON strings into objects and passes them to MCP tools. If the schema is defined as z.string(), the validation will fail.
Solution (handled in this project): Use z.any() and handle both types in the handler:
const rawArgs = params.arguments as unknown;
if (typeof rawArgs === "object" && rawArgs !== null) {
args = rawArgs as Record<string, unknown>;
} else {
args = JSON.parse((rawArgs as string) || "{}");
}
OpenClaw integration-related
Q: The AI does not use search-stack and still uses the built-in Brave search
Check three points:
- Confirm that Brave search is disabled:
"search": { "enabled": false } in ~/.openclaw/openclaw.json
- Confirm that SKILL.md exists:
ls ~/.openclaw/workspace/skills/web-search/SKILL.md
- (Most critical) Archive the old session: OpenClaw's session context (possibly hundreds of thousands of tokens) caches the previous tool usage pattern. Even if SKILL.md is updated, the old session will still use the old behavior. You must archive it:
ls -lt ~/.openclaw/agents/main/sessions/*.jsonl | head -3
mv ~/.openclaw/agents/main/sessions/SESSION_ID.jsonl \
~/.openclaw/agents/main/sessions/SESSION_ID.jsonl.archived
sudo systemctl restart openclaw
After starting a new session, the AI will read SKILL.md again and use the mcporter call command.
Q: The AI retrieves partial content but does not prompt the user to provide a cookie
SKILL.md must clearly list all conditions for triggering the cookie prompt:
- Return
** LOGIN REQUIRED **
- Incomplete content (only title/summary, content truncated or empty)
- Anti-scraping prompts ("Please log in", "Verification required", etc.)
- Content that does not match expectations (only the sidebar is retrieved on an article page)
At the same time, clearly tell the AI "what not to do" — do not use content interpretation to replace failed scraping, and do not skip the prompt. If only the LOGIN REQUIRED condition is listed, the AI will not trigger the prompt when retrieving partial content.
Q: Scraping fails for SPA websites like Threads/Instagram, with the prompt "JS SPA requiring Chrome"
This is usually not a problem of lacking Chrome (Browserless is running by default). The real reason is that the cookie has expired:
- Browserless uses Chrome to render and injects the cookie, but the session has expired.
- The React SPA does not render the actual content and returns the login page.
- The login page text is short, and the AI mistakenly thinks that the JS rendering has failed.
Solution: Re-export the cookie from the browser (ensure that you are logged in), update it via cookies_update, and retry with bypass_cache: true.
detect_needs_login supports multi-dimensional detection: HTTP status codes (401/403), text keywords (Chinese, English, Japanese), page titles, HTML password boxes/CAPTCHA/meta refresh, SPA login walls (Threads/Instagram/Facebook), and will return a clear needs_login: true prompt.
You can also use the Cookie Catcher for remote login: Open /cookie-catcher?key=API_KEY&url=TARGET_URL, complete the login in the remote Chrome, and save the cookie with one click.
Q: The AI uses exec + curl to call Brave instead of mcporter call
OpenClaw's AI uses the exec tool to execute shell commands to call MCP. SKILL.md must use the specific command format:
mcporter call search-stack.web_search query="keyword" --output json
Do not write it as the abstract search-stack.web_search(query="keyword"), as the AI will not translate it into a shell command on its own.
Q: The AI's behavior does not change after updating SKILL.md
This is one of the most common pitfalls. Reasons and solutions:
Background mechanism: OpenClaw has a Skills Watcher (enabled by default, skills.load.watch: true), which monitors changes to the SKILL.md file and bumps the version number. However, this only refreshes the skill list (which skills are available) and does not force the AI to re-read the content of SKILL.md.
Why does the change to SKILL.md not take effect?
- The AI does not read SKILL.md every round — it only reads it in the first round of the session or when it deems necessary.
- The read content is cached in the session context (possibly hundreds of thousands of tokens).
- More importantly, the AI mostly only looks at the description field of the tool (a short text written at registration) and does not look at SKILL.md.
Correct solutions (in order of recommendation):
- Modify the tool description (most effective) — The tool description is always visible to the AI. Modify the description field in
plugin/index.ts → Restart OpenClaw → The new session takes effect immediately. Key behavior constraints should be written in the description, not just in SKILL.md.
- Archive the old session — Force the new session to reload all context:
for f in ~/.openclaw/agents/main/sessions/*.jsonl; do
mv "$f" "$f.archived"
done
echo '{"sessions":[]}' > ~/.openclaw/agents/main/sessions/sessions.json
sudo systemctl restart openclaw
- Wait for the Skills Watcher to take effect — If only the supplementary notes in SKILL.md are modified (not involving core behavior), you can wait for the AI to be triggered to re-read SKILL.md in the next round of conversation.
Best practice: Write core behavior constraints (e.g., "Use tool A first, then tool B") in the tool description, and write detailed processes and examples in SKILL.md. In this way, even if the AI does not read SKILL.md, the tool description can still provide a fallback.
Q: What to do after updating the plugin code (index.ts)
The plugin runs within the OpenClaw process. After changing the code, you need to:
sudo systemctl restart openclaw
If you also modify SKILL.md, it is recommended to archive the old session (see the previous item).
📄 API Documentation
All requests must include the X-API-Key header.
GET /health
Health check.
{
"ok": true,
"redis": true,
"order": ["tavily", "serper", "searxng"],
"browserless_configured": true,
"dedupe": { "enabled": true, "max_per_host": 2 }
}
POST /search
Web search.
| Parameter |
Type |
Default Value |
Description |
query |
string |
Required |
Search keyword |
count |
int |
5 |
Number of results to return (1 - 10) |
provider |
string |
Automatic |
Force to specify: tavily / serper / searxng |
enrich |
bool |
false |
Scrape the full text of each result |
max_chars |
int |
8000 |
Maximum number of characters per page when enriching |
render |
bool |
true |
Render with a headless browser |
concurrency |
int |
3 |
Number of concurrent scrapes when enriching |
curl -s -X POST http://127.0.0.1:17080/search \
-H "X-API-Key: KEY" -H "Content-Type: application/json" \
-d '{"query": "Docker best practices", "count": 5}'
curl -s -X POST http://127.0.0.1:17080/search \
-H "X-API-Key: KEY" -H "Content-Type: application/json" \
-d '{"query": "Python asyncio", "count": 3, "enrich": true}'
curl -s -X POST http://127.0.0.1:17080/search \
-H "X-API-Key: KEY" -H "Content-Type: application/json" \
-d '{"query": "AI news", "count": 5, "provider": "searxng"}'
Example Response:
{
"query": "Docker best practices",
"count": 5,
"cached": false,
"provider": "tavily",
"results": [
{
"title": "Docker Best Practices",
"url": "https://example.com/docker",
"snippet": "Top 10 Docker best practices for production...",
"source": "tavily"
}
]
}
POST /fetch
Scrape web page content.
| Parameter |
Type |
Default Value |
Description |
url |
string |
Required |
Target URL |
render |
bool |
true |
Render with a headless browser |
max_chars |
int |
20000 |
Maximum number of characters to extract |
timeout |
float |
25 |
Timeout in seconds |
headers |
object |
null |
Custom request headers |
bypass_cache |
bool |
false |
Skip the cache (use after updating cookies) |
Example Response:
{
"cached": false,
"url": "https://example.com/",
"status_code": 200,
"render": false,
"title": "Example Domain",
"text": "This domain is for use in illustrative examples..."
}
When the page requires login or is blocked by anti-scraping mechanisms, it returns needs_login: true:
{
"needs_login": true,
"has_cookies": false
}
has_cookies: true indicates that there are existing cookies but they have expired and need to be re-exported.
The detection rules cover the following signals (in order of priority):
| Signal |
Condition |
Example |
| HTTP 401 |
Direct determination |
API endpoint not authenticated |
| HTTP 403 + short content |
text < 2000 characters |
Access denied page |
| Text login keywords |
1 hit + < 500 characters, or 2+ hits |
"Please log in", "sign in to continue", "verify you are human" |
| Page title contains login words |
+ text < 2000 characters |
<title>Sign In - Example</title> |
| HTML password input box |
+ text < 3000 characters |
<input type="password"> |
| Meta refresh → login URL |
Direct determination |
<meta http-equiv="refresh" content="0;url=/login"> |
| CAPTCHA embed |
+ text < 1000 characters |
reCAPTCHA, hCaptcha, Cloudflare Turnstile |
| Empty shell ICP filing page |
2+ hits + < 800 characters |
Pages containing only ICP filing numbers (e.g., Xiaohongshu) |
It supports Chinese, English, and Japanese login keywords, as well as OAuth prompts ("continue with Google"), paywalls ("subscribe to continue"), and Cloudflare verification ("checking your browser").
Cookie Management
Dynamically manage domain cookies without restarting. Cookies are automatically injected into Browserless rendering requests.
GET /cookies
PUT /cookies/zhihu.com
{"raw": "z_c0=xxx; _xsrf=yyy; d_c0=zzz"}
PUT /cookies/zhihu.com
{"cookies": [{"name":"z_c0","value":"xxx"}, {"name":"_xsrf","value":"yyy"}]}
DELETE /cookies/zhihu.com
POST /cookies/reload
Cookie Catcher (Remote Browser Login)
For scenarios where cookies cannot be directly copied (e.g., mobile devices, complex OAuth processes), the Cookie Catcher provides a Web UI to remotely control Chrome to complete the login:
Browser access: GET /cookie-catcher?key=YOUR_API_KEY[&url=https://target-site.com]
Workflow:
- Open
/cookie-catcher?key=API_KEY in the browser to establish a WebSocket connection.
- Enter the target website URL in the address bar and click Go.
- Real-time display of the remote Chrome screen via CDP Screencast (JPEG stream).
- The user operates on the screen (mouse click, keyboard input, scrolling) to complete the login.
- Click "Save Cookies" to extract and save all cookies for the current domain with one click.
- Cookies are automatically written to
cookies.json, and subsequent /fetch rendering requests will automatically inject the cookies.
Technical Details:
| Parameter |
Value |
| WebSocket Endpoint |
WS /cookie-catcher/ws?key=API_KEY |
| Maximum Concurrent Sessions |
2 |
| Session Timeout |
Automatically close after 10 minutes |
| Screen Resolution |
1280 x 800 |
| Screen Format |
JPEG, quality = 60 |
| Input Support |
Mouse (click/move/scroll), keyboard, touch screen |
WebSocket Message Protocol:
Client → Server:
{"type": "navigate", "url": "https://example.com"}
{"type": "mouse", "action": "mousePressed", "x": 100, "y": 200, "button": "left"}
{"type": "key", "action": "keyDown", "key": "a", "code": "KeyA", "text": "a"}
{"type": "scroll", "x": 640, "y": 400, "deltaX": 0, "deltaY": 100}
{"type": "save_cookies", "domain": "example.com"}
{"type": "close"}
Server → Client:
{"type": "frame", "data": "<base64 JPEG>"}
{"type": "url", "url": "https://example.com/dashboard"}
{"type": "title", "title": "Dashboard"}
{"type": "cookies_saved", "domain": "example.com", "count": 15, "names": ["session", "token", ...]}
{"type": "error", "message": "Too many active sessions (max 2)"}
{"type": "closed"}
Configuration Reference
Environment Variables
| Variable |
Default Value |
Description |
TAVILY_API_KEY |
- |
Tavily API Key |
SERPER_API_KEY |
- |
Serper (Google) API Key |
ORDER |
tavily,serper,searxng |
Search engine priority |
API_KEYS |
- |
Proxy authentication keys (comma-separated, supports multiple keys) |
RATE_LIMIT_PER_MIN |
60 |
Maximum number of requests per minute |
CACHE_TTL |
900 |
Cache expiration time in seconds |
FETCH_TIMEOUT |
25 |
Scraping timeout in seconds |
MAX_FETCH_BYTES |
2000000 |
Maximum number of bytes to scrape per page |
FETCH_DEFAULT_RENDER |
true |
Enable browser rendering by default |
ALLOW_DOMAINS |
- |
Domain whitelist (comma-separated) |
BLOCK_DOMAINS |
- |
Domain blacklist (comma-separated) |
DEDUPE |
true |
URL deduplication |
MAX_PER_HOST |
2 |
Maximum number of results to return from the same domain |
PROXY_URL |
- |
HTTP proxy, see HTTP Proxy (Optional) for details |
MCP Server Environment Variables
Passed through the env configuration of mcporter:
| Variable |
Description |
SEARCH_STACK_URL |
REST API address (default http://127.0.0.1:17080) |
SEARCH_STACK_API_KEY |
Same as PROXY_API_KEY |
TIKHUB_API_KEY |
(Optional) TikHub social media API Key |
SearXNG Configuration
The SearXNG configuration file is located at searxng/settings.yml. Key configurations:
use_default_settings: true
server:
secret_key: "your_secret"
limiter: false
search:
safe_search: 0
formats:
- html
- json
outgoing:
request_timeout: 10.0
max_request_timeout: 20.0
Refer to the SearXNG documentation for detailed configuration.
Project Structure
search-stack/
├── search-stack.yml # Docker Compose orchestration
├── .env # Environment variables (secrets, not included in Git)
├── .env.example # Environment variable template
├── plugin/
│ ├── openclaw.plugin.json # OpenClaw native plugin manifest
│ └── index.ts # Plugin entry point (recommended integration method)
├── skill-template/
│ └── SKILL.md # OpenClaw Skill template (copy to ~/.openclaw/workspace/skills/web-search/)
├── proxy/
│ ├── Dockerfile # Proxy service image
│ ├── app.py # FastAPI main program (REST API)
│ ├── cookie_catcher.py # Cookie Catcher (remote browser CDP session management)
│ ├── mcp-server.ts # MCP Server (stdio, alternative integration method)
│ ├── cookies.json # Cookie storage (automatically updated at runtime)
│ ├── cookies.json.example # Cookie format example
│ ├── requirements.txt # Python dependencies
│ └── static/
│ └── cookie-catcher.html # Cookie Catcher Web UI
└── searxng/
├── settings.yml # SearXNG configuration (automatically generated on first startup)
└── settings.yml.example # SearXNG configuration template
HTTP Proxy (Optional)
After configuring PROXY_URL, all outbound requests are sent through the proxy. This is suitable for:
- Anti-scraping with a fixed IP — The target website sees the proxy IP instead of the server's real IP.
- Accessing blocked websites — Servers in China can access blocked websites such as YouTube and Google.
Configuration
Set in .env:
PROXY_URL=http://host:port
PROXY_URL=http://user:pass@host:port
PROXY_URL=socks5://host:port
PROXY_URL=socks5://user:pass@host:port
If not configured or left blank, all requests will be sent directly, and the behavior will be the same as before.
Proxy Coverage
| Request Type |
Use Proxy? |
Description |
Direct httpx scraping (render=false) |
✅ |
http_client with the proxy parameter |
| Tavily / Serper API calls |
✅ |
Same as above |
| SearXNG → Google/DuckDuckGo, etc. |
✅ |
Passed to the SearXNG container via the HTTP_PROXY/HTTPS_PROXY environment variables |
Browserless Chrome rendering (render=true) |
⚠️ Conditional |
Through the Chrome --proxy-server startup parameter, does not support proxies with authentication (see below for details) |
| Internal container communication (Redis, SearXNG API, Browserless API) |
❌ |
Use an independent http_internal client, never use the proxy |
Proxy Limitations for Chrome Rendering
The Chrome --proxy-server startup parameter only accepts the scheme://host:port format and has no mechanism for passing usernames and passwords. Therefore:
| Proxy Type |
Direct httpx |
Chrome Rendering |
No authentication (IP whitelist) http://host:port |
✅ |
✅ |
With authentication http://user:pass@host:port |
✅ |
❌ Automatically skipped, Chrome uses direct connection |
SOCKS5 without authentication socks5://host:port |
✅ |
✅ |
SOCKS5 with authentication socks5://user:pass@host:port |
✅ |
❌ Automatically skipped, Chrome uses direct connection |
Recommendation: If you need Chrome rendering to also use the proxy, use a proxy with IP whitelist authentication (add the server IP to the whitelist in the proxy service provider's background and remove the username and password). Most fixed IP proxy service providers support this method.
The code will automatically detect whether PROXY_URL contains authentication information (@). If there is authentication, it will skip Chrome proxy injection to ensure that Browserless rendering does not report errors due to authentication failure.
Security Notes
- All internal services (Redis, SearXNG, Browserless) do not expose host ports and only communicate through the Docker internal network.
- Redis enables password authentication.
- Built-in SSRF protection: Block access to private IPs (127.0.0.0/8, 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 169.254.0.0/16).
- Automatically normalize URLs and remove tracking parameters.
- API Key authentication + sliding window rate limiting per minute.
- The
.env file contains sensitive secrets. Make sure to add it to .gitignore.
Reverse Proxy (Production Deployment)
The service only listens on 127.0.0.1:17080 by default. In a production environment, if you need external access, use an Nginx reverse proxy + HTTPS:
location /search-stack/ {
proxy_pass http://127.0.0.1:17080/;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_read_timeout 60s;
proxy_send_timeout 60s;
}
📄 License
MIT