Browsercontrol
BrowserControl is an MCP server that provides real browser automation capabilities for AI agents. It uses a vision - first approach, enabling interactions such as clicking and inputting through numbered elements, without the need for CSS selectors or XPath.
rating : 2.5 points
downloads : 3.9K
What is BrowserControl?
BrowserControl is an MCP (Model Context Protocol) server that endows AI agents with complete browser control capabilities. Different from traditional text-based browser automation, BrowserControl adopts a vision-first approach: AI sees a web page screenshot marked with numbers. Just tell it which number to click, and the corresponding operation can be completed. This method is closer to the way humans browse web pages, greatly simplifying the complexity of AI interacting with web pages.How to use BrowserControl?
BrowserControl runs as an MCP server and can be integrated with any AI agent (such as Claude, Gemini, etc.) or IDE that supports the MCP protocol. After installation, the AI agent can obtain a series of browser control tools, including navigation, clicking, input, form filling, tab management, etc. AI identifies interactive elements by viewing the numbered web page screenshot and then calls the corresponding tools to complete the operations.Applicable scenarios
BrowserControl is suitable for various scenarios that require AI to interact with web pages: 1. Web automation testing: Let AI automatically test website functions and processes. 2. Data collection and monitoring: Regularly visit websites to obtain the latest information. 3. Automated workflows: Automatically fill out forms, submit applications and other repetitive tasks. 4. Web content analysis: Let AI browse and analyze web content. 5. User behavior simulation: Simulate the interaction process between real users and websites.Main features
Vision-first approach (Set of Marks)
Each web page screenshot will automatically mark the numbers of interactive elements. AI only needs to identify the numbers and call the corresponding operations, without the need to understand the complex HTML structure or CSS selectors.
Multi-tab management
Supports creating, switching, closing, and listing all open browser tabs. AI can freely switch and collaborate between multiple web pages.
Session and Cookie management
Provides a complete set of Cookie operation tools, supporting setting, getting, deleting, and clearing Cookies to achieve persistent login state maintenance.
File upload support
Provides native file upload tools. AI can easily handle file upload forms on web pages without complex simulation operations.
Developer tool suite
Includes professional debugging tools such as console log viewing, network request monitoring, page error detection, and element inspection to help AI diagnose web page problems.
Session recording function
Supports recording the complete browser session and generating a replayable recording file for easy debugging and reviewing of AI's operation process.
Dynamic viewport control
The browser window size can be adjusted at any time to simulate the display effects of different devices (such as mobile phones, tablets, desktops).
Persistent sessions
Automatically save the browser state (Cookies, localStorage, etc.). AI will still maintain the previous login state and browsing history after restarting.
Advantages
Intuitive and easy to use: The vision-first approach makes it simpler and more intuitive for AI to operate web pages.
Comprehensive functions: Provides a complete set of tools from basic navigation to advanced debugging.
Persistent and stable: Automatically saves session states to avoid repeated logins.
Fully localized: All operations are completed locally without the need for cloud services.
Zero cost: Open source and free, no usage fees.
Good compatibility: Supports all MCP-compatible AI agents and IDEs.
Limitations
Requires installation: Needs a Python environment and the Chromium browser.
Resource consumption: Running browser instances requires a certain amount of memory and CPU resources.
Vision dependence: Depends on AI's visual recognition ability and may not accurately recognize complex layouts.
Learning curve: AI needs to learn how to effectively use the numbered marking system.
How to use
Install BrowserControl
Install the BrowserControl package using pip or uv.
Run the MCP server
Start BrowserControl as an MCP server.
Configure the AI agent
Add the BrowserControl server to the configuration file of the AI agent (such as Claude Desktop).
Start using
Restart the AI agent. Now the AI can use the browser control function.
Usage examples
Web automation testing
Let AI automatically test the login function of a website and verify whether the login process is normal.
Data collection task
Let AI regularly visit a news website and collect the latest news titles and links.
Multi-step form filling
Let AI complete the filling and submission of a complex multi-page form.
Web debugging and diagnosis
Let AI diagnose web page loading problems and report errors.
Frequently Asked Questions
Does BrowserControl require an internet connection?
Which browsers are supported?
How to solve the "Missing X server" error?
Is BrowserControl secure?
How to view the recorded session?
Does it support mobile device simulation?
Related resources
GitHub repository
The source code and latest version of BrowserControl
MCP protocol documentation
The official documentation of the Model Context Protocol
Playwright documentation
The browser automation framework used by BrowserControl at the underlying level
FastMCP documentation
The MCP server framework used by BrowserControl
Feedback
Report bugs or request new features

Gitlab MCP Server
Certified
The GitLab MCP server is a project based on the Model Context Protocol that provides a comprehensive toolset for interacting with GitLab accounts, including code review, merge request management, CI/CD configuration, and other functions.
TypeScript
24.7K
4.3 points

Markdownify MCP
Markdownify is a multi-functional file conversion service that supports converting multiple formats such as PDFs, images, audio, and web page content into Markdown format.
TypeScript
34.6K
5 points

Duckduckgo MCP Server
Certified
The DuckDuckGo Search MCP Server provides web search and content scraping services for LLMs such as Claude.
Python
72.6K
4.3 points

Notion Api MCP
Certified
A Python-based MCP Server that provides advanced to-do list management and content organization functions through the Notion API, enabling seamless integration between AI models and Notion.
Python
20.5K
4.5 points

Figma Context MCP
Framelink Figma MCP Server is a server that provides access to Figma design data for AI programming tools (such as Cursor). By simplifying the Figma API response, it helps AI more accurately achieve one - click conversion from design to code.
TypeScript
63.8K
4.5 points

Unity
Certified
UnityMCP is a Unity editor plugin that implements the Model Context Protocol (MCP), providing seamless integration between Unity and AI assistants, including real - time state monitoring, remote command execution, and log functions.
C#
31.5K
5 points

Gmail MCP Server
A Gmail automatic authentication MCP server designed for Claude Desktop, supporting Gmail management through natural language interaction, including complete functions such as sending emails, label management, and batch operations.
TypeScript
21.1K
4.5 points

Minimax MCP Server
The MiniMax Model Context Protocol (MCP) is an official server that supports interaction with powerful text-to-speech, video/image generation APIs, and is suitable for various client tools such as Claude Desktop and Cursor.
Python
49.5K
4.8 points





