🚀 native-devtools-mcp
native-devtools-mcp is a Model Context Protocol (MCP) server designed for use on macOS, Windows, and Android computers. It empowers AI agents and MCP clients to directly control native desktop applications and Android devices through various means, including taking screenshots, performing OCR, conducting accessibility-based text lookups, simulating inputs, managing windows, and utilizing ADB.
This tool comes in handy when browser-only automation falls short, such as when dealing with Electron apps, system dialogs, desktop tools, native app testing, and Android device workflows. It is compatible with Claude Desktop, Claude Code, Cursor, and other MCP-compatible clients.
It is useful for MCP-based computer usage, desktop automation, UI automation, native app testing, end-to-end testing, RPA, screen reading, mouse and keyboard control, and Android device automation.
npx -y native-devtools-mcp
Core capabilities
- Take screenshots, perform OCR, and use the accessibility-first
find_text function.
- Execute actions like
click, type_text, scroll, launch_app, quit_app, and manage windows.
- Use
element_at_point to inspect accessible UI elements at specific screen coordinates.
- Employ
load_image and find_image to locate non-text UI elements such as icons and custom controls.
- Capture Android screenshots, perform text lookups, simulate inputs, and control apps via ADB.
- Ensure local execution, keeping screenshots and inputs on the machine.
For AI agents: Refer to for tool definitions, workflow patterns, and machine-readable usage guidance.

Features • Installation • Getting Started • Recipes • Security & Trust • For AI Agents • Android
🚀 Quick Start
After installing, run the setup wizard:
npx native-devtools-mcp setup
This will:
- Check permissions (macOS) — Verify Accessibility and Screen Recording permissions, and open System Settings if necessary.
- Detect your MCP clients — Identify Claude Desktop, Claude Code, and Cursor.
- Write the configuration — Generate the correct JSON configuration and offer to write it for you.
Then restart your MCP client, and you're ready to go.
Claude Desktop on macOS requires the signed app bundle (Gatekeeper blocks npx). Download NativeDevtools-X.X.X.dmg from GitHub Releases, drag it to /Applications, and then run the setup. It will detect the app and configure Claude Desktop to use it.
VS Code, Windsurf, and other clients: The setup command does not currently auto-detect these clients. Run setup for the permission checks, and then refer to the manual configuration below for the JSON configuration snippet.
Claude Code tip: To avoid approving every tool call (clicks, screenshots), add the following to .claude/settings.local.json:
{ "permissions": { "allow": ["mcp__native-devtools__*"] } }
✨ Features
- 👀 Computer Vision: Capture screenshots of screens, windows, or specific regions. It includes built-in OCR (text recognition) to "read" the screen.
- 🖱️ Input Simulation: Naturally click, drag, scroll, and type text. Supports global coordinates and window-relative actions.
- 🪟 Window Management: List open windows, find applications, and bring them to focus.
- 🧩 Template Matching: Use
load_image and find_image to find non-text UI elements (icons, shapes) and obtain precise click coordinates.
- 🔒 Local & Private: 100% local execution. No screenshots or data are sent to external servers.
- 📱 Android Support: Connect to Android devices via ADB for screenshots, input simulation, UI element search, and app management — all from the same MCP server.
- 🔍 Hover Tracking: Track cursor hover transitions across UI elements in real-time. Configurable dwell threshold filters can filter out noise, designed for LLMs to observe user navigation patterns. Available on macOS only.
- 🔌 Dual-Mode Interaction:
- Visual/Native: Works with any app via screenshots and coordinates (Universal).
- AppDebugKit: Deep integration for supported apps to inspect the UI tree (DOM-like structure).
🤖 For AI Agents (LLMs)
This MCP server is designed to be highly discoverable and usable by AI models (Claude, Gemini, GPT).
- 📄 Read : A compact, token-optimized technical reference specifically designed for ingestion by LLMs. It contains intent definitions, schema examples, and reasoning patterns.
Core Capabilities for System Prompts:
take_screenshot: The "eyes". Returns images, layout metadata, and text locations (OCR).
click / type_text: The "hands". Interact with the system based on visual feedback.
find_text: A shortcut to find text on the screen and immediately obtain its coordinates. Uses the platform accessibility API (macOS Accessibility / Windows UI Automation) for precise element-level matching, with OCR as a fallback.
element_at_point: Inspect the accessibility element at given screen coordinates — returns name, role, label, value, bounds, pid, and app_name. Note: Privacy-focused Electron apps (e.g., Signal) may restrict their AX tree, returning only a container. Use take_screenshot with OCR as a fallback.
load_image / find_image: Template matching for non-text UI elements (icons, shapes), returning screen coordinates for clicking.
start_hover_tracking / get_hover_events / stop_hover_tracking: Track cursor hover transitions across UI elements. Configurable dwell threshold filters can be set. Available on macOS only.
launch_app / quit_app: Launch apps with optional CLI arguments, or gracefully/forcefully quit them.
📦 Installation
The installation steps are the same on macOS and Windows.
Option 1: Run with npx (no installation required)
npx -y native-devtools-mcp
Option 2: Global installation
npm install -g native-devtools-mcp
Option 3: Build from source (Rust)
Click to expand build instructions
Using the build script (clones, builds, and runs setup):
curl -fsSL https://raw.githubusercontent.com/sh3ll3x3c/native-devtools-mcp/master/scripts/build-from-source.sh | bash
Or manually:
git clone https://github.com/sh3ll3x3c/native-devtools-mcp
cd native-devtools-mcp
cargo build --release
💻 Usage Examples
Basic Usage
npx -y native-devtools-mcp
Advanced Usage
curl -fsSL https://raw.githubusercontent.com/sh3ll3x3c/native-devtools-mcp/master/scripts/build-from-source.sh | bash
📚 Documentation
Recipes and Examples
- Recipes and Examples Index
- Claude Desktop Setup
- Claude Code Setup
- Cursor Setup
- End-to-End Desktop Flow
- Native App Click Flow
- OCR Fallback and Element Inspection
- Template Matching Flow
- Android Quickstart
Manual configuration (without setup)
macOS — Claude Desktop
Config file: ~/Library/Application Support/Claude/claude_desktop_config.json
{
"mcpServers": {
"native-devtools": {
"command": "/Applications/NativeDevtools.app/Contents/MacOS/native-devtools-mcp"
}
}
}
Windows — Claude Desktop
Config file: %APPDATA%\Claude\claude_desktop_config.json
Claude Code, Cursor, and other MCP clients
{
"mcpServers": {
"native-devtools": {
"command": "npx",
"args": ["-y", "native-devtools-mcp"]
}
}
}
Requires Node.js 18+.
🔧 Technical Details
graph TD
Client[Claude / LLM Client] <-->|JSON-RPC 2.0| Server[native-devtools-mcp]
Server -->|Direct API| Sys[System APIs]
Server -->|WebSocket| Debug[AppDebugKit]
Server -->|ADB Protocol| Android[Android Device]
subgraph "Your Machine"
Sys -->|Screen/OCR| macOS[CoreGraphics / Vision]
Sys -->|Input| Win[Win32 / SendInput]
Sys -->|Text Search| UIA[UI Automation]
Debug -.->|Inspect| App[Target App]
end
subgraph "Android Device (USB/Wi-Fi)"
Android -->|screencap| Screen[Screenshots]
Android -->|input| Input[Tap / Swipe / Type]
Android -->|uiautomator| UITree[UI Hierarchy]
end
🔧 Technical Details (Under the Hood)
| OS |
Feature |
API Used |
| macOS |
Screenshots |
screencapture (CLI) |
|
Input |
CGEvent (CoreGraphics) |
|
Text Search (find_text) |
Accessibility API (primary), Vision OCR (fallback) |
|
Element Inspection (element_at_point) |
AXUIElementCopyElementAtPosition + AX tree walk fallback (Accessibility API) |
|
Hover Tracking (start_hover_tracking) |
CGEvent cursor + Accessibility API polling (macOS only) |
|
OCR |
VNRecognizeTextRequest (Vision Framework) |
| Windows |
Screenshots |
BitBlt (GDI) |
|
Input |
SendInput (Win32) |
|
Text Search (find_text) |
UI Automation (primary), WinRT OCR (fallback) |
|
Element Inspection (element_at_point) |
IUIAutomation::ElementFromPoint (UI Automation) |
|
OCR |
Windows.Media.Ocr (WinRT) |
| Android |
Screenshots |
screencap / ADB framebuffer |
|
Input |
adb shell input (tap, swipe, text, keyevent) |
|
Text Search (find_text) |
uiautomator dump (accessibility tree) |
|
Device Communication |
adb_client crate (native Rust ADB protocol) |
Screenshot Coordinate Precision
Screenshots include metadata for accurate coordinate conversion:
screenshot_origin_x/y: Screen-space origin of the captured area (in points)
screenshot_scale: Display scale factor (e.g., 2.0 for Retina displays)
screenshot_pixel_width/height: Actual pixel dimensions of the image
screenshot_window_id: Window ID (for window captures)
Coordinate conversion:
screen_x = screenshot_origin_x + (pixel_x / screenshot_scale)
screen_y = screenshot_origin_y + (pixel_y / screenshot_scale)
Implementation notes:
- Window captures (macOS): Uses
screencapture -o which excludes window shadow. The captured image dimensions match kCGWindowBounds × scale exactly, ensuring click coordinates derived from screenshots land on intended UI elements.
- Region captures: Origin coordinates are aligned to integers to match the actual captured area.
🔐 Security & Trust
This tool requires Accessibility and Screen Recording permissions, which requires a high level of trust. Here's how to verify its trustworthiness.
Verify your binary
native-devtools-mcp verify
Compute the SHA-256 hash of the running binary and compare it with the official checksums published on the GitHub Releases page. If the hashes match, you are running an unmodified official build.
Build from source
If you don't trust pre-built binaries, you can build it yourself:
curl -fsSL https://raw.githubusercontent.com/sh3ll3x3c/native-devtools-mcp/master/scripts/build-from-source.sh | bash
The script clones the repository, optionally allows you to review it before building, compiles the release binary, and runs the setup. See .
Audit the code
documents exactly which permissions are used, where in the source code, and includes an LLM audit prompt that you can paste into any AI model to perform an independent security review.
What this server does NOT do
- No unsolicited network access — The server never sends data to external servers. Network access is only used when the MCP client explicitly invokes
app_connect (WebSocket to a local debug server) or when you run the verify subcommand (fetches checksums from GitHub).
- No file scanning — It does not read or index your files. The only file reads are
load_image (reads a path explicitly provided by the MCP client) and short-lived temp files for screenshots (deleted immediately after capture).
- No background persistence — It exits when the MCP client disconnects.
- No data exfiltration — Screenshots are returned to the MCP client via stdout and are never stored or transmitted elsewhere.
🔍 Two Approaches to Interaction
We offer two ways for agents to interact, allowing them to choose the most suitable tool for the task.
1. The "Visual" Approach (Universal)
Best for: 99% of apps (Electron, Qt, Games, Browsers).
- How it works: The agent takes a screenshot, visually analyzes it (or uses OCR), and clicks at specific coordinates.
- Tools:
take_screenshot, find_text, click, type_text (plus load_image / find_image for icons and shapes).
- Example: "Click the button that looks like a gear icon." → Use
find_image with a gear template.
2. The "Structural" Approach (AppDebugKit)
Best for: Apps specifically instrumented with our AppDebugKit library (mostly for developers testing their own apps).
- How it works: The agent connects to a debug port and queries the UI tree (similar to the HTML DOM).
- Tools:
app_connect, app_query, app_click.
- Example:
app_click(element_id="submit-button").
🧩 Template Matching (find_image)
Use find_image when the target is not text (icons, toggles, custom controls) and OCR or find_text cannot identify it.
Typical flow:
take_screenshot(app_name="MyApp") → screenshot_id
load_image(path="/path/to/icon.png") → template_id
find_image(screenshot_id="...", template_id="...") → matches with screen_x/screen_y
click(x=..., y=...)
Fast vs Accurate:
- fast (default): Uses downscaling and early-exit for speed.
- accurate: Uses full-resolution, wider scale search, and smaller stride for thorough matching.
Optional inputs like mask_id, search_region, scales, and rotations can improve precision and performance.
📱 Android Support
Android support is built-in. The MCP server communicates with Android devices via ADB (USB or Wi-Fi), providing screenshots, input simulation, UI element search, and app management.
Prerequisites
- ADB installed on the host machine (
brew install android-platform-tools on macOS, or install via Android SDK)
- USB debugging enabled on the Android device (Settings > Developer options > USB debugging)
- ADB server running — Starts automatically when you run
adb devices
Android tools
All Android tools are prefixed with android_ and appear dynamically after connecting to a device:
| Tool |
Description |
android_list_devices |
List all ADB-connected devices (always available) |
android_connect |
Connect to a device by serial number |
android_disconnect |
Disconnect from the current device |
android_screenshot |
Capture the device screen |
android_find_text |
Find UI elements by text (via uiautomator) |
android_click |
Tap at screen coordinates |
android_swipe |
Swipe between two points |
android_type_text |
Type text on the device |
android_press_key |
Press a key (e.g., KEYCODE_HOME, KEYCODE_BACK) |
android_launch_app |
Launch an app by package name |
android_list_apps |
List installed packages |
android_get_display_info |
Get screen resolution and density |
android_get_current_activity |
Get the current foreground activity |
Typical workflow
android_list_devices → find your device serial
android_connect(serial="...") → connect (unlocks android_* tools)
android_screenshot → see what's on screen
android_find_text(text="OK") → locate a button
android_click(x=..., y=...) → tap it
Known issues
MIUI / HyperOS (Xiaomi, Redmi, POCO devices): Input injection (android_click, android_type_text, android_press_key, android_swipe) and android_find_text (via uiautomator) require an additional security toggle:
Settings > Developer options > USB debugging (Security settings) — Enable this toggle. MIUI may require you to sign in with a Mi account to enable it.
Without this, you'll see INJECT_EVENTS permission errors for input tools and could not get idle state errors for android_find_text. Screenshot and device info tools work without this toggle.
Wireless ADB: To connect without a USB cable, first connect via USB and run:
adb tcpip 5555
adb connect <phone-ip>:5555
Then use the <phone-ip>:5555 serial in android_connect.
Smoke tests
Smoke tests verify all Android tools against a real connected device. They are #[ignore]d by default and must be run explicitly:
cargo test --test android_smoke_tests -- --ignored --test-threads=1
Tests must run sequentially (--test-threads=1) since they share a single physical device. The device must be unlocked and awake.
⚠️ Operational Safety
- Hands Off: When the agent is "driving" (clicking/typing), do not move your mouse or type.
- Why? Real hardware inputs can conflict with the simulated ones, causing clicks to land in the wrong place.
- Focus Matters: Ensure the window you want the agent to use is visible. If a popup steals focus, the agent might type into the wrong window unless it checks first.
🪟 Windows Notes
It works out of the box on Windows 10/11.
- Uses standard Win32 APIs (GDI, SendInput).
find_text uses UI Automation (UIA) as the primary search mechanism, querying the accessibility tree for element names. This is the same accessibility-first approach used on macOS (with the Accessibility API). It automatically falls back to OCR when UIA finds no matches.
- OCR uses the built-in Windows Media OCR engine (offline).
- Note: It cannot interact with "Run as Administrator" windows unless the MCP server itself is also running as Administrator.
📄 License
MIT © sh3ll3x3c