Native Devtools MCP

native-devtools-mcp is a cross-platform MCP server that provides AI agents with the ability to automate control of macOS, Windows, and Android systems, including screenshot, OCR text recognition, simulated click input, window management, and Android device control.

Operating system automation Developer tools #Desktop automation #Cross-platform control #Android debugging #AI assistant .Rust

rating : 2.5 points

downloads : 8.5K

update time : 2026-03-13

Open Site

What is Native DevTools MCP?

Native DevTools MCP is a tool that enables AI assistants (such as Claude, Cursor, etc.) to directly interact with your computer and Android devices. It's like a digital assistant that can: • View screen content (screenshot and text recognition) • Click buttons and links on the screen • Input text • Manage application windows • Control Android phones/tablets When browser automation is not enough, it can handle scenarios such as desktop applications, system dialogs, and Electron applications.

How to use Native DevTools MCP?

It's very simple to use: 1. Install the tool (one-line command) 2. Run the setup wizard (automatically configure permissions) 3. Restart your AI assistant (such as Claude Desktop) 4. Start using the AI assistant to control your device The tool runs locally, and all screenshots and operations are kept on your device to ensure privacy and security.

Applicable scenarios

It is most suitable for the following situations: • Automate repetitive desktop operations • Test desktop applications • Control Android devices for automation • Handle applications that cannot be accessed through the browser • Assist visually impaired users in operating the computer • Robotic process automation (RPA)

Main features

Visual recognition

Capture screenshots and perform text recognition (OCR) to allow the AI to 'see' the content on the screen. It can capture the entire screen, a specific window, or an area.

Input control

Simulate mouse clicks, drags, scrolling, and keyboard inputs. It supports operations in global coordinates and relative positions within the window.

Window management

List open windows, find applications, and bring them to the foreground for focus.

Template matching

Find non-text UI elements (such as icons, shapes) through image templates and return precise click coordinates.

Android device support

Connect to Android devices via ADB for screenshot, input simulation, UI element search, and application management.

Hover tracking

Track the hover state of the cursor on UI elements in real-time to help the AI observe user navigation patterns (only available on macOS).

Local execution

100% local execution. Screenshot and input data are never sent to external servers to ensure privacy and security.

Advantages

Cross - platform support: Compatible with macOS, Windows, and Android platforms

No coding required: Control devices by interacting with the AI in natural language

Privacy and security: All operations are completed locally, and data is not transmitted externally

Flexible and powerful: Supports two interaction methods: visual recognition and structured interaction

Easy to integrate: Seamlessly integrates with mainstream AI tools such as Claude Desktop and Cursor

Limitations

Requires permissions: Screen recording and accessibility permissions need to be granted

Learning curve: Basic commands and interaction patterns need to be understood

Platform limitations: Some advanced features may vary depending on the operating system

Performance dependency: Complex operations may be affected by device performance

Application compatibility: Some applications may restrict automated access

How to use

Install the tool

Open the terminal (macOS/Linux) or command prompt (Windows) and run the following command:

Run the setup wizard

Run the setup command, and the tool will automatically check permissions and configure your AI client:

Grant necessary permissions

Go to the system settings as prompted and grant screen recording and accessibility permissions. This is necessary for the tool to run properly.

Restart the AI client

Restart your Claude Desktop, Cursor, or other MCP clients for the configuration to take effect.

Start using

In the AI assistant, you can now ask it to control your device. For example: 'Please help me open the calculator and calculate 123 + 456'.

Usage examples

Automated email sending

Let the AI assistant help you open the email client, compose, and send an email

File organization

Automatically organize files in the download folder

Android application testing

Automatically test an application on an Android device

Data entry automation

Enter Excel data into a desktop application

Frequently asked questions

Is this tool safe? Will it steal my privacy?

Why do I need screen recording and accessibility permissions?

Which AI clients are supported?

Can I use my computer normally while the tool is running?

What conditions are required for Android devices?

How can I get help if I encounter problems?

Related resources

Official GitHub repository

Source code, issue feedback, and the latest version

Detailed usage guide

Technical reference documentation optimized for AI assistants

Examples and tutorials

Practical usage examples and step-by-step tutorials

Security audit document

Detailed security audit and permission instructions

MCP protocol document

Official specification of the Model Context Protocol

🚀 native-devtools-mcp

native-devtools-mcp is a Model Context Protocol (MCP) server designed for use on macOS, Windows, and Android computers. It empowers AI agents and MCP clients to directly control native desktop applications and Android devices through various means, including taking screenshots, performing OCR, conducting accessibility-based text lookups, simulating inputs, managing windows, and utilizing ADB.

This tool comes in handy when browser-only automation falls short, such as when dealing with Electron apps, system dialogs, desktop tools, native app testing, and Android device workflows. It is compatible with Claude Desktop, Claude Code, Cursor, and other MCP-compatible clients.

It is useful for MCP-based computer usage, desktop automation, UI automation, native app testing, end-to-end testing, RPA, screen reading, mouse and keyboard control, and Android device automation.

npx -y native-devtools-mcp

Core capabilities

Take screenshots, perform OCR, and use the accessibility-first find_text function.
Execute actions like click, type_text, scroll, launch_app, quit_app, and manage windows.
Use element_at_point to inspect accessible UI elements at specific screen coordinates.
Employ load_image and find_image to locate non-text UI elements such as icons and custom controls.
Capture Android screenshots, perform text lookups, simulate inputs, and control apps via ADB.
Ensure local execution, keeping screenshots and inputs on the machine.

For AI agents: Refer to for tool definitions, workflow patterns, and machine-readable usage guidance.

Version License Downloads

Features • Installation • Getting Started • Recipes • Security & Trust • For AI Agents • Android

macOS	Windows

🚀 Quick Start

After installing, run the setup wizard:

npx native-devtools-mcp setup

This will:

Check permissions (macOS) — Verify Accessibility and Screen Recording permissions, and open System Settings if necessary.
Detect your MCP clients — Identify Claude Desktop, Claude Code, and Cursor.
Write the configuration — Generate the correct JSON configuration and offer to write it for you.

Then restart your MCP client, and you're ready to go.

Claude Desktop on macOS requires the signed app bundle (Gatekeeper blocks npx). Download NativeDevtools-X.X.X.dmg from GitHub Releases, drag it to /Applications, and then run the setup. It will detect the app and configure Claude Desktop to use it.

VS Code, Windsurf, and other clients: The setup command does not currently auto-detect these clients. Run setup for the permission checks, and then refer to the manual configuration below for the JSON configuration snippet.

Claude Code tip: To avoid approving every tool call (clicks, screenshots), add the following to .claude/settings.local.json:
{ "permissions": { "allow": ["mcp__native-devtools__*"] } }

✨ Features

👀 Computer Vision: Capture screenshots of screens, windows, or specific regions. It includes built-in OCR (text recognition) to "read" the screen.
🖱️ Input Simulation: Naturally click, drag, scroll, and type text. Supports global coordinates and window-relative actions.
🪟 Window Management: List open windows, find applications, and bring them to focus.
🧩 Template Matching: Use load_image and find_image to find non-text UI elements (icons, shapes) and obtain precise click coordinates.
🔒 Local & Private: 100% local execution. No screenshots or data are sent to external servers.
📱 Android Support: Connect to Android devices via ADB for screenshots, input simulation, UI element search, and app management — all from the same MCP server.
🔍 Hover Tracking: Track cursor hover transitions across UI elements in real-time. Configurable dwell threshold filters can filter out noise, designed for LLMs to observe user navigation patterns. Available on macOS only.
🔌 Dual-Mode Interaction:
1. Visual/Native: Works with any app via screenshots and coordinates (Universal).
2. AppDebugKit: Deep integration for supported apps to inspect the UI tree (DOM-like structure).

🤖 For AI Agents (LLMs)

This MCP server is designed to be highly discoverable and usable by AI models (Claude, Gemini, GPT).

📄 Read : A compact, token-optimized technical reference specifically designed for ingestion by LLMs. It contains intent definitions, schema examples, and reasoning patterns.

Core Capabilities for System Prompts:

take_screenshot: The "eyes". Returns images, layout metadata, and text locations (OCR).
click / type_text: The "hands". Interact with the system based on visual feedback.
find_text: A shortcut to find text on the screen and immediately obtain its coordinates. Uses the platform accessibility API (macOS Accessibility / Windows UI Automation) for precise element-level matching, with OCR as a fallback.
element_at_point: Inspect the accessibility element at given screen coordinates — returns name, role, label, value, bounds, pid, and app_name. Note: Privacy-focused Electron apps (e.g., Signal) may restrict their AX tree, returning only a container. Use take_screenshot with OCR as a fallback.
load_image / find_image: Template matching for non-text UI elements (icons, shapes), returning screen coordinates for clicking.
start_hover_tracking / get_hover_events / stop_hover_tracking: Track cursor hover transitions across UI elements. Configurable dwell threshold filters can be set. Available on macOS only.
launch_app / quit_app: Launch apps with optional CLI arguments, or gracefully/forcefully quit them.

📦 Installation

The installation steps are the same on macOS and Windows.

Option 1: Run with `npx` (no installation required)

npx -y native-devtools-mcp

Option 2: Global installation

npm install -g native-devtools-mcp

Option 3: Build from source (Rust)

Click to expand build instructions

Using the build script (clones, builds, and runs setup):

curl -fsSL https://raw.githubusercontent.com/sh3ll3x3c/native-devtools-mcp/master/scripts/build-from-source.sh | bash

Or manually:

git clone https://github.com/sh3ll3x3c/native-devtools-mcp
cd native-devtools-mcp
cargo build --release
# Binary: ./target/release/native-devtools-mcp

💻 Usage Examples

Basic Usage

npx -y native-devtools-mcp

Advanced Usage

# Build from source
curl -fsSL https://raw.githubusercontent.com/sh3ll3x3c/native-devtools-mcp/master/scripts/build-from-source.sh | bash

📚 Documentation

Recipes and Examples

Recipes and Examples Index
Claude Desktop Setup
Claude Code Setup
Cursor Setup
End-to-End Desktop Flow
Native App Click Flow
OCR Fallback and Element Inspection
Template Matching Flow
Android Quickstart

Manual configuration (without setup)

macOS — Claude Desktop

Config file: ~/Library/Application Support/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "native-devtools": {
      "command": "/Applications/NativeDevtools.app/Contents/MacOS/native-devtools-mcp"
    }
  }
}

Windows — Claude Desktop

Config file: %APPDATA%\Claude\claude_desktop_config.json

Claude Code, Cursor, and other MCP clients

{
  "mcpServers": {
    "native-devtools": {
      "command": "npx",
      "args": ["-y", "native-devtools-mcp"]
    }
  }
}

Requires Node.js 18+.

🔧 Technical Details

graph TD
    Client[Claude / LLM Client] <-->|JSON-RPC 2.0| Server[native-devtools-mcp]
    Server -->|Direct API| Sys[System APIs]
    Server -->|WebSocket| Debug[AppDebugKit]
    Server -->|ADB Protocol| Android[Android Device]

    subgraph "Your Machine"
        Sys -->|Screen/OCR| macOS[CoreGraphics / Vision]
        Sys -->|Input| Win[Win32 / SendInput]
        Sys -->|Text Search| UIA[UI Automation]
        Debug -.->|Inspect| App[Target App]
    end

    subgraph "Android Device (USB/Wi-Fi)"
        Android -->|screencap| Screen[Screenshots]
        Android -->|input| Input[Tap / Swipe / Type]
        Android -->|uiautomator| UITree[UI Hierarchy]
    end

🔧 Technical Details (Under the Hood)

OS	Feature	API Used
macOS	Screenshots	`screencapture` (CLI)
	Input	`CGEvent` (CoreGraphics)
	Text Search (`find_text`)	`Accessibility API` (primary), Vision OCR (fallback)
	Element Inspection (`element_at_point`)	`AXUIElementCopyElementAtPosition` + AX tree walk fallback (Accessibility API)
	Hover Tracking (`start_hover_tracking`)	`CGEvent` cursor + Accessibility API polling (macOS only)
	OCR	`VNRecognizeTextRequest` (Vision Framework)
Windows	Screenshots	`BitBlt` (GDI)
	Input	`SendInput` (Win32)
	Text Search (`find_text`)	`UI Automation` (primary), WinRT OCR (fallback)
	Element Inspection (`element_at_point`)	`IUIAutomation::ElementFromPoint` (UI Automation)
	OCR	`Windows.Media.Ocr` (WinRT)
Android	Screenshots	`screencap` / ADB framebuffer
	Input	`adb shell input` (tap, swipe, text, keyevent)
	Text Search (`find_text`)	`uiautomator dump` (accessibility tree)
	Device Communication	`adb_client` crate (native Rust ADB protocol)

Screenshot Coordinate Precision

Screenshots include metadata for accurate coordinate conversion:

screenshot_origin_x/y: Screen-space origin of the captured area (in points)
screenshot_scale: Display scale factor (e.g., 2.0 for Retina displays)
screenshot_pixel_width/height: Actual pixel dimensions of the image
screenshot_window_id: Window ID (for window captures)

Coordinate conversion:

screen_x = screenshot_origin_x + (pixel_x / screenshot_scale)
screen_y = screenshot_origin_y + (pixel_y / screenshot_scale)

Implementation notes:

Window captures (macOS): Uses screencapture -o which excludes window shadow. The captured image dimensions match kCGWindowBounds × scale exactly, ensuring click coordinates derived from screenshots land on intended UI elements.
Region captures: Origin coordinates are aligned to integers to match the actual captured area.

🔐 Security & Trust

This tool requires Accessibility and Screen Recording permissions, which requires a high level of trust. Here's how to verify its trustworthiness.

Verify your binary

native-devtools-mcp verify

Compute the SHA-256 hash of the running binary and compare it with the official checksums published on the GitHub Releases page. If the hashes match, you are running an unmodified official build.

Build from source

If you don't trust pre-built binaries, you can build it yourself:

curl -fsSL https://raw.githubusercontent.com/sh3ll3x3c/native-devtools-mcp/master/scripts/build-from-source.sh | bash

The script clones the repository, optionally allows you to review it before building, compiles the release binary, and runs the setup. See .

Audit the code

documents exactly which permissions are used, where in the source code, and includes an LLM audit prompt that you can paste into any AI model to perform an independent security review.

What this server does NOT do

No unsolicited network access — The server never sends data to external servers. Network access is only used when the MCP client explicitly invokes app_connect (WebSocket to a local debug server) or when you run the verify subcommand (fetches checksums from GitHub).
No file scanning — It does not read or index your files. The only file reads are load_image (reads a path explicitly provided by the MCP client) and short-lived temp files for screenshots (deleted immediately after capture).
No background persistence — It exits when the MCP client disconnects.
No data exfiltration — Screenshots are returned to the MCP client via stdout and are never stored or transmitted elsewhere.

🔍 Two Approaches to Interaction

We offer two ways for agents to interact, allowing them to choose the most suitable tool for the task.

1. The "Visual" Approach (Universal)

Best for: 99% of apps (Electron, Qt, Games, Browsers).

How it works: The agent takes a screenshot, visually analyzes it (or uses OCR), and clicks at specific coordinates.
Tools: take_screenshot, find_text, click, type_text (plus load_image / find_image for icons and shapes).
Example: "Click the button that looks like a gear icon." → Use find_image with a gear template.

2. The "Structural" Approach (AppDebugKit)

Best for: Apps specifically instrumented with our AppDebugKit library (mostly for developers testing their own apps).

How it works: The agent connects to a debug port and queries the UI tree (similar to the HTML DOM).
Tools: app_connect, app_query, app_click.
Example: app_click(element_id="submit-button").

🧩 Template Matching (find_image)

Use find_image when the target is not text (icons, toggles, custom controls) and OCR or find_text cannot identify it.

Typical flow:

take_screenshot(app_name="MyApp") → screenshot_id
load_image(path="/path/to/icon.png") → template_id
find_image(screenshot_id="...", template_id="...") → matches with screen_x/screen_y
click(x=..., y=...)

Fast vs Accurate:

fast (default): Uses downscaling and early-exit for speed.
accurate: Uses full-resolution, wider scale search, and smaller stride for thorough matching.

Optional inputs like mask_id, search_region, scales, and rotations can improve precision and performance.

📱 Android Support

Android support is built-in. The MCP server communicates with Android devices via ADB (USB or Wi-Fi), providing screenshots, input simulation, UI element search, and app management.

Prerequisites

ADB installed on the host machine (brew install android-platform-tools on macOS, or install via Android SDK)
USB debugging enabled on the Android device (Settings > Developer options > USB debugging)
ADB server running — Starts automatically when you run adb devices

Android tools

All Android tools are prefixed with android_ and appear dynamically after connecting to a device:

Tool	Description
`android_list_devices`	List all ADB-connected devices (always available)
`android_connect`	Connect to a device by serial number
`android_disconnect`	Disconnect from the current device
`android_screenshot`	Capture the device screen
`android_find_text`	Find UI elements by text (via uiautomator)
`android_click`	Tap at screen coordinates
`android_swipe`	Swipe between two points
`android_type_text`	Type text on the device
`android_press_key`	Press a key (e.g., `KEYCODE_HOME`, `KEYCODE_BACK`)
`android_launch_app`	Launch an app by package name
`android_list_apps`	List installed packages
`android_get_display_info`	Get screen resolution and density
`android_get_current_activity`	Get the current foreground activity

Typical workflow

android_list_devices          → find your device serial
android_connect(serial="...")  → connect (unlocks android_* tools)
android_screenshot            → see what's on screen
android_find_text(text="OK")  → locate a button
android_click(x=..., y=...)   → tap it

Known issues

MIUI / HyperOS (Xiaomi, Redmi, POCO devices): Input injection (android_click, android_type_text, android_press_key, android_swipe) and android_find_text (via uiautomator) require an additional security toggle:

Settings > Developer options > USB debugging (Security settings) — Enable this toggle. MIUI may require you to sign in with a Mi account to enable it.

Without this, you'll see INJECT_EVENTS permission errors for input tools and could not get idle state errors for android_find_text. Screenshot and device info tools work without this toggle.

Wireless ADB: To connect without a USB cable, first connect via USB and run:
adb tcpip 5555
adb connect <phone-ip>:5555
Then use the <phone-ip>:5555 serial in android_connect.

Smoke tests

Smoke tests verify all Android tools against a real connected device. They are #[ignore]d by default and must be run explicitly:

cargo test --test android_smoke_tests -- --ignored --test-threads=1

Tests must run sequentially (--test-threads=1) since they share a single physical device. The device must be unlocked and awake.

⚠️ Operational Safety

Hands Off: When the agent is "driving" (clicking/typing), do not move your mouse or type.
- Why? Real hardware inputs can conflict with the simulated ones, causing clicks to land in the wrong place.
Focus Matters: Ensure the window you want the agent to use is visible. If a popup steals focus, the agent might type into the wrong window unless it checks first.

🪟 Windows Notes

It works out of the box on Windows 10/11.

Uses standard Win32 APIs (GDI, SendInput).
find_text uses UI Automation (UIA) as the primary search mechanism, querying the accessibility tree for element names. This is the same accessibility-first approach used on macOS (with the Accessibility API). It automatically falls back to OCR when UIA finds no matches.
OCR uses the built-in Windows Media OCR engine (offline).
Note: It cannot interact with "Run as Administrator" windows unless the MCP server itself is also running as Administrator.