MCP Autogui Multinode
M

MCP Autogui Multinode

2.5 points
5.5K

Installation

Copy the following command to your Client for configuration
Note: Your key is sensitive information, do not share it with anyone.

๐Ÿš€ MCP and HTTP Server Wrapper for PyAutoGUI

This project is an MCP and HTTP server wrapper for PyAutoGUI, which enables LLMs to control your mouse and keyboard.

๐Ÿš€ Quick Start

Prerequisites

  • Python >= 3.12
  • uv package manager (recommended)

Installation

  1. Clone the repository:
git clone https://github.com/stonehill-2345/mcp-autogui-multinode.git
cd mcp-autogui-multinode
  1. Install dependencies based on your deployment scenario:

Local Full Development

For local development with all features (GUI control + testing):

uv sync --group gui --group dev

Deploy MCP Server Only

For deploying MCP server that connects to remote tool service (no GUI dependencies needed):

uv sync --no-group gui

Deploy Tool Service Only

For deploying HTTP tool service that performs actual computer control (requires GUI):

uv sync --group gui

Running the Service

The service supports two independent servers:

1. Run Tool Service (HTTP API)

Starts the HTTP API server for computer control:

uv run python tool.py

2. Run MCP Server

Starts the MCP server that can connect to remote tool services. The server supports two transport modes: HTTP Transport Mode:

uv run python mcp_local.py http

stdio Transport Mode (default):

uv run python mcp_local.py stdio

After starting, you can access:

  • HTTP API Documentation: http://localhost:8000/docs
  • Health Check: http://localhost:8000/health
  • MCP Endpoint: http://localhost:8001/mcp (if using HTTP transport)

โœจ Features

  • ๐Ÿš€ Dual Protocol Support: HTTP REST API and MCP (Model Context Protocol)
  • ๐Ÿ” API Key Authentication: Optional API key authentication for service-to-service communication
  • ๐ŸŒ Multiple MCP Transports: Support for both HTTP and stdio (Standard Input/Output) transport modes
  • ๐Ÿ–ฑ๏ธ Mouse Control: Move, click, drag, scroll operations
  • โŒจ๏ธ Keyboard Control: Press keys, type text, key combinations
  • ๐Ÿ“ธ Screenshot: Capture screen and get base64-encoded images
  • ๐Ÿ“Š Screen Info: Get cursor position and screen resolution
  • โš™๏ธ Configuration Management: Pydantic Settings with environment variable support
  • ๐Ÿ“ Auto Documentation: Swagger UI for HTTP API
  • ๐Ÿ”ง Flexible Deployment: Run HTTP server or MCP server independently
  • ๐Ÿ“‹ Request Tracing: Request ID middleware for request tracking
  • ๐Ÿ“ Structured Logging: Loguru-based logging with request ID integration
  • ๐Ÿ”Œ Remote MCP Support: Optional HTTP client for remote tool server integration

๐Ÿ“š Documentation

Architecture

The service supports two deployment architectures:

LLM -> MCP -> TOOL (Remote Tool Service)

This architecture separates the MCP server from the tool service, allowing the MCP server to connect to a remote tool service via HTTP.

graph LR
    LLM[LLM Client] -->|MCP Protocol| MCP[MCP Server<br/>main.py<br/>Client-based Tools]
    MCP -->|HTTP API<br/>with API Key| TOOLA[Tool Service<br/>tool.py<br/>HTTP API Server]
    MCP -->|HTTP API<br/>with API Key| TOOLB[Tool Service<br/>tool.py<br/>HTTP API Server]

    TOOLA -->|PyAutoGUI| COMPUTERA[Computer Control]
    TOOLB -->|PyAutoGUI| COMPUTERB[Computer Control]
    style LLM fill:#e1f5ff
    style MCP fill:#fff4e1
    style TOOLA fill:#ffe1f5
    style COMPUTERA fill:#e1ffe1
    style COMPUTERB fill:#e1ffe1

Characteristics:

  • MCP server uses client-based tools (register_computer_tools_with_client)
  • MCP server forwards requests to remote tool service via HTTP
  • Tool service performs actual computer control operations
  • Suitable for distributed deployments where MCP server and tool service run on different machines
  • Requires endpoint parameter in MCP tool calls

Architecture 2: LLM -> MCP (Direct Tools)

This architecture uses direct tools where the MCP server directly performs computer control operations.

graph LR
    LLM[LLM Client] -->|MCP Protocol<br/>stdio/http| MCP[MCP Server<br/>mcp_local.py<br/>Direct Tools]
    MCP -->|PyAutoGUI| COMPUTER[Computer Control]
    
    style LLM fill:#e1f5ff
    style MCP fill:#fff4e1
    style COMPUTER fill:#e1ffe1

Characteristics:

  • MCP server uses direct tools (register_computer_tools)
  • MCP server directly executes computer control operations
  • No separate tool service required
  • Suitable for local deployments where everything runs on the same machine
  • No endpoint parameter needed in MCP tool calls

๐Ÿ’ป Usage Examples

Example API Usage

Move Mouse

curl -X POST "http://localhost:8000/api/computer/MoveMouse" \
  -H "Content-Type: application/json" \
  -H "X-API-Key: your-secret-api-key-here" \
  -d '{"x": 100, "y": 200}'

Click Mouse

curl -X POST "http://localhost:8000/api/computer/ClickMouse" \
  -H "Content-Type: application/json" \
  -H "X-API-Key: your-secret-api-key-here" \
  -d '{"x": 100, "y": 200, "button": "left"}'

Take Screenshot

curl -X POST "http://localhost:8000/api/computer/TakeScreenshot" \
  -H "Content-Type: application/json" \
  -H "X-API-Key: your-secret-api-key-here" \
  -d '{}'

Get Cursor Position

curl -X POST "http://localhost:8000/api/computer/GetCursorPosition" \
  -H "Content-Type: application/json" \
  -H "X-API-Key: your-secret-api-key-here" \
  -d '{}'

Note: If API_KEY_ENABLED=false, the X-API-Key header is optional. If API_KEY_ENABLED=true, the header is required for all requests except health checks and documentation endpoints.

MCP Client Usage with API Key

from fastmcp import Client
from fastmcp.client.transports import StreamableHttpTransport

# Create transport with API key
transport = StreamableHttpTransport(
    url="http://localhost:8001/mcp",
    headers={"X-API-Key": "your-secret-api-key-here"}
)

# Create client with transport
client = Client(transport)
async with client:
    response = await client.call_tool("move_mouse", {"x": 100, "y": 200})

๐Ÿ“ฆ API Endpoints

Base Endpoints

  • GET / - Root path, returns API information
  • GET /health - Health check endpoint

Computer Control Endpoints

All computer control actions are available at:

  • POST /api/computer/{action} - Execute a computer control action
  • GET /api/computer/actions - List all available actions

Available Actions

Action Description Parameters
MoveMouse Move mouse cursor x, y (coordinates)
ClickMouse Click mouse button x, y, button, press, release
PressMouse Press mouse button (hold) x, y, button
ReleaseMouse Release mouse button x, y, button
DragMouse Drag mouse from source to target source_x, source_y, target_x, target_y
Scroll Scroll mouse wheel scroll_direction, scroll_amount, x, y
PressKey Press keyboard key(s) key (e.g., "enter", "ctrl c")
TypeText Type text (uses clipboard) text
Wait Wait for duration duration (milliseconds)
TakeScreenshot Capture screen (no parameters)
GetCursorPosition Get mouse position (no parameters)
GetScreenSize Get screen resolution (no parameters)

๐Ÿ”ง MCP Tools

Available MCP Tools

All HTTP API actions are available as MCP tools. The MCP tool names use snake_case, while the HTTP API uses PascalCase:

  • move_mouse - Move mouse cursor (HTTP: MoveMouse)
  • click_mouse - Click mouse button (HTTP: ClickMouse)
  • press_mouse - Press mouse button (HTTP: PressMouse)
  • release_mouse - Release mouse button (HTTP: ReleaseMouse)
  • drag_mouse - Drag mouse (HTTP: DragMouse)
  • scroll - Scroll mouse wheel (HTTP: Scroll)
  • press_key - Press keyboard key (HTTP: PressKey)
  • type_text - Type text (HTTP: TypeText)
  • wait - Wait for duration (HTTP: Wait)
  • take_screenshot - Take screenshot (HTTP: TakeScreenshot)
  • get_cursor_position - Get cursor position (HTTP: GetCursorPosition)
  • get_screen_size - Get screen size (HTTP: GetScreenSize)

MCP Transport Modes

The MCP server supports two transport modes:

  1. stdio (default): Standard input/output transport
    • Used for local communication via stdin/stdout
    • Suitable for direct integration with MCP clients
    • Start with: python mcp_local.py stdio
  2. http: HTTP-based transport with stateless mode
    • Used for remote communication over HTTP
    • Suitable for service-to-service communication
    • Start with: python mcp_local.py http
    • Accessible at: http://localhost:8001/mcp

MCP Tool Registration Modes

The service supports two modes of MCP tool registration:

  1. Direct Tools (register_computer_tools): Tools that directly call the local computer control implementation. No endpoint parameter required.
    • Used in mcp_local.py for local MCP server
    • Tools execute computer control actions directly
  2. Client-based Tools (register_computer_tools_with_client): Tools that use an HTTP client to call a remote tool server. Requires an endpoint parameter.
    • Used in mcp_server/register.py for remote MCP server
    • Tools forward requests to a remote tool service via HTTP

The local MCP server (mcp_local.py) uses direct tools by default. The remote MCP server uses client-based tools.

๐Ÿ”’ Security Considerations

โš ๏ธ Important Note

This service provides direct control over your computer's mouse and keyboard. Use with caution:

  • Only run on trusted networks
  • Restrict CORS origins in production (currently allows all origins)
  • Enable API Key Authentication: Set API_KEY_ENABLED=true and configure a strong API_KEY in production
  • Be aware of the security implications of remote computer control

API Key Authentication

The service supports optional API key authentication for securing service-to-service communication:

  1. Enable Authentication: Set API_KEY_ENABLED=true in your .env file
  2. Set API Key: Configure API_KEY=your-secret-api-key-here in your .env file
  3. Pass API Key in Requests: Include the API key in request headers:
    • X-API-Key: your-secret-api-key-here (recommended)
    • Authorization: Bearer your-secret-api-key-here (alternative)

Excluded Paths (no authentication required):

  • /health - Health check endpoint
  • /docs - API documentation
  • /openapi.json - OpenAPI schema
  • /redoc - Alternative API documentation

๐Ÿ“ Logging

The service uses Loguru for structured logging with the following features:

  • Request ID Tracking: Each request gets a unique ID that appears in all log entries
  • Environment-aware: Console output in development, file logging in production
  • Structured Format: Includes timestamp, level, request ID, module, function, and line number

Log files are stored in the logs/ directory:

  • app_YYYY-MM-DD.log: General application logs
  • error_YYYY-MM-DD.log: Error logs only

In development mode, logs are only output to the console. In production mode, logs are written to both console and files.

๐Ÿงช Testing

Run tests using pytest:

# Run all tests
uv pytest

# Run specific test file
uvpytest tests/test_mcp_client.py

# Run with verbose output
pytest -v

The test suite includes:

  • test_local_mcp_client.py: Tests for local MCP server with HTTP transport (direct tools)
  • test_stdio_mcp_client.py: Tests for local MCP server with stdio transport (direct tools)
  • test_mcp_client.py: Tests for remote MCP server with client-based tools (requires endpoint parameter)

โš™๏ธ Troubleshooting

Port Already in Use

If you get a port already in use error:

# Change ports in .env file
PORT=8002
MCP_PORT=8003

MCP Connection Issues

For HTTP transport, ensure the MCP server is running and accessible:

# Test MCP endpoint
curl http://localhost:8001/mcp

# Test with API key (if enabled)
curl -H "X-API-Key: your-secret-api-key-here" http://localhost:8001/mcp

For stdio transport, ensure the MCP server is started with stdio mode:

# Start MCP server in stdio mode
uv python mcp_local.py stdio

API Key Authentication Issues

If you're getting authentication errors:

  • Verify API_KEY_ENABLED is set correctly in .env
  • Check that API_KEY matches between client and server
  • Ensure the API key is passed in the X-API-Key header or Authorization: Bearer <key> header
  • Check that the request path is not in the excluded paths list

Screenshot Issues

If screenshot functionality fails:

  • Check Python version compatibility (requires Python >= 3.12)
  • Verify display permissions on macOS/Linux
  • Ensure PyAutoGUI and its dependencies are properly installed

๐Ÿ“„ License

MIT license

๐Ÿ‘ฅ Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Alternatives

V
Vestige
Vestige is an AI memory engine based on cognitive science. By implementing 29 neuroscience modules such as prediction error gating, FSRS - 6 spaced repetition, and memory dreaming, it provides long - term memory capabilities for AI. It includes a 3D visualization dashboard and 21 MCP tools, runs completely locally, and does not require the cloud.
Rust
5.8K
4.5 points
M
Moltbrain
MoltBrain is a long-term memory layer plugin designed for OpenClaw, MoltBook, and Claude Code, capable of automatically learning and recalling project context, providing intelligent search, observation recording, analysis statistics, and persistent storage functions.
TypeScript
6.9K
4.5 points
B
Bm.md
A feature-rich Markdown typesetting tool that supports multiple style themes and platform adaptation, providing real-time editing preview, image export, and API integration capabilities
TypeScript
5.9K
5 points
S
Security Detections MCP
Security Detections MCP is a server based on the Model Context Protocol that allows LLMs to query a unified security detection rule database covering Sigma, Splunk ESCU, Elastic, and KQL formats. The latest version 3.0 is upgraded to an autonomous detection engineering platform that can automatically extract TTPs from threat intelligence, analyze coverage gaps, generate SIEM-native format detection rules, run tests, and verify. The project includes over 71 tools, 11 pre-built workflow prompts, and a knowledge graph system, supporting multiple SIEM platforms.
TypeScript
6.7K
4 points
P
Paperbanana
Python
8.2K
5 points
B
Better Icons
An MCP server and CLI tool that provides search and retrieval of over 200,000 icons, supports more than 150 icon libraries, and helps AI assistants and developers quickly obtain and use icons.
TypeScript
7.3K
4.5 points
A
Assistant Ui
assistant - ui is an open - source TypeScript/React library for quickly building production - grade AI chat interfaces, providing composable UI components, streaming responses, accessibility, etc., and supporting multiple AI backends and models.
TypeScript
7.9K
5 points
A
Apify MCP Server
The Apify MCP Server is a tool based on the Model Context Protocol (MCP) that allows AI assistants to extract data from websites such as social media, search engines, and e-commerce through thousands of ready-to-use crawlers, scrapers, and automation tools (Apify Actors). It supports OAuth and Skyfire proxy payment and can be integrated into MCP clients such as Claude and VS Code through HTTPS endpoints or local stdio.
TypeScript
7.0K
5 points
M
Markdownify MCP
Markdownify is a multi-functional file conversion service that supports converting multiple formats such as PDFs, images, audio, and web page content into Markdown format.
TypeScript
35.5K
5 points
N
Notion Api MCP
Certified
A Python-based MCP Server that provides advanced to-do list management and content organization functions through the Notion API, enabling seamless integration between AI models and Notion.
Python
22.3K
4.5 points
G
Gitlab MCP Server
Certified
The GitLab MCP server is a project based on the Model Context Protocol that provides a comprehensive toolset for interacting with GitLab accounts, including code review, merge request management, CI/CD configuration, and other functions.
TypeScript
25.4K
4.3 points
D
Duckduckgo MCP Server
Certified
The DuckDuckGo Search MCP Server provides web search and content scraping services for LLMs such as Claude.
Python
75.2K
4.3 points
F
Figma Context MCP
Framelink Figma MCP Server is a server that provides access to Figma design data for AI programming tools (such as Cursor). By simplifying the Figma API response, it helps AI more accurately achieve one - click conversion from design to code.
TypeScript
66.3K
4.5 points
U
Unity
Certified
UnityMCP is a Unity editor plugin that implements the Model Context Protocol (MCP), providing seamless integration between Unity and AI assistants, including real - time state monitoring, remote command execution, and log functions.
C#
32.5K
5 points
C
Context7
Context7 MCP is a service that provides real-time, version-specific documentation and code examples for AI programming assistants. It is directly integrated into prompts through the Model Context Protocol to solve the problem of LLMs using outdated information.
TypeScript
98.9K
4.7 points
M
Minimax MCP Server
The MiniMax Model Context Protocol (MCP) is an official server that supports interaction with powerful text-to-speech, video/image generation APIs, and is suitable for various client tools such as Claude Desktop and Cursor.
Python
51.1K
4.8 points