MCP Vision Relay
M

MCP Vision Relay

MCP Vision Relay is an MCP server that provides image analysis capabilities for MCP clients that only support text, such as Claude and Codex, by encapsulating locally installed Gemini and Qwen command - line tools, enabling them to process pictures in local paths, URLs, or base64 - encoded format.
2.5 points
6.2K

What is MCP Vision Relay?

MCP Vision Relay is a bridge tool that allows AI assistants that do not support native image analysis (such as Claude and Codex) to analyze images by calling the locally installed multimodal AI tools. It encapsulates these tools into a standardized MCP server, enabling your AI assistant to use image analysis capabilities as if they were built - in functions.

How to use MCP Vision Relay?

The usage consists of three steps: 1) Install and configure Gemini CLI or Qwen CLI on your computer; 2) Install and run the MCP Vision Relay server; 3) Register this server in your AI assistant (such as Claude Desktop). After completion, you can directly ask the AI assistant to analyze pictures in the conversation.

Applicable scenarios

This tool is very useful when the AI assistant you are using (for example, Claude or Codex accessed through certain service providers) does not have image analysis capabilities. It provides you with a low - cost solution to restore multimodal capabilities without changing the service provider. For example, analyzing the code in a screenshot, explaining the content of a chart, describing the scene in a photo, etc.

Main features

Unified image analysis tools
Provides two tools, `gemini_analyze_image` and `qwen_analyze_image`, which support inputting pictures for analysis in three ways: local file path, online image link, or Base64 - encoded string.
Multi - provider relay architecture
Through an abstract 'provider' layer, different command - line tools (CLI) can be easily switched or extended. It retains the control ability over core configurations such as model selection and output format.
Robust input handling
Automatically checks the picture size and file format. If necessary, it will automatically download online pictures or decode Base64 data to a temporary file and clean it up after use to ensure security and resource management.
Highly configurable execution
Supports sandbox mode, timeout settings, additional command - line parameters, overriding default models, etc. It can be flexibly configured through environment variables or the `.env` file.
Structured and actionable outputs
Organizes the outputs of the underlying CLI and attaches metadata (such as the model used, analysis time, picture source, etc.), making it convenient for AI assistants to display in the interface or perform subsequent processing.
Advantages
Low - cost expansion of capabilities: There is no need to upgrade to a more expensive AI service package that supports vision. You can use local free or low - cost CLI tools.
Seamless integration: After registering in supported AI assistants (such as Claude Desktop), the image analysis tools will appear in the tool list like native functions.
Flexible selection: Supports multiple back - ends (Gemini, Qwen). You can choose the most suitable one according to your needs, model performance, or cost.
Controllable privacy: The image analysis process occurs on the locally called CLI, and you can understand how the data is sent to the corresponding service provider.
Limitations
Dependent on the local environment: The corresponding CLI tools need to be pre - installed and correctly configured on your computer, and login authentication needs to be completed.
Extra steps: Compared with the built - in vision function of the AI assistant, additional installation and configuration steps are required.
Performance - dependent: The analysis speed and effect depend on the CLI tool you choose and the underlying AI model.
Indirect call: It is not the AI assistant that natively understands images. Instead, the task is 'transferred' to another tool, which may not be as smooth as native integration in some complex interaction scenarios.

How to use

Environment preparation
Ensure that Node.js (version 18 or higher) is installed on your computer. Then, according to your choice, install and configure Google Gemini CLI or Qwen CLI. Please ensure that directly running `gemini -p "hi"` or `qwen -p "hi"` in the command line can return results normally, which means that the CLI is correctly installed and authorized.
Install and build MCP Vision Relay
Download or clone the MCP Vision Relay project, enter the project directory, install dependencies, and build the project.
Configuration (optional)
Copy the `.env.example` file in the project to `.env` and modify the configuration according to your needs, such as setting the default model and timeout. If you keep the default installation and configuration of the CLI, this step can be skipped.
Register the server in the AI assistant
In the AI assistant you are using (such as Claude Desktop or Codex CLI), add MCP Vision Relay as an MCP server. Note: The registration command needs to directly call the entry file instead of through an npm script to avoid additional output interfering with communication.
Start using
After successful registration, in the conversation interface of the AI assistant, you should be able to see the newly added image analysis tools (such as `gemini_analyze_image`). You can then ask the AI assistant to use these tools to analyze pictures in the conversation.

Usage examples

Analyze technical charts
You took a screenshot of a system architecture diagram and want the AI assistant to explain the components and workflow in it.
Explain error screenshots
An error dialog popped up during program execution. After taking a screenshot, you want to know the specific meaning and possible causes of this error.
Describe photo content
You have a landscape photo taken during a trip and want the AI assistant to generate a beautiful description.

Frequently Asked Questions

I already have an AI that can analyze images. Why do I still need this tool?
What should I do if I encounter a 'command not found' error during installation?
I failed to add the server in Claude Desktop and got a handshake error. What should I do?
The tool was called successfully, but I got an error saying 'picture too large' or 'format not supported'. What should I do?
Does it support other models besides Gemini and Qwen?

Related resources

Model Context Protocol official documentation
Understand the standards and specifications of the MCP protocol.
Google Gemini CLI project homepage
Get installation, configuration, and usage instructions for Gemini CLI.
Qwen Code (CLI) NPM page
Get installation and usage information for Qwen CLI.
MCP Vision Relay project code repository
Get the latest source code of this project, report issues, or participate in contributions.

Installation

Copy the following command to your Client for configuration
Note: Your key is sensitive information, do not share it with anyone.

Alternatives

R
Rsdoctor
Rsdoctor is a build analysis tool specifically designed for the Rspack ecosystem, fully compatible with webpack. It provides visual build analysis, multi - dimensional performance diagnosis, and intelligent optimization suggestions to help developers improve build efficiency and engineering quality.
TypeScript
6.7K
5 points
N
Next Devtools MCP
The Next.js development tools MCP server provides Next.js development tools and utilities for AI programming assistants such as Claude and Cursor, including runtime diagnostics, development automation, and document access functions.
TypeScript
9.2K
5 points
T
Testkube
Testkube is a test orchestration and execution framework for cloud-native applications, providing a unified platform to define, run, and analyze tests. It supports existing testing tools and Kubernetes infrastructure.
Go
4.9K
5 points
M
MCP Windbg
An MCP server that integrates AI models with WinDbg/CDB for analyzing Windows crash dump files and remote debugging, supporting natural language interaction to execute debugging commands.
Python
9.2K
5 points
R
Runno
Runno is a collection of JavaScript toolkits for securely running code in multiple programming languages in environments such as browsers and Node.js. It achieves sandboxed execution through WebAssembly and WASI, supports languages such as Python, Ruby, JavaScript, SQLite, C/C++, and provides integration methods such as web components and MCP servers.
TypeScript
6.1K
5 points
P
Praisonai
PraisonAI is a production-ready multi-AI agent framework with self-reflection capabilities, designed to create AI agents to automate the solution of various problems from simple tasks to complex challenges. It simplifies the construction and management of multi-agent LLM systems by integrating PraisonAI agents, AG2, and CrewAI into a low-code solution, emphasizing simplicity, customization, and effective human-machine collaboration.
Python
5.9K
5 points
N
Netdata
Netdata is an open-source real-time infrastructure monitoring platform that provides second-level metric collection, visualization, machine learning-driven anomaly detection, and automated alerts. It can achieve full-stack monitoring without complex configuration.
Go
8.1K
5 points
M
MCP Server
The Mapbox MCP Server is a model context protocol server implemented in Node.js, providing AI applications with access to Mapbox geospatial APIs, including functions such as geocoding, point - of - interest search, route planning, isochrone analysis, and static map generation.
TypeScript
6.6K
4 points
N
Notion Api MCP
Certified
A Python-based MCP Server that provides advanced to-do list management and content organization functions through the Notion API, enabling seamless integration between AI models and Notion.
Python
18.6K
4.5 points
M
Markdownify MCP
Markdownify is a multi-functional file conversion service that supports converting multiple formats such as PDFs, images, audio, and web page content into Markdown format.
TypeScript
30.6K
5 points
D
Duckduckgo MCP Server
Certified
The DuckDuckGo Search MCP Server provides web search and content scraping services for LLMs such as Claude.
Python
62.6K
4.3 points
G
Gitlab MCP Server
Certified
The GitLab MCP server is a project based on the Model Context Protocol that provides a comprehensive toolset for interacting with GitLab accounts, including code review, merge request management, CI/CD configuration, and other functions.
TypeScript
21.2K
4.3 points
U
Unity
Certified
UnityMCP is a Unity editor plugin that implements the Model Context Protocol (MCP), providing seamless integration between Unity and AI assistants, including real - time state monitoring, remote command execution, and log functions.
C#
26.4K
5 points
F
Figma Context MCP
Framelink Figma MCP Server is a server that provides access to Figma design data for AI programming tools (such as Cursor). By simplifying the Figma API response, it helps AI more accurately achieve one - click conversion from design to code.
TypeScript
57.5K
4.5 points
G
Gmail MCP Server
A Gmail automatic authentication MCP server designed for Claude Desktop, supporting Gmail management through natural language interaction, including complete functions such as sending emails, label management, and batch operations.
TypeScript
19.5K
4.5 points
C
Context7
Context7 MCP is a service that provides real-time, version-specific documentation and code examples for AI programming assistants. It is directly integrated into prompts through the Model Context Protocol to solve the problem of LLMs using outdated information.
TypeScript
83.0K
4.7 points
AIBase
Zhiqi Future, Your AI Solution Think Tank
© 2026AIBase