MCP Vision Relay
M

MCP Vision Relay

MCP Vision Relay is an MCP server that provides image analysis capabilities for MCP clients that only support text, such as Claude and Codex, by encapsulating locally installed Gemini and Qwen command - line tools, enabling them to process pictures in local paths, URLs, or base64 - encoded format.
2.5 points
0

What is MCP Vision Relay?

MCP Vision Relay is a bridge tool that allows AI assistants that do not support native image analysis (such as Claude and Codex) to analyze images by calling the locally installed multimodal AI tools. It encapsulates these tools into a standardized MCP server, enabling your AI assistant to use image analysis capabilities as if they were built - in functions.

How to use MCP Vision Relay?

The usage consists of three steps: 1) Install and configure Gemini CLI or Qwen CLI on your computer; 2) Install and run the MCP Vision Relay server; 3) Register this server in your AI assistant (such as Claude Desktop). After completion, you can directly ask the AI assistant to analyze pictures in the conversation.

Applicable scenarios

This tool is very useful when the AI assistant you are using (for example, Claude or Codex accessed through certain service providers) does not have image analysis capabilities. It provides you with a low - cost solution to restore multimodal capabilities without changing the service provider. For example, analyzing the code in a screenshot, explaining the content of a chart, describing the scene in a photo, etc.

Main features

Unified image analysis tools
Provides two tools, `gemini_analyze_image` and `qwen_analyze_image`, which support inputting pictures for analysis in three ways: local file path, online image link, or Base64 - encoded string.
Multi - provider relay architecture
Through an abstract 'provider' layer, different command - line tools (CLI) can be easily switched or extended. It retains the control ability over core configurations such as model selection and output format.
Robust input handling
Automatically checks the picture size and file format. If necessary, it will automatically download online pictures or decode Base64 data to a temporary file and clean it up after use to ensure security and resource management.
Highly configurable execution
Supports sandbox mode, timeout settings, additional command - line parameters, overriding default models, etc. It can be flexibly configured through environment variables or the `.env` file.
Structured and actionable outputs
Organizes the outputs of the underlying CLI and attaches metadata (such as the model used, analysis time, picture source, etc.), making it convenient for AI assistants to display in the interface or perform subsequent processing.
Advantages
Low - cost expansion of capabilities: There is no need to upgrade to a more expensive AI service package that supports vision. You can use local free or low - cost CLI tools.
Seamless integration: After registering in supported AI assistants (such as Claude Desktop), the image analysis tools will appear in the tool list like native functions.
Flexible selection: Supports multiple back - ends (Gemini, Qwen). You can choose the most suitable one according to your needs, model performance, or cost.
Controllable privacy: The image analysis process occurs on the locally called CLI, and you can understand how the data is sent to the corresponding service provider.
Limitations
Dependent on the local environment: The corresponding CLI tools need to be pre - installed and correctly configured on your computer, and login authentication needs to be completed.
Extra steps: Compared with the built - in vision function of the AI assistant, additional installation and configuration steps are required.
Performance - dependent: The analysis speed and effect depend on the CLI tool you choose and the underlying AI model.
Indirect call: It is not the AI assistant that natively understands images. Instead, the task is 'transferred' to another tool, which may not be as smooth as native integration in some complex interaction scenarios.

How to use

Environment preparation
Ensure that Node.js (version 18 or higher) is installed on your computer. Then, according to your choice, install and configure Google Gemini CLI or Qwen CLI. Please ensure that directly running `gemini -p "hi"` or `qwen -p "hi"` in the command line can return results normally, which means that the CLI is correctly installed and authorized.
Install and build MCP Vision Relay
Download or clone the MCP Vision Relay project, enter the project directory, install dependencies, and build the project.
Configuration (optional)
Copy the `.env.example` file in the project to `.env` and modify the configuration according to your needs, such as setting the default model and timeout. If you keep the default installation and configuration of the CLI, this step can be skipped.
Register the server in the AI assistant
In the AI assistant you are using (such as Claude Desktop or Codex CLI), add MCP Vision Relay as an MCP server. Note: The registration command needs to directly call the entry file instead of through an npm script to avoid additional output interfering with communication.
Start using
After successful registration, in the conversation interface of the AI assistant, you should be able to see the newly added image analysis tools (such as `gemini_analyze_image`). You can then ask the AI assistant to use these tools to analyze pictures in the conversation.

Usage examples

Analyze technical charts
You took a screenshot of a system architecture diagram and want the AI assistant to explain the components and workflow in it.
Explain error screenshots
An error dialog popped up during program execution. After taking a screenshot, you want to know the specific meaning and possible causes of this error.
Describe photo content
You have a landscape photo taken during a trip and want the AI assistant to generate a beautiful description.

Frequently Asked Questions

I already have an AI that can analyze images. Why do I still need this tool?
What should I do if I encounter a 'command not found' error during installation?
I failed to add the server in Claude Desktop and got a handshake error. What should I do?
The tool was called successfully, but I got an error saying 'picture too large' or 'format not supported'. What should I do?
Does it support other models besides Gemini and Qwen?

Related resources

Model Context Protocol official documentation
Understand the standards and specifications of the MCP protocol.
Google Gemini CLI project homepage
Get installation, configuration, and usage instructions for Gemini CLI.
Qwen Code (CLI) NPM page
Get installation and usage information for Qwen CLI.
MCP Vision Relay project code repository
Get the latest source code of this project, report issues, or participate in contributions.

Installation

Copy the following command to your Client for configuration
Note: Your key is sensitive information, do not share it with anyone.

Alternatives

A
Acemcp
Acemcp is an MCP server for codebase indexing and semantic search, supporting automatic incremental indexing, multi-encoding file processing, .gitignore integration, and a Web management interface, helping developers quickly search for and understand code context.
Python
7.2K
5 points
B
Blueprint MCP
Blueprint MCP is a chart generation tool based on the Arcade ecosystem. It uses technologies such as Nano Banana Pro to automatically generate visual charts such as architecture diagrams and flowcharts by analyzing codebases and system architectures, helping developers understand complex systems.
Python
7.0K
4 points
M
MCP Agent Mail
MCP Agent Mail is a mail - based coordination layer designed for AI programming agents, providing identity management, message sending and receiving, file reservation, and search functions, supporting asynchronous collaboration and conflict avoidance among multiple agents.
Python
7.7K
5 points
K
Klavis
Klavis AI is an open-source project that provides a simple and easy-to-use MCP (Model Context Protocol) service on Slack, Discord, and Web platforms. It includes various functions such as report generation, YouTube tools, and document conversion, supporting non-technical users and developers to use AI workflows.
TypeScript
12.9K
5 points
M
MCP
The Microsoft official MCP server provides search and access functions for the latest Microsoft technical documentation for AI assistants
12.6K
5 points
A
Aderyn
Aderyn is an open - source Solidity smart contract static analysis tool written in Rust, which helps developers and security researchers discover vulnerabilities in Solidity code. It supports Foundry and Hardhat projects, can generate reports in multiple formats, and provides a VSCode extension.
Rust
9.5K
5 points
D
Devtools Debugger MCP
The Node.js Debugger MCP server provides complete debugging capabilities based on the Chrome DevTools protocol, including breakpoint setting, stepping execution, variable inspection, and expression evaluation.
TypeScript
9.9K
4 points
S
Scrapling
Scrapling is an adaptive web scraping library that can automatically learn website changes and re - locate elements. It supports multiple scraping methods and AI integration, providing high - performance parsing and a developer - friendly experience.
Python
11.3K
5 points
N
Notion Api MCP
Certified
A Python-based MCP Server that provides advanced to-do list management and content organization functions through the Notion API, enabling seamless integration between AI models and Notion.
Python
16.3K
4.5 points
G
Gitlab MCP Server
Certified
The GitLab MCP server is a project based on the Model Context Protocol that provides a comprehensive toolset for interacting with GitLab accounts, including code review, merge request management, CI/CD configuration, and other functions.
TypeScript
18.0K
4.3 points
M
Markdownify MCP
Markdownify is a multi-functional file conversion service that supports converting multiple formats such as PDFs, images, audio, and web page content into Markdown format.
TypeScript
26.3K
5 points
D
Duckduckgo MCP Server
Certified
The DuckDuckGo Search MCP Server provides web search and content scraping services for LLMs such as Claude.
Python
52.0K
4.3 points
F
Figma Context MCP
Framelink Figma MCP Server is a server that provides access to Figma design data for AI programming tools (such as Cursor). By simplifying the Figma API response, it helps AI more accurately achieve one - click conversion from design to code.
TypeScript
49.8K
4.5 points
U
Unity
Certified
UnityMCP is a Unity editor plugin that implements the Model Context Protocol (MCP), providing seamless integration between Unity and AI assistants, including real - time state monitoring, remote command execution, and log functions.
C#
22.1K
5 points
M
Minimax MCP Server
The MiniMax Model Context Protocol (MCP) is an official server that supports interaction with powerful text-to-speech, video/image generation APIs, and is suitable for various client tools such as Claude Desktop and Cursor.
Python
35.9K
4.8 points
C
Context7
Context7 MCP is a service that provides real-time, version-specific documentation and code examples for AI programming assistants. It is directly integrated into prompts through the Model Context Protocol to solve the problem of LLMs using outdated information.
TypeScript
73.3K
4.7 points
AIBase
Zhiqi Future, Your AI Solution Think Tank
© 2025AIBase