MCP Vision Relay
M

MCP Vision Relay

MCP Vision Relay is an MCP server that provides image analysis capabilities for MCP clients that only support text, such as Claude and Codex, by encapsulating locally installed Gemini and Qwen command - line tools, enabling them to process pictures in local paths, URLs, or base64 - encoded format.
2.5 points
6.5K

What is MCP Vision Relay?

MCP Vision Relay is a bridge tool that allows AI assistants that do not support native image analysis (such as Claude and Codex) to analyze images by calling the locally installed multimodal AI tools. It encapsulates these tools into a standardized MCP server, enabling your AI assistant to use image analysis capabilities as if they were built - in functions.

How to use MCP Vision Relay?

The usage consists of three steps: 1) Install and configure Gemini CLI or Qwen CLI on your computer; 2) Install and run the MCP Vision Relay server; 3) Register this server in your AI assistant (such as Claude Desktop). After completion, you can directly ask the AI assistant to analyze pictures in the conversation.

Applicable scenarios

This tool is very useful when the AI assistant you are using (for example, Claude or Codex accessed through certain service providers) does not have image analysis capabilities. It provides you with a low - cost solution to restore multimodal capabilities without changing the service provider. For example, analyzing the code in a screenshot, explaining the content of a chart, describing the scene in a photo, etc.

Main features

Unified image analysis tools
Provides two tools, `gemini_analyze_image` and `qwen_analyze_image`, which support inputting pictures for analysis in three ways: local file path, online image link, or Base64 - encoded string.
Multi - provider relay architecture
Through an abstract 'provider' layer, different command - line tools (CLI) can be easily switched or extended. It retains the control ability over core configurations such as model selection and output format.
Robust input handling
Automatically checks the picture size and file format. If necessary, it will automatically download online pictures or decode Base64 data to a temporary file and clean it up after use to ensure security and resource management.
Highly configurable execution
Supports sandbox mode, timeout settings, additional command - line parameters, overriding default models, etc. It can be flexibly configured through environment variables or the `.env` file.
Structured and actionable outputs
Organizes the outputs of the underlying CLI and attaches metadata (such as the model used, analysis time, picture source, etc.), making it convenient for AI assistants to display in the interface or perform subsequent processing.
Advantages
Low - cost expansion of capabilities: There is no need to upgrade to a more expensive AI service package that supports vision. You can use local free or low - cost CLI tools.
Seamless integration: After registering in supported AI assistants (such as Claude Desktop), the image analysis tools will appear in the tool list like native functions.
Flexible selection: Supports multiple back - ends (Gemini, Qwen). You can choose the most suitable one according to your needs, model performance, or cost.
Controllable privacy: The image analysis process occurs on the locally called CLI, and you can understand how the data is sent to the corresponding service provider.
Limitations
Dependent on the local environment: The corresponding CLI tools need to be pre - installed and correctly configured on your computer, and login authentication needs to be completed.
Extra steps: Compared with the built - in vision function of the AI assistant, additional installation and configuration steps are required.
Performance - dependent: The analysis speed and effect depend on the CLI tool you choose and the underlying AI model.
Indirect call: It is not the AI assistant that natively understands images. Instead, the task is 'transferred' to another tool, which may not be as smooth as native integration in some complex interaction scenarios.

How to use

Environment preparation
Ensure that Node.js (version 18 or higher) is installed on your computer. Then, according to your choice, install and configure Google Gemini CLI or Qwen CLI. Please ensure that directly running `gemini -p "hi"` or `qwen -p "hi"` in the command line can return results normally, which means that the CLI is correctly installed and authorized.
Install and build MCP Vision Relay
Download or clone the MCP Vision Relay project, enter the project directory, install dependencies, and build the project.
Configuration (optional)
Copy the `.env.example` file in the project to `.env` and modify the configuration according to your needs, such as setting the default model and timeout. If you keep the default installation and configuration of the CLI, this step can be skipped.
Register the server in the AI assistant
In the AI assistant you are using (such as Claude Desktop or Codex CLI), add MCP Vision Relay as an MCP server. Note: The registration command needs to directly call the entry file instead of through an npm script to avoid additional output interfering with communication.
Start using
After successful registration, in the conversation interface of the AI assistant, you should be able to see the newly added image analysis tools (such as `gemini_analyze_image`). You can then ask the AI assistant to use these tools to analyze pictures in the conversation.

Usage examples

Analyze technical charts
You took a screenshot of a system architecture diagram and want the AI assistant to explain the components and workflow in it.
Explain error screenshots
An error dialog popped up during program execution. After taking a screenshot, you want to know the specific meaning and possible causes of this error.
Describe photo content
You have a landscape photo taken during a trip and want the AI assistant to generate a beautiful description.

Frequently Asked Questions

I already have an AI that can analyze images. Why do I still need this tool?
What should I do if I encounter a 'command not found' error during installation?
I failed to add the server in Claude Desktop and got a handshake error. What should I do?
The tool was called successfully, but I got an error saying 'picture too large' or 'format not supported'. What should I do?
Does it support other models besides Gemini and Qwen?

Related resources

Model Context Protocol official documentation
Understand the standards and specifications of the MCP protocol.
Google Gemini CLI project homepage
Get installation, configuration, and usage instructions for Gemini CLI.
Qwen Code (CLI) NPM page
Get installation and usage information for Qwen CLI.
MCP Vision Relay project code repository
Get the latest source code of this project, report issues, or participate in contributions.

Installation

Copy the following command to your Client for configuration
Note: Your key is sensitive information, do not share it with anyone.

Alternatives

V
Vestige
Vestige is an AI memory engine based on cognitive science. By implementing 29 neuroscience modules such as prediction error gating, FSRS - 6 spaced repetition, and memory dreaming, it provides long - term memory capabilities for AI. It includes a 3D visualization dashboard and 21 MCP tools, runs completely locally, and does not require the cloud.
Rust
9.2K
4.5 points
M
Moltbrain
MoltBrain is a long-term memory layer plugin designed for OpenClaw, MoltBook, and Claude Code, capable of automatically learning and recalling project context, providing intelligent search, observation recording, analysis statistics, and persistent storage functions.
TypeScript
8.7K
4.5 points
B
Bm.md
A feature-rich Markdown typesetting tool that supports multiple style themes and platform adaptation, providing real-time editing preview, image export, and API integration capabilities
TypeScript
14.6K
5 points
S
Security Detections MCP
Security Detections MCP is a server based on the Model Context Protocol that allows LLMs to query a unified security detection rule database covering Sigma, Splunk ESCU, Elastic, and KQL formats. The latest version 3.0 is upgraded to an autonomous detection engineering platform that can automatically extract TTPs from threat intelligence, analyze coverage gaps, generate SIEM-native format detection rules, run tests, and verify. The project includes over 71 tools, 11 pre-built workflow prompts, and a knowledge graph system, supporting multiple SIEM platforms.
TypeScript
7.7K
4 points
P
Paperbanana
Python
8.8K
5 points
B
Better Icons
An MCP server and CLI tool that provides search and retrieval of over 200,000 icons, supports more than 150 icon libraries, and helps AI assistants and developers quickly obtain and use icons.
TypeScript
10.4K
4.5 points
A
Assistant Ui
assistant - ui is an open - source TypeScript/React library for quickly building production - grade AI chat interfaces, providing composable UI components, streaming responses, accessibility, etc., and supporting multiple AI backends and models.
TypeScript
8.6K
5 points
A
Apify MCP Server
The Apify MCP Server is a tool based on the Model Context Protocol (MCP) that allows AI assistants to extract data from websites such as social media, search engines, and e-commerce through thousands of ready-to-use crawlers, scrapers, and automation tools (Apify Actors). It supports OAuth and Skyfire proxy payment and can be integrated into MCP clients such as Claude and VS Code through HTTPS endpoints or local stdio.
TypeScript
9.4K
5 points
G
Gitlab MCP Server
Certified
The GitLab MCP server is a project based on the Model Context Protocol that provides a comprehensive toolset for interacting with GitLab accounts, including code review, merge request management, CI/CD configuration, and other functions.
TypeScript
26.7K
4.3 points
D
Duckduckgo MCP Server
Certified
The DuckDuckGo Search MCP Server provides web search and content scraping services for LLMs such as Claude.
Python
79.3K
4.3 points
M
Markdownify MCP
Markdownify is a multi-functional file conversion service that supports converting multiple formats such as PDFs, images, audio, and web page content into Markdown format.
TypeScript
38.6K
5 points
N
Notion Api MCP
Certified
A Python-based MCP Server that provides advanced to-do list management and content organization functions through the Notion API, enabling seamless integration between AI models and Notion.
Python
24.4K
4.5 points
F
Figma Context MCP
Framelink Figma MCP Server is a server that provides access to Figma design data for AI programming tools (such as Cursor). By simplifying the Figma API response, it helps AI more accurately achieve one - click conversion from design to code.
TypeScript
71.0K
4.5 points
U
Unity
Certified
UnityMCP is a Unity editor plugin that implements the Model Context Protocol (MCP), providing seamless integration between Unity and AI assistants, including real - time state monitoring, remote command execution, and log functions.
C#
37.9K
5 points
M
Minimax MCP Server
The MiniMax Model Context Protocol (MCP) is an official server that supports interaction with powerful text-to-speech, video/image generation APIs, and is suitable for various client tools such as Claude Desktop and Cursor.
Python
56.6K
4.8 points
G
Gmail MCP Server
A Gmail automatic authentication MCP server designed for Claude Desktop, supporting Gmail management through natural language interaction, including complete functions such as sending emails, label management, and batch operations.
TypeScript
23.6K
4.5 points
AIBase
Zhiqi Future, Your AI Solution Think Tank
© 2026AIBase