MCP Vision Relay: Server for MCP Clients' Image Analysis via Gemini & Qwen

MCP Vision Relay

MCP Vision Relay is an MCP server that provides image analysis capabilities for MCP clients that only support text, such as Claude and Codex, by encapsulating locally installed Gemini and Qwen command - line tools, enabling them to process pictures in local paths, URLs, or base64 - encoded format.

Artificial intelligence chatbots Developer tools #Image analysis #Multimodal relay #MCP server #Command - line integration .TypeScript

rating : 2.5 points

downloads : 5.7K

update time : 2025-12-04

Open Site

What is MCP Vision Relay?

MCP Vision Relay is a bridge tool that allows AI assistants that do not support native image analysis (such as Claude and Codex) to analyze images by calling the locally installed multimodal AI tools. It encapsulates these tools into a standardized MCP server, enabling your AI assistant to use image analysis capabilities as if they were built - in functions.

How to use MCP Vision Relay?

The usage consists of three steps: 1) Install and configure Gemini CLI or Qwen CLI on your computer; 2) Install and run the MCP Vision Relay server; 3) Register this server in your AI assistant (such as Claude Desktop). After completion, you can directly ask the AI assistant to analyze pictures in the conversation.

Applicable scenarios

This tool is very useful when the AI assistant you are using (for example, Claude or Codex accessed through certain service providers) does not have image analysis capabilities. It provides you with a low - cost solution to restore multimodal capabilities without changing the service provider. For example, analyzing the code in a screenshot, explaining the content of a chart, describing the scene in a photo, etc.

Main features

Unified image analysis tools

Provides two tools, `gemini_analyze_image` and `qwen_analyze_image`, which support inputting pictures for analysis in three ways: local file path, online image link, or Base64 - encoded string.

Multi - provider relay architecture

Through an abstract 'provider' layer, different command - line tools (CLI) can be easily switched or extended. It retains the control ability over core configurations such as model selection and output format.

Robust input handling

Automatically checks the picture size and file format. If necessary, it will automatically download online pictures or decode Base64 data to a temporary file and clean it up after use to ensure security and resource management.

Highly configurable execution

Supports sandbox mode, timeout settings, additional command - line parameters, overriding default models, etc. It can be flexibly configured through environment variables or the `.env` file.

Structured and actionable outputs

Organizes the outputs of the underlying CLI and attaches metadata (such as the model used, analysis time, picture source, etc.), making it convenient for AI assistants to display in the interface or perform subsequent processing.

Advantages

Low - cost expansion of capabilities: There is no need to upgrade to a more expensive AI service package that supports vision. You can use local free or low - cost CLI tools.

Seamless integration: After registering in supported AI assistants (such as Claude Desktop), the image analysis tools will appear in the tool list like native functions.

Flexible selection: Supports multiple back - ends (Gemini, Qwen). You can choose the most suitable one according to your needs, model performance, or cost.

Controllable privacy: The image analysis process occurs on the locally called CLI, and you can understand how the data is sent to the corresponding service provider.

Limitations

Dependent on the local environment: The corresponding CLI tools need to be pre - installed and correctly configured on your computer, and login authentication needs to be completed.

Extra steps: Compared with the built - in vision function of the AI assistant, additional installation and configuration steps are required.

Performance - dependent: The analysis speed and effect depend on the CLI tool you choose and the underlying AI model.

Indirect call: It is not the AI assistant that natively understands images. Instead, the task is 'transferred' to another tool, which may not be as smooth as native integration in some complex interaction scenarios.

How to use

Environment preparation

Ensure that Node.js (version 18 or higher) is installed on your computer. Then, according to your choice, install and configure Google Gemini CLI or Qwen CLI. Please ensure that directly running `gemini -p "hi"` or `qwen -p "hi"` in the command line can return results normally, which means that the CLI is correctly installed and authorized.

Install and build MCP Vision Relay

Download or clone the MCP Vision Relay project, enter the project directory, install dependencies, and build the project.

Configuration (optional)

Copy the `.env.example` file in the project to `.env` and modify the configuration according to your needs, such as setting the default model and timeout. If you keep the default installation and configuration of the CLI, this step can be skipped.

In the AI assistant you are using (such as Claude Desktop or Codex CLI), add MCP Vision Relay as an MCP server. Note: The registration command needs to directly call the entry file instead of through an npm script to avoid additional output interfering with communication.

Start using

After successful registration, in the conversation interface of the AI assistant, you should be able to see the newly added image analysis tools (such as `gemini_analyze_image`). You can then ask the AI assistant to use these tools to analyze pictures in the conversation.

Usage examples

Analyze technical charts

You took a screenshot of a system architecture diagram and want the AI assistant to explain the components and workflow in it.

Explain error screenshots

An error dialog popped up during program execution. After taking a screenshot, you want to know the specific meaning and possible causes of this error.

Describe photo content

You have a landscape photo taken during a trip and want the AI assistant to generate a beautiful description.

Frequently Asked Questions

I already have an AI that can analyze images. Why do I still need this tool?

What should I do if I encounter a 'command not found' error during installation?

I failed to add the server in Claude Desktop and got a handshake error. What should I do?

The tool was called successfully, but I got an error saying 'picture too large' or 'format not supported'. What should I do?

Does it support other models besides Gemini and Qwen?

Related resources

Model Context Protocol official documentation

Understand the standards and specifications of the MCP protocol.

Google Gemini CLI project homepage

Get installation, configuration, and usage instructions for Gemini CLI.

Qwen Code (CLI) NPM page

Get installation and usage information for Qwen CLI.

MCP Vision Relay project code repository

Get the latest source code of this project, report issues, or participate in contributions.

🚀 MCP Vision Relay

MCP Vision Relay encapsulates locally installed multimodal CLIs (currently supporting Google Gemini CLI and Qwen CLI) into a Model Context Protocol (MCP) server. It enables MCP - supported tools like Claude and Codex to directly utilize their image - viewing capabilities.

English | Chinese

⚠️ Important Note

When Claude Code connects to providers such as k2, DeepSeek, or MiniMax M2, the current backing models are text - only, with no built - in vision. By relaying calls through Gemini/Qwen CLI, MCP Vision Relay provides an inexpensive way for those deployments to regain multimodal features without switching providers.

✨ Features

Unified image tools – The gemini_analyze_image and qwen_analyze_image tools support three types of input: paths, URLs, and base64 strings.
Provider relay architecture – Switch or extend different CLIs through the provider abstraction, while retaining the ability to configure models, output formats, etc.
Robust input handling – Automatically verify image size and extension. Download or write to temporary files when necessary and clean them up after use.
Configurable execution – Support optional sandbox, timeout, additional flags, model override, and configuration via .env/environment variables.
Actionable outputs – Regularize stdout and attach metadata for easy display in the UI or subsequent processing by the client.

🚀 Quick Start

Prerequisites

Node.js ≥ 18
Gemini CLI and/or Qwen CLI installed and directly callable from the command line.
Login/authentication for the corresponding CLI is completed (ensure that gemini -p "hi" and qwen -p "hi" can return results).

Install & Build

npm install
npm run build

Run the MCP server

# Development mode (start TypeScript directly with tsx)
npm run dev

# Production mode (use compiled artifacts)
npm run start

The MCP server communicates with the MCP client via stdio, suitable for commands like claude mcp add and codex mcp add.

💻 Usage Examples

Claude

claude mcp add mcp-vision-relay -- npx tsx /absolute/path/to/mcp-vision-relay/src/index.ts

Codex CLI

codex mcp add mcp-vision-relay -- node /absolute/path/to/mcp-vision-relay/dist/index.js

After completion, you can select the mcp-vision-relay tool for invocation in the session/task.

📚 Documentation

Available Tools

`gemini_analyze_image`

Analyze an image and return the description provided by the Gemini CLI.

Property	Details
`image`	Required; local path, HTTP(S) URL, or base64 string
`prompt`	Optional; additional instructions, default is `GEMINI_DEFAULT_PROMPT`
`model`	Optional; override the default model (e.g., `gemini-2.0-flash`)
`outputFormat`	Optional; `"text" \| "json"` to control the `-o` output format
`sandbox`	Optional; whether to add the `-s` sandbox flag
`extraFlags`	Optional; additional custom parameters
`timeoutMs`	Optional; CLI timeout (default 120000ms)

The returned content includes a Markdown text and metadata (model, input source, elapsed time, etc.).

`qwen_analyze_image`

Perform image understanding using the Qwen CLI. The logic is similar to that of Gemini, but it automatically converts local files to data URLs for the CLI to read when necessary.

Property	Details
`image`	Required; local path, HTTP(S) URL, or base64 string
`prompt`	Optional; additional instructions, default is `QWEN_DEFAULT_PROMPT`
`model`	Optional; Qwen model (e.g., `qwen2.5-omni-medium`)
`sandbox`	Optional; whether to add `-s`
`extraFlags`	Optional; additional parameters
`timeoutMs`	Optional; CLI timeout (default 120000ms)

Configuration

Copy .env.example for configuration. Common entries include:

GEMINI_CLI_COMMAND / QWEN_CLI_COMMAND: Path to the CLI executable.
GEMINI_DEFAULT_MODEL / QWEN_DEFAULT_MODEL: Default model name.
GEMINI_OUTPUT_FORMAT: Control the Gemini output (text or json).
MCP_COMMAND_TIMEOUT_MS: Global timeout (in milliseconds).
MCP_MAX_IMAGE_BYTES: Maximum allowed image size.
MCP_ALLOWED_IMAGE_EXTENSIONS: List of allowed file extensions.
MCP_IMAGE_TEMP_DIR: Directory to store temporary files after download/decoding.

If you need to override the CLI command for a single invocation, you can provide cliPath in the tool parameters (supported by both Gemini and Qwen).

Local Diagnostics

The project provides two simple scripts to manually verify CLI invocations:

npx tsx scripts/dev/manual-gemini-test.ts
npx tsx scripts/dev/manual-qwen-test.ts

Ensure that npm run build has been completed or use ts-node/tsx before execution.

Project Structure

src/
  index.ts                 # Program entry point, load env and start the MCP server
  config/                  # Configuration parsing (appConfig, etc.)
  providers/               # CLI provider adaptation layer (Gemini/Qwen)
  server/                  # MCP server wiring
  tools/                   # MCP tool definition and registration
  utils/                   # Common tools for file, CLI scheduling, etc.
scripts/
  dev/                     # Manual verification scripts
test-assets/               # Example image resources