MCP Image Recognition

An MCP server that provides image recognition functions, supporting the visual APIs of Anthropic and OpenAI, with capabilities such as image description, multi - format support, configurable primary - backup service providers, and OCR text extraction.

Image and video processing Developer tools #Image recognition #Multimodal #OCR #API service .Python

rating : 2.5 points

downloads : 20

update time : 2025-04-29

What is the MCP Image Recognition Server?

The MCP Image Recognition Server is a service based on AI technology that can analyze uploaded images and generate detailed text descriptions. It integrates advanced visual AI models such as Anthropic Claude and OpenAI GPT - 4, and can understand the content, scenes, and objects in the images.

How to use the MCP Image Recognition Server?

It's very simple to use: 1) Install the necessary software environment. 2) Configure the API key. 3) Start the server with simple commands. 4) Upload an image or provide the image path to get the description.

Applicable scenarios

Suitable for scenarios that require automatic analysis of image content, such as: - Providing image descriptions for visually impaired people. - Automatically tagging social media content. - Analyzing e - commerce product images. - Extracting image content from documents.

Main features

Support for multiple AI providersSupports both Anthropic Claude and OpenAI GPT - 4 Vision API, and a primary - backup solution can be configured.

Support for multiple formatsCompatible with multiple common image formats such as JPEG, PNG, GIF, and WebP.

Optical Character Recognition (OCR)Optionally integrate the Tesseract OCR engine to extract text content from images.

Flexible input methodsSupports direct upload of image files or providing Base64 - encoded image data.

Advantages and limitations

Advantages

One - click deployment, easy to use.

Supports multiple AI models and backup solutions, improving reliability.

Open - source and free, can be freely customized.

Capable of generating detailed image descriptions, going beyond simple label recognition.

Limitations

Requires an API key, and some services may incur fees.

The OCR function requires additional installation of Tesseract.

Processing extremely high - resolution images may be slow.

How to use

Installation preparation

Ensure that Python 3.8 or a higher version is installed on the system. If the OCR function is required, install Tesseract.

Get the code

Clone the project repository to the local machine.

Configure the environment

Copy the example environment file and fill in your API key.

Start the server

Start the image recognition service with simple commands.

Usage examples

Social media image analysisAutomatically generate alternative text (alt text) for social media uploaded images.

Text extraction from document imagesExtract text content from scanned document images.

Frequently Asked Questions

Is it free?

Which languages are supported for text recognition?

How long does it usually take to process an image?

How to switch AI providers?

Related resources

Project GitHub repository

Get the latest source code and submit issues.

Anthropic API documentation

Understand the detailed functions of the Claude Vision API.

OpenAI Vision guide

Usage guide for the GPT - 4 Vision API.

Tesseract OCR installation guide

How to install and configure Tesseract OCR.

🚀 MCP Image Recognition Server

An MCP server that uses Anthropic and OpenAI Vision APIs to provide image recognition capabilities. Version 0.1.2.

🚀 Quick Start

The MCP Image Recognition Server leverages Anthropic and OpenAI Vision APIs to offer robust image recognition features. It supports multiple image formats and provides flexible configuration options.

✨ Features

Use Anthropic Claude Vision or OpenAI GPT - 4 Vision for image description.
Support multiple image formats (JPEG, PNG, GIF, WebP).
Configurable primary and backup providers.
Support image input in Base64 and file formats.
Optional Tesseract OCR text extraction feature.

📦 Installation

Dependencies

Python 3.8 or higher.
Tesseract OCR (optional) for text extraction:
- Windows: Download and install from [UB - Mannheim/tesseract](https://github.com/UB - Mannheim/tesseract/wiki).
- Linux: sudo apt - get install tesseract - ocr.
- macOS: brew install tesseract.

Steps

Clone the repository:

git clone https://github.com/mario - andreschak/mcp - image - recognition.git
cd mcp - image - recognition

Create and configure the environment file:

cp.env.example.env
# Edit the.env file to set API keys and preferences.

Build the project:

build.bat

💻 Usage Examples

Running the Server

Start the server using Python:

python -m image_recognition_server.server

Start the server via the batch script:

run.bat server

Run the server in development mode (with MCP inspector):

run.bat debug

Available Tools

describe_image
- Input: Base64 - encoded image data and MIME type.
- Output: Detailed description of the image.
describe_image_from_file
- Input: Image file path.
- Output: Detailed description of the image.

📚 Documentation

Environment Configuration

ANTHROPIC_API_KEY: Anthropic API key.
OPENAI_API_KEY: OpenAI API key.
IMAGE_SIZE: Size of the processed image (default is "256x256").
MAX_ITERATIONS: Maximum number of iterations (default is 100).

OpenRouter Instructions

If using OpenRouter, refer to its documentation for detailed configuration information.
Ensure that the correct API key and model endpoint are set.

Default Models

Anthropic's Claude Vision model.
OpenAI's GPT - 4 Vision model.

🔧 Technical Details

Testing

Run all tests:

run.bat test

Debugging

Run the server in debug mode:

run.bat debug

📄 Release History

Version 0.1.2:
- Fixed known compatibility issues.
- Improved OCR error handling and added comprehensive test coverage for the OCR feature.
Version 0.1.1:
- Initial release, including basic features and documentation support.