Openai Ocr MCP

An OCR service based on OpenAI's visual model, integrated with Cursor IDE to achieve automatic extraction and saving of text from images

Developer tools Image and video processing #OCR recognition #AI vision #Text extraction #Automatic saving .TypeScript

rating : 2 points

downloads : 16

update time : 2025-04-29

What is the OpenAI OCR MCP Server?

This is an OCR server based on OpenAI's visual model, capable of accurately extracting text content from various formats of images. It is designed for developers and is deeply integrated with Cursor IDE, providing a convenient image text recognition experience.

How to use the OCR service?

Simply select the image to be recognized in Cursor IDE, and the server will automatically process it and return the extracted text content, while generating a corresponding text file. The whole process is simple and fast, without complex configuration.

Applicable scenarios

Suitable for scenarios where text content needs to be extracted from screenshots, scanned documents, photos, or charts, such as document digitization, data entry, and converting code screenshots to text.

Main features

Intelligent text extractionAdopts OpenAI's advanced GPT - 4.1 - mini visual model, capable of accurately recognizing text content in various fonts, layouts, and backgrounds

Automatic file managementAutomatically generates text files paired with images, using content hashing for naming to ensure file association and version tracking

Multi - format supportSupports common image formats such as JPG, PNG, GIF, and WebP, meeting the needs of processing images from different sources

Seamless IDE integrationOptimized for Cursor IDE, the OCR function can be easily invoked through the command panel, improving development efficiency

Advantages and limitations

Advantages

High - precision text recognition, capable of accurately extracting text even from complex layouts

Automated workflow, reducing manual operation steps

Intelligent file naming system, facilitating management and retrieval

Deep integration with the development environment, improving work efficiency

Limitations

Depends on the OpenAI API and requires an internet connection

The size of a single file is limited to 5MB

Does not support handwritten text recognition

Processing a large number of images may take a long time

How to use

Installation preparation

Ensure that the Node.js environment is installed, clone the project repository, and install the dependencies

Configure the API key

Create a.env file in the project root directory and add your OpenAI API key

Start the service

Build the project and start the OCR service

Use in Cursor

Configure the MCP server address in the settings of Cursor IDE and invoke the OCR function through the command panel

Usage examples

Convert code screenshots to textConvert the code snippets in the screenshot to an editable text format

Document digitizationExtract the text from scanned PDFs or images into searchable text

Data table extractionExtract structured data from the table in the image

Frequently Asked Questions

Is it necessary to pay for using this service?

Which languages' text recognition is supported?

What is the processing speed?

Can it be used offline?

How to improve the recognition accuracy?

Related resources

OpenAI API Documentation

OpenAI's official API usage documentation

Cursor IDE Official Website

The official website of Cursor IDE

GitHub Repository

Project source code and issue tracking

Installation Tutorial Video

Step - by - step installation and configuration video guide

🚀 OpenAI OCR MCP Server

An MCP (Model Context Protocol) server that offers Optical Character Recognition (OCR) capabilities, leveraging OpenAI's vision capabilities. This server integrates with Cursor IDE to seamlessly extract text from images.

🚀 Quick Start

The OpenAI OCR MCP Server is designed to work in tandem with Cursor IDE. With this server, you can effortlessly extract text from images and save it as text files.

✨ Features

Image Text Extraction: Utilize OpenAI's GPT - 4.1 - mini model to extract text from various image formats.
Automatic Text File Creation: Automatically save the extracted text along with the source image.
Content - Based File Naming: Organize files using unique content hashes for efficient management.
Support for Multiple Image Formats: Compatible with JPG, PNG, GIF, and WebP formats.
Robust Error Handling: Comprehensive validation and error reporting.
Detailed Logging: Debug - friendly logging for easy troubleshooting.

🔧 Technical Details

Vision Model

Employ OpenAI's GPT - 4.1 - mini model.
Optimized for text extraction from images.
Supports high - detail image analysis.
Processes images via OpenAI's vision API.

File Handling

Automatically creates text files.
Generates content hashes.
Supports multiple image formats.
Built on file size validation.

📦 Installation

Clone the repository.
Install dependencies:
```
npm install
```
Build the TypeScript code:
```
npm run build
```
Set the OpenAI API key in the .env file:
```
OPENAI_API_KEY=your_api_key_here
```

💻 Usage Examples

In Cursor IDE

Configure the MCP server in Cursor settings.
Use the OCR tool through Cursor's command panel.
Select the image file to process.
The extracted text will:
- Be displayed in Cursor.
- Be saved as a text file along with the image.

Text File Output

For each processed image, the server creates a text file following this naming convention:

{Original Image Name}-{Content Hash}.txt

Example:

Input image: document.jpg
Output file: document-a1b2c3d4.txt

The content_hash is a unique 8 - character hash generated from the extracted text, ensuring:

Unique file names for different text contents.
Easy matching between the source image and the extracted text.
Version tracking when the same image yields different results.

Supported Image Formats

JPEG/JPG
PNG
GIF
WebP

File Size Limit

Maximum file size: 5MB.
Files exceeding this limit will be rejected with an error message.

📚 Documentation

Error Handling

The server provides detailed error messages for common issues:

Invalid image formats.
File size exceeding the limit.
File access problems.
API key issues.
Text extraction failures.

Development

Build from Source

npm run build

Run Tests

npm test

Debugging

The server offers detailed logging, including:

API key validation.
File processing steps.
Text extraction results.
File saving operations.

Environment Variables

OPENAI_API_KEY: Your OpenAI API key (required).
Supports standard (sk - ...) and project - specific (sk - proj - ...) API keys.

Contribution

Contributions are welcome! Feel free to submit pull requests.

📄 License

MIT License

Featured MCP Services

Duckduckgo MCP Server

Certified

The DuckDuckGo Search MCP Server provides web search and content scraping services for LLMs such as Claude.

The GitLab MCP server is a project based on the Model Context Protocol that provides a comprehensive toolset for interacting with GitLab accounts, including code review, merge request management, CI/CD configuration, and other functions.

A Python-based MCP Server that provides advanced to-do list management and content organization functions through the Notion API, enabling seamless integration between AI models and Notion.

Markdownify is a multi-functional file conversion service that supports converting multiple formats such as PDFs, images, audio, and web page content into Markdown format.

Framelink Figma MCP Server is a server that provides access to Figma design data for AI programming tools (such as Cursor). By simplifying the Figma API response, it helps AI more accurately achieve one - click conversion from design to code.

UnityMCP is a Unity editor plugin that implements the Model Context Protocol (MCP), providing seamless integration between Unity and AI assistants, including real - time state monitoring, remote command execution, and log functions.

562

5 points

Gmail MCP Server

A Gmail automatic authentication MCP server designed for Claude Desktop, supporting Gmail management through natural language interaction, including complete functions such as sending emails, label management, and batch operations.

The MiniMax Model Context Protocol (MCP) is an official server that supports interaction with powerful text-to-speech, video/image generation APIs, and is suitable for various client tools such as Claude Desktop and Cursor.

Python

751

4.8 points

Zhiqi Future, Your AI Solution Think Tank

English 简体中文繁體中文にほんご