MCP Server Whisper

MCP Server Whisper is an audio processing server based on OpenAI Whisper and GPT - 4o models, providing advanced audio transcription, format conversion, batch processing, and text - to - speech functions. It enables seamless interaction with AI assistants through the Model Context Protocol standard.

Voice processing Education and learning tools #Audio transcription #AI processing #Batch processing #Speech synthesis .Python

rating : 2 points

downloads : 17

update time : 2025-04-29

What is MCP Server Whisper?

MCP Server Whisper is an intelligent audio processing tool that can convert your recordings into text, analyze audio content, and even generate natural speech. It uses OpenAI's most advanced AI models and is particularly suitable for processing audio materials such as meeting records, interview content, and podcasts.

How to use MCP Server Whisper?

You can use it through simple natural language instructions (e.g., 'Please transcribe my latest recording'). The system will automatically find the audio file, select the most suitable AI model for processing, and return the results. No complex technical operations are required.

Use cases

It is suitable for various scenarios such as journalist interview transcription, meeting record organization, podcast content analysis, voice memo conversion, and foreign language learning material processing. It is especially suitable for professionals who need to quickly extract information from audio.

Main features

Intelligent audio transcriptionSupports multiple AI models to convert speech into text, with options for detail level and format (ordinary/professional/story - like, etc.)

Audio content analysisYou can directly 'converse' with the audio content to obtain AI analysis and insights on the recording

Text - to - speechConverts text into natural speech, supporting multiple voice styles and speed adjustments

Batch processingCan process multiple audio files simultaneously, automatically optimizing the processing order to improve efficiency

Intelligent file managementSearch and filter audio files by conditions such as name, size, and duration

Advantages and limitations

Advantages

Uses the most advanced GPT - 4o model with high transcription accuracy

Supports audio processing in multiple languages including Chinese

Simple to operate, just describe your needs in natural language

Automatically handles large - file compression and format conversion

Provides multiple enhanced transcription templates to meet different needs

Limitations

Depends on the OpenAI API and requires an internet connection

The size of a single file for processing should not exceed 25MB

Some professional terms may require manual proofreading

Recordings with extremely fast speech or in a noisy environment may affect accuracy

How to use

Installation preparation

Ensure that Python 3.10+ and necessary dependencies are installed

Configure the environment

Create a.env file and set the OpenAI API key and audio file path

Start the service

Run the server so that AI assistants such as Claude can call it

Start using

Use various functions through natural language instructions, such as requesting transcription or analyzing audio

Usage examples

Meeting record organizationAutomatically convert a one - hour meeting recording into a structured text record

Foreign language learning assistanceAnalyze foreign language listening materials and explain difficult points

Podcast content summaryAutomatically generate a summary of the core content of a podcast

Frequently Asked Questions

Which audio formats are supported?

What is the transcription accuracy?

What is the processing speed?

How to protect my audio privacy?

Related resources

Official GitHub repository

Get the latest code and updates

Model Context Protocol official website

Understand the MCP protocol standard

OpenAI audio API documentation

Understand the underlying technical details

🚀 MCP Server Whisper

MCP Server Whisper is a powerful tool designed to achieve efficient audio processing and transcription tasks through standardized protocols and advanced technologies.

🚀 Quick Start

Get started with MCP Server Whisper by following the steps below.

📦 Installation

Install with pip

pip install mcp-server-whisper

📚 Documentation

System Requirements

Python 3.10 or higher
Node.js (Recommended v16.24.0 or higher)
Omi Screen Recorder (Mac only)

Install Dependencies

npm install --save-exact \
  asyncio \
  fast-mcp \
  mcp-server-whisper \
  openai \
  pydub \
  ruff \
  mypy

💻 Usage Examples

Basic Usage

from mcp_server_whisper import start

# Start MCP Server Whisper with default configuration
start()

# Or specify a custom audio file path
AUDIO_FILES_PATH = "/path/to/your/audio/files"
start(audio_files_path=AUDIO_FILES_PATH)

📚 Documentation

MCP Server Configuration

Configure by creating a mcp_server_whisper_config.json file:

{
  "servers": {
    "whisper": {
      "host": "localhost",
      "port": 3001,
      "workers": 4,
      "max_body_size": "5mb"
    }
  },
  "openai": {
    "api_key": "your_openai_api_key",
    "model": "gpt-4o-transcribe",
    "temperature": 0.7
  }
}

🔧 Technical Details

Toolchain

The project uses modern Python development tools:

# Run tests
pytest

# Tests with coverage
pytest --cov=src

# Format code
ruff format src

# Check code style
ruff check src

# Run type checking (strict mode)
mypy --strict src

Key Components

MCP Protocol: Exposes audio processing capabilities through a standardized MCP tool interface.
Parallel Processing: Uses asyncio and batch processing to improve performance.
File Management: Implements detection, verification, conversion, and compression of audio files.
Rich Transcription: Provides high-quality transcription with different OpenAI models (including gpt-4o-transcribe).
Optimized Performance: Built-in caching mechanism to speed up repetitive operations.