Voice Gen MCP

A voice generation MCP server based on Minimax AI and Amazon S3, providing text-to-speech functionality and automatically uploading audio files to cloud storage

Voice processing Cloud storage #Voice Generation #Text-to-Speech #Cloud Storage #MCP Service .Python

rating : 2 points

downloads : 10.2K

update time : 2025-09-18

Open Site

What is the Voice Generation MCP Server?

This is a professional speech synthesis service that can convert any text into natural and fluent voice audio. By integrating Minimax AI's advanced voice technology, it generates high-quality voice content and automatically stores the generated audio files in Amazon S3 cloud storage for easy sharing and use.

How to use the voice generation service?

Simply provide the text content to be converted, select your favorite voice model and timbre, set the speech rate parameter, and the system will automatically generate the voice file and return an accessible S3 link. It supports multiple integration methods, including HTTP API, SSE, and standard input/output.

Applicable scenarios

Suitable for various scenarios that require converting text into voice, such as audio content production, voice assistant responses, educational learning materials, barrier-free reading assistance, and multimedia content creation.

Main Features

Text-to-Speech

Use Minimax AI's high-quality speech synthesis technology to convert text into natural and fluent voice

Automatic Cloud Storage

The generated audio files are automatically uploaded to Amazon S3, providing reliable storage and convenient sharing links

MCP Protocol Support

Fully compatible with the Model Context Protocol and can be seamlessly integrated with various AI assistants

Multiple Transmission Protocols

Supports three communication methods: HTTP, SSE, and STDIO, adapting to different deployment environments

Customizable Audio

Supports adjusting the speech rate, selecting different timbres and models to meet personalized needs

Containerized Deployment

Provides support for Docker and Docker Compose, simplifying deployment and maintenance

Advantages

High-quality voice output, close to the effect of real human pronunciation

Automated workflow, directly storing the text-to-speech result in the cloud service

Flexible configuration options, supporting multiple voice models and parameter adjustments

Easy to integrate, supporting the standard MCP protocol and multiple transmission methods

Containerized deployment, reducing environmental dependencies and deployment complexity

Limitations

Requires a Minimax AI API key and an Amazon S3 account

Highly dependent on the network, requiring a stable Internet connection

Audio generation and upload require a certain amount of processing time

The free quota is limited, and heavy usage may incur fees

How to Use

Environment Preparation

Ensure you have Minimax AI API credentials and access to an Amazon S3 bucket

Configure Environment Variables

Set the necessary API key and S3 configuration information

Start the Service

Choose to start the voice generation service locally or using Docker

Call the Generation Interface

Call the generate_voice tool through the MCP protocol to generate voice

Usage Examples

Create a Welcome Voice

Generate a personalized welcome voice message for new users

Produce Educational Content

Convert course text into voice to produce audio learning materials

Quick Voice Prompts

Generate quick voice prompts and notifications for applications

Frequently Asked Questions

What accounts and permissions are required to use this service?

Which voice models and timbres are supported?

How should the speech rate parameter be set?

Where are the generated audio files stored?

Is custom storage path supported?

Related Resources

Minimax AI Official Documentation

The official technical documentation for the Minimax AI voice API

Amazon S3 Developer Guide

The development documentation and guide for the Amazon S3 storage service

Model Context Protocol Specification

The official specification and implementation guide for the MCP protocol

Docker Official Documentation

The usage and deployment documentation for Docker container technology

🚀 Voice Generation MCP Server

A Model Context Protocol (MCP) server that leverages the Minimax AI API to offer voice generation capabilities. It converts text into speech and automatically uploads the generated audio files to Amazon S3 for convenient access and sharing.

🚀 Quick Start

The Voice Generation MCP Server is a powerful tool that uses the Minimax AI API to convert text to speech and uploads the audio to Amazon S3. Follow the installation and configuration steps below to get started.

✨ Features

Text-to-Speech Generation: Utilize Minimax AI's voice synthesis API to convert text into high - quality speech.
S3 Integration: Automatically upload generated audio files to Amazon S3 with an organized directory structure.
MCP Protocol Support: Fully compatible with the Model Context Protocol for seamless integration with AI assistants.
Authentication: Built - in API key authentication ensures secure access.
Multiple Transport Modes: Supports HTTP, SSE, and STDIO transport protocols.
Docker Support: Facilitates easy deployment with Docker and Docker Compose.
Configurable Audio Settings: Allows customization of sample rate, bitrate, and format options.

📦 Installation

Local Installation

Clone the repository

git clone <repository-url>
cd voice-gen-mcp

Create a virtual environment

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies
```
pip install -r requirements.txt
```

Configure environment variables

cp env.example .env
# Edit .env with your actual configuration values

Docker Installation

Build the Docker image
```
docker build -t voice-gen-mcp .
```

Run with Docker Compose

cp env.example .env
# Edit .env with your configuration
docker-compose up -d

📚 Documentation

Prerequisites

Python 3.8 or higher
Minimax AI API credentials
Amazon S3 bucket and credentials
(Optional) Docker and Docker Compose for containerized deployment

Configuration

Environment Variables

Create a .env file based on env.example with the following required variables:

Voice Generation API (Required)

VOICE_GEN_API_GROUP_ID=your_minimax_group_id
VOICE_GEN_API_KEY=your_minimax_api_key

S3 Configuration (Required)

S3_BUCKET_NAME=your_s3_bucket_name
S3_REGION=us-east-1
S3_ACCESS_KEY_ID=your_s3_access_key_id
S3_SECRET_ACCESS_KEY=your_s3_secret_access_key
S3_ENDPOINT=https://s3.amazonaws.com
S3_PREFIX=voice-gen/

💻 Usage Examples

Starting the Server

Local Development

python3 server.py

Docker

docker run -d \
  --name voice-gen-mcp \
  -p 8000:8000 \
  --env-file .env \
  voice-gen-mcp

Docker Compose

docker-compose up -d

MCP Clients

The server supports multiple transport modes:

HTTP: http://localhost:8000/mcp
SSE: http://localhost:8000/sse
STDIO: Direct process communication

Available Tools

`generate_voice`

Converts text to speech and uploads to S3.

Parameters:

text (string, required): The text to convert to speech
model (string, optional): Model to use (default: "speech-2.5-hd-preview")
voice_id (string, optional): Voice ID to use (default: "mylxsw_voice_1")
speed (float, optional): Speech speed (default: 1.0, typically 0.5 - 2.0)

Returns:

Success message with S3 URL and file size
Error message if generation fails

Example:

{
  "text": "Hello, this is a test of the voice generation system.",
  "model": "speech-2.5-hd-preview",
  "voice_id": "mylxsw_voice_1",
  "speed": 1.2
}