One-click MCP TTS Deployment with REST API: GPU-accelerated & Multilingual in kyutai-tts-docker

Kyutai Tts Docker

The Docker deployment solution for Kyutai TTS provides a one-click startup web interface, REST API, and MCP tool support, supporting GPU acceleration and multilingual interfaces.

Voice processing Artificial intelligence chatbots #Voice synthesis #Docker deployment #MCP tools #GPU acceleration .Python

rating : 2.5 points

downloads : 6.5K

update time : 2025-12-29

Open Site

What is Kyutai TTS MCP Server?

Kyutai TTS MCP Server is a text-to-speech service based on the Model Context Protocol. It allows AI assistants (such as Claude, Cursor, etc.) to call high-quality voice synthesis functions through the standardized MCP protocol, converting text into natural and fluent speech. This service is based on the 1.6B parameter TTS model open-sourced by Kyutai Labs, supporting English and French, and providing audio output close to human voice quality.

How to use Kyutai TTS MCP Server?

Using Kyutai TTS MCP Server is very simple: First, start the service via Docker. Then, add the MCP server address to the configuration of your AI assistant. After startup, the AI assistant can directly call the text-to-speech function. You can use the generated voice in various ways, such as voice playback, file saving, or API calls.

Applicable scenarios

Kyutai TTS MCP Server is suitable for various scenarios: when an AI assistant needs voice output (such as voice assistants, audiobook generation), applications that need to convert text content into voice, voice explanations in educational tools, text-to-speech functions in accessibility applications, and any automated workflow that requires high-quality voice synthesis.

Main features

MCP protocol integration

Fully compatible with the Model Context Protocol standard, it can be seamlessly integrated with any AI assistant that supports MCP, such as Claude Desktop, Cursor, etc.

High-quality voice synthesis

Based on the 1.6B parameter TTS model of Kyutai Labs, it generates natural and fluent audio close to human voice quality, supporting English and French.

Multiple output formats

Supports multiple output methods such as real-time voice playback, WAV file saving, and Base64 encoding return, meeting the needs of different application scenarios.

Intelligent GPU management

Automatically manages GPU memory, supporting automatic release of GPU resources when idle to optimize resource utilization efficiency.

Flexible configuration

Supports various configuration options, including voice parameter adjustment, output format selection, GPU device specification, etc., to meet personalized needs.

Docker containerization

Provides a complete Docker image and Docker Compose configuration for one-click deployment without complex environment configuration.

Advantages

Standardized integration: Based on the MCP protocol, it has good compatibility with mainstream AI assistants

High-quality output: The 1.6B parameter model provides voice quality close to that of humans

Easy deployment: Docker containerization allows for one-click startup without complex configuration

Resource optimization: Intelligent GPU memory management improves resource utilization

Multilingual support: Natively supports English and French voice synthesis

Flexible output: Supports multiple audio formats and output methods

Limitations

Hardware requirements: Requires NVIDIA GPU support, with certain hardware requirements

Language limitations: Currently mainly supports English and French, with limited support for other languages

Model size: The 1.6B parameter model requires 3 - 4GB of GPU memory

Real-time performance: It takes some time to load the model for the first time, not suitable for ultra-low latency scenarios

How to use

Start the MCP server

Start the Kyutai TTS MCP server using Docker. Make sure Docker and the NVIDIA container runtime are installed.

Configure the AI assistant

Add the MCP server address to the configuration of your AI assistant (such as Claude Desktop). Usually, you need to specify the server URL and tool list in the configuration file.

Test the connection

Start the AI assistant and test the MCP connection. Usually, you can verify the success of the connection by checking the tool list or performing a simple text-to-speech test.

Start using

Now you can directly use the text-to-speech function in the AI assistant. You can call voice synthesis through natural language instructions or specific commands.

Usage examples

AI assistant voice feedback

Let the AI assistant provide voice output while answering questions to enhance the interaction experience.

Document to audiobook conversion

Convert long documents or articles into audiobooks for easy listening on the go.

Multilingual content voice conversion

Convert English or French content into voice for language learning or content consumption.

Application voice prompts

Add voice prompts and feedback functions to applications.

Frequently Asked Questions

What kind of hardware do I need to run this service?

How to integrate the MCP server with Claude Desktop?

Which languages and voice styles are supported?

How fast is the voice generation?

Can multiple requests be processed simultaneously?

How to monitor the service status and performance?

Related resources

GitHub repository

Complete source code, Docker configuration, and usage documentation

Docker Hub image

Pre-built Docker image supporting one-click deployment

Model Context Protocol documentation

Official documentation and specifications of the MCP protocol

Kyutai Labs official website

Official website of the TTS model development team

MCP guide documentation

Detailed MCP integration and usage guide

🚀 Kyutai TTS Docker Deployment

This project offers a production-ready Docker deployment for Kyutai TTS, featuring a user interface, REST API, and MCP support.

English | 简体中文 | 繁體中文 | 日本語

✨ Features

🚀 One-Click Deployment: Automatically selects the GPU and detects ports.
🎨 Three Access Modes: Web UI, REST API, and MCP tools are available.
🧠 Smart GPU Management: Implements lazy loading and automatic memory release.
🌐 Multi-language UI: Offers both English and Chinese interfaces.
📦 All-in-One Image: Contains all necessary models without external dependencies.
🔒 Production Ready: Comes with HTTPS, health checks, and monitoring features.

🚀 Quick Start

Using Docker Hub (Recommended)

docker run -d \
  --name kyutai-tts \
  --gpus all \
  -p 8900:8900 \
  -e NVIDIA_VISIBLE_DEVICES=0 \
  neosun/kyutai-tts:allinone

Access the application at: http://localhost:8900

Using Docker Compose

git clone https://github.com/neosun100/kyutai-tts-docker.git
cd kyutai-tts-docker
./start.sh

📦 Installation

Prerequisites

Docker 20.10 or higher
Docker Compose 2.0 or higher
NVIDIA GPU with CUDA support
nvidia-docker runtime

Method 1: Pull from Docker Hub

docker pull neosun/kyutai-tts:allinone

Method 2: Build from Source

git clone https://github.com/neosun100/kyutai-tts-docker.git
cd kyutai-tts-docker
docker-compose build

⚙️ Configuration

Environment Variables

Property	Details
`PORT`	The default service port is 8900.
`DEVICE`	The default device type is `cuda`. Options are `cuda` or `cpu`.
`GPU_IDLE_TIMEOUT`	The default GPU idle timeout is 60 seconds.
`NVIDIA_VISIBLE_DEVICES`	The default GPU ID to use is 0.

Example `.env` File

PORT=8900
DEVICE=cuda
GPU_IDLE_TIMEOUT=60
NVIDIA_VISIBLE_DEVICES=0

💻 Usage Examples

Basic Usage

Web UI

Open your browser and navigate to http://localhost:8900.
Enter the text you want to synthesize.
Optionally, adjust the parameters.
Click the "Generate" button.
You can play or download the generated audio.

REST API

Generate Speech

curl -X POST http://localhost:8900/api/tts \
  -F "text=Hello, world!" \
  -F "cfg_coef=2.0" \
  --output output.wav

Check GPU Status

curl http://localhost:8900/api/gpu/status

Release GPU Memory

curl -X POST http://localhost:8900/api/gpu/offload

MCP Tools

For detailed usage of MCP, refer to MCP_GUIDE.md.

result = await mcp_client.call_tool(
    "text_to_speech",
    {
        "text": "Hello from MCP!",
        "output_path": "/tmp/output.wav"
    }
)

📚 Documentation

Project Structure

kyutai-tts-docker/
├── app.py                 # Flask application
├── gpu_manager.py         # GPU resource manager
├── mcp_server.py          # MCP server
├── Dockerfile             # Docker image
├── Dockerfile.allinone    # All-in-one image
├── docker-compose.yml     # Docker Compose config
├── start.sh               # One-click startup script
├── test_api.sh            # API test script
└── docs/                  # Documentation
    ├── QUICKSTART.md
    ├── MCP_GUIDE.md
    └── TEST_REPORT.md

Tech Stack

Framework: Flask 3.0
ML Framework: PyTorch 2.7 + CUDA 12.1
TTS Model: Kyutai TTS 1.6B (English/French)
API Docs: Swagger/Flasgger
MCP: FastMCP 0.2
Container: Docker + nvidia-docker

API Documentation

Once the application is running, you can access the Swagger documentation at: http://localhost:8900/apidocs

Available Endpoints

GET /health: Health check
GET /api/gpu/status: Check GPU status
POST /api/tts: Generate speech
POST /api/gpu/offload: Release GPU memory

Production Deployment

With Nginx Reverse Proxy

server {
    listen 443 ssl;
    server_name your-domain.com;
    
    location / {
        proxy_pass http://localhost:8900;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
}

Multi-GPU Setup

# GPU 0
NVIDIA_VISIBLE_DEVICES=0 PORT=8900 docker-compose up -d

# GPU 1
NVIDIA_VISIBLE_DEVICES=1 PORT=8901 docker-compose up -d

Performance

Model Size: 1.6B parameters
GPU Memory: 3 - 4GB
Latency: 350ms (L40S, 32 concurrent)
Speed: 3 - 5x real-time
Audio Quality: 16-bit PCM, 24kHz

🤝 Contributing

Contributions are welcome! Please follow these steps to contribute:

Fork the repository.
Create a new feature branch (git checkout -b feature/AmazingFeature).
Commit your changes (git commit -m 'Add some AmazingFeature').
Push to the branch (git push origin feature/AmazingFeature).
Open a Pull Request.

📝 Changelog

v1.0.0 (2025-12-14)

Initial release
Docker deployment with GPU support
Web UI with multi-language support
REST API with Swagger docs
MCP server implementation
All-in-one Docker image

📄 License

Python code is licensed under the MIT License.
Rust code is licensed under the Apache License.
Model weights are licensed under CC-BY 4.0.

🙏 Acknowledgments

Thanks to Kyutai Labs for providing the TTS model.
Thanks to Moshi for the implementation.

⭐ Star History

📱 Follow Us

公众号

Markdownify MCP

Markdownify is a multi-functional file conversion service that supports converting multiple formats such as PDFs, images, audio, and web page content into Markdown format.

The GitLab MCP server is a project based on the Model Context Protocol that provides a comprehensive toolset for interacting with GitLab accounts, including code review, merge request management, CI/CD configuration, and other functions.

A Python-based MCP Server that provides advanced to-do list management and content organization functions through the Notion API, enabling seamless integration between AI models and Notion.

Python

19.5K

4.5 points

Duckduckgo MCP Server

Certified

The DuckDuckGo Search MCP Server provides web search and content scraping services for LLMs such as Claude.

Framelink Figma MCP Server is a server that provides access to Figma design data for AI programming tools (such as Cursor). By simplifying the Figma API response, it helps AI more accurately achieve one - click conversion from design to code.

UnityMCP is a Unity editor plugin that implements the Model Context Protocol (MCP), providing seamless integration between Unity and AI assistants, including real - time state monitoring, remote command execution, and log functions.

29.3K

5 points

Gmail MCP Server

A Gmail automatic authentication MCP server designed for Claude Desktop, supporting Gmail management through natural language interaction, including complete functions such as sending emails, label management, and batch operations.

The MiniMax Model Context Protocol (MCP) is an official server that supports interaction with powerful text-to-speech, video/image generation APIs, and is suitable for various client tools such as Claude Desktop and Cursor.

Python

43.5K

4.8 points

Zhiqi Future, Your AI Solution Think Tank

English 简体中文繁體中文にほんご

Kyutai Tts Docker

Overview

Tools List

Content Details

Alternatives

What is Kyutai TTS MCP Server?

How to use Kyutai TTS MCP Server?

Applicable scenarios

Main features

How to use

Usage examples

Frequently Asked Questions

Related resources

Installation

🚀 Kyutai TTS Docker Deployment

✨ Features

🚀 Quick Start

Using Docker Hub (Recommended)

Using Docker Compose

📦 Installation

Prerequisites

Method 1: Pull from Docker Hub

Method 2: Build from Source

⚙️ Configuration

Environment Variables

Example .env File

💻 Usage Examples

Basic Usage

Web UI

REST API

Generate Speech

Check GPU Status

Release GPU Memory

MCP Tools

📚 Documentation

Project Structure

Tech Stack

API Documentation

Available Endpoints

Production Deployment

With Nginx Reverse Proxy

Multi-GPU Setup

Performance

🤝 Contributing

📝 Changelog

v1.0.0 (2025-12-14)

📄 License

🙏 Acknowledgments

⭐ Star History

📱 Follow Us

Alternatives

Example `.env` File