Kokoro Tts MCP

Kokoro Text to Speech (TTS) MCP Server, supporting the generation of MP3 files and optional uploading to S3 storage

Voice processing Developer tools #Text to speech #Voice synthesis #Cloud storage #Audio processing .Python

rating : 2.5 points

downloads : 35

update time : 2025-04-29

What is the Kokoro TTS MCP service?

The Kokoro TTS MCP service is a text-to-speech (TTS) solution that receives text input and generates corresponding voice MP3 files. The service is built on the Model Context Protocol (MCP), supports multiple voice styles and speed adjustments, and can automatically upload the generated audio files to AWS S3 cloud storage.

How to use the Kokoro TTS service?

You can use this service through a simple command-line client or by directly calling the MCP protocol. The service supports instant text conversion or reading content from a file, and the generated audio files can be saved locally or in the cloud.

Use cases

This service is suitable for various scenarios that require voice synthesis, such as: audiobook generation, voice assistant responses, educational content production, accessible access, etc. It is particularly suitable for workflows that require batch processing of text or automated voice generation.

Main features

Multi-voice supportProvides a variety of preset voice styles (such as af_heart, en_female, etc.) to meet the needs of different scenarios

Speed adjustmentYou can adjust the voice playback speed (0.5 - 2.0 times the normal speed) to get the best auditory experience

S3 cloud storage integrationSupports automatically uploading the generated MP3 files to AWS S3 storage for easy sharing and management

Intelligent file managementAutomatic cleaning of old files. You can set the number of days to keep or delete the local copy immediately after uploading

Advantages and limitations

Advantages

A simple and easy-to-use command-line interface for easy integration into automated processes

Supports multiple language and voice style selections

Flexible cloud storage options to reduce local storage pressure

Open-source model support without additional licensing fees

Limitations

Requires installing dependency tools such as ffmpeg

Needs to download a large voice model file for the first use

Limited advanced voice customization functions

How to use

Environment preparation

Install the necessary dependencies, including the Python environment and the ffmpeg tool

Download the voice model

Get the Kokoro Onnx weight file from GitHub and put it in the project directory

Configure the service

Create a .env file or set environment variables to configure AWS credentials and voice parameters

Start the service

Run the MCP server using uvicorn

Use the client

Send text through the command-line client for voice synthesis

Usage examples

Generate a welcome voiceCreate multi-language welcome voices for a website

Batch process documentsConvert long documents into audiobooks

Automated voice remindersIntegrate into the notification system to generate voice reminders

Frequently asked questions

How to change the default voice?

Where are the generated audio files saved?

What languages does the service support?

How to disable the S3 upload function?

Related resources

Kokoro Onnx project

Source code and weight files of the voice model

HuggingFace demo space

Experience the Kokoro TTS effect online

FFmpeg installation guide

Get and install the FFmpeg tool

🚀 Kokoro Text-to-Speech (TTS) MCP Server

The Kokoro Text-to-Speech MCP server is used to generate .mp3 files and provides an option to upload them to S3.

Usage: https://huggingface.co/spaces/hexgrad/Kokoro-TTS

🚀 Quick Start

✨ Features

Generate .mp3 files from text.
Provide an option to upload the generated .mp3 files to S3.

📦 Installation

Clone the Repository

Clone the repository to your local machine.

Download Model Files

Download kokoro-v1.0.onnx and voices-v1.0.bin from Kokoro Onnx weights and store them in the same repository.

Configure MCP

Add the following content to your MCP configuration and update it with your own values.

  "kokoro-tts-mcp": {
      "command": "uv",
      "args": [
        "--directory",
        "/path/toyourlocal/kokoro-tts-mcp",
        "run",
        "mcp-tts.py"
      ],
      "env": {
        "TTS_VOICE": "af_heart",
        "TTS_SPEED": "1.0",
        "TTS_LANGUAGE": "en-us",
        "AWS_ACCESS_KEY_ID": "",
        "AWS_SECRET_ACCESS_KEY": "",
        "AWS_REGION": "us-east-1",
        "AWS_S3_FOLDER": "mp3",
        "S3_ENABLED": "true",
        "MP3_FOLDER": "/path/to/mp3"
      } 
    }

Install ffmpeg

This is required to convert .wav files to .mp3 files.

For Mac:

brew install ffmpeg

Configure Environment Variables

To run locally, add these contents to your .env file. Refer to env.example, copy it to .env, and modify it according to your own values.

💻 Usage Examples

Run the Server Locally

The recommended method is to use UV:

uv run mcp-tts.py

Text-to-Speech Client

To use the text-to-speech function, run the following command in the terminal:

uv run mcp-client.py --tts text-to-speech <your-text>

Replace <your-text> with the actual text content.

Example: Configure and Run the MCP Server and Client

Start the MCP server:

uv run mcp-tts.py

In another terminal window, start the TTS client:

uv run mcp-client.py --tts text-to-speech "你好，Kokoro！"

This will use the Kokoro model to convert Chinese text to speech and generate an .mp3 file.

📚 Documentation

Supported Environment Variables

Property	Details
`AWS_ACCESS_KEY_ID`	Your AWS access key ID
`AWS_SECRET_ACCESS_KEY`	Your AWS secret access key
`AWS_S3_BUCKET_NAME`	S3 bucket name
`AWS_S3_REGION`	S3 region (e.g., us-east-1)
`AWS_S3_FOLDER`	Folder path in the S3 bucket
`AWS_S3_ENDPOINT_URL`	Optional custom S3-compatible storage endpoint URL
`MCP_HOST`	The host to which the server binds (default: 0.0.0.0)
`MCP_PORT`	The port to listen on (default: 9876)
`MCP_CLIENT_HOST`	The hostname for the client to connect to the server (default: localhost)
`DEBUG`	Enable debug mode (set to "true" or "1")
`S3_ENABLED`	Enable S3 upload (set to "true" or "1")
`MP3_FOLDER`	The path to store MP3 files (default is the 'mp3' folder in the script directory)
`MP3_RETENTION_DAYS`	The number of days to retain MP3 files before automatic deletion
`DELETE_LOCAL_AFTER_S3_UPLOAD`	Whether to delete local MP3 files after successful upload to S3 (set to "true" or "1")
`TTS_VOICE`	The default voice for the TTS client (default: af_heart)
`TTS_SPEED`	The default speed for the TTS client (default: 1.0)
`TTS_LANGUAGE`	The default language for the TTS client (default: en-us)

🔧 Technical Details

Customization and Extension

To customize the behavior of the MCP server, edit the configuration parameters in mcp-tts.py. For example, you can adjust the audio output format, sampling rate, or bit depth.

To add more features or integrate other services, extend the code of the MCP server by adding new handlers and services.

Text-to-Speech Example

import uvicorn
from fastapi import FastAPI
from fastapi.responses import FileResponse
from pydub import AudioSegment
import os

app = FastAPI()

@app.get("/text-to-speech/{text}")
async def text_to_speech(text: str):
    # Here it is assumed that you have the logic to generate speech, such as using the API of iFlytek or Alibaba Cloud
    # The following is an example, and the actual implementation needs to be adjusted according to the specific situation
    audio = AudioSegment.from_text(text, language="zh-CN")
    file_path = os.path.join("audio_files", f"output_{len(audio)}_bytes.mp3")
    audio.export(file_path, format="mp3")
    return FileResponse(file_path)

if __name__ == "__main__":
    uvicorn.run(app, host="0.0.0.0", port=8000)