๐ MiniMind Docker
All-in-One Docker deployment for MiniMind LLM with UI, API & MCP support
Live Demo ยท API Docs ยท Original Project
โจ Features
- ๐ณ One-Click Docker Deployment: All dependencies are bundled, and it's ready to run.
- ๐จ Modern Web UI: Responsive design with dark mode and multi-language support.
- ๐ OpenAI-Compatible API: A drop-in replacement for existing applications.
- ๐ค MCP Integration: Model Context Protocol for AI agent workflows.
- ๐ฎ Smart GPU Management: Auto-select idle GPU and auto-release memory.
- ๐ Real-time Streaming: SSE-based streaming responses.
- ๐ Multi-language UI: English, Simplified Chinese, Traditional Chinese, Japanese.
๐ Quick Start
Docker (Recommended)
docker run -d --gpus all -p 8998:8998 neosun/minimind:latest
Docker Compose
git clone https://github.com/neosu/minimind-docker.git
cd minimind-docker
./start.sh
๐ฆ Installation
Prerequisites
- Docker 20.10+
- Docker Compose 2.0+
- NVIDIA GPU with CUDA 12.1+ (optional, CPU fallback available)
- nvidia-container-toolkit (for GPU support)
Method 1: Docker Run
docker run -d -p 8998:8998 neosun/minimind:latest
docker run -d --gpus all -p 8998:8998 neosun/minimind:latest
docker run -d --gpus all -p 8998:8998 \
-v /path/to/models:/app/models \
-e MODEL_PATH=/app/models/MiniMind2 \
neosun/minimind:latest
Method 2: Docker Compose
services:
minimind:
image: neosun/minimind:latest
ports:
- "8998:8998"
environment:
- NVIDIA_VISIBLE_DEVICES=0
- GPU_IDLE_TIMEOUT=60
- MODEL_PATH=MiniMind2-Small
volumes:
- /tmp/minimind:/app/uploads
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
docker compose up -d
Method 3: Local Development
git clone https://github.com/neosu/minimind-docker.git
cd minimind-docker
pip install -r requirements.txt
python -c "from huggingface_hub import snapshot_download; snapshot_download('jingyaogong/MiniMind2-Small', local_dir='MiniMind2-Small')"
python app.py
โ๏ธ Configuration
Environment Variables
| Property |
Details |
PORT |
8998 (Server port) |
MODEL_PATH |
MiniMind2-Small (Model path or HuggingFace ID) |
GPU_IDLE_TIMEOUT |
60 (Seconds before auto-releasing GPU memory) |
NVIDIA_VISIBLE_DEVICES |
0 (GPU device ID) |
MAX_SEQ_LEN |
8192 (Maximum sequence length) |
TEMPERATURE |
0.85 (Default generation temperature) |
.env Example
PORT=8998
GPU_IDLE_TIMEOUT=60
NVIDIA_VISIBLE_DEVICES=0
MODEL_PATH=MiniMind2-Small
๐ป Usage Examples
Web UI
Visit http://localhost:8998 for the interactive chat interface.
Features:
- Adjustable parameters (Temperature, Max Tokens, Top P)
- GPU status monitoring
- One-click GPU memory release
- Multi-language support (EN/CN/TW/JP)
- Dark mode support
REST API
Chat Completion (OpenAI Compatible)
curl -X POST http://localhost:8998/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "minimind",
"messages": [{"role": "user", "content": "Hello!"}],
"temperature": 0.7,
"max_tokens": 512,
"stream": false
}'
Streaming Response
curl -X POST http://localhost:8998/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "minimind",
"messages": [{"role": "user", "content": "Tell me a story"}],
"stream": true
}'
GPU Status
curl http://localhost:8998/api/gpu/status
curl -X POST http://localhost:8998/api/gpu/offload
MCP Integration
Configure in your MCP client:
{
"mcpServers": {
"minimind": {
"command": "python",
"args": ["mcp_server.py"],
"env": {
"MODEL_PATH": "MiniMind2-Small",
"GPU_IDLE_TIMEOUT": "600"
}
}
}
}
Available Tools:
chat - Single-turn conversation
multi_turn_chat - Multi-turn conversation
get_gpu_status - Query GPU status
get_model_info - Get model information
release_gpu - Release GPU memory
See MCP_GUIDE.md for detailed documentation.
๐ Documentation
| Endpoint |
Method |
Description |
/ |
GET |
Web UI |
/health |
GET |
Health check |
/api/gpu/status |
GET |
GPU status |
/api/gpu/offload |
POST |
Release GPU memory |
/v1/chat/completions |
POST |
Chat API (OpenAI compatible) |
/apidocs/ |
GET |
Swagger documentation |
๐ง Technical Details
minimind-docker/
โโโ app.py # Main application (UI + API)
โโโ mcp_server.py # MCP server
โโโ Dockerfile # Docker build file
โโโ docker-compose.yml # Docker Compose config
โโโ start.sh # One-click start script
โโโ requirements.txt # Python dependencies
โโโ .env.example # Environment template
โโโ MCP_GUIDE.md # MCP documentation
โโโ model/ # Tokenizer files
โโโ trainer/ # Training scripts
โโโ scripts/ # Utility scripts
- Framework: Flask + FastMCP
- Model: MiniMind2 (Transformer-based LLM)
- GPU: CUDA 12.1 + PyTorch 2.6
- Container: Docker + nvidia-container-toolkit
- API: OpenAI-compatible REST API
- Docs: Swagger/Flasgger
๐ค Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository.
- Create your feature branch (
git checkout -b feature/AmazingFeature).
- Commit your changes (
git commit -m 'Add some AmazingFeature').
- Push to the branch (
git push origin feature/AmazingFeature).
- Open a Pull Request.
๐ Changelog
v1.0.0 (2026-01-04)
- ๐ Initial release
- ๐ณ Docker all-in-one deployment
- ๐จ Web UI with multi-language support
- ๐ OpenAI-compatible API
- ๐ค MCP integration
- ๐ฎ Smart GPU management
๐ License
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
Based on MiniMind by Jingyao Gong.
โญ Star History

๐ฑ Follow Us