Minimind Docker

Artificial intelligence chatbots Developer tools .Python

rating : 2 points

downloads : 7.7K

update time : 2026-03-13

Open Site

Installation

Copy the following command to your Client for configuration

{
  "mcpServers": {
    "minimind": {
      "command": "python",
      "args": ["mcp_server.py"],
      "env": {
        "MODEL_PATH": "MiniMind2-Small",
        "GPU_IDLE_TIMEOUT": "600"
      }
    }
  }
}

Note: Your key is sensitive information, do not share it with anyone.

🚀 MiniMind Docker

All-in-One Docker deployment for MiniMind LLM with UI, API & MCP support

Live Demo · API Docs · Original Project

✨ Features

🐳 One-Click Docker Deployment: All dependencies are bundled, and it's ready to run.
🎨 Modern Web UI: Responsive design with dark mode and multi-language support.
🔌 OpenAI-Compatible API: A drop-in replacement for existing applications.
🤖 MCP Integration: Model Context Protocol for AI agent workflows.
🎮 Smart GPU Management: Auto-select idle GPU and auto-release memory.
📊 Real-time Streaming: SSE-based streaming responses.
🌍 Multi-language UI: English, Simplified Chinese, Traditional Chinese, Japanese.

🚀 Quick Start

Docker (Recommended)

# Pull and run
docker run -d --gpus all -p 8998:8998 neosun/minimind:latest

# Access
# UI: http://localhost:8998
# API: http://localhost:8998/v1/chat/completions
# Docs: http://localhost:8998/apidocs/

Docker Compose

git clone https://github.com/neosu/minimind-docker.git
cd minimind-docker
./start.sh

📦 Installation

Prerequisites

Docker 20.10+
Docker Compose 2.0+
NVIDIA GPU with CUDA 12.1+ (optional, CPU fallback available)
nvidia-container-toolkit (for GPU support)

Method 1: Docker Run

# Basic (CPU)
docker run -d -p 8998:8998 neosun/minimind:latest

# With GPU
docker run -d --gpus all -p 8998:8998 neosun/minimind:latest

# With custom model path
docker run -d --gpus all -p 8998:8998 \
  -v /path/to/models:/app/models \
  -e MODEL_PATH=/app/models/MiniMind2 \
  neosun/minimind:latest

Method 2: Docker Compose

# docker-compose.yml
services:
  minimind:
    image: neosun/minimind:latest
    ports:
      - "8998:8998"
    environment:
      - NVIDIA_VISIBLE_DEVICES=0
      - GPU_IDLE_TIMEOUT=60
      - MODEL_PATH=MiniMind2-Small
    volumes:
      - /tmp/minimind:/app/uploads
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

docker compose up -d

Method 3: Local Development

# Clone
git clone https://github.com/neosu/minimind-docker.git
cd minimind-docker

# Install dependencies
pip install -r requirements.txt

# Download model
python -c "from huggingface_hub import snapshot_download; snapshot_download('jingyaogong/MiniMind2-Small', local_dir='MiniMind2-Small')"

# Run
python app.py

⚙️ Configuration

Environment Variables

Property	Details
`PORT`	`8998` (Server port)
`MODEL_PATH`	`MiniMind2-Small` (Model path or HuggingFace ID)
`GPU_IDLE_TIMEOUT`	`60` (Seconds before auto-releasing GPU memory)
`NVIDIA_VISIBLE_DEVICES`	`0` (GPU device ID)
`MAX_SEQ_LEN`	`8192` (Maximum sequence length)
`TEMPERATURE`	`0.85` (Default generation temperature)

.env Example

PORT=8998
GPU_IDLE_TIMEOUT=60
NVIDIA_VISIBLE_DEVICES=0
MODEL_PATH=MiniMind2-Small

💻 Usage Examples

Web UI

Visit http://localhost:8998 for the interactive chat interface.

Features:

Adjustable parameters (Temperature, Max Tokens, Top P)
GPU status monitoring
One-click GPU memory release
Multi-language support (EN/CN/TW/JP)
Dark mode support

REST API

Chat Completion (OpenAI Compatible)

curl -X POST http://localhost:8998/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "minimind",
    "messages": [{"role": "user", "content": "Hello!"}],
    "temperature": 0.7,
    "max_tokens": 512,
    "stream": false
  }'

Streaming Response

curl -X POST http://localhost:8998/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "minimind",
    "messages": [{"role": "user", "content": "Tell me a story"}],
    "stream": true
  }'

GPU Status

# Check status
curl http://localhost:8998/api/gpu/status

# Release GPU memory
curl -X POST http://localhost:8998/api/gpu/offload

MCP Integration

Configure in your MCP client:

{
  "mcpServers": {
    "minimind": {
      "command": "python",
      "args": ["mcp_server.py"],
      "env": {
        "MODEL_PATH": "MiniMind2-Small",
        "GPU_IDLE_TIMEOUT": "600"
      }
    }
  }
}

Available Tools:

chat - Single-turn conversation
multi_turn_chat - Multi-turn conversation
get_gpu_status - Query GPU status
get_model_info - Get model information
release_gpu - Release GPU memory

See MCP_GUIDE.md for detailed documentation.

📚 Documentation

Endpoint	Method	Description
`/`	GET	Web UI
`/health`	GET	Health check
`/api/gpu/status`	GET	GPU status
`/api/gpu/offload`	POST	Release GPU memory
`/v1/chat/completions`	POST	Chat API (OpenAI compatible)
`/apidocs/`	GET	Swagger documentation

🔧 Technical Details

minimind-docker/
├── app.py              # Main application (UI + API)
├── mcp_server.py       # MCP server
├── Dockerfile          # Docker build file
├── docker-compose.yml  # Docker Compose config
├── start.sh           # One-click start script
├── requirements.txt    # Python dependencies
├── .env.example       # Environment template
├── MCP_GUIDE.md       # MCP documentation
├── model/             # Tokenizer files
├── trainer/           # Training scripts
└── scripts/           # Utility scripts

Framework: Flask + FastMCP
Model: MiniMind2 (Transformer-based LLM)
GPU: CUDA 12.1 + PyTorch 2.6
Container: Docker + nvidia-container-toolkit
API: OpenAI-compatible REST API
Docs: Swagger/Flasgger