K8s GPU MCP Server: Real-time Diagnosis & Troubleshooting for NVIDIA GPUs in Kubernetes Cluster

K8s Gpu MCP Server

An instant SRE diagnostic agent for NVIDIA GPU hardware diagnosis in Kubernetes clusters, providing real-time GPU hardware detection and fault troubleshooting functions through the MCP protocol.

Monitoring Developer tools #GPU diagnostics #Kubernetes #SRE tools #Hardware monitoring .Go

rating : 2.5 points

downloads : 6.3K

update time : 2026-03-12

Open Site

What is k8s-gpu-mcp-server?

This is a diagnostic tool specifically designed for NVIDIA GPUs in Kubernetes clusters. It integrates with AI assistants (such as Claude and Cursor) through the Model Context Protocol (MCP), allowing you to directly ask the AI assistant about GPU health status, temperature, errors, etc., without manually running complex command-line tools.

How to use k8s-gpu-mcp-server?

Simply install and configure it once in your AI assistant (Claude Desktop or Cursor IDE). After that, you can ask GPU-related questions just like having a conversation with the assistant. For example: 'Check the GPU temperature of node gpu-worker-5' or 'Analyze recent GPU errors'.

Applicable scenarios

When running GPU-intensive tasks such as AI training and inference in your Kubernetes cluster, if you encounter performance degradation, task failure, or abnormal GPU resources, you can use this tool to quickly diagnose the problem. It is particularly suitable for operation and maintenance teams, AI engineers, and researchers.

Main features

Real-time GPU monitoring

Obtain key indicators such as GPU temperature, power consumption, memory usage, and utilization in real-time without installing an additional monitoring system.

Hardware health check

Comprehensively check the GPU hardware health status, including ECC errors, XID error code analysis, thermal throttling status, etc.

Kubernetes integration

Automatically associate GPU hardware with Kubernetes Pods, view which Pod is using which GPU, and the resource allocation situation.

AI assistant friendly

Designed specifically for AI assistants such as Claude and Cursor, complex GPU diagnostics can be performed through natural language.

Historical data recording

Built-in flight recorder continuously records GPU indicators, and the GPU status at historical time points can be queried.

Safe read-only mode

It runs in read-only mode by default and will not make any modifications to the GPU or the system, ensuring the security of the production environment.

Advantages

No need to learn complex commands: Interact through the natural language of the AI assistant

Rapid deployment: One-click installation and can be used within a few minutes

Low resource consumption: Only requires 15 - 20MB of memory when idle

Production-ready: Tested with real Tesla T4 GPUs

Open source and free: Based on the Apache 2.0 license

Multi-platform support: Supports Claude Desktop, Cursor IDE, etc.

Limitations

Only supports NVIDIA GPUs: Does not support AMD or other brand GPUs

Requires NVIDIA drivers: Depends on the NVML library and correctly installed drivers

Kubernetes environment: Mainly designed for K8s clusters, with limitations for single-machine use

Read-only diagnostics: The current version is mainly a diagnostic tool, and repair operations are limited

How to use

Installation and configuration

Add MCP server settings to the configuration file according to the AI assistant you are using (Claude Desktop or Cursor).

Start the AI assistant

Restart your AI assistant (Claude Desktop or Cursor IDE) to make the configuration take effect.

Start the conversation

Directly ask GPU-related questions in the AI assistant, and the assistant will automatically call the corresponding diagnostic tools.

View the results

The AI assistant will display the diagnostic results in a clear and understandable format, including problem analysis and suggestions.

Usage examples

Case 1: Diagnose training task failure

AI training tasks repeatedly fail on a specific node, and it is necessary to quickly locate GPU hardware problems.

Case 2: Monitor GPU temperature

The temperature in the computer room rises in summer, and it is necessary to monitor whether the GPUs are overheating and causing performance degradation.

Case 3: Troubleshoot resource contention

Multiple teams report insufficient GPU resources, and it is necessary to view the actual usage situation.

Case 4: Analyze historical problems

There was a brief GPU failure last night, and it is necessary to view the GPU status at the time of the failure.

Frequently Asked Questions

Do I need to have an NVIDIA GPU to use this tool?

Is this tool safe? Will it affect the production environment?

Which AI assistants are supported?

Do I need to install it on each Kubernetes node?

Will the diagnostic data be sent to the cloud?

How to update to the new version?

Related resources

GitHub repository

Source code, issue tracking, and contribution guidelines

Quick start guide

Detailed steps to get started in 5 minutes

Kubernetes deployment guide

A complete guide for deploying in a production K8s cluster

Model Context Protocol official website

Understand the technical details and specifications of the MCP protocol

Helm Chart repository

Use Helm to deploy to Kubernetes with one click

🚀 k8s-gpu-mcp-server

Just-in-Time SRE Diagnostic Agent for NVIDIA GPU Clusters on Kubernetes

🚀 Quick Start

One-Click Install

Click the button above to install automatically in Cursor.

One-Line Installation

# Using npx (recommended)
npx k8s-gpu-mcp-server@latest

# Or install globally
npm install -g k8s-gpu-mcp-server

📋 Manual Configuration: Cursor / VS Code

Add to ~/.cursor/mcp.json (Cursor) or VS Code MCP config:

{
  "mcpServers": {
    "k8s-gpu-mcp": {
      "command": "npx",
      "args": ["-y", "k8s-gpu-mcp-server@latest"]
    }
  }
}

📋 Manual Configuration: Claude Desktop

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json

{
  "mcpServers": {
    "k8s-gpu-mcp": {
      "command": "npx",
      "args": ["-y", "k8s-gpu-mcp-server@latest"]
    }
  }
}

Install from Source

# Clone and build
git clone https://github.com/ArangoGutierrez/k8s-gpu-mcp-server.git
cd k8s-gpu-mcp-server
make agent

# Test with mock GPUs (no hardware required)
cat examples/gpu_inventory.json | ./bin/agent --nvml-mode=mock

# Test with real GPU (requires NVIDIA driver)
cat examples/gpu_inventory.json | ./bin/agent --nvml-mode=real

Deploy to Kubernetes

# Deploy with Helm OCI (recommended)
helm install k8s-gpu-mcp-server \
  oci://ghcr.io/arangogutierrez/charts/k8s-gpu-mcp-server \
  --namespace gpu-diagnostics --create-namespace

# Or from local chart
helm install k8s-gpu-mcp-server ./deployment/helm/k8s-gpu-mcp-server \
  --namespace gpu-diagnostics --create-namespace

# Find agent pod on target node
NODE_NAME=<node-name>
POD=$(kubectl get pods -n gpu-diagnostics \
  -l app.kubernetes.io/name=k8s-gpu-mcp-server \
  --field-selector spec.nodeName=$NODE_NAME \
  -o jsonpath='{.items[0].metadata.name}')

# Start diagnostic session
kubectl exec -it -n gpu-diagnostics $POD -- /agent --mode=read-only

⚠️ Important Note

GPU access requires runtimeClassName: nvidia configured by GPU Operator or nvidia-ctk. For clusters without RuntimeClass, use fallback: --set gpu.runtimeClass.enabled=false --set gpu.resourceRequest.enabled=true

Configure Claude Desktop with kubectl (Advanced)

For deployed agents, add to your Claude Desktop configuration:

{
  "mcpServers": {
    "k8s-gpu-agent": {
      "command": "kubectl",
      "args": ["exec", "-i", "deploy/k8s-gpu-mcp-server", "-n", "gpu-diagnostics", "--", "/agent"]
    }
  }
}

Then ask Claude: "What's the temperature of the GPUs?"

📖 Full Quick Start Guide → | Kubernetes Deployment →

✨ Features

🎯 Low-Footprint, Always-Available - Persistent HTTP server (~15 - 20MB idle) performs GPU work only when tools are invoked
🔌 HTTP Transport - JSON-RPC 2.0 over HTTP/SSE (production default)
🔍 Deep Hardware Access - Direct NVML integration for GPU diagnostics
🤖 AI-Native - Built for Claude Desktop, Cursor, and MCP-compatible hosts
📋 MCP Prompts - Pre-built GPU diagnostic workflows for guided troubleshooting
🔒 Secure by Default - Read-only operations with explicit operator mode
⚡ Production Ready - Real Tesla T4 testing, 550+ tests passing

📦 Installation

Using npm (Recommended)

# Run directly with npx
npx k8s-gpu-mcp-server@latest

# Or install globally
npm install -g k8s-gpu-mcp-server

From Source

git clone https://github.com/ArangoGutierrez/k8s-gpu-mcp-server.git
cd k8s-gpu-mcp-server
make agent
sudo mv bin/agent /usr/local/bin/k8s-gpu-mcp-server

Using Go

go install github.com/ArangoGutierrez/k8s-gpu-mcp-server/cmd/agent@latest

Container Image

docker pull ghcr.io/arangogutierrez/k8s-gpu-mcp-server:latest

Helm Chart (OCI)

# Install from GHCR OCI registry
helm install k8s-gpu-mcp-server \
  oci://ghcr.io/arangogutierrez/charts/k8s-gpu-mcp-server \
  --namespace gpu-diagnostics --create-namespace

💻 Usage Examples

Basic Usage

# One-line installation using npx
npx k8s-gpu-mcp-server@latest

Advanced Usage

# Deploy to Kubernetes using Helm OCI
helm install k8s-gpu-mcp-server \
  oci://ghcr.io/arangogutierrez/charts/k8s-gpu-mcp-server \
  --namespace gpu-diagnostics --create-namespace

📚 Documentation

Quick Start Guide - Get running in 5 minutes
Kubernetes Deployment - K8s deployment and configuration
Architecture - System design and components
Security Model - RBAC and security configuration
MCP Usage - How to consume the MCP server
Development Guide - Contributing guidelines
Examples - Sample JSON-RPC requests

🔧 Technical Details

Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                    MCP Client (Claude/Cursor)                        │
└────────────────────────────┬────────────────────────────────────────┘
                             │ stdio / HTTP
                             ▼
┌─────────────────────────────────────────────────────────────────────┐
│                    Gateway Pod (:8080)                               │
│       Router → Circuit Breaker → HTTP Client                         │
└────────────────────────────┬────────────────────────────────────────┘
                             │ HTTP (pod-to-pod)
         ┌───────────────────┼───────────────────┐
         ▼                   ▼                   ▼
┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐
│  Agent (Node 1) │  │  Agent (Node 2) │  │  Agent (Node N) │
│  9 MCP Tools    │  │  9 MCP Tools    │  │  9 MCP Tools    │
│  NVML → GPU     │  │  NVML → GPU     │  │  NVML → GPU     │
└─────────────────┘  └─────────────────┘  └─────────────────┘

Design Principles:

HTTP-First: Gateway routes via HTTP to agent pods (~50ms latency)
Low Footprint: Persistent HTTP server, ~15 - 20MB memory
Observable: Circuit breaker, Prometheus metrics, distributed tracing
Interface Abstraction: Testable, flexible, portable (538 tests)

📖 Architecture Documentation →

Available Tools

Tool	Description	Category	Status
`get_gpu_inventory`	Hardware inventory + telemetry	NVML	✅ Available
`get_gpu_health`	GPU health monitoring with scoring	NVML	✅ Available
`analyze_xid_errors`	Parse GPU XID error codes from kernel logs	NVML	✅ Available
`get_nvlink_topology`	NVLink interconnect topology and health	NVML	✅ Available
`get_gpu_timeline`	Historical GPU metrics from flight recorder	NVML + Blackbox	✅ Available
`describe_gpu_node`	Node-level GPU diagnostics with K8s metadata	K8s + NVML	✅ Available
`get_pod_gpu_allocation`	GPU-to-Pod correlation via resource requests	K8s	✅ Available
`explain_failure`	Root cause analysis for failed GPU workloads	K8s + Incidents	✅ Available
`get_incident_report`	Detailed incident report with timeline and snapshots	K8s + Incidents	✅ Available
`kill_gpu_process`	Terminate GPU process	Operator	🚧 M4 (Operator)
`reset_gpu`	GPU reset	Operator	🚧 M4 (Operator)

Available Prompts

MCP Prompts provide guided diagnostic workflows that orchestrate multiple tools. See for prompt definitions.

Prompt	Description
`gpu-health-check`	Comprehensive GPU health assessment with recommendations
`diagnose-xid-errors`	Analyze NVIDIA XID errors with remediation guidance
`gpu-triage`	Standard SRE triage workflow: inventory → health → XID analysis

Example usage with Claude:

You: "Run the GPU triage workflow for node gpu-worker-5"

Claude: [Executes gpu-triage prompt]
        → Calls get_gpu_inventory, get_gpu_health, analyze_xid_errors
        → Returns structured triage report with recommendations

Operation Modes

Mode	Flag	Description
Read-Only (default)	`--mode=read-only`	All diagnostic tools, no mutations
Operator	`--mode=operator`	Enables future mutating operations (kill process, reset GPU)

Read-only mode is the default and recommended for most use cases. Operator mode enables future M4 tools that perform write operations on GPUs.

Flight Recorder

The agent includes a built-in flight recorder (pkg/blackbox) that continuously captures GPU telemetry (temperature, power, utilization, memory) into per-GPU ring buffers. This enables tools like get_gpu_timeline and get_incident_report to query historical GPU metrics around the time of a failure.

The flight recorder starts automatically with the agent and requires no additional configuration. Data is retained in memory for the configured window (default: 30 minutes).

📖 MCP Usage Guide →

📈 Project Status

Current Milestone: M3: Kubernetes Integration

Progress: ~90% Complete (HTTP Transport ✅, Gateway ✅, K8s Tools ✅)

Completed Milestones

✅ M1: Foundation & API - Completed Jan 3, 2026
✅ M2: Hardware Introspection - Completed Jan 10, 2026
- Real NVML integration, tested on Tesla T4
- GPU health monitoring, XID error analysis
- npm/Helm distribution

Recent Updates (Jan 2026)

Jan 17: MCP Prompts support - 3 built-in GPU diagnostic workflows
Jan 16: Documentation 360 review for external contributors
Jan 15: K8s tools complete (describe_gpu_node, get_pod_gpu_allocation)
Jan 14: HTTP Transport Epic complete - 150× latency improvement
Jan 14: Cross-node networking fix (Calico VXLAN)
Jan 13: Gateway mode with circuit breaker & Prometheus metrics

📊 View All Milestones →

🧪 Testing

Unit Tests (No GPU Required)

make test                   # Run all unit tests (538 tests passing)
make coverage               # Generate coverage report
make coverage-html          # View coverage in browser

Integration Tests (Requires GPU)

make test-integration       # Run on GPU hardware
# Or manually:
go test -tags=integration -v ./pkg/nvml/

Latest Test Results:

✓ 538 total tests passing
✓ Race detector enabled (-race)
✓ Coverage: 58 - 80% by package

Integration tested on Tesla T4:
  - GPU: Tesla T4 (15GB)
  - Temperature: 29°C
  - Power: 13.9W
  - All NVML operations verified

🏗️ Build

# Build for local platform
make agent

# Build for Linux (with real NVML)
CGO_ENABLED=1 GOOS=linux GOARCH=amd64 make agent

# Build container image
make image

# Multi-arch release builds
make dist

Binary Sizes:

Mock mode: 4.3MB (CGO disabled)
Real mode: 7.9MB (CGO enabled)

🤝 Contributing

We welcome contributions! Please see our Development Guide for details.

Quick Contribution Guide

Check open issues
Fork and create feature branch: git checkout -b feat/my-feature
Make changes, add tests
Run checks: make all
Commit with DCO: git commit -s -S -m "feat(scope): description"
Open PR with labels and milestone

📖 Full Development Guide →

📄 License

Apache License 2.0 - See LICENSE for details.

🙏 Acknowledgments

NVIDIA NVML - GPU Management Library
Model Context Protocol - MCP Specification
mcp-go - MCP Go Implementation
Anthropic Claude - AI Assistant
Cursor - AI-Powered IDE

📞 Contact

Maintainer: @ArangoGutierrez
Issues: GitHub Issues
Discussions: GitHub Discussions

**⭐ Star us on GitHub — it helps!** [Report Bug](https://github.com/ArangoGutierrez/k8s-gpu-mcp-server/issues/new?template=bug_report.yml) · [Request Feature](https://github.com/ArangoGutierrez/k8s-gpu-mcp-server/issues/new?template=feature_request.yml) · [View Roadmap](https://github.com/ArangoGutierrez/k8s-gpu-mcp-server/milestones)