dino-x-mcp - 集成DINO - X与Grounding DINO 1.6 API，实现细粒度物体检测和图像理解的MCP工具

探索

Dino X MCP

DINO-X MCP是一个结合大型语言模型与DINO-X、Grounding DINO 1.6 API的项目，旨在实现细粒度物体检测和图像理解，支持自然语言驱动的视觉任务和自动化场景。

图像与视频处理开发者工具 #物体检测 #图像理解 #自然语言处理 #视觉分析 .TypeScript

评分 : 2.5分

下载量 : 9.5K

更新时间 : 2025-07-23

打开站点

什么是DINO-X MCP?

DINO-X MCP是一个图像理解和目标检测服务，能够通过自然语言指令对图像进行细粒度分析。它结合了DINO-X和Grounding DINO 1.6技术，提供精准的物体定位、属性识别和场景理解。

如何使用DINO-X MCP?

用户可以通过自然语言提示输入图像，并利用DINO-X MCP解析图像内容，获取物体位置、数量及属性信息。支持远程URL和本地文件路径作为输入源。

适用场景

适用于需要精确图像理解的场景，如工业质量检查、智能安防、自动驾驶、医疗影像分析等。可集成到多步骤视觉工作流中。

主要功能

细粒度图像理解

不仅识别图像整体内容，还能针对特定对象进行检测和描述。

自然语言驱动的检测

根据用户提供的自然语言提示，检测并定位图像中的对象。

人体姿态分析

检测图像中人体的关键点，用于姿势估计。

可视化检测结果

在图像上绘制边界框和标签，方便直观查看检测结果。

优势

支持自然语言指令，操作简单直观

高精度的目标检测和属性识别

可集成到多步骤视觉工作流中

支持多种图像格式和输入方式

局限性

依赖于网络连接和API调用

处理复杂场景时可能需要更强大的计算资源

对于模糊或低质量图像可能产生误差

如何使用

安装Node.js

下载并安装Node.js环境，确保系统满足运行要求。

配置MCP服务器

在MCP客户端中配置DINO-X MCP服务器，包括命令行参数和环境变量。

获取API密钥

访问DINO-X平台申请API密钥，用于身份验证。

启动服务

根据配置启动DINO-X MCP服务，等待接收请求。

使用案例

检测森林中的火区

输入森林图片和自然语言提示，检测并可视化火区位置。

统计仓库中的纸箱数量

输入仓库图片和自然语言提示，统计纸箱数量。

检测红色汽车

输入图片和自然语言提示，检测并可视化红色汽车。

分析瑜伽姿势

输入瑜伽图片，检测人体关键点以分析姿势。

常见问题

DINO-X MCP需要哪些前置条件？

如何获取API密钥？

支持哪些图像格式？

是否支持中文提示？

🚀 DINO-X MCP

DINO-X MCP 借助 DINO-X 和 Grounding DINO 1.6 API，赋能大语言模型实现细粒度的目标检测和图像理解。

🚀 快速开始

1. 前提条件

你可以使用以下方法之一安装 Node.js：

选项 A：命令行安装 👍

# 适用于 MacOS 或 Linux
# 1. 安装 nvm（Node 版本管理器）
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.1/install.sh | bash
# 或者
wget -qO- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.1/install.sh | bash

# 2. 将以下行添加到你的配置文件（~/.bash_profile、~/.zshrc、~/.profile 或 ~/.bashrc）
export NVM_DIR="$HOME/.nvm"
[ -s "$NVM_DIR/nvm.sh" ] && \. "$NVM_DIR/nvm.sh"  
[ -s "$NVM_DIR/bash_completion" ] && \. "$NVM_DIR/bash_completion"  

# 3. 在当前 shell 中激活 nvm
source ~/.bashrc
# 或者
source ~/.zshrc   

# 4. 验证 nvm 安装
command -v nvm

# 5. 安装并使用 Node.js 的 LTS 版本
nvm install --lts
nvm use --lts

# 适用于 Windows
winget install OpenJS.NodeJS.LTS
# 或者使用 PowerShell（管理员权限）
iwr -useb https://raw.githubusercontent.com/chocolatey/chocolatey/master/chocolateyInstall/InstallChocolatey.ps1 | iex
choco install nodejs-lts -y

选项 B：手动安装

从 nodejs.org 下载安装程序。

此外，选择一个支持 MCP 客户端的 AI 助手和应用程序，包括但不限于：

2. 配置 MCP 服务器

你可以通过两种方式使用 DINO-X MCP 服务器：

选项 A：使用 NPM 包 👍

在你的 MCP 客户端中添加以下配置：

{
  "mcpServers": {
    "dinox-mcp": {
      "command": "npx",
      "args": ["-y", "@deepdataspace/dinox-mcp"],
      "env": {
        "DINOX_API_KEY": "your-api-key-here",
        "IMAGE_STORAGE_DIRECTORY": "/path/to/your/image/directory"
      }
    }
  }
}

选项 B：使用本地项目

首先，克隆并构建项目：

# 克隆项目
git clone https://github.com/IDEA-Research/DINO-X-MCP.git
cd DINO-X-MCP

# 安装依赖
pnpm install

# 构建项目
pnpm run build

然后配置你的 MCP 客户端：

{
  "mcpServers": {
    "dinox-mcp": {
      "command": "node",
      "args": ["/path/to/DINO-X-MCP/build/index.js"],
      "env": {
        "DINOX_API_KEY": "your-api-key-here",
        "IMAGE_STORAGE_DIRECTORY": "/path/to/your/image/directory"
      }
    }
  }
}

3. 获取 API 密钥

从 DINO-X 平台获取你的 API 密钥（新用户可享受免费配额）。

将上述配置中的 your-api-key-here 替换为你实际的 API 密钥。

4. 环境变量

DINO-X MCP 服务器支持以下环境变量：

变量名	描述	是否必需	默认值	示例
`DINOX_API_KEY`	用于身份验证的 DINO-X API 密钥	必需	-	`your-api-key-here`
`IMAGE_STORAGE_DIRECTORY`	生成的可视化图像将保存的目录	可选	macOS/Linux: `/tmp/dinox-mcp` Windows: `%TEMP%\dinox-mcp`	`/Users/admin/Downloads/dinox-images`

5. 可用工具

重启你的 MCP 客户端，你应该能够使用以下工具：

方法名	描述	输入	输出
`detect-all-objects`	检测并定位图像中所有可识别的对象。	图像	类别名称 + 边界框 + 描述
`object-detection-by-text`	根据自然语言提示检测并定位图像中的对象。	图像 + 文本提示	边界框 + 对象描述
`detect-human-pose-keypoints`	检测图像中每个人的 17 个人体关键点，用于姿态估计。	图像	关键点坐标和描述
`visualize-detections`	通过在图像上绘制边界框和标签来可视化检测结果。	图像 + 检测结果	保存到存储目录的标注图像

✨ 主要特性

细粒度图像理解：借助 DINO-X MCP，能够实现全场景识别和基于自然语言的目标检测，达成细粒度的图像理解。
精准信息获取：可以准确获取对象的数量、位置和属性，为视觉问答等任务提供支持。
多步工作流构建：能够与其他 MCP 服务器集成，构建多步视觉工作流。
视觉代理搭建：可构建由自然语言驱动的视觉代理，适用于现实世界的自动化场景。

🎬 使用案例

🎯 场景	📝 输入	✨ 输出
检测与定位	💬 提示： `检测并可视化森林中的火灾区域` 🖼️ 输入图像：
对象计数	💬 提示： `请分析此仓库图像，检测所有纸板箱并统计总数` 🖼️ 输入图像：
特征检测	💬 提示： `在图像中找到所有红色汽车` 🖼️ 输入图像：
属性推理	💬 提示： `在图像中找到最高的人，描述他们的穿着` 🖼️ 输入图像：
全场景检测	💬 提示： `在图像中找到维生素 C 含量最高的水果` 🖼️ 输入图像：	答案：猕猴桃（93mg/100g）
姿态分析	💬 提示： `请分析这是什么瑜伽姿势` 🖼️ 输入图像：

📝 使用说明

支持的图像格式

以 https:// 开头的远程 URL 👍
本地文件路径（以 file:// 开头）
常见图像格式：jpg, jpeg, png, webp

API 文档

请参考 DINO-X 平台获取 API 使用限制和定价信息。

🛠️ 开发

监听模式

在开发过程中，你可以使用监听模式进行自动重建：

pnpm run watch

调试

使用 MCP 检查器调试服务器：

pnpm run inspector

📄 许可证

本项目采用 Apache License 2.0 许可证。

object-detection-by-text

Analyze an image based on a text prompt to identify and count specific objects, and return detailed descriptions of the objects and their 2D coordinates.

参数

imageFileUri : string*

描述

URI of the input image. Preferred for remote or local files. Must start with "https://" or "file://".

参数

textPrompt : string*

描述

Nouns of target objects (English only, avoid adjectives). Use periods to separate multiple categories (e.g., 'person.car.traffic light').

参数

includeDescription : boolean*

描述

Whether to return a description of the objects detected in the image, but will take longer to process.

detect-all-objects

Analyze an image to detect all identifiable objects, returning the category, count, coordinate positions and detailed descriptions for each object.

参数

imageFileUri : string*

描述

URI of the input image. Preferred for remote or local files. Must start with "https://" or "file://".

参数

includeDescription : boolean*

描述

Whether to return a description of the objects detected in the image, but will take longer to process.

detect-human-pose-keypoints

Detects 17 keypoints for each person in an image, supporting body posture and movement analysis.

参数

imageFileUri : string*

描述

URI of the input image. Preferred for remote or local files. Must start with "https://" or "file://".

参数

includeDescription : boolean*

描述

Whether to return a description of the objects detected in the image, but will take longer to process.

visualize-detections

Visualize detection results by drawing bounding boxes and labels on the original image. Images are saved to the directory specified by IMAGE_STORAGE_DIRECTORY environment variable.

参数

imageFileUri : string*

描述

URI of the input image. Preferred for remote or local files. Must start with "https://" or "file://".

参数

detections : array*

描述

Array of detection results with name and bbox information.

参数

fontSize : number*

描述

Font size for labels (default: 24)