English | δΈζ
Enables large language models to perform fine-grained object detection and image understanding, powered by DINO-X and Grounding DINO 1.6 API.
Although multimodal models can understand and describe images, they often lack precise localization and high-quality structured outputs for visual content.
With DINO-X MCP, you can:
π§ Achieve fine-grained image understanding β both full-scene recognition and targeted detection based on natural language.
π― Accurately obtain object count, position, and attributes, enabling tasks such as visual question answering.
π§© Integrate with other MCP Servers to build multi-step visual workflows.
π οΈ Build natural language-driven visual agents for real-world automation scenarios.
You can install Node.js using one of the following methods:
# For MacOS or Linux
# 1. Install nvm (Node Version Manager)
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.1/install.sh | bash
# OR
wget -qO- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.1/install.sh | bash
# 2. Add these lines to your profile (~/.bash_profile, ~/.zshrc, ~/.profile, or ~/.bashrc)
export NVM_DIR="$HOME/.nvm"
[ -s "$NVM_DIR/nvm.sh" ] && \. "$NVM_DIR/nvm.sh"
[ -s "$NVM_DIR/bash_completion" ] && \. "$NVM_DIR/bash_completion"
# 3. Activate nvm in current shell
source ~/.bashrc
# Or
source ~/.zshrc
# 4. Verify nvm installation
command -v nvm
# 5. Install and use LTS version of Node.js
nvm install --lts
nvm use --lts
# For Windows
winget install OpenJS.NodeJS.LTS
# Or using PowerShell (Administrator)
iwr -useb https://raw.githubusercontent.com/chocolatey/chocolatey/master/chocolateyInstall/InstallChocolatey.ps1 | iex
choco install nodejs-lts -y
Download the installer from nodejs.org
Also, choose an AI assistants and applications that support the MCP Client, including but not limited to:
You can use DINO-X MCP server in two ways:
Add the following configuration in your MCP client:
{
"mcpServers": {
"dinox-mcp": {
"command": "npx",
"args": ["-y", "@deepdataspace/dinox-mcp"],
"env": {
"DINOX_API_KEY": "your-api-key-here",
"IMAGE_STORAGE_DIRECTORY": "/path/to/your/image/directory"
}
}
}
}
First, clone and build the project:
# Clone the project
git clone https://github.com/IDEA-Research/DINO-X-MCP.git
cd DINO-X-MCP
# Install dependencies
pnpm install
# Build the project
pnpm run build
Then configure your MCP client:
{
"mcpServers": {
"dinox-mcp": {
"command": "node",
"args": ["/path/to/DINO-X-MCP/build/index.js"],
"env": {
"DINOX_API_KEY": "your-api-key-here",
"IMAGE_STORAGE_DIRECTORY": "/path/to/your/image/directory"
}
}
}
}
Get your API key from DINO-X Platform (A free quota is available for new users).
Replace your-api-key-here
in the configuration above with your actual API key.
The DINO-X MCP server supports the following environment variables:
Variable Name | Description | Required | Default Value | Example |
---|---|---|---|---|
DINOX_API_KEY |
Your DINO-X API key for authentication | Required | - | your-api-key-here |
IMAGE_STORAGE_DIRECTORY |
Directory where generated visualization images will be saved | Optional | macOS/Linux: /tmp/dinox-mcp Windows: %TEMP%\dinox-mcp |
/Users/admin/Downloads/dinox-images |
Restart your MCP client, and you should be able to use the following tools:
Method Name | Description | Input | Output |
---|---|---|---|
detect-all-objects |
Detects and localizes all recognizable objects in an image. | Image | Category names + bounding boxes + captions |
object-detection-by-text |
Detects and localizes objects in an image based on a natural language prompt. | Image + Text prompt | Bounding boxes + object captions |
detect-human-pose-keypoints |
Detects 17 human body keypoints per person in an image for pose estimation. | Image | Keypoint coordinates and captions |
visualize-detections |
Visualizes detection results by drawing bounding boxes and labels on the image. | Image + Detection results | Annotated image saved to storage directory |
- Remote URLs starting with
https://
π - Local file paths (starting with
file://
) - Common image formats:
jpg, jpeg, png, webp
Please refer to DINO-X Platform for API usage limits and pricing information.
During development, you can use watch mode for automatic rebuilding:
pnpm run watch
Use MCP Inspector to debug the server:
pnpm run inspector
Apache License 2.0