Gemini Media Analysis

An MCP server based on Google Gemini AI that provides image, audio, and video recognition functions, supporting multiple transmission methods and client integration.

Image and video processing Voice processing #Video Recognition #AI Analysis #Multimodality #Gemini .TypeScript

rating : 2.5 points

downloads : 19

update time : 2025-04-28

What is the MCP Video Recognition Server?

This is an intelligent server based on the Model Context Protocol (MCP) that leverages the powerful capabilities of Google Gemini AI to analyze image, audio, and video content. It can help you automatically identify and describe the content in multimedia files.

How to use the MCP Video Recognition Server?

You can use this service through simple API calls or by integrating it into development environments such as FLUJO. Simply provide the multimedia file path and optional analysis prompts, and the server will return a detailed content description.

Use Cases

Suitable for scenarios such as content moderation, multimedia indexing, accessibility (describing images/videos for the visually impaired), and media content analysis.

Main Features

Image RecognitionAnalyze image content using Google Gemini AI and provide a detailed text description

Audio RecognitionTranscribe and analyze the content of audio files, supporting custom prompts to guide the analysis

Video RecognitionAnalyze video content and describe scene changes and key events

Advantages and Limitations

Advantages

Based on Google Gemini AI, providing high-quality recognition results

Supporting multiple media types (image/audio/video)

Easy to integrate into existing development environments (such as FLUJO)

Supporting custom analysis prompts for flexible control of the output

Limitations

Requiring a Google API key

Relying on external API services, which may have usage restrictions

Taking a long time to process large files

How to Use

Install the Server

It can be installed manually or using the FLUJO integrated environment.

Configure the API Key

Set the GOOGLE_API_KEY environment variable.

Start the Server

Start the server using the npm command.

Send an Analysis Request

Send a request containing the file path and analysis prompts through the MCP protocol.

Usage Examples

Image Content DescriptionAnalyze a landscape photo and generate a detailed description

Meeting Recording TranscriptionConvert a meeting recording into text and extract key points

Video Content AnalysisAnalyze a teaching video and extract the main content

Frequently Asked Questions

How to obtain a Google Gemini API key?

Which file formats are supported?

Are there any restrictions on processing large files?

How to integrate it into my application?

Related Resources

Google Gemini API Documentation

Official guide for using the Gemini API

FLUJO Project Homepage

Integrated development environment project

MCP Protocol Specification

Official documentation for the Model Context Protocol

🚀 MCP Video Recognition Server

A server based on the Model Context Protocol (MCP) that provides image, audio, and video recognition tools using Google's Gemini AI.

✨ Features

Image Recognition: Analyze and describe images using Google Gemini AI.
Audio Recognition: Analyze and transcribe audio using Google Gemini AI.
Video Recognition: Analyze and describe videos using Google Gemini AI.

📋 Prerequisites

Node.js 18 or higher.
Google Gemini API key.

📦 Installation

Manual Installation

Clone the repository:

git clone https://github.com/yourusername/mcp-video-recognition.git
cd mcp-video-recognition

Install dependencies:
```
npm install
```
Build the project:
```
npm run build
```

Installation in FLUJO

Click "Add Server".
Copy and paste the Github URL into FLUJO.
Click "Resolve", "Clone", "Install", "Build", and "Save".

Installation via Configuration File

To integrate with Cline or other MCP clients via a configuration file:

Open your Cline settings:
- In VS Code, go to File -> Preferences -> Settings.
- Search for "Cline MCP settings".
- Click "Edit in settings.json".

Add the server configuration to the mcpServers object:

{
  "mcpServers": {
    "video-recognition": {
      "command": "node",
      "args": [
        "/path/to/mcp-video-recognition/dist/index.js"
      ],
      "disabled": false,
      "autoApprove": []
    }
  }
}

Replace /path/to/mcp-video-recognition/dist/index.js with the actual path to the index.js file in the project directory. Use forward slashes (/) or double backslashes (\) on Windows.
Save the settings file. Cline should automatically connect to the server.

⚙️ Configuration

Configure the server using the following environment variables:

GOOGLE_API_KEY: Google Gemini API key.
TRANSPORT_TYPE: Transport type (e.g., http, grpc).
PORT: Server port.

💻 Usage Examples

Starting the Server

npm start

Tool Usage Instructions

Basic Usage

Image Recognition

{
  "tool": "image_recognition",
  "params": {
    "input_path": "path/to/image.jpg"
  }
}

Audio Recognition

{
  "tool": "audio_recognition",
  // Assume there is more content here in the original, keep it as is
}

Featured MCP Services

Notion Api MCP

Certified

A Python-based MCP Server that provides advanced to-do list management and content organization functions through the Notion API, enabling seamless integration between AI models and Notion.

Python

141

4.5 points

Duckduckgo MCP Server

Certified

The DuckDuckGo Search MCP Server provides web search and content scraping services for LLMs such as Claude.

Markdownify is a multi-functional file conversion service that supports converting multiple formats such as PDFs, images, audio, and web page content into Markdown format.

The GitLab MCP server is a project based on the Model Context Protocol that provides a comprehensive toolset for interacting with GitLab accounts, including code review, merge request management, CI/CD configuration, and other functions.

TypeScript

4.3 points

Figma Context MCP

Framelink Figma MCP Server is a server that provides access to Figma design data for AI programming tools (such as Cursor). By simplifying the Figma API response, it helps AI more accurately achieve one - click conversion from design to code.

UnityMCP is a Unity editor plugin that implements the Model Context Protocol (MCP), providing seamless integration between Unity and AI assistants, including real - time state monitoring, remote command execution, and log functions.

567

5 points

Minimax MCP Server

The MiniMax Model Context Protocol (MCP) is an official server that supports interaction with powerful text-to-speech, video/image generation APIs, and is suitable for various client tools such as Claude Desktop and Cursor.

Context7 MCP is a service that provides real-time, version-specific documentation and code examples for AI programming assistants. It is directly integrated into prompts through the Model Context Protocol to solve the problem of LLMs using outdated information.

TypeScript

5.2K

4.7 points

Zhiqi Future, Your AI Solution Think Tank

English 简体中文繁體中文にほんご