VideoCutter MCP Server: All-in-one Tool for Video, Audio & Image with AI Editing

Videocutter

VideoCutter is a professional multimedia tool that integrates video, audio, and image processing. It supports AI - intelligent editing and the MCP protocol, providing a one - stop intelligent creation solution.

Image and video processing Entertainment and media #Video Processing #AI Editing #MCP Protocol #Multimedia Tool

rating : 2 points

downloads : 6.2K

update time : 2025-09-18

Open Site

What is the VideoCutter MCP server?

The VideoCutter MCP server is a professional media processing service based on the Model Context Protocol. It enables AI agents to call 67 professional tools such as video editing, audio processing, and image editing through natural language. It supports two transmission modes, SSE and HTTP Streamable, providing powerful multimedia processing capabilities for AI applications.

How to use the VideoCutter MCP server?

You can use an AI agent to describe your media processing requirements in natural language, for example, 'Help me crop this video to 1 minute in length and add subtitles'. The server supports two connection methods: the SSE mode for real-time progress monitoring and the HTTP Streamable mode for complex interaction scenarios.

Applicable scenarios

It is suitable for scenarios that require media processing, such as content creation, short - video production, podcast editing, education and training, and corporate promotion. Both individual creators and professional teams can quickly complete complex media processing tasks through AI agents.

Main Features

Video Processing

Supports complete video editing functions such as video splitting, merging, speed change, rotation, cropping, filters, color adjustment, overlay synthesis, etc.

Audio Processing

Provides professional audio processing tools such as audio splitting, merging, speed change, volume adjustment, fade - in and fade - out, reverb effects, and vocal enhancement.

Image Processing

Includes comprehensive image editing functions such as image cropping, rotation, scaling, filters, special effects, overlay synthesis, and format conversion.

AI Generation Function

Integrates AI models for text generation, image generation, and video generation, supporting creation functions such as text - to - image, text - to - video, and image - to - video conversion.

Batch Processing

Supports batch overlay of images and text through command files, greatly improving processing efficiency. It supports an 81 - grid precise positioning system.

MCP Agent Integration

Deeply supports AI agents to call 67 professional tools through natural language, supporting both SSE and HTTP Streamable transmission modes.

Advantages

One - stop media processing: Integrates three major processing modules for video, audio, and images to meet all editing needs.

AI intelligent optimization: Built - in multiple AI models provide intelligent text, image, and video generation capabilities.

Natural language interaction: Supports AI agents to call complex functions in natural language through the MCP protocol.

Dual transmission modes: Supports both SSE real - time monitoring and HTTP Streamable two - way interaction modes.

Precise positioning system: A unique 81 - grid positioning system provides pixel - level precise control.

Batch processing efficiency: Supports batch operations, greatly improving processing efficiency.

Limitations

Requires certain hardware resources: AI models and video processing require good CPU and GPU performance.

Learning curve: Although natural language is supported, complex functions still require an understanding of basic concepts.

Network dependency: Cloud - based AI services require a stable network connection.

File size limitation: Processing extremely large files may be limited by memory.

How to Use

Start the MCP Server

Ensure that the VideoCutter service is running normally. The MCP server will automatically start on ports 8000 (SSE) and 8001 (HTTP Streamable).

Connect the AI Agent

Configure your AI application or agent to connect to the MCP server, using the corresponding server address and transmission mode.

Use Natural Language to Call Functions

Use an AI agent to describe your needs in natural language, such as video editing, audio processing, and image editing tasks.

Monitor the Processing Progress

Monitor the progress of long - running processing tasks in real - time through the SSE mode, or conduct real - time interaction through the HTTP Streamable mode.

Usage Examples

Short - Video Editing and Production

A user wants to create a 15 - second short video for social media sharing, which requires cropping the video, adding filters, inserting text, and background music.

Podcast Audio Processing

A user has recorded a podcast audio and needs to remove noise, standardize the volume, add fade - in and fade - out effects, and extract the subtitle text of the key content.

Product Promotion Image Creation

A user needs to create a promotional image for a new product, including product image modification, text addition, and multi - image synthesis.

Frequently Asked Questions

What is the MCP server? How is it different from a regular API?

What is the difference between the SSE mode and the HTTP Streamable mode? Which one should I choose?

Do I need programming knowledge to use the MCP server?

What file formats are supported? Are there any size limitations?

What should I do if an error occurs during the processing?

Related Resources

VideoCutter GitHub Repository

The project's source code and latest updates

VideoCutter Gitee Repository

A domestic mirror repository with faster access speed

API Usage Documentation

Detailed REST API interface description and usage guide

MCP Tool Documentation

Complete list of MCP tools and usage methods

AI Model Usage Instructions

AI function configuration and model usage guide

Detailed Explanation of Position Parameters

Detailed description and examples of the 81 - grid positioning system

🚀 VideoCutter User Guide

VideoCutter is a professional multimedia processing tool that integrates video, audio, and image processing modules. Relying on advanced AI technology and a powerful engine, it deeply supports the MCP intelligent agent protocol, enabling AI agents to call functions through natural language. With both SSE and HTTP Streamable modes, it provides a one-stop and intelligent editing solution for content creation such as short videos.

✨ Features

🎯 One-stop Processing: Integrates video, audio, and image processing modules to meet all media editing needs.
⚡ High-performance Processing: Supports hardware acceleration, significantly improving processing speed and efficiency.
🤖 AI Intelligent Optimization: Built-in multiple AI models, providing intelligent text generation, image generation, and video generation capabilities.
🎨 AI Creation Tools: Supports AI creation functions such as text-to-image, text-to-video, and image-to-video conversion.
🧠 Intelligent Processing: AI-assisted functions such as intelligent speed change, intelligent scene detection, voice recognition, and subtitle extraction.
🤖 MCP Intelligent Agent: Deeply supports AI agents, providing natural language call and intelligent workflow capabilities, supporting both SSE and HTTP Streamable modes.
🔌 Multiple Interface Support: Provides REST API and MCP protocol, supporting various integration methods.
📱 Cross-platform Compatibility: Supports mainstream operating systems such as Windows, macOS, and Linux.
🎯 Precise Positioning: Supports a 81-grid precise positioning system, providing pixel-level precise control.
📦 Batch Processing: Supports efficient batch operations such as batch image overlay and text overlay.

Whether you are an individual creator, a content production team, or an enterprise user, you can easily complete complex media processing tasks with VideoCutter.

📞 Contact the Author

Provide one-stop deployment, installation, and activation services

GitHub: https://github.com/daimaxiuligong/VideoCutter

Gitee: https://gitee.com/daimaxiuligong/VideoCutter

🤖 MCP Intelligent Agent Support

VideoCutter deeply integrates the Model Context Protocol (MCP), providing powerful media processing capabilities for AI agents.

MCP Transmission Modes

SSE Mode: Server-Sent Events mode, supporting real-time streaming data transmission.
- Server address: http://localhost:8000/mcp/sse
- Features: One-way real-time push, suitable for progress monitoring and status updates.
- Application scenarios: Real-time feedback for long-term processing tasks.
HTTP Streamable Mode: HTTP streaming mode, supporting two-way streaming communication.
- Server address: http://localhost:8001/mcp/streamable
- Features: Two-way streaming communication, supporting real-time interaction.
- Application scenarios: Complex workflows that require real-time interaction.

AI Agent Capabilities

Through the MCP protocol, AI agents can:

Natural Language Call: Use natural language to describe requirements, and AI agents automatically call corresponding media processing functions.
Intelligent Workflow: AI agents can combine multiple processing steps to create complex media processing workflows.
Real-time Collaboration: Support real-time collaboration between AI agents and users, adjusting processing strategies based on user feedback.
Context Understanding: AI agents can understand the context of media content and provide more accurate processing suggestions.
Automated Creation: From content planning to final output, AI agents can handle the entire process automatically.
Streamed Response: Support real-time progress feedback and result streaming, enhancing the user experience.

🔗 Multi-interface Ecosystem

VideoCutter has built a complete multi-interface ecosystem:

REST API: Provides standardized HTTP interfaces for traditional applications and web services.
MCP Protocol: Provides specialized protocol support for AI agents and AI applications.
- SSE Mode: Real-time streaming data transmission, suitable for progress monitoring.
- HTTP Streamable Mode: Two-way streaming communication, supporting real-time interaction.
Local Deployment: Supports local deployment of models, protecting data privacy.
Cloud Service: Supports cloud AI services such as Doubao and Silicon-based Flow, providing powerful computing power.
Plugin Extension: Supports third-party plugins and custom function extensions.

💡 Product Highlights

1. Powerful Video Processing Capabilities

Basic Editing Functions

Video Splitting: Precise video splitting down to milliseconds, supporting specified time ranges.
Video Merging: Intelligent merging of multiple video files, automatically handling format compatibility.
Video Speed Change: 0.1 - 16x speed adjustment, maintaining audio-video synchronization.
Video Reverse Playback: Complete reverse playback of the timeline.
Video Rotation: Rotation at any angle, automatically adjusting the output size.
Video Cropping: Precise pixel-level area cropping.
Video Scaling: Intelligent size adjustment, maintaining the aspect ratio.
Video Padding: Adding borders and padding effects to videos.

Video Special Effects Functions

Video Filters: Various artistic filters such as black and white, sepia, vintage, and blur.
Color Adjustment: Fine adjustment of brightness, contrast, saturation, and gamma value.
Video Sharpening: Enhancing image details and clarity.
Mosaic Processing: Adding mosaic effects to specified areas.
Intelligent Speed Change: Intelligent accelerated playback based on content similarity.
Intelligent Scene Detection: Automatically identifying video scene change points.

Overlay and Composition Functions

Video Overlay: Overlaying another video on the main video.
Image Overlay: Overlaying static images on videos, supporting 81-grid precise positioning.
Text Overlay: Adding text watermarks and subtitles to videos, supporting multiple fonts and effects.
Audio Overlay: Overlaying audio tracks on videos.
Audio-Video Separation: Separating audio and video tracks in videos.
Batch Overlay: Supporting batch addition of image and text watermarks through command files.

Format Conversion Functions

Video to GIF: Converting videos to GIF animations.
Video Frame Extraction: Extracting single frames from specified time points.

2. Professional Audio Processing

Basic Audio Editing

Audio Splitting: Millisecond-level audio splitting, supporting specified time ranges.
Audio Merging: Seamless merging of multiple audio files.
Audio Speed Change: Speed change processing while maintaining audio quality.
Audio Reverse Playback: Complete reverse playback of the timeline.
Volume Adjustment: Precise volume control and standardization.

Audio Enhancement Effects

Audio Standardization: Standardizing audio volume to a standard level.
Fade In/Out: Adding smooth fade-in and fade-out effects to audio.
Reverb Effect: Simulating acoustic effects of different spatial environments.
Audio Compressor: Professional-level dynamic range compression.
Voice Enhancement: Highlighting voices and improving clarity.
Audio Mixing: Mixing multiple audio tracks into a single track.

Advanced Functions

Audio Looping: Creating looped audio.
Audio Format Conversion: Supporting mutual conversion of all mainstream audio formats.
Subtitle Extraction: Automatically extracting subtitle text from audio.
Text-to-Speech: Supporting CosyVoice pre-training and voice cloning modes.
Audio Information Retrieval: Obtaining detailed information about audio files.

3. Comprehensive Image Processing

Basic Image Editing

Image Cropping: Precise pixel-level cropping control.
Image Rotation: Rotation at any angle and mirror flipping.
Image Scaling: Intelligent size adjustment while maintaining the aspect ratio.
Image Flipping: Horizontal and vertical flipping.
Brightness Adjustment: Precise brightness control.
Contrast Adjustment: Enhancing the difference between light and dark.
Saturation Adjustment: Controlling color saturation.

Image Special Effects

Image Filters: Various effects such as black and white, vintage, blur, and sharpening.
Noise Effect: Adding various types of noise effects.
Vignette Effect: Creating a professional photography atmosphere.
Image Sharpening: Enhancing image details and clarity.
Mosaic Processing: Adding mosaic effects to specified areas.

Image Composition

Image Overlay (Absolute Position): Overlaying images at specified coordinate positions.
Image Overlay (Relative Position): Overlaying images using relative positions, supporting 81-grid precise positioning.
Text Overlay (Absolute Position): Adding text at specified coordinate positions.
Text Overlay (Relative Position): Adding text using relative positions, supporting multiple fonts and effects.
Collage Creation: Multi-image collages and grid layouts.
Batch Overlay: Supporting batch overlay of images and text through command files, improving processing efficiency.

Format Conversion Functions

Image to Video: Converting static images to videos.
Multiple Images to GIF: Combining multiple images into GIF animations.
Image Format Conversion: Supporting mutual conversion of all mainstream image formats.
Watermark Removal: Intelligent removal of watermarks from images.
Beauty Enhancement: Simple face beauty effects.
Image Thumbnail Generation: Generating thumbnails of specified sizes.
Image Information Retrieval: Obtaining detailed information about image files.

4. Powerful AI Intelligent Functions

AI Model Services

Multi-model Support: Integrates mainstream AI service providers such as Ollama, Doubao, and Silicon-based Flow.
Local Deployment: Supports local deployment of Ollama models, protecting data privacy.
Cloud Service: Supports cloud AI services such as Doubao and Silicon-based Flow, providing powerful computing power.
Flexible Configuration: Can enable or disable different AI service providers according to needs.
MCP Integration: Provides 67 professional tools for AI agents through the MCP protocol.

Text Generation Functions

Intelligent Text Generation: Generate high-quality text content based on prompt words.
Multi-language Support: Support text generation in multiple languages such as Chinese and English.
Parameter Adjustment: Support fine adjustment of parameters such as temperature and maximum length.
Segmented Content Generation: Automatically generate video segment descriptions and corresponding subtitle text.

Image Generation Functions

Text-to-Image: Generate high-quality images based on text descriptions.
Multi-resolution Support: Support various resolutions from 512x512 to 2048x2048.
Multi-aspect Ratio Support: Support various aspect ratios such as 1:1, 4:3, 16:9, and 9:16.
Artistic Styles: Support multiple artistic and creative styles.

Video Generation Functions

Text-to-Video: Generate dynamic video content based on text descriptions.
Image-to-Video: Convert static images to dynamic videos.
Multi-resolution Support: Support various resolutions such as 480p, 720p, and 1080p.
Duration Control: Support video duration adjustment from 3 to 12 seconds.
Action Description: Control actions and changes in videos through text descriptions.

Intelligent Processing Functions

Intelligent Speed Change: Automatically detect and accelerate repeated segments based on content similarity.
Intelligent Scene Detection: Automatically identify video scene change points for precise editing.
Voice Recognition: Automatically extract subtitle text from videos.
Voice Enhancement: Intelligent enhancement of the voice part in audio.
Audio Noise Reduction: Automatically remove background noise from audio.

AI-assisted Creation

Content Planning: AI helps plan the structure and segments of video content.
Subtitle Generation: Automatically generate subtitle text that matches video content.
Creative Suggestions: Provide creative inspiration and suggestions based on themes.
Quality Optimization: AI-assisted optimization of video, audio, and image quality.

MCP Intelligent Agent Integration

Natural Language Interaction: Interact with AI agents through natural language to complete complex media processing tasks.
Intelligent Workflow: AI agents can automatically combine multiple processing steps to create end-to-end processing flows.
Context Awareness: AI agents can understand the context of media content and provide more accurate processing suggestions.
Real-time Collaboration: Support real-time collaboration between AI agents and users, adjusting processing strategies based on feedback.
Automated Creation: From creative conception to final output, AI agents can handle the entire process automatically.
Toolchain Integration: AI agents can call all 67 professional tools of VideoCutter to achieve complex tasks.

5. Efficient Batch Processing Functions

Batch Image Overlay

Command File Support: Define batch overlay commands through TXT files.
Flexible Command Format: Support two command formats: image overlay and text overlay.
Parameterized Configuration: Support custom configuration of parameters such as position, transparency, scaling, and font.
Intelligent Command Recognition: Automatically recognize image and text commands without manual specification.
Batch Execution: Process multiple overlay operations at once, significantly improving efficiency.

Batch Text Overlay

Multi-font Support: Support system fonts and custom font files.
Rich Text Effects: Support various text effects such as shadows, strokes, and glows.
Precise Positioning: Support 81-grid precise positioning system for pixel-level precise control.
Parameterized Configuration: Support custom configuration of parameters such as font size, color, and transparency.

Advantages of Batch Processing

Efficient Processing: Batch operations are several times more efficient than single operations.
Command Reuse: Command files can be saved and reused.
Error Handling: A single command failure does not affect the overall processing flow.
Flexible Configuration: Support default values and parameter overrides to adapt to different scenario requirements.

🔌 Interface Integration Guide

1. REST API Interface

VideoCutter provides a complete REST API, supporting direct call of various processing functions through HTTP requests.

Service Information

API Service Address: http://localhost:8900
Interactive Documentation: http://localhost:8900/docs
ReDoc Documentation: http://localhost:8900/redoc
Health Check: http://localhost:8900/health

Interface Features

Standardized Design: Follows RESTful API design specifications.
Unified Response Format: All interfaces return a unified JSON format.
File Upload Support: Supports multipart/form-data file uploads.
Parameter Validation: Complete request parameter validation and error handling.
AI Model Integration: Built-in AI model APIs, supporting text generation, image generation, and video generation.

AI Model APIs

Text Generation: Supports text generation models such as Ollama, Doubao, and Silicon-based Flow.
Image Generation: Supports text-to-image generation, with multiple resolutions and artistic styles.
Video Generation: Supports text-to-video and image-to-video generation.
Segmented Content Generation: AI-assisted generation of video segment descriptions and subtitles.

Detailed Documentation

For the complete API interface documentation, please refer to: VideoCutter_API User Guide.md

2. MCP Protocol Interface

The Model Context Protocol (MCP) allows AI models to directly call various functions of VideoCutter. It supports two transmission modes to meet different application scenario requirements.

Service Information

SSE Server: http://localhost:8000/mcp/sse
HTTP Streamable Server: http://localhost:8001/mcp/streamable

Transmission Mode Features

SSE Mode (Server-Sent Events)

Features: One-way real-time push, where the server actively sends data to the client.
Advantages: Low latency, easy to use, suitable for progress monitoring.
Application Scenarios: Real-time feedback for long-term processing tasks, status updates.
Technical Features: Based on HTTP long connections, with an automatic reconnection mechanism.

HTTP Streamable Mode

Features: Two-way streaming communication, supporting real-time interaction between the client and the server.
Advantages: Supports complex interactions, real-time collaboration, and dynamic adjustment.
Application Scenarios: Complex workflows that require real-time interaction, AI agent collaboration.
Technical Features: Based on HTTP/2 streaming, supporting concurrent processing.

Protocol Features

AI-friendly: Designed specifically for AI model integration, supporting natural language call.
Streamed Response: Supports real-time progress feedback and result streaming.
Dual Transmission Modes: Supports both SSE and HTTP Streamable transmission modes.
Rich Tools: Provides 67 professional media processing tools.
AI Tool Integration: Built-in AI model call tools, supporting text, image, and video generation.

AI Tool Support

Text Generation Tools: Support text generation functions of multiple AI models.
Image Generation Tools: Support text-to-image and image processing functions.
Video Generation Tools: Support text-to-video and image-to-video functions.
Intelligent Processing Tools: Support AI-assisted functions such as intelligent speed change and scene detection.

Detailed Documentation

For the complete MCP tool documentation, please refer to: VideoCutter_MCP User Guide.md

📚 Documentation Resources

API User Guide: VideoCutter_API User Guide.md - Detailed REST API interface description.
MCP User Guide: VideoCutter_MCP User Guide.md - Complete MCP tool usage guide.
AI Model Usage Instructions: AI Model Usage Instructions.md - AI function configuration and usage guide.
Position Parameter Details: VideoCutter_Position Position Parameter Details.md - Detailed description of the 81-grid positioning system.
User Guide: This document - Product introduction and integration guide.