Video Processing
Video Processing Overview
Deeptrain's multi-dimensional video processing engine allows AI agents to ingest, interpret, and learn from video content. By bridging the gap between raw video data and LLM context windows, Deeptrain enables your agents to understand visual sequences and auditory information from local files and major streaming platforms.
The platform supports three primary video sources:
- Local Storage: Direct file paths from your server or local environment.
- YouTube: Public or unlisted video URLs.
- Vimeo: Professional video hosting links.
The Transcribe API
The Transcribe API is the primary interface for processing video data. It handles the extraction of audio, speech-to-text conversion, and synchronization of visual metadata to provide a comprehensive dataset for your AI models.
Usage Example
from deeptrain import VideoManager
# Initialize the manager
vm = VideoManager(api_key="your_api_key")
# Process a video for AI training
processed_video = vm.transcribe(
source="https://www.youtube.com/watch?v=example",
provider="youtube",
config={
"extract_metadata": True,
"language": "en"
}
)
print(processed_video.transcript)
print(processed_video.metadata)
API Reference: transcribe()
| Parameter | Type | Description |
| :--- | :--- | :--- |
| source | str | The file path or URL of the video. |
| provider | str | The source type: local, youtube, or vimeo. |
| config | dict | (Optional) Configuration for processing (e.g., sampling_rate, chunk_size). |
Returns: A VideoData object containing the transcript, timestamps, and extracted visual context.
Working with Local Video Files
To process videos stored on your local file system, ensure the path is accessible by the Deeptrain environment. This is ideal for processing proprietary data, internal recordings, or sensitive training material.
# Processing a local MP4 file
local_data = vm.transcribe(
source="/path/to/video/training_demo.mp4",
provider="local"
)
Integrating YouTube and Vimeo
Deeptrain abstracts the complexity of web scraping or API management for video platforms. By providing a URL, Deeptrain fetches the necessary streams for processing without requiring manual downloads.
YouTube Integration
Use the youtube provider to ingest public educational content, tutorials, or webinars directly into your agent's knowledge base.
# Syncing a YouTube tutorial
yt_context = vm.transcribe(
source="https://youtu.be/dQw4w9WgXcQ",
provider="youtube"
)
Vimeo Integration
For high-quality professional content or private enterprise videos, use the vimeo provider.
# Syncing a Vimeo presentation
vimeo_context = vm.transcribe(
source="https://vimeo.com/123456789",
provider="vimeo"
)
Multi-dimensional Output
When a video is processed, Deeptrain generates a structured output that can be directly fed into an LLM or a vector database:
- Textual Transcript: A full text-based representation of the audio.
- Temporal Metadata: Time-coded segments that allow the AI to reference specific moments in the video.
- Visual Descriptions (Optional): Keyframe analysis that describes the visual scene, enabling non-vision models to "understand" the video content.
Data Structure
The returned VideoData object follows this structure:
{
"source_id": "unique_video_id",
"full_text": "The transcript content...",
"segments": [
{
"start_time": 0.0,
"end_time": 10.5,
"text": "Introduction to the topic."
}
],
"metadata": {
"duration": 120,
"resolution": "1080p",
"provider": "youtube"
}
}