Video Knowledge Extraction
Video Knowledge Extraction
Deeptrain's video processing engine allows AI agents to ingest, interpret, and learn from temporal data. By converting video content into structured, searchable knowledge, you can bridge the gap between visual media and text-based Large Language Models.
Supported Video Sources
Deeptrain provides a unified interface for various video formats and hosting platforms:
- Public Platforms: Seamlessly ingest content from YouTube and Vimeo using standard URLs.
- Local Storage: Process
.mp4,.avi,.mov, and other standard video formats stored on your local file system. - Self-Hosted/Direct Links: Integrate videos hosted on private servers or cloud storage via direct HTTP/S links.
Transcribe API
The Transcribe API is the primary interface for extracting knowledge from video sources. It handles the multi-dimensional task of audio transcription and visual context extraction, preparing the data for LLM consumption.
Input Parameters
When calling the Transcribe API, the following parameters are required:
| Parameter | Type | Description |
| :--- | :--- | :--- |
| source | string | The URL (YouTube/Vimeo/Direct) or the local file path to the video. |
| source_type | string | The type of source: "youtube", "vimeo", "local", or "url". |
| options | dict | (Optional) Configuration for transcription accuracy, language hints, or timestamping. |
Output Schema
The API returns a structured object containing the extracted knowledge:
{
"video_id": "string",
"transcript": "string",
"metadata": {
"duration": "float",
"resolution": "string",
"platform": "string"
},
"knowledge_chunks": [
{
"timestamp": "00:01:20",
"content": "Segment text or visual description"
}
]
}
Usage Examples
Processing a YouTube Video
To expand your agent's knowledge base with a YouTube tutorial or lecture:
from deeptrain import VideoModule
# Initialize the Video Module
video_engine = VideoModule(api_key="your_api_key")
# Extract knowledge from a YouTube URL
result = video_engine.transcribe(
source="https://www.youtube.com/watch?v=example",
source_type="youtube"
)
print(f"Extracted Transcript: {result['transcript'][:100]}...")
Processing Local Video Files
For proprietary data or internal training videos stored locally:
# Extract knowledge from a local MP4 file
local_result = video_engine.transcribe(
source="./data/internal_demo.mp4",
source_type="local"
)
# Inject the extracted content into your AI agent's memory
agent.learn(local_result['transcript'])
Key Capabilities
- Multi-Dimensional Analysis: Deeptrain doesn't just look at text; it processes audio and visual cues to provide a comprehensive context that standard transcription services often miss.
- Temporal Indexing: Every piece of extracted knowledge is timestamped, allowing your AI agent to reference specific moments within a video during a conversation.
- Model Agnostic: Extracted video data can be fed into any of the 200+ supported LLMs, whether they are private deployments or open-source models.
Note: For high-resolution or long-form videos, processing time may vary based on the complexity of the visual data. It is recommended to use the asynchronous processing flag for videos exceeding 30 minutes.