Video Ingestion

Deeptrain allows you to transform video content into actionable data for your AI agents. By processing visual and auditory information from various sources, you can equip your models with context that was previously locked in video formats.

Supported Sources

Deeptrain supports video ingestion from both remote and local sources:

Public Platforms: YouTube and Vimeo.
Local Storage: Common video formats (MP4, AVI, MOV, etc.) stored on your machine or server.
Self-Hosted: Direct links to video files hosted on private servers.

Ingesting Videos

To integrate video content, you use the VideoIngestor module. This module handles the fetching, frame analysis, and audio extraction required to prepare the data for your LLM.

Using Public URLs (YouTube/Vimeo)

You can ingest videos directly by providing the URL. Deeptrain handles the stream resolution and data extraction.

from deeptrain import VideoIngestor

# Initialize the ingestor
ingestor = VideoIngestor()

# Ingest a YouTube video
video_data = ingestor.ingest(
    source="https://www.youtube.com/watch?v=example",
    include_timestamps=True
)

print(video_data.summary)

Using Local Files

For local files, provide the absolute or relative path to the video file.

# Ingest a local MP4 file
local_video = ingestor.ingest(
    source="./data/recordings/meeting_01.mp4",
    sampling_rate=1.0  # Capture 1 frame per second for visual context
)

Transcribe API

The Transcribe API is the primary interface for converting video speech into text that AI agents can index and query. It supports multi-dimensional processing, meaning it tracks visual changes alongside the transcribed text.

API Interface

Method: transcribe(source, options)

Inputs:

Output: Returns a TranscriptionResponse object containing:

text: The full transcript string.
segments: A list of dictionaries containing start_time, end_time, and content.
metadata: Video duration, resolution, and source details.

Usage Example

from deeptrain.api import TranscribeAPI

# Set up the API
api = TranscribeAPI(api_key="your_api_key")

# Process a video for AI training
response = api.transcribe(
    source="https://vimeo.com/123456789",
    visual_context=True
)

# Accessing the processed data
for segment in response.segments:
    print(f"[{segment['start_time']}] {segment['content']}")

Integration with AI Agents

Once ingested, the video data can be fed into your localized embedding database or used directly as context for supported language models.

from deeptrain.agents import Agent

agent = Agent(model="gpt-4")

# Attach the ingested video context to the agent
agent.add_context(video_data)

# Query the agent about the video content
response = agent.ask("What were the key takeaways from the video presentation?")

Configuration Note

While the ingestion process is automated, ensure your environment has the necessary codecs installed (e.g., ffmpeg) for processing local files. For public URLs, Deeptrain manages dependencies internally via its cloud-optimized scrapers.