Third-Party Platforms

Third-Party Platform Integration

Deeptrain provides native connectors for streaming and hosted video platforms, allowing you to ingest live data directly into your AI workflows. By bridging the gap between external media hosting and LLM processing, you can utilize real-time video content as a knowledge source without manual downloading or preprocessing.

Supported Platforms

The platform currently supports seamless data extraction from the following sources:

YouTube: Public and unlisted videos via standard URLs.
Vimeo: Hosted professional video content.
Web Sources: Direct links to hosted MP4, WebM, and OGG files (e.g., self-hosted or S3 buckets).

Live Stream Ingestion

To integrate data from third-party platforms, use the VideoLoader interface. This allows Deeptrain to fetch, frame-sample, and process the video for multi-modal understanding.

from deeptrain import VideoLoader

# Ingesting a YouTube video for AI analysis
video_data = VideoLoader.from_source(
    url="https://www.youtube.com/watch?v=example_id",
    platform="youtube"
)

# Integrating with an LLM agent
agent.context.add(video_data)

Transcribe API

The Transcribe API is the primary gateway for converting audio/visual data from third-party platforms into a text-based format that non-vision LLMs can interpret. It handles the heavy lifting of audio extraction and speech-to-text conversion.

Basic Usage

from deeptrain.api import TranscribeAPI

# Transcribe a video from a third-party URL
transcription = TranscribeAPI.process(
    source="https://vimeo.com/123456789",
    include_timestamps=True
)

print(transcription.text)

API Specifications

The TranscribeAPI.process method accepts the following parameters:

Output Object:

The API returns a TranscriptionResult object containing:

text (string): The full transcribed content.
segments (list): A collection of dictionaries containing start, end, and text for each fragment.
metadata (dict): Information about the video duration, source, and processing confidence.

Webhook Integration for Live Data

For applications requiring real-time updates from third-party platforms (such as monitoring a YouTube channel for new uploads), Deeptrain supports webhook triggers.

Configure Listener: Set up an endpoint in your application to receive Deeptrain's processing updates.
Payload: When a new video is processed from a third-party source, Deeptrain sends a JSON payload containing the processed embeddings and transcription.

{
  "event": "processing.completed",
  "source_platform": "youtube",
  "video_id": "example_id",
  "data": {
    "transcription_summary": "...",
    "embedding_ref": "vmtp_db_xyz"
  }
}

Configuration and Limits

To use third-party platforms, ensure your environment is configured with the necessary API keys if you are accessing private content or hitting rate limits on high-traffic sources. These should be defined in your .env or system environment variables:

YOUTUBE_API_KEY: (Optional) For high-frequency metadata fetching.
VIMEO_ACCESS_TOKEN: (Optional) Required for private or restricted Vimeo content.