Data Sourcing Workflow

Overview

The Deeptrain workflow transforms raw, unstructured multi-modal data into structured, model-interpretable formats. This lifecycle ensures that whether your data originates from a YouTube video, a local image, or a complex flowchart, it is normalized and vectorized for seamless consumption by your AI agents.

Step 1: Data Ingestion

Deeptrain acts as a universal bridge, allowing you to ingest data from diverse external and local sources.

Supported Sources

Video: YouTube URLs, Vimeo links, and local video files (.mp4, .mkv, etc.).
Audio: Local recordings and hosted audio streams.
Visuals: Images, flowcharts, and diagrams (.png, .jpg, .svg).
Text: Local documents and live web data.

Configuration Example

To begin ingestion, you define your source and initialize the connector:

from deeptrain import DataConnector

# Initialize for a remote video source
connector = DataConnector(
    source_type="video",
    provider="youtube",
    config={"api_key": "YOUR_OPTIONAL_KEY"}
)

# Ingest a specific resource
data_stream = connector.ingest("https://www.youtube.com/watch?v=example_id")

Step 2: Processing & Transcription

Once ingested, raw data is passed through Deeptrain’s processing layer. For video and audio, this involves the Transcribe API, which converts multi-dimensional signals into text-based tokens that LLMs can process.

The Transcribe API

The Transcribe API is the primary interface for converting visual and auditory data into textual context.

Usage Example:

# Processing a video file for AI training
processed_content = deeptrain.transcribe(
    input_data="path/to/video.mp4",
    mode="transcribe"
)

print(processed_content['text']) # Output: Extracted transcription

Step 3: Vectorization and Local Storage

To overcome the context window limitations of standard LLMs, Deeptrain utilizes a Localized Embedding Database. Processed data is automatically chunked and converted into vector embeddings.

Internal Role: While the vectorization logic is internal, users interact with the storage layer to retrieve relevant context in real-time.
Real-time Retrieval: When an AI agent is queried, Deeptrain performs a similarity search within this local database to fetch the most relevant data "shards" from your ingested sources.

Step 4: Model Integration

The final stage of the workflow is feeding the processed, relevant data into your chosen model. Deeptrain is model-agnostic, supporting over 200 private and open-source models.

Integrating with Agents

You can connect the processed data stream directly to your agent's inference cycle:

from deeptrain import Agent

# Initialize an agent with a specific LLM
my_agent = Agent(model="gpt-4" or "llama-3")

# Attach the processed multi-modal context to the agent
my_agent.attach_context(data_stream)

# The agent now has access to video/image data in its prompt context
response = my_agent.query("What was discussed in the video at the 5-minute mark?")

Output Format

The data returned to the model is formatted as high-density text or structured JSON, ensuring compatibility even with non-vision-enabled models. This allows you to "enable" computer vision and audio capabilities on standard text-based LLMs.