Data Sourcing Workflow
Overview
The Deeptrain workflow transforms raw, unstructured multi-modal data into structured, model-interpretable formats. This lifecycle ensures that whether your data originates from a YouTube video, a local image, or a complex flowchart, it is normalized and vectorized for seamless consumption by your AI agents.
Step 1: Data Ingestion
Deeptrain acts as a universal bridge, allowing you to ingest data from diverse external and local sources.
Supported Sources
- Video: YouTube URLs, Vimeo links, and local video files (
.mp4,.mkv, etc.). - Audio: Local recordings and hosted audio streams.
- Visuals: Images, flowcharts, and diagrams (
.png,.jpg,.svg). - Text: Local documents and live web data.
Configuration Example
To begin ingestion, you define your source and initialize the connector:
from deeptrain import DataConnector
# Initialize for a remote video source
connector = DataConnector(
source_type="video",
provider="youtube",
config={"api_key": "YOUR_OPTIONAL_KEY"}
)
# Ingest a specific resource
data_stream = connector.ingest("https://www.youtube.com/watch?v=example_id")
Step 2: Processing & Transcription
Once ingested, raw data is passed through Deeptrain’s processing layer. For video and audio, this involves the Transcribe API, which converts multi-dimensional signals into text-based tokens that LLMs can process.
The Transcribe API
The Transcribe API is the primary interface for converting visual and auditory data into textual context.
Input Parameters:
| Parameter | Type | Description |
| :--- | :--- | :--- |
| input_data | String/File | The URL or local path to the media file. |
| mode | String | Processing mode: transcribe (audio-to-text) or analyze (visual context). |
| language | String | (Optional) Target language for transcription. |
Usage Example:
# Processing a video file for AI training
processed_content = deeptrain.transcribe(
input_data="path/to/video.mp4",
mode="transcribe"
)
print(processed_content['text']) # Output: Extracted transcription
Step 3: Vectorization and Local Storage
To overcome the context window limitations of standard LLMs, Deeptrain utilizes a Localized Embedding Database. Processed data is automatically chunked and converted into vector embeddings.
- Internal Role: While the vectorization logic is internal, users interact with the storage layer to retrieve relevant context in real-time.
- Real-time Retrieval: When an AI agent is queried, Deeptrain performs a similarity search within this local database to fetch the most relevant data "shards" from your ingested sources.
Step 4: Model Integration
The final stage of the workflow is feeding the processed, relevant data into your chosen model. Deeptrain is model-agnostic, supporting over 200 private and open-source models.
Integrating with Agents
You can connect the processed data stream directly to your agent's inference cycle:
from deeptrain import Agent
# Initialize an agent with a specific LLM
my_agent = Agent(model="gpt-4" or "llama-3")
# Attach the processed multi-modal context to the agent
my_agent.attach_context(data_stream)
# The agent now has access to video/image data in its prompt context
response = my_agent.query("What was discussed in the video at the 5-minute mark?")
Output Format
The data returned to the model is formatted as high-density text or structured JSON, ensuring compatibility even with non-vision-enabled models. This allows you to "enable" computer vision and audio capabilities on standard text-based LLMs.