LLM Integration
LLM Integration Overview
Deeptrain is engineered to be model-agnostic, acting as a high-performance bridge between multi-modal data sources and Large Language Models. It currently supports integration with over 200 private and open-source models, including those from OpenAI, Anthropic, Google, and the Hugging Face ecosystem.
By leveraging Deeptrain, you can augment standard LLMs with the ability to process visual, auditory, and structural data that falls outside their native context window or architectural capabilities.
Connecting a Model
To integrate an LLM with Deeptrain, you must configure the environment with the appropriate provider credentials and initialize the connector.
Configuration
Deeptrain utilizes a standardized interface to interact with different providers. Configuration typically involves specifying the model_type and your API credentials.
from deeptrain import DeeptrainConnector
# Initialize the connector for a specific LLM
connector = DeeptrainConnector(
model_provider="openai", # e.g., 'anthropic', 'huggingface', 'local'
model_name="gpt-4",
api_key="your_api_key_here"
)
Multi-modal Data Injection
The core utility of the integration is feeding non-textual data into the LLM's processing stream. Deeptrain handles the transformation of complex assets into formats the LLM can interpret.
1. Vision Integration (Images & Diagrams)
For models that do not natively support vision, Deeptrain processes images, flowcharts, and graphs, providing the LLM with a structured textual or embedding-based representation of the visual content.
# Integrating visual data into the LLM context
response = connector.process_with_vision(
prompt="Analyze the logic in this flowchart.",
image_path="./assets/workflow.png"
)
2. Video and Audio via Transcribe API
Deeptrain provides a dedicated Transcribe API to ingest video (YouTube, Vimeo, or local) and audio files. This API converts multi-dimensional media into searchable, interpretable data for the AI agent.
| Parameter | Type | Description |
| :--- | :--- | :--- |
| source | string | URL or local path to the video/audio file. |
| sync_to_db | boolean | Whether to store the transcription in the localized embedding database. |
| language | string | (Optional) The target language for transcription. |
Example Usage:
# Processing a video source for LLM consumption
video_data = connector.transcribe_api.process(
source="https://www.youtube.com/watch?v=example",
sync_to_db=True
)
# Query the LLM about the video content
answer = connector.query("What are the key takeaways from the video?")
Localized Embedding Database
Deeptrain uses a localized embedding database to bypass the context window limitations of standard LLMs. This allows the AI agent to retrieve real-time content from live data sources during the inference phase.
Usage
When an LLM is integrated, Deeptrain automatically handles the RAG (Retrieval-Augmented Generation) workflow:
- Embedding: Input data (text, video transcripts, etc.) is vectorized.
- Retrieval: Relevant context is pulled based on the user prompt.
- Augmentation: The LLM receives the prompt along with the most relevant data shards.
# Update the local knowledge base
connector.knowledge_base.add_source("./docs/internal_manual.pdf")
# The LLM now has access to this data regardless of its native training cut-off
result = connector.query("How do I reset the internal server based on the manual?")
Supported Providers
Deeptrain supports a wide array of model providers. Ensure you have the corresponding SDK or API access for the provider you choose:
- Proprietary: OpenAI, Azure AI, Anthropic, Google Vertex AI, Cohere.
- Open-Source: Llama (via Meta or local), Mistral, Falcon, and other models hosted via Hugging Face Inference Endpoints or local Ollama instances.
- Custom Models: (Upcoming) Support for side-loading weights for proprietary fine-tuned models.