Model-Agnostic Integration

Deeptrain is designed to be model-agnostic, functioning as a sophisticated bridge between your multi-modal data and over 200+ private and open-source Large Language Models (LLMs). This architecture allows you to swap underlying models without reconfiguring your entire data pipeline.

Connecting to Language Models

To integrate Deeptrain with your preferred model, you utilize the primary configuration interface. The system supports major providers including OpenAI, Anthropic, Google, and any open-source model hosted via Hugging Face or self-hosted environments (e.g., Ollama, vLLM).

Basic Initialization

The following example demonstrates how to initialize Deeptrain with a specific model provider:

from deeptrain import DeeptrainConnector

# Initialize the connector for a specific model
connector = DeeptrainConnector(
    provider="openai",        # e.g., 'anthropic', 'huggingface', 'ollama'
    model="gpt-4-vision",     # The specific model identifier
    api_key="your_api_key"    # Your provider API key
)

# Connect multi-modal data (e.g., a video) to the model
response = connector.process(
    source="https://www.youtube.com/watch?v=example",
    prompt="Analyze the sequence of events in this video."
)

print(response.text)

Configuration Parameters

When initializing the connection, the following parameters define the interface between Deeptrain and the target LLM:

Supported Model Categories

Deeptrain’s integration layer is categorized into three main implementation paths:

1. Private Managed Models

Direct integration for API-based models. Deeptrain handles the formatting of multi-modal inputs (like image embeddings or video transcriptions) into the specific format required by the provider's API.

Supported: OpenAI (GPT-4o), Anthropic (Claude 3.5), Google (Gemini 1.5 Pro).

2. Open-Source & Self-Hosted

For developers running models locally or on private clouds. By specifying a base_url, Deeptrain can communicate with any OpenAI-compatible server.

Supported: Llama 3, Mistral, Mixtral, Phi-3.

3. Non-Vision Models (Vision Enhancement)

One of Deeptrain's core strengths is enabling vision capabilities for models that do not natively support image or video processing. Deeptrain processes the visual data into a format (such as detailed spatial descriptions or temporal transcriptions) that a standard text-based LLM can interpret.

Handling Multi-modal Inputs

When using Deeptrain's model-agnostic layer, the input types are automatically handled based on the target model's capabilities:

# Example: Sending an image to a non-vision model
# Deeptrain will automatically process the image into a context-rich description
# before sending it to the text-only model.

connector = DeeptrainConnector(provider="local", model="llama-3-8b")

result = connector.process(
    image_path="./diagram.png",
    prompt="Explain this flowchart step by step."
)

Custom Model Integration (Internal Interface)

While Deeptrain supports 200+ models out of the box, it also provides an internal BaseModelAdapter class.

Note: This is considered an internal component. It is used to map custom model response schemas to the Deeptrain standard output. If you are using a proprietary model with a unique API structure, you can extend this adapter to ensure compatibility with Deeptrain’s multi-modal data retrieval.