Model-Agnostic Setup

Overview

Deeptrain is designed to be architecture-neutral, supporting over 200 private and open-source Large Language Models. By abstracting the data ingestion and multi-modal processing layers, you can swap underlying LLMs without rewriting your data pipelines or integration logic.

Configuring the Model Provider

To connect Deeptrain to your preferred AI architecture, you must initialize the ModelConfig class. This serves as the bridge between Deeptrain's multi-modal data streams and your model's inference API.

Basic Initialization

from deeptrain import DeeptrainConfig

# Initialize for a hosted provider (e.g., OpenAI, Anthropic)
config = DeeptrainConfig(
    model_name="gpt-4-vision-preview",
    provider="openai",
    api_key="your_api_key_here"
)

# Initialize for a local or self-hosted model (e.g., Llama 3 via Ollama)
config = DeeptrainConfig(
    model_name="llama3",
    provider="local",
    base_url="http://localhost:11434/v1"
)

Integration Patterns

1. Hosted LLMs (API-Based)

For models hosted via providers like OpenAI, Anthropic, or Google, Deeptrain handles the formatting of multi-modal payloads (images, audio, and video) to match the provider's specific schema.

2. Open-Source & Custom Models

For custom architectures or models hosted on HuggingFace/vLLM, Deeptrain uses a standardized interface to pass processed multi-modal embeddings or transcribed text.

from deeptrain import ModelConnector

# Setup for a custom VCD (Vision-Language-Data) architecture
custom_connector = ModelConnector(
    endpoint="https://your-custom-gpu-endpoint.com/v1",
    model_type="vision-enabled",
    headers={"Authorization": "Bearer token"}
)

deeptrain.link(custom_connector)

API Reference: `DeeptrainConfig`

The DeeptrainConfig object is the primary interface for defining how data is passed to your model.

Inputs

| Argument | Type | Required | Default | Description | | :--- | :--- | :--- | :--- | :--- | | model_name | str | Yes | - | The name/ID of the LLM. | | provider | str | Yes | - | The backend service or local framework. | | api_key | str | No | None | Required for private API providers. | | max_tokens | int | No | 4096 | Limits the response length from the model. | | temperature | float | No | 0.7 | Controls the randomness of the output. | | context_window | int | No | 128000 | Defines the maximum token limit for the session. |

Outputs

Returns: A configured Client instance ready to accept multi-modal inputs (Text, Images, Audio, Video).

Multi-modal Input Mapping

When the setup is complete, Deeptrain automatically maps different data types to your model's capabilities:

Text-only Models: Multi-modal inputs (like Video or Audio) are automatically converted to text/transcriptions via the Transcribe API before being sent to the model.
Vision-supported Models: Images and video frames are passed as raw tensors or encoded strings, depending on the provider requirements.
Flowcharts/Graphs: Automatically converted into structural descriptions or image-text pairs to ensure the model understands the visual logic.