Supported Language Models

Deeptrain is designed with a model-agnostic architecture, providing seamless integration with over 200 private and open-source Large Language Models (LLMs). This flexibility allows you to leverage Deeptrain’s multi-modal data processing regardless of your preferred inference engine or provider.

Model Categories

Deeptrain categorizes supported models into two primary groups to ensure compatibility across various infrastructure requirements:

Private/Proprietary Models: Direct integration with industry-leading APIs including OpenAI (GPT-4o, GPT-3.5), Anthropic (Claude 3.5/3), and Google (Gemini).
Open-Source Models: Support for models hosted via providers like Hugging Face, Together AI, and Anyscale, as well as locally hosted instances using Ollama or vLLM. Supported architectures include Llama 3, Mistral, Mixtral, and Falcon.

Configuration and Usage

To utilize a specific model with Deeptrain, you define the model identifier and provider within your configuration. Deeptrain handles the underlying data transformation to ensure your multi-modal inputs (images, audio, video) are formatted correctly for the target model's specific API requirements.

Example: Initializing with a Specific Model

from deeptrain import DeeptrainConnector

# Initialize the connector with your desired model
# Deeptrain supports 200+ model identifiers
connector = DeeptrainConnector(
    model="gpt-4-vision-preview",
    provider="openai",
    api_key="your_api_key_here"
)

# Example: Processing a video input for the LLM
response = connector.process_video(
    source="https://www.youtube.com/watch?v=example",
    prompt="Summarize the key visual milestones in this video."
)

print(response.text)

Supported Providers

Deeptrain supports models across the following ecosystems:

Multi-modal Compatibility Matrix

While Deeptrain enhances model capabilities, the level of interaction depends on the model's base architecture. Deeptrain bridges these gaps using the following logic:

Vision-Native Models: Deeptrain passes optimized image/video frames directly to the model's vision encoder.
Text-Only Models: Deeptrain utilizes internal vision-to-text processing (OCR, Image Captioning) to provide textual descriptions of visual data, enabling non-vision models to "understand" visual context.
Audio/Video: All supported models can leverage the Transcribe API to receive processed transcriptions and temporal metadata.

Custom Model Support (Upcoming)

Future updates to Deeptrain will introduce a Custom Model Wrapper, allowing users to register their own fine-tuned weights or proprietary model architectures. This will enable the same multi-modal data connector features for private, in-house AI systems.