Performance Tuning

Optimizing Retrieval Latency

For real-time applications, minimizing the time between a user query and the multi-modal response is critical. Performance in Deeptrain is primarily influenced by embedding retrieval speeds and the processing overhead of non-textual assets.

Localized Embedding Tuning

Deeptrain utilizes a localized embedding database to handle high-frequency updates from live data sources. To optimize search latency, configure the retrieval parameters within your configuration object.

Top-K Retrieval: Reducing the number of retrieved segments improves speed but may impact context depth.
Chunk Overlap: Lowering overlap reduces redundant processing during the embedding phase.

from deeptrain import DeeptrainConfig

# Example: Tuning for low-latency text retrieval
config = DeeptrainConfig(
    embedding_model="text-embedding-3-small",
    top_k=3,                # Limit retrieval to top 3 relevant chunks
    chunk_size=512,         # Standardize chunk size for predictable latency
    cache_results=True      # Enable local caching for frequent queries
)

Video and Audio Processing Efficiency

The Transcribe API is the primary interface for ingesting media. To reduce processing time for large files, use the sampling_rate and frame_reduction parameters to limit the data density processed by the transformer.

Transcribe API Configuration

Example: High-speed video ingestion

from deeptrain.api import TranscribeAPI

# Process a video with reduced visual sampling to prioritize speed
transcriber = TranscribeAPI()
result = transcriber.process(
    source="https://www.youtube.com/watch?v=example",
    frame_rate=1.0,        # Extract 1 frame per second
    sampling_rate=16000,   # Optimized for voice clarity
    priority="latency"     # Internal flag to prioritize speed over exhaustive analysis
)

Multi-modal Throughput

When handling multiple data types (Images, Graphs, and Text) simultaneously, performance can be optimized through asynchronous processing.

Asynchronous Ingestion

Deeptrain supports async operations to prevent visual content processing from blocking text-based retrievals.

import asyncio
from deeptrain import MultiModalConnector

async def process_data_stream(video_url, doc_path):
    connector = MultiModalConnector()
    
    # Run vision and text ingestion in parallel
    task1 = connector.ingest_video(video_url)
    task2 = connector.ingest_text(doc_path)
    
    results = await asyncio.gather(task1, task2)
    return results

Memory Management for Vision Tasks

Processing flowcharts and high-resolution graphs can be memory-intensive. To maintain system stability during high-load periods:

Image Scaling: Use the image_resolution parameter to downscale high-resolution diagrams before they reach the vision encoder.
Batch Processing: For bulk ingestion of images or frames, use the batch_size parameter to control the memory footprint.

# Optimize flowchart/graph analysis
graph_data = connector.process_diagram(
    file_path="complex_flowchart.png",
    target_resolution=(800, 600), # Resize for faster inference
    mode="fast"                   # Use a lightweight vision pass
)

Advanced Configuration Summary

To achieve the best performance across different model architectures, refer to the following global settings:

Model-Agnostic Caching: Deeptrain provides an internal caching layer that stores pre-computed embeddings for static content. Ensure enable_global_cache=True is set in your environment variables.
Context Window Management: Since Deeptrain operates beyond native context limits, use the relevancy_threshold (0.0 - 1.0) to filter out low-confidence data before it is sent to the LLM, reducing the token count and API costs.