Performance Tuning
Optimizing Retrieval Latency
For real-time applications, minimizing the time between a user query and the multi-modal response is critical. Performance in Deeptrain is primarily influenced by embedding retrieval speeds and the processing overhead of non-textual assets.
Localized Embedding Tuning
Deeptrain utilizes a localized embedding database to handle high-frequency updates from live data sources. To optimize search latency, configure the retrieval parameters within your configuration object.
- Top-K Retrieval: Reducing the number of retrieved segments improves speed but may impact context depth.
- Chunk Overlap: Lowering overlap reduces redundant processing during the embedding phase.
from deeptrain import DeeptrainConfig
# Example: Tuning for low-latency text retrieval
config = DeeptrainConfig(
embedding_model="text-embedding-3-small",
top_k=3, # Limit retrieval to top 3 relevant chunks
chunk_size=512, # Standardize chunk size for predictable latency
cache_results=True # Enable local caching for frequent queries
)
Video and Audio Processing Efficiency
The Transcribe API is the primary interface for ingesting media. To reduce processing time for large files, use the sampling_rate and frame_reduction parameters to limit the data density processed by the transformer.
Transcribe API Configuration
| Parameter | Type | Description | Optimization Impact |
| :--- | :--- | :--- | :--- |
| sampling_rate | int | Frequency of audio sampling (Hz). | Lower rates (e.g., 16000) speed up transcription. |
| frame_rate | float | Number of frames extracted per second from video. | Lowering FPS reduces vision processing overhead. |
| stream | bool | Whether to process data as a stream. | Set to True for real-time feedback. |
Example: High-speed video ingestion
from deeptrain.api import TranscribeAPI
# Process a video with reduced visual sampling to prioritize speed
transcriber = TranscribeAPI()
result = transcriber.process(
source="https://www.youtube.com/watch?v=example",
frame_rate=1.0, # Extract 1 frame per second
sampling_rate=16000, # Optimized for voice clarity
priority="latency" # Internal flag to prioritize speed over exhaustive analysis
)
Multi-modal Throughput
When handling multiple data types (Images, Graphs, and Text) simultaneously, performance can be optimized through asynchronous processing.
Asynchronous Ingestion
Deeptrain supports async operations to prevent visual content processing from blocking text-based retrievals.
import asyncio
from deeptrain import MultiModalConnector
async def process_data_stream(video_url, doc_path):
connector = MultiModalConnector()
# Run vision and text ingestion in parallel
task1 = connector.ingest_video(video_url)
task2 = connector.ingest_text(doc_path)
results = await asyncio.gather(task1, task2)
return results
Memory Management for Vision Tasks
Processing flowcharts and high-resolution graphs can be memory-intensive. To maintain system stability during high-load periods:
- Image Scaling: Use the
image_resolutionparameter to downscale high-resolution diagrams before they reach the vision encoder. - Batch Processing: For bulk ingestion of images or frames, use the
batch_sizeparameter to control the memory footprint.
# Optimize flowchart/graph analysis
graph_data = connector.process_diagram(
file_path="complex_flowchart.png",
target_resolution=(800, 600), # Resize for faster inference
mode="fast" # Use a lightweight vision pass
)
Advanced Configuration Summary
To achieve the best performance across different model architectures, refer to the following global settings:
- Model-Agnostic Caching: Deeptrain provides an internal caching layer that stores pre-computed embeddings for static content. Ensure
enable_global_cache=Trueis set in your environment variables. - Context Window Management: Since Deeptrain operates beyond native context limits, use the
relevancy_threshold(0.0 - 1.0) to filter out low-confidence data before it is sent to the LLM, reducing the token count and API costs.