Real-time Live Retrieval
Real-time Live Retrieval
Deeptrain’s Real-time Live Retrieval system allows AI agents to transcend static training data by fetching, embedding, and injecting live content into the model's context during inference. By utilizing a localized embedding database, Deeptrain enables models to access up-to-the-minute information from external data sources without requiring constant fine-tuning.
Overview
Standard LLMs are limited by their knowledge cutoff and context window constraints. Deeptrain solves this by:
- Sourcing: Connecting to live web data, local files, or API streams.
- Vectorization: Converting that data into high-dimensional embeddings in real-time.
- Injection: Retrieving only the most relevant "chunks" of data and providing them to the LLM to ground its response in factual, current data.
Configuration
To enable live retrieval, you must configure a LiveSource connector. This connector defines where the data originates and how frequently the localized database should refresh its index.
from deeptrain import MultiModalConnector
# Initialize the connector with a live web source
connector = MultiModalConnector(
source_type="live_web",
refresh_interval="5m", # Refresh embeddings every 5 minutes
embedding_model="text-embedding-3-small"
)
# Connect a specific URL or data stream
connector.add_source("https://news.example.com/live-updates")
Usage: Retrieval-Augmented Inference
Once a live source is configured, the retrieval mechanism automatically intercepts queries to provide context. You can also manually trigger retrieval to inspect the data being fed to your agent.
Automatic Retrieval
In most implementations, Deeptrain handles the "Retrieve-then-Generate" flow automatically via the query method.
response = connector.query(
prompt="What are the latest updates on the project?",
use_live_context=True,
top_k=5 # Retrieve the 5 most relevant segments from the live source
)
print(response.content)
Manual Retrieval API
If you need to process the retrieved data before sending it to the LLM, use the retrieve method.
Endpoint / Method: connector.retrieve(query_string, limit)
| Parameter | Type | Description |
| :--- | :--- | :--- |
| query_string | string | The natural language query used to search the embedding database. |
| limit | int | The number of context chunks to return (default: 3). |
| min_score | float | The similarity threshold (0.0 to 1.0) for context relevance. |
Example Response:
[
{
"id": "chunk_9921",
"content": "The live stream indicates that the mission successfully landed at 14:00 UTC.",
"source": "https://news.example.com/live-updates",
"similarity_score": 0.98,
"timestamp": "2023-10-27T14:05:00Z"
}
]
Context Window Optimization
One of Deeptrain’s primary advantages is its ability to operate beyond predefined context window limitations. Instead of stuffing a massive document into the prompt, the Real-time Live Retrieval engine performs a semantic search to select only the relevant snippets. This reduces latency and token costs while maintaining high accuracy.
- Localized Embedding Database: Deeptrain maintains a local vector index of your live data to ensure low-latency retrieval (typically <50ms).
- Dynamic Pruning: Irrelevant data is automatically filtered out based on the
min_scoreparameter, ensuring the AI agent only receives high-signal information.
Supported Live Sources
Deeptrain currently supports real-time retrieval from:
- Web Scrapers: Real-time extraction from HTML/JavaScript-heavy sites.
- Cloud Storage: Live monitoring of S3 buckets or Google Drive folders.
- API Integrations: Direct ingestion from Slack, Jira, or custom JSON endpoints.
- Video/Audio Streams: Real-time transcription via the Transcribe API followed by immediate text embedding.