Localized Embeddings
Overview
Localized embeddings enable Deeptrain to process and retrieve data that exceeds the native context window of your LLM. By maintaining a local vector database, Deeptrain can perform real-time content retrieval, feeding only the most relevant snippets of your live data sources into the model's prompt. This approach ensures your AI agents stay context-aware without the high latency or costs associated with processing massive datasets in every request.
Initializing the Localized Database
To start using localized embeddings, you must first initialize a vector storage instance. Deeptrain is model-agnostic, meaning you can use embeddings from various providers (OpenAI, HuggingFace, etc.) to populate your local database.
from deeptrain import LocalEmbeddings
# Initialize the embedding engine
# You can specify the model used to generate vector representations
db = LocalEmbeddings(
storage_path="./data/embeddings",
model="text-embedding-3-small"
)
Data Ingestion
Deeptrain allows you to source data from multiple formats and live streams. When data is added, it is automatically chunked, embedded, and indexed for retrieval.
Adding Static Text
For documents or pre-existing datasets:
# Add local text files or strings
db.add_source("path/to/document.pdf")
db.add_source("This is a raw string of text to be indexed.")
Managing Live Data Sources
One of Deeptrain's core strengths is its ability to handle live content. You can connect the localized embedding database to real-time feeds.
# Connect to a live data source for real-time indexing
db.connect_live_source(
url="https://api.yourservice.com/v1/updates",
refresh_interval=300 # refresh every 5 minutes
)
Retrieval and Context Enhancement
Once the database is populated, you can retrieve relevant context to enhance LLM responses. This process identifies the most semantically similar chunks based on a user's query.
Basic Querying
Retrieve the top k most relevant segments from your localized database:
query = "What are the latest updates on the project timeline?"
context_segments = db.query(query, top_k=5)
# context_segments returns a list of strings to be injected into your prompt
Integration with LLM Flows
Pass the retrieved context directly into your model's context window:
from deeptrain import Agent
agent = Agent(model="gpt-4-turbo")
# The agent uses the localized DB to fetch context before generating a response
response = agent.ask(
"How do I set up the new video processing feature?",
use_embeddings=True
)
print(response)
Configuration Options
The behavior of the localized embedding database can be tuned via the following parameters:
| Parameter | Type | Description |
| :--- | :--- | :--- |
| chunk_size | int | The maximum number of tokens per text segment. |
| overlap | int | The number of overlapping tokens between segments to maintain context. |
| distance_metric | string | The method used to calculate similarity (e.g., cosine, euclidean). |
| persistence | bool | Whether to save the database to disk for future sessions. |
Best Practices
- Periodic Re-indexing: If your live data sources change frequently, ensure your
refresh_intervalis aligned with the data's volatility. - Model Matching: Use the same embedding model for both ingestion and querying to ensure vector consistency.
- Chunk Optimization: For technical documentation, smaller
chunk_sizevalues (200-500 tokens) typically yield more precise retrieval results.