Localized Embeddings

Overview

Localized embeddings enable Deeptrain to process and retrieve data that exceeds the native context window of your LLM. By maintaining a local vector database, Deeptrain can perform real-time content retrieval, feeding only the most relevant snippets of your live data sources into the model's prompt. This approach ensures your AI agents stay context-aware without the high latency or costs associated with processing massive datasets in every request.

Initializing the Localized Database

To start using localized embeddings, you must first initialize a vector storage instance. Deeptrain is model-agnostic, meaning you can use embeddings from various providers (OpenAI, HuggingFace, etc.) to populate your local database.

from deeptrain import LocalEmbeddings

# Initialize the embedding engine
# You can specify the model used to generate vector representations
db = LocalEmbeddings(
    storage_path="./data/embeddings",
    model="text-embedding-3-small"
)

Data Ingestion

Deeptrain allows you to source data from multiple formats and live streams. When data is added, it is automatically chunked, embedded, and indexed for retrieval.

Adding Static Text

For documents or pre-existing datasets:

# Add local text files or strings
db.add_source("path/to/document.pdf")
db.add_source("This is a raw string of text to be indexed.")

Managing Live Data Sources

One of Deeptrain's core strengths is its ability to handle live content. You can connect the localized embedding database to real-time feeds.

# Connect to a live data source for real-time indexing
db.connect_live_source(
    url="https://api.yourservice.com/v1/updates",
    refresh_interval=300 # refresh every 5 minutes
)

Retrieval and Context Enhancement

Once the database is populated, you can retrieve relevant context to enhance LLM responses. This process identifies the most semantically similar chunks based on a user's query.

Basic Querying

Retrieve the top k most relevant segments from your localized database:

query = "What are the latest updates on the project timeline?"
context_segments = db.query(query, top_k=5)

# context_segments returns a list of strings to be injected into your prompt

Integration with LLM Flows

Pass the retrieved context directly into your model's context window:

from deeptrain import Agent

agent = Agent(model="gpt-4-turbo")

# The agent uses the localized DB to fetch context before generating a response
response = agent.ask(
    "How do I set up the new video processing feature?",
    use_embeddings=True
)
print(response)

Configuration Options

The behavior of the localized embedding database can be tuned via the following parameters:

Best Practices

Periodic Re-indexing: If your live data sources change frequently, ensure your refresh_interval is aligned with the data's volatility.
Model Matching: Use the same embedding model for both ingestion and querying to ensure vector consistency.
Chunk Optimization: For technical documentation, smaller chunk_size values (200-500 tokens) typically yield more precise retrieval results.