AI Agent Customization
Overview
Customizing AI agents within Deeptrain involves tailoring how models process and respond to multi-modal inputs. Since Deeptrain acts as a data connector, customization is achieved by configuring the ingestion pipelines and the localized embedding database that feeds your agent's context.
Multi-modal Input Configuration
Deeptrain allows you to extend the capabilities of standard LLMs by injecting non-textual data into the model's processing stream.
Text and Live Data Integration
To customize an agent beyond its native context window, you can configure a localized embedding database. This allows the agent to retrieve real-time content from specific live sources.
- Real-time Retrieval: Configure the agent to query live data sources before generating a response.
- Context Extension: Use the embedding database to store vast amounts of proprietary documentation, allowing the agent to "remember" information outside its training data.
# Example: Configuring a text-based agent with a localized data source
agent.configure_source(
type="text",
source_url="https://api.yourdomain.com/live-updates",
embedding_model="text-embedding-3-small",
refresh_rate="5m"
)
Computer Vision for Non-Vision Models
You can customize agents to interpret visual data (Images, Flowcharts, Graphs) even if the underlying LLM does not natively support vision. Deeptrain processes these inputs and converts them into a structured format the model can understand.
- Flowcharts and Graphs: Enable
graph_interpretationto allow agents to analyze logic flows or statistical data from images. - Image Tagging: Automatically generate descriptive metadata for images to provide context to the agent.
# Example: Enabling visual data processing
agent.enable_capability("vision", {
"support_graphs": True,
"ocr_enabled": True,
"detail_level": "high"
})
Video and Audio Customization
Deeptrain’s Transcribe API is the primary interface for customizing agents with temporal data. You can feed videos or audio files to expand an agent’s knowledge base or enable it to respond to user-submitted media.
Using the Transcribe API
The Transcribe API processes video/audio and provides a structured text-based representation (including timestamps and speaker identification) to the agent.
Endpoint: POST /api/v1/transcribe
| Parameter | Type | Description |
| :--- | :--- | :--- |
| source | String | URL of the video (YouTube, Vimeo, or self-hosted) or local file path. |
| model_priority | String | Defines the accuracy level (e.g., high_fidelity, fast_processing). |
| include_metadata | Boolean | Whether to include visual scene descriptions alongside text. |
// Sample API Request to customize agent knowledge via video
{
"agent_id": "agent_01",
"source": "https://vimeo.com/example-video",
"options": {
"transcribe": true,
"analyze_visuals": true
}
}
Model-Agnostic Setup
Deeptrain supports over 200 models. Customizing your agent behavior involves selecting the model that best fits your multi-modal data needs and configuring the connection parameters.
- Select Model: Choose from private (on-premise) or open-source models (e.g., Llama 3, Mistral, GPT-4).
- Define System Prompt: Customize the agent's "personality" and how it should prioritize different data modalities (e.g., "Always prioritize information found in video transcripts over general knowledge").
- Set Modality Weights: Adjust how much weight the agent gives to different data types (Text vs. Image vs. Audio).
# Example: Configuring model-agnostic agent parameters
agent.set_model("llama-3-70b", provider="ollama")
agent.set_system_instruction(
"You are a technical assistant. Use the provided flowchart data "
"to explain system architectures to users."
)
Best Practices for Customization
- Data Cleaning: Ensure that visual content (graphs/flowcharts) is high-resolution for better interpretation by the computer vision layer.
- Embedding Optimization: For large text datasets, use a localized embedding database to minimize latency and reduce costs associated with long-context tokens.
- Video Processing: When using the Transcribe API, enable
analyze_visualsif the video contains on-screen text or demonstrations essential for the agent's task.