Local Storage & Self-Hosting
Local Storage & Self-Hosting
Deeptrain allows you to integrate data directly from your internal infrastructure, ensuring data privacy and reducing reliance on external cloud providers. This section covers how to configure and use local file systems and self-hosted servers as data sources for your AI agents.
Local File System Integration
You can source text, images, audio, and video directly from your local machine or network-attached storage (NAS). This is ideal for processing sensitive datasets or high-resolution media that is too large for frequent cloud uploads.
Sourcing Local Media
To process local files, provide the absolute path to the file or directory. Deeptrain’s loaders automatically detect the file format and apply the necessary pre-processing.
from deeptrain import DataConnector
# Initialize connector
connector = DataConnector()
# Load a local directory of documents for embedding
connector.load_source(
path="/home/user/data/documents/",
source_type="local",
content_type="text"
)
# Load a local video for multi-dimensional processing
video_data = connector.process_video(
path="./assets/training_video.mp4",
source_type="local"
)
Self-Hosted Infrastructure
For organizations using private clouds or self-hosted media servers (e.g., MinIO, private Vimeo instances, or custom HTTP servers), Deeptrain supports custom endpoint configurations.
Connecting to Private Servers
When using self-hosted platforms, you must specify the endpoint URL and any required authentication credentials.
# Sourcing from a self-hosted video server
connector.load_source(
path="https://media.internal.company.com/v/12345",
source_type="self_hosted",
auth_token="YOUR_INTERNAL_API_KEY"
)
Transcribe API (Local & Self-Hosted)
The Transcribe API is the primary interface for converting audio and video content from local or self-hosted sources into LLM-readable formats. It handles the extraction of dialogue and metadata for use in AI training.
API Specification
Method: connector.transcribe()
| Parameter | Type | Description |
| :--- | :--- | :--- |
| input_path | string | The local file path or self-hosted URL. |
| source | string | One of: "local", "self_hosted", "vimeo", "youtube". |
| language | string | (Optional) The language code (e.g., "en"). Defaults to auto-detect. |
| output_format | string | The desired format: "json", "text", or "embedding". |
Example Usage:
# Transcribing a local audio file for agent memory enhancement
transcription = connector.transcribe(
input_path="/data/audio/meeting_record.wav",
source="local",
output_format="text"
)
print(transcription)
# Output: "The meeting discussion centered around multi-modal integration..."
Configuration Settings
To streamline local operations, you can define global paths and environment variables for your self-hosted infrastructure.
| Environment Variable | Description |
| :--- | :--- |
| DEEPTRAIN_LOCAL_ROOT | Sets the default base directory for all local path lookups. |
| DEEPTRAIN_SELF_HOSTED_ENDPOINT | The default URL for your internal media server. |
| DEEPTRAIN_VERIFY_SSL | Set to False if using self-signed certificates on internal networks. |
Note: When using local storage, ensure the environment running Deeptrain has read permissions for the specified directories. For self-hosted video processing, the server must support byte-range requests for efficient seeking and processing.