Security & Data Privacy

Data Privacy Architecture

Deeptrain is built with a "Privacy-First" approach to multi-modal data integration. Because the platform acts as a bridge between your proprietary data (local files, private databases) and Large Language Models (LLMs), we prioritize data sovereignty and secure transit.

Localized Embedding Database

One of Deeptrain’s core security features is the support for localized embedding databases. Unlike cloud-only vector stores that require uploading your raw data to a third-party provider, Deeptrain allows you to maintain your embeddings locally.

Data Residency: Your raw text, images, and audio metadata remain within your infrastructure.
Real-time Retrieval: Deeptrain queries your local database to enhance AI responses without exposing the entire dataset to the LLM provider.
Vector Isolation: Only relevant chunks of context are sent to the model via the prompt, minimizing data exposure.

Secure Model Connectivity

Deeptrain supports over 200 private and open-source models. Security for these connections is handled through standard authentication protocols.

API Key Management

When connecting to private model providers (e.g., OpenAI, Anthropic, or self-hosted instances), Deeptrain utilizes environment-based configuration to ensure keys are never hardcoded or logged.

# Example configuration for secure model connection
from deeptrain import MultiModalConnector

connector = MultiModalConnector(
    model_name="your-private-model",
    api_key=os.getenv("DEEPTRAIN_MODEL_API_KEY"),
    base_url="https://your-private-endpoint.internal" # For VPC-hosted models
)

End-to-End Encryption

All data transmitted between Deeptrain and external model providers is encrypted using TLS 1.2+. When utilizing self-hosted models, ensure your endpoints are wrapped in a secure layer to maintain this encryption chain.

Multi-modal Data Handling

Processing non-textual data like video and audio introduces unique privacy challenges. Deeptrain manages these via strict processing pipelines.

Video and Audio Processing

When using the Transcribe API or local storage connectors:

Local Processing: Files stored on local storage are processed in-place. Deeptrain does not mirror your media files to its own servers unless explicitly configured for cloud-relay.
Third-party Integration: For platforms like YouTube or Vimeo, Deeptrain only fetches the necessary streams for processing and does not store the original media beyond the active session cache.

Transcription Privacy

The Transcribe API can be configured to use local inference engines or secure cloud providers. When using the API to convert user video/audio into text for training:

Input Sanitization: Deeptrain recommends stripping PII (Personally Identifiable Information) before passing transcription outputs into the localized embedding database.
Transient State: Media files processed via the API are treated as transient; once the transcription is generated and vectorized, the source media is cleared from the processing buffer.

Best Practices for Private Deployments

To maximize security when using Deeptrain in a production environment, follow these guidelines:

VPC Isolation: Deploy Deeptrain within a Virtual Private Cloud (VPC) to ensure that the localized embedding database is not accessible via the public internet.
Least Privilege Access: Configure your API keys with the minimum scopes required for transcription or model inference.
Audit Logging: Enable logging for the MultiModalConnector to track which data sources are being accessed by your AI agents, but ensure sensitive content is masked in logs.
Local Inference: For high-security environments, pair Deeptrain with locally hosted open-source models (e.g., Llama 3 via vLLM) to ensure data never leaves your network.

# Example: Setting up a local processing environment
export DEEPTRAIN_DATA_DIR="/mnt/secure/embeddings"
export DEEPTRAIN_ENCRYPTION_KEY="your-system-level-key"

By leveraging local embeddings and model-agnostic routing, Deeptrain ensures that your multi-modal AI capabilities do not come at the cost of data privacy or compliance.