Transcribe API
Overview
The Transcribe API is Deeptrain's core gateway for converting raw multi-modal content into machine-readable text. It allows you to ingest audio and video from various sources—including local files, self-hosted servers, and external platforms like YouTube or Vimeo—and transform them into structured training data for your LLMs and AI agents.
By leveraging this API, you can bridge the gap between non-textual media and your localized embedding database, enabling your models to "listen" to and "watch" content to improve their contextual awareness.
API Specification
Endpoint: POST /v1/transcribe
The Transcribe API accepts either a direct file upload or a source URL. It processes the media asynchronously or synchronously depending on your configuration and returns the transcribed text along with metadata.
Request Parameters
| Parameter | Type | Required | Description |
| :--- | :--- | :--- | :--- |
| source_url | String | No* | The URL of the video or audio file (YouTube, Vimeo, S3, etc.). |
| file | Multipart | No* | The local binary file to be uploaded. |
| language | String | No | ISO 639-1 code (e.g., en, fr). Defaults to auto-detection. |
| output_format | String | No | The desired output structure (text, json, srt). Defaults to text. |
| training_sync | Boolean | No | If true, the result is automatically indexed into your Deeptrain embedding database. |
*Either source_url or file must be provided.
Supported Formats
- Audio: MP3, WAV, M4A, FLAC, AAC.
- Video: MP4, MOV, AVI, MKV, WEBM.
- External Platforms: YouTube, Vimeo, and direct links to self-hosted media files.
Usage Examples
Transcribing a YouTube Video for Training
To ingest a public video and automatically sync it to your AI agent's knowledge base, use the source_url parameter and set training_sync to true.
curl -X POST https://api.deeptrain.ai/v1/transcribe \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"source_url": "https://www.youtube.com/watch?v=example",
"language": "en",
"training_sync": true
}'
Transcribing a Local Audio File (Python SDK)
If you are using the Deeptrain Python client, you can pass a local file buffer directly.
from deeptrain import DeeptrainClient
client = DeeptrainClient(api_key="YOUR_API_KEY")
result = client.transcribe(
file_path="./meeting_notes.mp3",
language="en",
training_sync=False
)
print(f"Transcription: {result['text']}")
Response Schema
Upon a successful request, the API returns a JSON object containing the transcribed content and processing metadata.
{
"id": "trans_8472910",
"status": "completed",
"data": {
"text": "Welcome to the Deeptrain multi-modal tutorial...",
"duration": 124.5,
"language": "en",
"word_count": 450
},
"training_status": {
"synced": true,
"vector_id": "vec_001923"
}
}
Best Practices
- Batching: For large video libraries, use the asynchronous processing flag to avoid connection timeouts.
- Contextual Accuracy: For technical content (e.g., coding tutorials), ensure the
languageparameter is explicitly set to improve transcription accuracy for specific terminologies. - Data Privacy: When using local files, Deeptrain processes the data through secure, encrypted buffers before indexing them into your localized embedding database.