Transcribe API Reference
Transcribe API Reference
The Transcribe API is a core component of the Deeptrain platform, designed to convert multi-modal audio and video inputs into machine-readable text. This enables LLMs and AI agents to process content from sources that are traditionally inaccessible to text-based models.
Endpoint Overview
| Method | Endpoint | Description |
| :--- | :--- | :--- |
| POST | /api/v1/transcribe | Transcribes audio or video content from a URL or local file upload. |
Request Parameters
The API accepts a multipart/form-data or application/json request body with the following parameters:
| Parameter | Type | Required | Description |
| :--- | :--- | :--- | :--- |
| source | string | Yes | The source of the media. Can be a public URL (YouTube, Vimeo), a self-hosted link, or a path to a local file. |
| type | string | Yes | The type of media being processed. Accepted values: video or audio. |
| model_hint | string | No | Optional hint to specify a preferred transcription engine or target LLM context (e.g., whisper-1, nova-2). Defaults to optimized multi-modal processing. |
| language | string | No | ISO 639-1 language code (e.g., en, es). If omitted, auto-detection is used. |
| timestamps | boolean | No | If true, the response will include word-level or segment-level timestamps. Defaults to false. |
Supported Sources
The Transcribe API is built to be flexible across different hosting environments:
- Streaming Platforms: YouTube, Vimeo.
- Cloud Storage: AWS S3, Google Cloud Storage, or direct public URLs.
- Local Storage: Files uploaded directly to the Deeptrain buffer.
Usage Examples
Python (Requests)
import requests
url = "https://api.deeptrain.ai/v1/transcribe"
headers = {"Authorization": "Bearer YOUR_API_KEY"}
data = {
"source": "https://www.youtube.com/watch?v=example",
"type": "video",
"timestamps": True
}
response = requests.post(url, json=data, headers=headers)
print(response.json())
cURL
curl -X POST https://api.deeptrain.ai/v1/transcribe \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"source": "https://vimeo.com/123456789",
"type": "video",
"language": "en"
}'
Response Schema
Upon a successful request, the API returns a JSON object containing the transcribed text and associated metadata.
Success Response (200 OK)
| Field | Type | Description |
| :--- | :--- | :--- |
| id | string | Unique identifier for the transcription job. |
| status | string | Current state (e.g., completed, processing). |
| transcript | string | The full text extracted from the media. |
| segments | array | (Optional) List of objects containing start, end, and text if timestamps was requested. |
| metadata | object | Details about the source media (duration, format, resolution). |
Example Response Body
{
"id": "tx_8829104",
"status": "completed",
"transcript": "Welcome to the Deeptrain multi-modal tutorial. In this video, we will cover...",
"segments": [
{
"start": 0.0,
"end": 5.2,
"text": "Welcome to the Deeptrain multi-modal tutorial."
}
],
"metadata": {
"duration": 124.5,
"source_type": "youtube",
"language_detected": "en"
}
}
Error Codes
| Code | Description |
| :--- | :--- |
| 400 Bad Request | Missing required parameters or unsupported file format. |
| 401 Unauthorized | Invalid or missing API Key. |
| 404 Not Found | The provided source URL could not be reached or the file does not exist. |
| 422 Unprocessable Entity | The media file is corrupted or the duration exceeds the account limit. |
Internal Notes
While the Transcribe API handles the orchestration of transcription, the underlying processing is managed by the MediaProcessor internal class. This ensures model-agnostic compatibility by mapping the transcription output to the specific input requirements of the 200+ supported LLMs. Users do not need to interact with MediaProcessor directly.