Transcribe API Reference

The Transcribe API is a core component of the Deeptrain platform, designed to convert multi-modal audio and video inputs into machine-readable text. This enables LLMs and AI agents to process content from sources that are traditionally inaccessible to text-based models.

Endpoint Overview

Request Parameters

The API accepts a multipart/form-data or application/json request body with the following parameters:

Supported Sources

The Transcribe API is built to be flexible across different hosting environments:

Streaming Platforms: YouTube, Vimeo.
Cloud Storage: AWS S3, Google Cloud Storage, or direct public URLs.
Local Storage: Files uploaded directly to the Deeptrain buffer.

Usage Examples

Python (Requests)

import requests

url = "https://api.deeptrain.ai/v1/transcribe"
headers = {"Authorization": "Bearer YOUR_API_KEY"}
data = {
    "source": "https://www.youtube.com/watch?v=example",
    "type": "video",
    "timestamps": True
}

response = requests.post(url, json=data, headers=headers)
print(response.json())

cURL

curl -X POST https://api.deeptrain.ai/v1/transcribe \
     -H "Authorization: Bearer YOUR_API_KEY" \
     -H "Content-Type: application/json" \
     -d '{
           "source": "https://vimeo.com/123456789",
           "type": "video",
           "language": "en"
         }'

Response Schema

Upon a successful request, the API returns a JSON object containing the transcribed text and associated metadata.

Success Response (200 OK)

Example Response Body

{
  "id": "tx_8829104",
  "status": "completed",
  "transcript": "Welcome to the Deeptrain multi-modal tutorial. In this video, we will cover...",
  "segments": [
    {
      "start": 0.0,
      "end": 5.2,
      "text": "Welcome to the Deeptrain multi-modal tutorial."
    }
  ],
  "metadata": {
    "duration": 124.5,
    "source_type": "youtube",
    "language_detected": "en"
  }
}

Error Codes

Internal Notes

While the Transcribe API handles the orchestration of transcription, the underlying processing is managed by the MediaProcessor internal class. This ensures model-agnostic compatibility by mapping the transcription output to the specific input requirements of the 200+ supported LLMs. Users do not need to interact with MediaProcessor directly.