Diagram Analysis

Overview of Diagram Analysis

Deeptrain’s diagram analysis engine bridges the gap between static visual assets and the logical reasoning required by LLMs. While traditional OCR focuses on text extraction, Deeptrain parses the structural relationships, decision nodes, and directional flows within technical diagrams. This allows AI agents to "understand" system architectures, business workflows, and complex data relations as actionable logic rather than just pixels.

Supported Visual Assets

The platform is optimized for a variety of technical and structural visuals:

Flowcharts: Mapping decision paths, loops, and terminal points.
System Architectures: Identifying components (databases, servers, APIs) and their interconnections.
UML & Sequence Diagrams: Extracting temporal logic and interaction orders between entities.
Graphs and Charts: Converting visual data points into structured numerical formats.

Logic Extraction Interface

To process a diagram, users interact with the multi-modal ingestion endpoint. Deeptrain converts the visual input into a structured schema—typically JSON or Markdown—that an LLM can ingest into its context window.

Usage Example

The following example demonstrates how to send a visual flowchart to Deeptrain and retrieve a logical mapping for use in an AI agent's prompt.

from deeptrain import MultiModalConnector

# Initialize the connector
dt = MultiModalConnector(api_key="your_api_key")

# Analyze a technical flowchart
analysis_result = dt.diagram.analyze(
    source="./assets/system_workflow.png",
    output_format="structured_logic",
    detail_level="high"
)

# The result can now be passed directly to an LLM
print(analysis_result.logic_summary)

Input Parameters

Integrating with AI Agents

Once logic is extracted, it acts as "Visual Context." This allows you to build agents that can perform the following:

Workflow Validation: Ask an agent, "Does this flowchart have any circular logic?"
Code Generation: Provide a diagram and ask the agent to "Generate Python boilerplate for this architecture."
Troubleshooting: Upload a system diagram and a log file, then ask the agent to "Identify which component in the diagram is likely failing based on these logs."

Output Structure

When using the structured_logic format, Deeptrain returns a standardized object representing the diagram's flow:

{
  "nodes": [
    {"id": "node_1", "label": "User Login", "type": "process"},
    {"id": "node_2", "label": "Authenticated?", "type": "decision"}
  ],
  "edges": [
    {"from": "node_1", "to": "node_2", "label": "submit"},
    {"from": "node_2", "to": "node_3", "label": "Yes"},
    {"from": "node_2", "to": "node_4", "label": "No"}
  ],
  "semantic_summary": "A user authentication workflow where credentials lead to a decision node..."
}

This structured data ensures that your AI models remain within their context limitations by providing high-density information without the noise of raw image processing.