Diagram Analysis
Overview of Diagram Analysis
Deeptrain’s diagram analysis engine bridges the gap between static visual assets and the logical reasoning required by LLMs. While traditional OCR focuses on text extraction, Deeptrain parses the structural relationships, decision nodes, and directional flows within technical diagrams. This allows AI agents to "understand" system architectures, business workflows, and complex data relations as actionable logic rather than just pixels.
Supported Visual Assets
The platform is optimized for a variety of technical and structural visuals:
- Flowcharts: Mapping decision paths, loops, and terminal points.
- System Architectures: Identifying components (databases, servers, APIs) and their interconnections.
- UML & Sequence Diagrams: Extracting temporal logic and interaction orders between entities.
- Graphs and Charts: Converting visual data points into structured numerical formats.
Logic Extraction Interface
To process a diagram, users interact with the multi-modal ingestion endpoint. Deeptrain converts the visual input into a structured schema—typically JSON or Markdown—that an LLM can ingest into its context window.
Usage Example
The following example demonstrates how to send a visual flowchart to Deeptrain and retrieve a logical mapping for use in an AI agent's prompt.
from deeptrain import MultiModalConnector
# Initialize the connector
dt = MultiModalConnector(api_key="your_api_key")
# Analyze a technical flowchart
analysis_result = dt.diagram.analyze(
source="./assets/system_workflow.png",
output_format="structured_logic",
detail_level="high"
)
# The result can now be passed directly to an LLM
print(analysis_result.logic_summary)
Input Parameters
| Parameter | Type | Description |
| :--- | :--- | :--- |
| source | String / File | Path to the image file (PNG, JPEG, SVG) or a direct URL. |
| output_format | String | Determines the structure of the data: json, markdown_list, or graphviz. |
| detail_level | String | low (summary) or high (full node-by-node extraction). |
Integrating with AI Agents
Once logic is extracted, it acts as "Visual Context." This allows you to build agents that can perform the following:
- Workflow Validation: Ask an agent, "Does this flowchart have any circular logic?"
- Code Generation: Provide a diagram and ask the agent to "Generate Python boilerplate for this architecture."
- Troubleshooting: Upload a system diagram and a log file, then ask the agent to "Identify which component in the diagram is likely failing based on these logs."
Output Structure
When using the structured_logic format, Deeptrain returns a standardized object representing the diagram's flow:
{
"nodes": [
{"id": "node_1", "label": "User Login", "type": "process"},
{"id": "node_2", "label": "Authenticated?", "type": "decision"}
],
"edges": [
{"from": "node_1", "to": "node_2", "label": "submit"},
{"from": "node_2", "to": "node_3", "label": "Yes"},
{"from": "node_2", "to": "node_4", "label": "No"}
],
"semantic_summary": "A user authentication workflow where credentials lead to a decision node..."
}
This structured data ensures that your AI models remain within their context limitations by providing high-density information without the noise of raw image processing.