MLLMs for Chart and Data Understanding: Reading Graphs Like a Human
Multimodal LLMs can now read charts, extract data from graphs, and answer questions about visualizations. Here's how well they actually work, where they fail, and how to use them effectively.
View all mllms depths βDepth ladder for this topic:
Someone sends you a screenshot of a dashboard with 6 charts. You need to extract the key numbers, identify trends, and write a summary. A year ago, youβd do this manually. Today, multimodal LLMs can do it β with caveats.
What MLLMs Can Actually Do with Charts
The Good
Trend identification β βSales are trending upward in Q3 and Q4β β models get this right 90%+ of the time. They understand the visual language of line charts, bar charts, and most common visualization types.
Relative comparisons β βCategory A is significantly larger than Category Bβ β reliable for obvious visual differences. Models can rank bars by height, compare line slopes, and identify the largest slice in a pie chart.
Chart type recognition β Models correctly identify bar charts, line charts, scatter plots, pie charts, heatmaps, and most standard visualization types.
Qualitative reading β βRevenue peaked in July then declined sharplyβ β models extract narrative-level insights well, similar to how a non-expert human would describe a chart.
The Unreliable
Exact value extraction β βWhat is the exact value for March?β β models frequently get this wrong, especially when the chart lacks data labels. They estimate values from axis positions but with significant error (Β±10β20% is common).
Dense charts β Charts with many overlapping lines, small labels, or tight spacing confuse models. A line chart with 3 lines is fine; one with 12 lines produces errors.
Axis reading β Models sometimes misread axis scales, especially logarithmic axes, dual axes, or axes that donβt start at zero. This is a major source of incorrect numerical claims.
Small differences β βIs the Q2 bar slightly taller than Q1?β β when differences are subtle (less than ~15% visually), models are unreliable.
Model Comparison (March 2026)
| Capability | GPT-5 Vision | Claude 4 | Gemini 2.5 Ultra |
|---|---|---|---|
| Trend identification | Excellent | Excellent | Excellent |
| Value extraction | Good | Good | Very Good |
| Complex charts | Good | Very Good | Good |
| Chart reasoning | Very Good | Excellent | Good |
| Table extraction | Excellent | Excellent | Very Good |
Geminiβs edge on value extraction comes from its training data β Google has extensive chart/data pairs. Claude tends to be more cautious, hedging when itβs uncertain rather than guessing (which is actually preferable for accuracy-critical applications).
Prompting for Chart Understanding
Basic: Get the Facts
Look at this chart and tell me:
1. What type of chart is this?
2. What are the axes?
3. What are the approximate values for each data point?
4. What is the main trend or insight?
Better: Structured Extraction
Extract data from this chart into a markdown table.
For each data point, provide:
- Category/date label (from x-axis)
- Value (estimated from y-axis)
- Confidence: HIGH if the value is labeled, MEDIUM if clearly readable
from the axis, LOW if estimated
Format as a table. If you're uncertain about any value, say so.
Best: Task-Specific with Context
This is a monthly revenue chart for our SaaS product.
I need you to:
1. Extract monthly values into a table (estimate is fine, note uncertainty)
2. Calculate month-over-month growth rates
3. Identify any anomalies (months that deviate from the trend)
4. Summarize in 2-3 sentences for an executive audience
Our fiscal year starts in April. Revenue should be in the $2-5M range
per month based on our business size.
Providing context (expected ranges, domain) helps the model calibrate its value estimates and catch its own errors.
Building a Chart Analysis Pipeline
For automated chart analysis at scale:
async def analyze_chart(image_path, chart_context=""):
# Step 1: Extract raw data
extraction_prompt = f"""
Extract all data from this chart into structured format.
Context: {chart_context}
Return as JSON with fields: chart_type, title, x_axis, y_axis,
data_points (list of {{label, value, confidence}})
"""
raw_data = await call_mllm(image_path, extraction_prompt)
# Step 2: Validate extraction
validation_prompt = f"""
I extracted this data from a chart: {raw_data}
Looking at the chart again, check:
- Are the values reasonable given the axis scale?
- Did I miss any data points?
- Are there any obvious errors?
Return corrected data if needed.
"""
validated_data = await call_mllm(image_path, validation_prompt)
# Step 3: Generate insights
insight_prompt = f"""
Based on this extracted data: {validated_data}
Provide 3-5 key insights. Focus on trends, anomalies,
and actionable observations.
"""
insights = await call_mllm(image_path, insight_prompt)
return {"data": validated_data, "insights": insights}
The two-pass approach (extract, then validate) significantly improves accuracy. The model catches many of its own errors when explicitly asked to double-check.
Handling Complex Dashboards
Real-world dashboards contain multiple charts, KPI cards, tables, and text. For these:
Crop individual charts. Processing one chart at a time is more accurate than asking the model to interpret an entire dashboard. Use simple image cropping to isolate each visualization.
Provide layout context. βThis is the top-left chart from a sales dashboard. The dashboard covers Q1 2026 performance.β Layout context helps the model interpret ambiguous labels.
Process KPI cards separately. Text-heavy elements (big numbers, metric cards) are easier for models to read accurately. Extract these first to establish baseline numbers that inform chart interpretation.
Limitations to Remember
- Donβt trust exact numbers unless the chart has data labels. Always verify critical values.
- Color-coded information can be unreliable β models sometimes confuse similar colors in legends.
- 3D charts are harder for models (and humans). Ask the chart creator for a 2D version.
- Interactive charts in screenshots lose hover/tooltip information. Screenshot with tooltips visible if possible.
- Low resolution kills accuracy. Use high-resolution screenshots (2x or retina resolution).
When to Use MLLMs vs. Structured Data
If you have access to the underlying data (CSV, database), always use that for exact analysis. MLLMs reading charts are estimating from pixels β useful when you only have the image, but inferior to actual data.
The sweet spot for MLLM chart analysis: processing screenshots from reports, slides, and dashboards where you donβt have access to the source data. For your own charts, query the data directly.
Simplify
β Benchmarking Multimodal LLMs: What to Measure and How
Go deeper
MLLMs for Code and Visual Reasoning: When Models Read Diagrams, Screenshots, and Whiteboards β
Related reads
Stay ahead of the AI curve
Weekly insights on AI β explained at the level that's right for you. No hype, no jargon, just what matters.
No spam. Unsubscribe anytime. We respect your inbox.