Vision API pricing 2026: a 1 MP image costs from $0.0002 to $0.014.

Vision API pricing in May 2026 spans a 70× range. Across the seven leading vision-capable APIs — Google Cloud Vision API, OpenAI GPT-4o, Anthropic Claude, Google Gemini, Microsoft Azure AI Vision — analyzing a one-megapixel image costs 70× more on Claude Opus 4.7 than on Gemini 2.0 Flash, before a single output token is billed. The cost surface is not what it looks like on the per-token pricing page: the "mini" models, which appear cheap by token rate, can cost more per image than the flagship models because of how their image tokenizers expand pixels into tokens.

As of: May 28, 2026
Image tested: 1 MP (1000×1000 px)
Sample: n=7 products
Source: Vendor docs + math
Updated: Monthly
Topic: AI Tagging

Cost to analyze a single 1 MP image · May 2026

Input tokens only · output cost depends on response length

Provider · model	Image tokens (1 MP)	Input rate / 1M tokens	Cost / image	Cost / 1k images
Google — Gemini 2.0 Flashai.google.dev	~258 (1 tile)	~$0.075	$0.00019	$0.19
OpenAI — GPT-4o (high detail)platform.openai.com	765 (85 + 4 tiles)	$2.50	$0.0019	$1.91
Google — Gemini 1.5 Procloud.google.com	~1,000	$3.00	$0.0030	$3.00
Anthropic — Claude Sonnet 4.6platform.claude.com	~1,334	$3.00	$0.0040	$4.00
Anthropic — Claude Opus 4.7 (1 MP)platform.claude.com	~1,334	$5.00	$0.0067	$6.70
OpenAI — GPT-4o-mini (high detail, max)platform.openai.com	~48,000 (effective)	$0.15	$0.0072	$7.20
Anthropic — Claude Opus 4.7 (3 MP)platform.claude.com	~2,800	$5.00	$0.0140	$14.00

Token counts use each vendor's published image-tokenization formula. OpenAI tile formula: 85 + 170 × ceil(w/512) × ceil(h/512) for high-detail. Anthropic megapixel formula: tokens ≈ (width × height) / 750. Gemini tile formula: 258 tokens per 768×768 tile. Methodology →

Google Cloud Vision API pricing vs alternatives

If you arrived here searching for Google Cloud Vision API pricing, here is the comparison most pricing pages omit. Google publishes a per-feature rate (label detection, OCR, face detection) of roughly $1.50 per 1,000 images at the first volume tier — that is $0.0015 per image, broadly comparable to GPT-4o on a single high-detail call but cheaper than Claude Sonnet 4.6 or Opus 4.7 for the same workload. Classical CV providers like Google Cloud Vision win on per-image cost for closed-taxonomy tagging; frontier multimodal LLMs win when the task requires natural-language reasoning about an image but cost 2–10× more per call. OpenAI GPT-4 Vision API pricing sits in the middle: $0.0019 per 1 MP image on GPT-4o high-detail, with significant savings on batch and prompt caching. Azure Computer Vision pricing is roughly equivalent to Google Cloud Vision at the per-feature level. Claude vision pricing is the most expensive of the multimodal LLMs, ranging from $0.0040 (Sonnet 4.6) to $0.014 (Opus 4.7 at 3 MP).

The "mini" trap

GPT-4o-mini lists at $0.15 per million input tokens — sixteen times cheaper than full GPT-4o. Operators routinely assume that means they'll spend sixteen times less on image inference. They do not. OpenAI's tokenizer expands a maximum-size image (768 × 2048 px) into roughly 48,000 effective tokens on the mini model — versus about 765 tokens for the same image on full GPT-4o. The result is that processing a single large image can cost more on mini than on the flagship.

This is documented in OpenAI's developer community (linked in the methodology) but is not surfaced on the headline pricing page. It catches operators almost universally on first deployment.

What this means for product choices

Gemini 2.0 Flash is the cost-leader by a wide margin — roughly 10× cheaper per image than GPT-4o, ~20× cheaper than Claude Sonnet 4.6, ~70× cheaper than Claude Opus 4.7 at 3 MP. For high-volume taxonomic tagging where reasoning quality is sufficient, the math is decisive.
The Sonnet/Opus gap on Claude is meaningful. Same image, same megapixels — Opus 4.7 charges 1.7× the Sonnet price. The 3 MP version is 3.5× Sonnet's 1 MP cost, because Opus has a steeper image-token formula on high-res inputs.
GPT-4o is mid-pack on cost, top-pack on availability. It's not the cheapest, but it's one of the most documented and most operator-friendly for production deployment. The cost-quality tradeoff is real.
Avoid optimizing for token rate. The headline rate is misleading. Compute per-image cost using each vendor's image-token formula before architecting a pipeline; the rankings will surprise you.

How we computed this

For each provider we used the official published image-tokenization formula (or, where unpublished, vendor-confirmed numbers from developer documentation) multiplied by the current input rate per 1M tokens. Output tokens are not included — they vary by response length and dwarf input cost in the long tail. For Anthropic, the documented formula is tokens ≈ (w × h) / 750 (capped at 1,568 tokens for ≥1.19 MP on Sonnet, with Opus expanding up to ~3× that ratio on its highest resolutions). For OpenAI, the tile formula is published on the platform docs and confirmed via the OpenAI developer community. For Google, tile sizes vary by model (768×768 for Gemini 2.0 Flash; up to 2304×2304 for max-quality requests on Gemini 1.5 Pro).

What this statistic does not capture

Output tokens. A tagging prompt that returns 200 tokens of JSON costs roughly the same on output-rate; a long-form description with reasoning can return thousands. Output rate differences (often 4-10× input rate) frequently dominate total cost.
Batch discounts and prompt caching. Most providers offer 50% off for asynchronous batch processing. Anthropic and OpenAI both offer prompt-cache reads at meaningfully lower rates if the same system prompt is reused across image calls.
Quality. A more expensive image inference may produce a more accurate tag, more useful description, or richer structured output. The economics of quality vs cost are the subject of Report 03.
Egress / orchestration. If images live in another cloud or behind a CDN, the cost of moving them to the inference endpoint can rival the inference itself, especially at scale.

Sources

Anthropic vision pricing — platform.claude.com/docs/en/build-with-claude/vision
OpenAI pricing — developers.openai.com/api/docs/pricing
OpenAI image-token mini multiplier (community-confirmed) — OpenAI developer community thread
Google Gemini pricing & tile formulas — ai.google.dev/pricing

Cite this statistic

DAM LLM Research. "Per-image inference cost across frontier multimodal LLMs, May 2026." damllm.ai, 2026. https://damllm.ai/statistics/per-image-cost-multimodal-llms/