Statistic · AI Tagging · From the AI Tagging Provider Index
9of 10
Vision APIs publishing per-unit pricing.
Nine of the 10 leading image-tagging APIs publish a per-unit price — per image, per 1,000 requests, or per 1,000 tokens — directly on their public pricing page. One provider gates pricing behind a sales conversation. For the three frontier multimodal LLMs, the unit is tokens, not images; mapping that back to per-image cost takes work that most operators don't realize at the planning stage.
Public per-unit pricing · by provider
| Provider | Pricing published | Unit | Notes |
|---|---|---|---|
| Google Cloud Vision | Yes | per 1,000 requests | Per-feature pricing tiers public. |
| AWS Rekognition | Yes | per 1,000 images | Tiered by volume; per-feature rates listed. |
| Azure AI Vision | Yes | per 1,000 transactions | Tiered F0/S1 pricing public. |
| Clarifai | Yes | per operation | Operation-based pricing on public site. |
| Imagga | Yes | per month / per image | Plan tiers with per-image fall-through. |
| Cloudinary AI | Yes | per credit | Credit-based model maps to AI operations. |
| Anthropic Claude (vision) | Yes | per 1M tokens | Per-token pricing public; images priced as input tokens. |
| OpenAI GPT-4o (vision) | Yes | per 1M tokens | Per-token pricing public; image-token pricing documented. |
| Google Gemini (vision) | Yes | per 1M tokens | Per-token pricing public. |
| Hive AI | Partial | — | Pricing page references plans but no per-unit number; sales contact required for production rates. |
"Yes" requires that an outside developer can find a per-unit price (per image, per request, per 1,000 ops, per 1M tokens) without filling in a contact form. Token-based pricing for multimodal LLMs is counted as "Yes" because the rate is public, even if mapping to per-image cost takes a calculator. Cells re-verified monthly. Methodology →
The token-cost gotcha
For Anthropic, OpenAI, and Google Gemini, the listed token rate looks competitive next to classical per-image pricing — until you do the conversion. A single high-resolution image to GPT-4o consumes ~1,500-3,000 input tokens; to Claude, several thousand. At list rates that puts a single image inference at roughly 5-15× the per-image cost of Google Cloud Vision or AWS Rekognition for a comparable label-extraction task. The reason teams pay the premium is what they get in return — open-ended reasoning, instruction following, multi-image context. But the cost surprise is real, and we see operators caught off-guard by it more often than any other line item in the index.
What counts
- Yes — per-unit price is published on the vendor's public pricing page.
- Partial — pricing page exists but per-unit rate is not disclosed; sales contact required.
- No — pricing is entirely behind a sales call.
Cite this statistic
DAM LLM Research. "Vision APIs publishing per-unit pricing, May 2026." damllm.ai, 2026. https://damllm.ai/statistics/vision-apis-with-public-pricing/