Statistic · AI Tagging · May 2026
450× spread
How many images can a frontier multimodal LLM ingest in one API call?
As of May 2026, frontier multimodal LLMs accept anywhere from 8 to 3,600 images in a single API call — a 450× spread across providers. Google Gemini 1.5 Pro and Flash top the list at 3,600 images per request; Gemini 2.5 Pro follows at 3,000. Anthropic Claude allows 100 images on its 200K-context models (Opus 4.7, Sonnet 4.5) and 600 on its smaller-context variants. OpenAI publishes no hard per-request image cap for GPT-4o; the effective ceiling is bounded by your tokens-per-minute quota. Mistral Pixtral Large caps at 8.
TL;DR
- Google Gemini 1.5 Pro and Flash accept up to 3,600 images per request — a 450× higher ceiling than Mistral Pixtral Large's 8.
- Anthropic Claude caps at 100 images per request on its 200k-context models (Opus 4.7, Sonnet 4.5) and 600 on smaller-context models, but a 32 MB total payload limit usually trips first.
- OpenAI does not publish a hard per-request image cap for GPT-4o; the effective limit is bounded by your tokens-per-minute quota and context window. Community testing finds quality degradation past ~30 images.
- Mistral Pixtral Large's documented cap is 8 images per request, despite the model's 128K context window theoretically holding 30+.
- Image-token-cost ratios are wildly different too: Gemini bills 258 tokens per image, GPT-4o bills 85-2,125 tokens depending on detail level, Claude scales with megapixels.
Per-request image limits across frontier multimodal LLM APIs · May 2026
| Provider · model | Max images / request | Per-image cap | Context window | Source |
|---|---|---|---|---|
| Google — Gemini 1.5 Pro / Flash | 3,600 | 20 MB total payload | 1M / 2M tokens | ai.google.dev |
| Google — Gemini 2.5 Pro | 3,000 | 7 MB inline / 30 MB GCS | 1M tokens | docs.cloud.google.com |
| Anthropic — Claude (sub-200k models) | 600 | 5 MB; 8000×8000 px | varies | platform.claude.com |
| Anthropic — Claude Opus 4.7 / Sonnet 4.5 (200k context) | 100 | 5 MB; 8000×8000 px | 200K tokens | platform.claude.com |
| OpenAI — GPT-4o / GPT-5 | no official cap (TPM-bounded; ~16-30 effective) | 20 MB | 128K-400K tokens | platform.openai.com |
| Mistral — Pixtral Large | 8 | 20 MB | 128K tokens | platform-docs-public.pages.dev |
Methodology
This survey reads each provider's official vision/multimodal API documentation as it stood on May 28, 2026. We do not run our own black-box probes of the limit; we cite the vendor's published number. Where a provider distinguishes context-window tiers — for instance Claude's 200K-context Opus 4.7 and Sonnet 4.5 versus its smaller-context variants — each tier is listed as its own row in the table. OpenAI is the one provider that publishes no fixed per-request image ceiling for GPT-4o; we record "no official cap" and note that the practical bound comes from your tokens-per-minute quota and from reproducible community testing showing quality degradation past roughly 30 images. Image-generation models (DALL-E, Imagen, Stable Diffusion) and self-hosted vision weights are excluded — different API surfaces.
- For each provider, the maximum-images-per-request value was read from the provider's primary vision/multimodal documentation page.
- Every cited URL was loaded on 2026-05-28 and the limit re-confirmed against the page text.
- Where the provider distinguishes between context-window tiers (e.g. Claude's 200k vs smaller models), each tier is listed as a separate row.
What we excluded: Models with vision support only via fine-tunes or self-hosted weights without an official hosted-API limit (e.g. Llama 3.2 Vision via Hugging Face).; Image generation models (DALL-E 3, Gemini Imagen, Stable Diffusion) — different API surface.; Video understanding limits — covered in a separate statistic..
Frequently asked
Which provider lets you send the most images in one API call?
Google Gemini 1.5 Pro and Flash, at 3,600 images per request — verified against Google's image-understanding documentation on May 28, 2026. Gemini 2.5 Pro sits just behind at 3,000. Beyond Gemini, the next-most-permissive cap is Anthropic Claude's smaller-context models at 600. Practical caveat: every provider also enforces a payload-size limit (20 MB inline for Gemini, 32 MB for Claude). Most real workloads hit the payload ceiling before the image-count one.
Why does Mistral Pixtral Large only allow 8 images per request?
The 8-image cap is documented by Mistral but is conservative relative to Pixtral Large's 128K context window, which the open-weights community has demonstrated can hold 30+ high-resolution images. The discrepancy is API policy, not a hard model constraint. If you need higher per-request volume on Mistral, the practical workaround is to chain sequential requests under a shared system prompt, or to switch providers for batch workloads.
Does OpenAI publish a per-request image limit for GPT-4o?
No. OpenAI's vision documentation describes how image tokens count against your tokens-per-minute (TPM) quota and against the model's 128K-400K context window, but states no absolute images-per-request number. Independent community testing finds reliable behavior up to roughly 30 images per request and quality degradation past that, but it is not a hard cap — your TPM tier and image-detail setting determine the actual ceiling.
Sources
- Vision — Claude API Docs (general limits section) — Anthropic, accessed May 28, 2026.
- Image understanding — Gemini API — Google AI for Developers, accessed May 28, 2026.
- Gemini 2.5 Pro — model specifications — Google Cloud, accessed May 28, 2026.
- Vision capability — Mistral AI platform docs — Mistral AI, accessed May 28, 2026.
- Vision guide — OpenAI Platform — OpenAI, accessed May 28, 2026.