How many images can a frontier multimodal LLM ingest in one API call?

As of May 2026, frontier multimodal LLMs accept anywhere from 8 to 3,600 images in a single API call — a 450× spread across providers. Google Gemini 1.5 Pro and Flash top the list at 3,600 images per request; Gemini 2.5 Pro follows at 3,000. Anthropic Claude allows 100 images on its 200K-context models (Opus 4.7, Sonnet 4.5) and 600 on its smaller-context variants. OpenAI publishes no hard per-request image cap for GPT-4o; the effective ceiling is bounded by your tokens-per-minute quota. Mistral Pixtral Large caps at 8.

As of: May 28, 2026
Sample: n=6
Sources: 5 cited
Updated: Monthly
Topic: AI Tagging

TL;DR

Google Gemini 1.5 Pro and Flash accept up to 3,600 images per request — a 450× higher ceiling than Mistral Pixtral Large's 8.
Anthropic Claude caps at 100 images per request on its 200k-context models (Opus 4.7, Sonnet 4.5) and 600 on smaller-context models, but a 32 MB total payload limit usually trips first.
OpenAI does not publish a hard per-request image cap for GPT-4o; the effective limit is bounded by your tokens-per-minute quota and context window. Community testing finds quality degradation past ~30 images.
Mistral Pixtral Large's documented cap is 8 images per request, despite the model's 128K context window theoretically holding 30+.
Image-token-cost ratios are wildly different too: Gemini bills 258 tokens per image, GPT-4o bills 85-2,125 tokens depending on detail level, Claude scales with megapixels.

Per-request image limits across frontier multimodal LLM APIs · May 2026

6 rows · 5 sources · verified May 28, 2026

Provider · model	Max images / request	Per-image cap	Context window	Source
Google — Gemini 1.5 Pro / Flash	3,600	20 MB total payload	1M / 2M tokens	ai.google.dev
Google — Gemini 2.5 Pro	3,000	7 MB inline / 30 MB GCS	1M tokens	docs.cloud.google.com
Anthropic — Claude (sub-200k models)	600	5 MB; 8000×8000 px	varies	platform.claude.com
Anthropic — Claude Opus 4.7 / Sonnet 4.5 (200k context)	100	5 MB; 8000×8000 px	200K tokens	platform.claude.com
OpenAI — GPT-4o / GPT-5	no official cap (TPM-bounded; ~16-30 effective)	20 MB	128K-400K tokens	platform.openai.com
Mistral — Pixtral Large	8	20 MB	128K tokens	platform-docs-public.pages.dev

Methodology

This survey reads each provider's official vision/multimodal API documentation as it stood on May 28, 2026. We do not run our own black-box probes of the limit; we cite the vendor's published number. Where a provider distinguishes context-window tiers — for instance Claude's 200K-context Opus 4.7 and Sonnet 4.5 versus its smaller-context variants — each tier is listed as its own row in the table. OpenAI is the one provider that publishes no fixed per-request image ceiling for GPT-4o; we record "no official cap" and note that the practical bound comes from your tokens-per-minute quota and from reproducible community testing showing quality degradation past roughly 30 images. Image-generation models (DALL-E, Imagen, Stable Diffusion) and self-hosted vision weights are excluded — different API surfaces.

For each provider, the maximum-images-per-request value was read from the provider's primary vision/multimodal documentation page.
Every cited URL was loaded on 2026-05-28 and the limit re-confirmed against the page text.
Where the provider distinguishes between context-window tiers (e.g. Claude's 200k vs smaller models), each tier is listed as a separate row.

What we excluded: Models with vision support only via fine-tunes or self-hosted weights without an official hosted-API limit (e.g. Llama 3.2 Vision via Hugging Face).; Image generation models (DALL-E 3, Gemini Imagen, Stable Diffusion) — different API surface.; Video understanding limits — covered in a separate statistic..

Frequently asked

Which provider lets you send the most images in one API call?

Google Gemini 1.5 Pro and Flash, at 3,600 images per request — verified against Google's image-understanding documentation on May 28, 2026. Gemini 2.5 Pro sits just behind at 3,000. Beyond Gemini, the next-most-permissive cap is Anthropic Claude's smaller-context models at 600. Practical caveat: every provider also enforces a payload-size limit (20 MB inline for Gemini, 32 MB for Claude). Most real workloads hit the payload ceiling before the image-count one.

Why does Mistral Pixtral Large only allow 8 images per request?

The 8-image cap is documented by Mistral but is conservative relative to Pixtral Large's 128K context window, which the open-weights community has demonstrated can hold 30+ high-resolution images. The discrepancy is API policy, not a hard model constraint. If you need higher per-request volume on Mistral, the practical workaround is to chain sequential requests under a shared system prompt, or to switch providers for batch workloads.

Does OpenAI publish a per-request image limit for GPT-4o?

No. OpenAI's vision documentation describes how image tokens count against your tokens-per-minute (TPM) quota and against the model's 128K-400K context window, but states no absolute images-per-request number. Independent community testing finds reliable behavior up to roughly 30 images per request and quality degradation past that, but it is not a hard cap — your TPM tier and image-detail setting determine the actual ceiling.

Sources

Vision — Claude API Docs (general limits section) — Anthropic, accessed May 28, 2026.
Image understanding — Gemini API — Google AI for Developers, accessed May 28, 2026.
Gemini 2.5 Pro — model specifications — Google Cloud, accessed May 28, 2026.
Vision capability — Mistral AI platform docs — Mistral AI, accessed May 28, 2026.
Vision guide — OpenAI Platform — OpenAI, accessed May 28, 2026.

TL;DR

Methodology

Frequently asked

Sources

See also