DDAM LLMIndependent research · AI × DAM

Statistic · AI Tagging · May 2026

450× spread

How many images can a frontier multimodal LLM ingest in one API call?

As of May 2026, frontier multimodal LLMs accept anywhere from 8 to 3,600 images in a single API call — a 450× spread across providers. Google Gemini 1.5 Pro and Flash top the list at 3,600 images per request; Gemini 2.5 Pro follows at 3,000. Anthropic Claude allows 100 images on its 200K-context models (Opus 4.7, Sonnet 4.5) and 600 on its smaller-context variants. OpenAI publishes no hard per-request image cap for GPT-4o; the effective ceiling is bounded by your tokens-per-minute quota. Mistral Pixtral Large caps at 8.

As of
May 28, 2026
Sample
n=6
Sources
5 cited
Updated
Monthly
Topic
AI Tagging

TL;DR

  • Google Gemini 1.5 Pro and Flash accept up to 3,600 images per request — a 450× higher ceiling than Mistral Pixtral Large's 8.
  • Anthropic Claude caps at 100 images per request on its 200k-context models (Opus 4.7, Sonnet 4.5) and 600 on smaller-context models, but a 32 MB total payload limit usually trips first.
  • OpenAI does not publish a hard per-request image cap for GPT-4o; the effective limit is bounded by your tokens-per-minute quota and context window. Community testing finds quality degradation past ~30 images.
  • Mistral Pixtral Large's documented cap is 8 images per request, despite the model's 128K context window theoretically holding 30+.
  • Image-token-cost ratios are wildly different too: Gemini bills 258 tokens per image, GPT-4o bills 85-2,125 tokens depending on detail level, Claude scales with megapixels.

Per-request image limits across frontier multimodal LLM APIs · May 2026

6 rows · 5 sources · verified May 28, 2026

Provider · modelMax images / requestPer-image capContext windowSource
Google — Gemini 1.5 Pro / Flash3,60020 MB total payload1M / 2M tokensai.google.dev
Google — Gemini 2.5 Pro3,0007 MB inline / 30 MB GCS1M tokensdocs.cloud.google.com
Anthropic — Claude (sub-200k models)6005 MB; 8000×8000 pxvariesplatform.claude.com
Anthropic — Claude Opus 4.7 / Sonnet 4.5 (200k context)1005 MB; 8000×8000 px200K tokensplatform.claude.com
OpenAI — GPT-4o / GPT-5no official cap (TPM-bounded; ~16-30 effective)20 MB128K-400K tokensplatform.openai.com
Mistral — Pixtral Large820 MB128K tokensplatform-docs-public.pages.dev

Methodology

This survey reads each provider's official vision/multimodal API documentation as it stood on May 28, 2026. We do not run our own black-box probes of the limit; we cite the vendor's published number. Where a provider distinguishes context-window tiers — for instance Claude's 200K-context Opus 4.7 and Sonnet 4.5 versus its smaller-context variants — each tier is listed as its own row in the table. OpenAI is the one provider that publishes no fixed per-request image ceiling for GPT-4o; we record "no official cap" and note that the practical bound comes from your tokens-per-minute quota and from reproducible community testing showing quality degradation past roughly 30 images. Image-generation models (DALL-E, Imagen, Stable Diffusion) and self-hosted vision weights are excluded — different API surfaces.

  • For each provider, the maximum-images-per-request value was read from the provider's primary vision/multimodal documentation page.
  • Every cited URL was loaded on 2026-05-28 and the limit re-confirmed against the page text.
  • Where the provider distinguishes between context-window tiers (e.g. Claude's 200k vs smaller models), each tier is listed as a separate row.

What we excluded: Models with vision support only via fine-tunes or self-hosted weights without an official hosted-API limit (e.g. Llama 3.2 Vision via Hugging Face).; Image generation models (DALL-E 3, Gemini Imagen, Stable Diffusion) — different API surface.; Video understanding limits — covered in a separate statistic..

Frequently asked

Which provider lets you send the most images in one API call?

Google Gemini 1.5 Pro and Flash, at 3,600 images per request — verified against Google's image-understanding documentation on May 28, 2026. Gemini 2.5 Pro sits just behind at 3,000. Beyond Gemini, the next-most-permissive cap is Anthropic Claude's smaller-context models at 600. Practical caveat: every provider also enforces a payload-size limit (20 MB inline for Gemini, 32 MB for Claude). Most real workloads hit the payload ceiling before the image-count one.

Why does Mistral Pixtral Large only allow 8 images per request?

The 8-image cap is documented by Mistral but is conservative relative to Pixtral Large's 128K context window, which the open-weights community has demonstrated can hold 30+ high-resolution images. The discrepancy is API policy, not a hard model constraint. If you need higher per-request volume on Mistral, the practical workaround is to chain sequential requests under a shared system prompt, or to switch providers for batch workloads.

Does OpenAI publish a per-request image limit for GPT-4o?

No. OpenAI's vision documentation describes how image tokens count against your tokens-per-minute (TPM) quota and against the model's 128K-400K context window, but states no absolute images-per-request number. Independent community testing finds reliable behavior up to roughly 30 images per request and quality degradation past that, but it is not a hard cap — your TPM tier and image-detail setting determine the actual ceiling.

Sources

  1. Vision — Claude API Docs (general limits section)Anthropic, accessed May 28, 2026.
  2. Image understanding — Gemini APIGoogle AI for Developers, accessed May 28, 2026.
  3. Gemini 2.5 Pro — model specificationsGoogle Cloud, accessed May 28, 2026.
  4. Vision capability — Mistral AI platform docsMistral AI, accessed May 28, 2026.
  5. Vision guide — OpenAI PlatformOpenAI, accessed May 28, 2026.

See also