Statistic · AI Tagging · From the AI Tagging Provider Index
6of 10
Vision APIs that let operators train custom models via API.
Six of the 10 leading image-tagging APIs ship a documented API surface for training on operator-supplied labeled data — not just classifying against a fixed taxonomy, but a real training pipeline. The split here is sharp: every classical CV provider in v1.0 except Cloudinary supports custom training; every frontier multimodal LLM is in a "Partial" or "No" state for vision-specific fine-tuning. This is the dimension that classical CV vendors still win.
Custom model training via API · by provider
| Provider | Custom training | Notes |
|---|---|---|
| Google Cloud Vision | Yes | AutoML Vision / Vertex AI custom training. |
| AWS Rekognition | Yes | Amazon Rekognition Custom Labels. |
| Azure AI Vision | Yes | Azure Custom Vision service. |
| Clarifai | Yes | Custom-training is a hallmark feature, end-to-end API. |
| Imagga | Yes | Custom Training API documented. |
| Hive AI | Yes | Hive AutoML for custom moderation/tagging. |
| OpenAI GPT-4o (vision) | Partial | Fine-tuning generally available for text; vision fine-tuning support exists in rollout and is improving rapidly. |
| Google Gemini (vision) | Partial | Vertex AI supports tuning Gemini models; vision-specific tuning surface less developed than text. |
| Anthropic Claude (vision) | No | No public fine-tuning offered for Claude models in v1.0. |
| Cloudinary AI | No | AI tagging is delivered via swappable third-party models; no first-party custom training surface. |
"Yes" requires a documented API or workflow for uploading labeled training data, kicking off training, and serving the resulting custom model. "Partial" means fine-tuning is supported in principle but vision coverage is incomplete or in preview. Cells re-verified monthly. Methodology →
Why this still matters
For high-volume, narrow-domain tagging — your specific product catalog, your specific brand assets, your specific defect-detection task — a small custom-trained classical CV model still beats a frontier LLM on cost, latency, and predictability by an order of magnitude. The frontier multimodal models are catching up, but in v1.0 if your problem is "tag 10 million product photos against my 800-SKU catalog," you are not yet picking Claude or GPT-4o to do it.
What counts
- Yes — documented API or workflow for training a custom model on operator-supplied labeled data.
- Partial — fine-tuning is offered but vision coverage is limited, in preview, or rolling out.
- No — no first-party custom-training capability.
Cite this statistic
DAM LLM Research. "Vision APIs with custom model training, May 2026." damllm.ai, 2026. https://damllm.ai/statistics/vision-apis-with-custom-training/