
Multimodal embedding model supporting text and images with 128K token context for semantic search and retrieval systems.
Cohere Embed v4 transforms text, images, and mixed content into unified vector representations for semantic search, RAG, document clustering, and similarity matching. The model supports 1,536-dimensional embeddings with flexible compression via Matryoshka embeddings (256, 512, 1024, 1536 dimensions). Priced at $0.12/1M text tokens and $0.47/1M image tokens, it delivers multimodal capabilities competitive with text-only alternatives. The API supports batch processing up to 128,000 tokens per request with asymmetric search optimization. Limitation: incompatible with v3 embeddings; corpus re-embedding required for upgrades.
Bottom line: *Use for multimodal retrieval, document understanding, and vector search requiring unified text-image embeddings. Avoid for text-only applications where cheaper models suffice or when legacy v3 embeddings must be preserved.*
Text embedding pricing at $0.12 per 1 million input tokens with no monthly minimum
Image embedding pricing at $0.47 per 1 million image tokens
Rate-limited free tier for development and proof-of-concept; not permitted for production use
Volume discounts and custom agreements for high-throughput workloads (billions of tokens monthly)
View full pricing on cohere.com →
Pricing may have changed since last verified. Check the official site for current plans.
No community ratings yet. Be the first to rate this tool!
No community benchmarks yet. Be the first to share a real-world data point.
No reviews yet. Be the first to share your experience.
No discussion yet. Sign in to start the conversation.
Spotted incorrect or missing data? Join our community of contributors.
Sign Up to ContributeBe the first to contribute. General notes, observations, gotchas, and tips from people who use this tool day-to-day.
Be the first to contribute. Concrete time/cost savings, with context. e.g. "Cut my code review backlog from 4h to 45m per week."
Embed v4 is Cohere’s unified multimodal embedding model released in April 2025. It processes text, images, and interleaved multimodal content in a single model, outputting 1,536-dimensional vectors by default. The model supports multiple embedding formats (float, int8, uint8, binary, ubinary, base64) and configurable output dimensions (256, 512, 1024, 1536) via Matryoshka Representation Learning.
The model achieves 65.2 MTEB retrieval score and demonstrates 35% improvement in cross-lingual retrieval over prior versions. It supports up to 128,000 tokens per request for batch operations and handles images up to 2,458,624 pixels. Asymmetric search optimization differentiates document vs. query embeddings for improved retrieval accuracy in RAG systems.
Available via Cohere Platform, AWS Bedrock, Azure AI Foundry, and SageMaker. Pricing structure: text at $0.12 per million tokens, images at $0.47 per million tokens, with pay-as-you-go billing. Enterprise customers may negotiate volume discounts.
Key architectural difference: Embed v4 embeddings are not backward-compatible with v3; migrating requires re-embedding the entire corpus. The model supports 100+ languages for text input and English for image input.
Share a real-world data point. Plausibility-checked by our AI moderator before publishing.