Cohere Embed v4 vs Llama 3.2 90B Vision Instruct

Cohere Embed v4 and Llama 3.2 90B Vision Instruct are both large language models tracked by AIDiveForge. Below is a side-by-side comparison of pricing, capabilities, platforms, and ownership — sourced from each tool's live website and verified before publishing.

Cohere Embed v4

Cohere Embed v4 transforms text, images, and mixed content into unified vector representations for semantic search, RAG, document clustering, and similarity matching. The model supports 1,536-dimensional embeddings with flexible compression via Matryoshka embeddings (256, 512, 1024, 1536 dimensions). Priced at $0.12/1M text tokens and $0.47/1M image tokens, it delivers multimodal capabilities competitive with text-only alternatives. The API supports batch processing up to 128,000 tokens per request with asymmetric search optimization. Limitation: incompatible with v3 embeddings; corpus re-embedding required for upgrades.

Llama 3.2 90B Vision Instruct

Meta's 90B multimodal large language model with vision capabilities, fine-tuned for instruction-following across text and image understanding tasks.

Attribute	Cohere Embed v4	Llama 3.2 90B Vision Instruct
Pricing	Paid	—
Price	$0.12 per 1M text tokens; $0.47 per 1M image tokens	—
Free trial	0 days	No
Open source	No	Yes
Has API	Yes	No
Self-hosted option	No	No
Platforms	Cohere Platform, AWS Bedrock, Azure AI Foundry, Amazon SageMaker, GitHub Models	—
Languages	English and 100+ languages for text input; English for image input	—
Released	2025-04-15	—
Pros	Unified multimodal model reduces infrastructure complexity Competitive pricing at $0.12/1M tokens for text embeddings Flexible output dimensions (256-1536) via Matryoshka embeddings reduce storage and latency Strong MTEB performance (65.2) with 35% cross-lingual retrieval improvement Supports asymmetric search for optimized query-document retrieval	Strong multimodal capabilities combining text and vision in a single model Competitive performance with proprietary vision models like GPT-4V Fully open-source with published weights under permissive license Efficient 90B parameter size suitable for on-premise deployment Excellent instruction-following and reasoning abilities
Cons	Embed v4 vectors incompatible with v3; requires full corpus re-embedding for migrations Image pricing ($0.47/1M tokens) is higher than text and limits image-heavy workloads Trial keys rate-limited and unusable for production, requiring immediate production key conversion	Requires significant computational resources (GPU memory) for inference Vision performance not yet benchmarked against all major proprietary competitors Slightly lower performance on some specialized vision tasks compared to larger proprietary models

Bottom line

Llama 3.2 90B Vision Instruct is open source; only Cohere Embed v4 exposes a public API. Choose based on which difference matters most for your workflow.

Comparison data is sourced and verified by the AIDiveForge data pipeline. AIDiveForge is editorially independent.