0.92
0.94
0.96
0.98
1
0.3
0.35
0.4
0.45
0.5
0.55
0.6
Model Averages Across Use Cases
Claude-3.5-Sonnet-1022 (Overall)
Claude-3.7-Sonnet (Overall)
Claude-3.7-Sonnet-Thinking (Overall)
Cohere-Command-A (Overall)
Cohere-Command-R-Plus (Overall)
DeepSeek-R1 (Overall)
DeepSeek-V3 (Overall)
DeepSeek-V3-0324 (Overall)
GPT-4.1 (Overall)
GPT-4.5 (Overall)
GPT-4o-0513 (Overall)
Gemini-1.5-Pro (Overall)
Gemini-2.0-Flash-Thinking-0121 (Overall)
Gemini-2.5-Pro-0325 (Overall)
Grok-3-Think (Overall)
Llama-3.1-405B (Overall)
Llama-3.3-70B (Overall)
Llama-4-Maverick-17B (Overall)
Mistral-Large (Overall)
Mistral-Large-2 (Overall)
OpenAI-O1-1217 (Overall)
OpenAI-O1-mini (Overall)
OpenAI-O3-high (Overall)
OpenAI-O3-medium (Overall)
OpenAI-O3-mini-high (Overall)
OpenAI-O3-mini-medium (Overall)
Industries
Advertising & Marketing
Agriculture
Customer Service & Support
Defense
Design & Creative Services
Education
Financial Services
Generic
Healthcare
Human Resources
Knowledge Management
Legal
Logistics
Manufacturing
Media & Entertainment
Pharmaceutical
Real Estate & Construction
Sciences
Software Development
Technology
Transportation
Utilities
Model Trust Scores: Model Comparisons
Affordability Score
Normalized Inverse USD/1M Tokens
Capability Score
Model's ability to successfully complete tasks and generate high-quality outputs
X-axis:
Y-axis:
Industries:
Show All Industries
▼
Affordability Score
▼
Capability Score
▼
plotly-logomark