0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0.4
0.45
0.5
0.55
0.6
0.65
0.7
Models
OpenAI-O3-high
OpenAI-O3-medium
OpenAI-O3-mini-medium
OpenAI-O3-mini-high
OpenAI-O1-1217
OpenAI-O1-mini
GPT-4.5
GPT-4.1
GPT-4o-0513
DeepSeek-R1
DeepSeek-V3
DeepSeek-V3-0324
Gemini-1.5-Pro
Gemini-2.0-Flash-Thinking-0121
Gemini-2.5-Pro-0325
Claude-3.5-Sonnet-1022
Claude-3.7-Sonnet
Claude-3.7-Sonnet-Thinking
Llama-3.3-70B
Llama-3.1-405B
Llama-4-Maverick-17B
Mistral-Large-2
Mistral-Large
Grok-3-Think
Cohere-Command-R-Plus
Cohere-Command-A
Model Trust Scores: Generic Model Comparisons
Safety Score
Model's ability to handle tasks safely and avoid harmful outputs
Capability Score
Model's ability to successfully complete tasks and generate high-quality outputs
X-axis:
Y-axis:
Color by:
Safety Score
▼
Capability Score
▼
Color by Developer
▼
plotly-logomark