0.20.30.40.50.60.70.80.90.40.450.50.550.60.650.7
ModelsOpenAI-O3-highOpenAI-O3-mediumOpenAI-O3-mini-mediumOpenAI-O3-mini-highOpenAI-O1-1217OpenAI-O1-miniGPT-4.5GPT-4.1GPT-4o-0513DeepSeek-R1DeepSeek-V3DeepSeek-V3-0324Gemini-1.5-ProGemini-2.0-Flash-Thinking-0121Gemini-2.5-Pro-0325Claude-3.5-Sonnet-1022Claude-3.7-SonnetClaude-3.7-Sonnet-ThinkingLlama-3.3-70BLlama-3.1-405BLlama-4-Maverick-17BMistral-Large-2Mistral-LargeGrok-3-ThinkCohere-Command-R-PlusCohere-Command-AModel Trust Scores: Generic Model ComparisonsSafety ScoreModel's ability to handle tasks safely and avoid harmful outputsCapability ScoreModel's ability to successfully complete tasks and generate high-quality outputsX-axis:Y-axis:Color by:Safety ScoreCapability ScoreColor by Developer