0.920.940.960.9810.30.350.40.450.50.550.6
Model Averages Across Use CasesClaude-3.5-Sonnet-1022 (Overall)Claude-3.7-Sonnet (Overall)Claude-3.7-Sonnet-Thinking (Overall)Cohere-Command-A (Overall)Cohere-Command-R-Plus (Overall)DeepSeek-R1 (Overall)DeepSeek-V3 (Overall)DeepSeek-V3-0324 (Overall)GPT-4.1 (Overall)GPT-4.5 (Overall)GPT-4o-0513 (Overall)Gemini-1.5-Pro (Overall)Gemini-2.0-Flash-Thinking-0121 (Overall)Gemini-2.5-Pro-0325 (Overall)Grok-3-Think (Overall)Llama-3.1-405B (Overall)Llama-3.3-70B (Overall)Llama-4-Maverick-17B (Overall)Mistral-Large (Overall)Mistral-Large-2 (Overall)OpenAI-O1-1217 (Overall)OpenAI-O1-mini (Overall)OpenAI-O3-high (Overall)OpenAI-O3-medium (Overall)OpenAI-O3-mini-high (Overall)OpenAI-O3-mini-medium (Overall)IndustriesAdvertising & MarketingAgricultureCustomer Service & SupportDefenseDesign & Creative ServicesEducationFinancial ServicesGenericHealthcareHuman ResourcesKnowledge ManagementLegalLogisticsManufacturingMedia & EntertainmentPharmaceuticalReal Estate & ConstructionSciencesSoftware DevelopmentTechnologyTransportationUtilitiesModel Trust Scores: Model ComparisonsAffordability ScoreNormalized Inverse USD/1M TokensCapability ScoreModel's ability to successfully complete tasks and generate high-quality outputsX-axis:Y-axis:Industries:Show All IndustriesAffordability ScoreCapability Score