OpenAI has just released HealthBench , an open-source benchmark for evaluating large language models in realistic clinical scenarios . Built with insights from 262 physicians across 60 countries , HealthBench includes 5,000 multi-turn, multilin...