Physicians intentionally compared the AI system's performance to their own clinical reasoning to benchmark the model against expert human judgment.