We're taking a big step towards medical superintelligence. AI models have aced multiple choice medical exams – but real patients don’t come with ABC answer options. Now MAI-DxO can solve some of the world’s toughest open-ended cases with higher accuracy and lower costs.
While AI has achieved near-perfect scores on the US Medical Licensing Exam, we set a higher benchmark: 304 cases from the New England Journal of Medicine. These are some of the toughest and most diagnostically complex cases a physician can face.
Microsoft AI built MAI-DxO to simulate a virtual panel of physicians with different approaches collaborating to find a diagnosis on each case. They also included the ability to set a budget to avoid infinite testing (higher costs, longer wait times, etc.).
What they found: - MAI-DxO boosted performance of every model tested on those 304 cases - 85.5% solve rate vs. 20% by a group of physicians - Its higher accuracy came with LOWER overall testing costs than lone LLMs or physicians
MAI-DxO in action, tackling one of those complex cases:
This research is just the first step on a long, exciting journey. We’re excited to keep testing and learning with our healthcare partners in pursuit of better, more accessible care for people everywhere. More on the blog today:
481,77K