Math and coding are rapidly falling to AI, and it looks like medicine might be next.
OpenAI’s o1-preview model handily outperformed human doctors at diagnosing illnesses, a research paperhas found. Titled “Superhuman performance of a large language model on the reasoning tasks of a physician,” the paper compared the accuracy of diagnoses of clinical cases between human doctors and OpenAI’s o-1 model, which uses chain-of-thought reasoning. In a particular test, the o1-preview model managed to correctly diagnose around 80 percent of cases, compared to around 60 percent accuracy for GPT-4 and just 30 percent for human clinicians.
“We evaluated the medical reasoning abilities of the o1-preview model across five diverse experiments, comparing the model to historical controls of human baselines and GPT-4,” the research paper said. “As in non-medical studies, we saw significant gains in performance for most tasks for o1-preview. For differential diagnosis generation, o1-preview surpasses both GPT-4 and previous non-LLM differential generators, as well as the human baseline,” it added.
The paper tested used several different kinds of medical cases to determine how the model performed in detecting them. The cases were fed into the model, and the model then gave its diagnosis. The results were scored by human doctors, who decided whether the diagnosis of the model was accurate. After the researchers had looked at the results, they called the performance of AI models “superhuman”.
Medicine does look to be a field that could very easily be disrupted by AI. Most modern doctors look at lab reports and hear symptoms from patients before making their diagnosis and prescribing their course of treatment. All this data can be fed into an LLMs, and modern LLMs are seemingly already able to parse it correctly and using their rich databases, able to diagnose the underlying condition. Additionally, AI systems have become extremely adept at seeing X-ray and other visual reports , and can also double up as radiologists. The current results indicate that it might not be long before most doctors have an LLM with them that helps them guide their diagnoses. And if these LLMs can produce good results, they might end up slowly chipping away at many doctor jobs going forward.