ChatGPT was able to outperform human doctors in diagnosing diseases and medical conditions in a study. The findings of the study were published last month and highlighted that artificial intelligence (AI) chatbots might be more efficient in analysing patient histories and conditions and provide more accurate diagnoses. While the study aimed to understand if AI chatbots could help doctors provide better diagnoses, the results unexpectedly revealed that OpenAI's GPT-4-powered chatbot performed much better when performing without human assistance compared to when paired with a doctor.
ChatGPT Outperforms Doctors in Diagnosing Diseases
The study, published in the JAMA Network Open journal, was conducted at the Beth Israel Deaconess Medical Center in Boston by a group of researchers. The experiment aimed to find out if AI can help doctors better diagnose diseases compared to traditional methods.
According to a New York Times report, the experiment involved 50 doctors who were a mix of residents and physicians attending the medical college. They were recruited through multiple large hospital systems in the US and were given six case histories of patients. The subjects were reportedly asked to suggest a diagnosis for each of the cases and provide an explanation for why they favoured or ruled certain diagnoses out. Doctors were said to also be graded based on whether their final diagnosis was right.
To evaluate each of the participants' performance, medical experts were reportedly selected as graders. While they were said to be shown the answers, they were not told if the response came from a doctor with access to AI, just the doctor, or from only ChatGPT.
Further, to eliminate the possibility of unrealistic case histories, the researchers reportedly picked case histories of real patients that have been used by researchers for decades but have never been published to avoid contamination. This point is important because ChatGPT cannot be trained on data which has never been published.
The findings of the study were surprising. Doctors who did not use any AI tool to diagnose the case histories had an average score of 74 percent whereas those physicians who used the chatbot scored 76 percent on average. However, when ChatGPT alone analysed the case histories and provided diagnosis, it scored an average of 90 percent
While various factors could have impacted the outcome of the study — from the experience level of the doctors to individual biases with certain diagnoses — the researchers believe the study highlights that the potential of AI systems in medical institutions cannot be ignored.