When researchers at Harvard Medical School and Beth Israel Deaconess Medical Center compared their new artificial intelligence model with real doctors in primary diagnosis, the result proved uncomfortable for the medical community: in certain scenarios, AI was more accurate. This does not mean the algorithm will replace a doctor — but it does mean the question of "who makes the diagnosis" is no longer so straightforward.
A model that reads tumors
Harvard's CHIEF model (Clinical Histopathology Imaging Evaluation Foundation) was trained on 15 million digital slices of tumor tissue. According to the journal Cancer (Wiley), it outperforms other leading AI methods by up to 36% in a complex of diagnostic tasks — from detecting cancer cells to predicting patient survival and treatment response.
"Our ambition is to create a flexible, universal ChatGPT-like platform that performs a wide range of oncological diagnostic tasks"
Kun-Hsing Yu, Associate Professor of Biomedical Informatics at Harvard Medical School
Important: CHIEF does not simply detect the presence of cancer. It predicts the molecular profile of the tumor based on visual cell characteristics — without additional genetic tests. For patients in countries without advanced molecular diagnostics, this could mean access to personalized treatment where it was previously unavailable.
At the same time, another Harvard study found a troubling side effect: pathological AI models are capable of detecting demographic patient characteristics (age, gender, race) directly from tissue slices. This introduces potential bias into diagnosis — varying for different population groups.
Preterm birth: from 39% sensitivity to FDA approval
A parallel direction — obstetrics. Ultrasound AI published the results of the PAIR study (Perinatal Artificial Intelligence in Ultrasound) in The Journal of Maternal-Fetal & Neonatal Medicine. Its model analyzes standard ultrasound images and predicts the exact date of delivery with an R²=0.95 indicator for term pregnancies and R²=0.92 for all cases.
For preterm birth specifically: after several iterations of training, the model's sensitivity increased, while specificity remains at 93%. The FDA granted the tool De Novo approval — meaning it recognized it as clinically justified as a new class of medical device.
The developers particularly emphasize applications in "obstetric deserts" — regions without access to specialized care. The tool requires no new equipment: it works with the ultrasound data already being collected in routine clinics.
What stands between the algorithm and the patient
Technical accuracy is not the only barrier. Among unresolved issues:
- Data bias: Models trained on homogeneous samples perform worse on underrepresented groups — this has already been documented in Harvard pathology research.
- Regulatory gap: The FDA approves individual tools, but there is still no unified standard for auditing AI diagnostics in clinical practice.
- Accountability: If AI makes a mistake — who bears legal responsibility: the developer, the doctor who trusted the system, or the hospital?
Until regulators provide a clear answer to the accountability question, mass implementation of AI diagnostics in clinical practice will remain a matter for enthusiasts, not the system. If next year brings a court precedent regarding an error by an FDA-approved AI tool — that will change the industry faster than any new accuracy research.