Artificial Intelligence (AI) has tremendous potential to advance healthcare and improve the lives of everyone. But successful clinical translation requires evaluating the performance of AI models on large and diverse real-world datasets. MLCommons, an open global engineering consortium dedicated to making machine learning better for everyone, announced on July 17, 2023, a major milestone toward addressing this challenge with the publication of “Federated Benchmarking of Medical Artificial Intelligence with MedPerf” in the Nature Machine Intelligence journal.
MedPerf is an open benchmarking platform that efficiently evaluates AI models on diverse real-world medical data and delivers clinical efficacy while prioritizing patient privacy and mitigating legal and regulatory risks. The publication in Nature Machine Intelligence is the result of a two-year global collaboration spearheaded by the MLCommons Medical Working Group, with participation of experts from more than 20 companies, more than 20 academic institutions, and nine hospitals across 13 countries, including Dana-Farber Cancer Institute.
“To assess whether patients will benefit from new tools or algorithms deployed clinically at a large scale, as well as to measure potential biases, we need to test the tools on large, diverse patient populations in different communities and medical settings. We hope this open benchmarking platform, and the collaborative international partnership underlying it, can help enable and democratize access to the many benefits of medical AI on the horizon,” says Jason Johnson, PhD, senior vice president and chief data and analytics officer at Dana-Farber.
MedPerf is a foundational step towards the MLCommons Medical Working Group’s mission to develop benchmarks and best practices to accelerate medical AI through an open, neutral, and scientific approach. The team believes that such efforts will increase trust in medical AI, accelerate ML adoption in clinical settings, and ultimately enable medical AI to personalize patient treatment, reduce costs, and improve both healthcare provider and patient experience.
Validating across diverse populations
Medical AI models are usually trained with data from limited and specific clinical settings, which may lead to unintended bias with respect to specific patient populations. This lack of generalizability can reduce the real-world impact of medical AI. However, getting access to train models on larger, diverse datasets is difficult because data owners are constrained by privacy, legal, and regulatory risks. MedPerf can improve medical AI by making data easily and safely accessible to AI researchers, which reduces bias and improves generalizability and clinical impact.
“Our goal is to use benchmarking as a tool to enhance medical AI,” says Alex Karargyris, PhD, Co-Chair, MLCommons. “Neutral and scientific testing of models on large and diverse datasets can improve effectiveness, reduce bias, build public trust and support regulatory compliance.”
Critically, MedPerf enables healthcare organizations to assess and validate AI models in an efficient and human-supervised process without accessing patient data. The platform’s design relies on federated evaluation in which medical AI models are remotely deployed and evaluated within the premises of data providers. This approach alleviates data privacy concerns and builds trust among healthcare stakeholders, leading to more effective collaboration.
Streamlining research from months to hours
MedPerf’s orchestration and workflow automation capabilities can significantly accelerate federated learning studies.
“With MedPerf’s orchestration capabilities, we can evaluate multiple AI models through the same collaborators in hours instead of months,” explains Spyridon Bakas, PhD, assistant professor at the University of Pennsylvania’s Perelman School of Medicine, and the vice chair for Benchmarking and Clinical Translation at MLCommons Medical group.
This efficiency was demonstrated in the Federated Tumor Segmentation (FeTS) Challenge, the largest federated experiment on Glioblastoma. The FeTS Challenge spanned 32 sites across six continents, and successfully employed MedPerf to benchmark 41 different models. Thanks to active involvement by teams from Dana-Farber, IHU Strasbourg, Intel, Nutanix, and University of Pennsylvania, MedPerf was also validated through a series of pilot studies representative of academic medical research. These studies involved public and private data across on-premises and cloud technology including brain tumor segmentation, pancreas segmentation, and surgical workflow phase recognition.
“We pressure-tested the MedPerf open benchmark framework by supporting the largest real-world federated challenge to date,” says Renato Umeton, PhD, director of Artificial Intelligence Operations and Data Science Services, Informatics & Analytics Department at Dana-Farber. “Having medical AI models delivered to hospitals to assess how well the AI generalizes as part of an open benchmark, without sharing patient data, is a step in the right direction toward better and more inclusive AI-powered medicine.”
Extending MedPerf to other biomedical tasks
While initial uses of MedPerf focused on radiology, it is a flexible platform that supports any biomedical task. Through its sister project GaNDLF, which focuses on quickly and easily building AI pipelines, MedPerf can accommodate multiple tasks such as digital pathology and omics. And supporting the open community, MedPerf is developing examples for the specialized low-code libraries in computational pathology, such as PathML, SlideFlow, Spark NLP, and MONAI to fill the data engineering gap and provide access to state-of-the-art pre-trained computer vision and natural language processing models.
Call for participation
To continue to drive medical AI innovation and bridge the gap between AI research and real-world clinical impact, there is a critical need for broad collaboration, reproducible, standardized and open computation, and a passionate community that spans academia, industry, and clinical practice. Healthcare professionals, patient advocacy groups, AI researchers, data owners, and regulators are invited to join the MedPerf effort.