Public Health Datasets for Deep Learning: Challenges and Opportunities

Ziad Obermeyer, Katy Haynes, Amy Pitelka, Josh Risley, Katie Lin

Abstract: With today's publicly available, de-identified clinical datasets, it is possible to ask questions like, “Can an algorithm read an electrocardiogram as well as a cardiologist can?” However, other kinds of questions like, “Does this ECG relate to a later cardiac arrest?” can’t be answered with the limited public data available to us today. Research using private datasets gives us reason to be optimistic, but progress will be slow unless suitable de-identified datasets become open, allowing researchers to efficiently collaborate and compete. Learn about an effort underway at the University of Chicago, led by Ziad Obermeyer, Sendhil Mullainathan, and their team, to provide a secure and public “ImageNet for clinical data” that balances the concerns of patients, healthcare institutions, and researchers.