A Comprehensive EHR Timeseries Pre-training Benchmark
Matthew McDermott (Massachusetts Institute of Technology) , Bret Nestor (University of Toronto) , Evan Kim (Massachusetts Institute of Technology) , Wancong Zhang (New York University) , Anna Goldenberg (Hospital for Sick Children, University of Toronto, Vector Institute) , Peter Szolovits (MIT) , Marzyeh Ghassemi (University of Toronto , Vector Institute for Artificial Intelligence)
Abstract: Pre-training (PT) has been used successfully in many areas of machine learning. One area where PT would be extremely impactful is over electronic health record (EHR) data. Successful PT strategies on this modality could improve model performance in data-scarce contexts such as modeling for rare diseases or allowing smaller hospitals to benefit from data from larger health systems. While many PT strategies have been explored in other domains, much less exploration has occurred for EHR data. One reason this may be is the lack of standardized benchmarks suitable for developing and testing PT algorithms. In this work, we establish a PT benchmark dataset for EHR timeseries data, establishing cohorts, a diverse set of fine-tuning tasks, and PT-focused evaluation regimes across two public EHR datasets: MIMIC-III and eICU. This benchmark fills an essential hole in the field by enabling a robust manner of iterating on PT strategies for this modality. To show the value of this benchmark and provide baselines for further research, we also profile two simple PT algorithms: a self-supervised, masked imputation system and a weakly-supervised, multi-task system. We find that PT strategies (in particular weakly-supervised PT methods) can offer significant gains over traditional learning in few-shot settings, especially on tasks with strong class imbalance. Our full benchmark and code are publicly available at https://github.com/mmcdermott/comprehensive_MTL_EHR.