DynEHR: Dynamic Adaptation of Models with Data Heterogeneity in Electronic Health Records

Lida Zhang (Texas A&M University); Xiaohan Chen, Tianlong Chen, and Zhangyang Wang (University of Texas at Austin); Bobak J. Mortazavi (Texas A&M University)

Abstract: Electronic health records (EHRs) provide an abundance of data for clinical outcomes modeling. The prevalence of EHR data has enabled a number of studies using a variety of machine learning algorithms to predict potential adverse events. However, these studies do not account for the heterogeneity present in EHR data, including various lengths of stay, various frequencies of vitals captured in invasive versus non-invasive fashion, and various repetitions (or lack of thereof) of laboratory examinations. Therefore, studies limit the types of features extracted or the domain considered to provide a more homogeneous training set to machine learning models. The heterogeneity in this data represents important risk differences in each patient. In this work, we examine such data in an intensive care unit (ICU) setting, where the length of stay and the frequency of data gathered may vary significantly based upon the severity of patient condition. Therefore, it is unreasonable to use the same model for patients first entering the ICU versus those that have been there for above average lengths of stay. Developing multiple individual models to account for different patient cohorts, different lengths of stay, and different sources for key vital sign data may be tedious and not account for rare cases well. We address this challenge by developing a dynamic model, based upon meta-learning, to adapt to data heterogeneity and generate predictions of various outcomes across the different lengths of data. We compare this technique against a set of benchmarks on a publicly-available ICU dataset (MIMIC-III) and demonstrate improved model performance by accounting for data heterogeneity.