An Empirical Framework for Domain Generalization In Clinical Settings
Haoran Zhang (University of Toronto , Vector Institute) , Natalie Dullerud (University of Toronto, Vector Institute) , Laleh Seyyed-Kalantari (University of Toronto) , Quaid Morris (Memorial Sloan Kettering Cancer Center) , Shalmali Joshi (Harvard University) , Marzyeh Ghassemi (University of Toronto , Vector Institute for Artificial Intelligence)
Abstract: Clinical machine learning models have been found to significantly degrade in performance on hospitals or regions not seen during training. Recent developments in domain generalization offer a promising solution to this problem, by creating models that learn invariances which hold across environments. In this work, we benchmark the performance of eight domain generalization methods on clinical time series and medical imaging data. We introduce a framework to induce practical confounding and sampling bias to stress-test these methods over existing non-healthcare benchmarks. We find, consistent with prior work, that current domain generalization methods do not achieve significant gains in out-of-distribution performance over empirical risk minimization on real-world medical imaging data. However, we do find a subset of realistic confounding scenarios where significant performance gains are observed. We characterize these scenarios in detail, and recommend best practices for domain generalization in the clinical setting.