A Data-Driven Approach to Estimating Infectious Disease Transmission from Graphs: A Case of Class Imbalance Driven Low Homophily

Jeeheh Oh (University of Michigan, Ann Arbor); Jenna Wiens (University of Michigan)

Abstract: We explore the application of graph neural networks (GNNs) to the problem of estimating exposure to an infectious pathogen and probability of transmission. Specifically, given a datatset in which a subset of patients are known to be infected and information in the form of a graph about who has interacted with whom, we aim to directly estimate transmission dynamics, i.e., what types of interactions (e.g., length and number) lead to transmission events. While, graph neural networks (GNNs) have proven capable of learning meaningful representations from graph data, they commonly assume tasks with high homophily (i.e., nodes that share an edge look similar). Recently researchers have proposed techniques for addressing problems with low homophily (e.g., adding residual connections to GNNs). In our problem setting, homophily is high on average, the majority of patients do not become infected. But, homophily remains low with respect to the minority class. In this paper, we characterize this setting as particularly challenging for GNNs. Given the asymmetry in homophily between classes, we hypothesize that solutions designed to address low homophily on average will not suffice and instead propose a solution based on attention. Applied to both real-world and synthetic network data, we test this hypothesis and explore the ability of GNNs to learn complex transmission dynamics directly from network data. Overall, attention proves to be an effective mechanism for addressing low homophily in the minority class (AUROC with 95\% CI: GCN 0.684 (0.659,0.710) vs. GAT 0.715 (0.688,0.742)) and such a data-driven approach can outperform approaches based on potentially flawed expert knowledge.