Tutorials from CHIL 2022

Changing patient trajectory: A case study exploring implementation and deployment of clinical machine learning models

Yindalon Aphinyanaphongs

Abstract: You’ve created an awesome model that predicts with near 100 percent accuracy. Now what? In this tutorial, we will give insight into the implementation, deployment, integration, and evaluation steps following the building of a clinical model. Specifically, we will discuss each step in the context of informing design choices as you build a model. For example, aggressive feature selection is a necessary step toward integration as real time data streams of all the data points a machine learning model may consume may not be accessible or feasible. We will use our implementation and evaluation of a Covid-19 adverse event model at our institution as a representative case study. This case study will demonstrate the full lifecycle of a clinical model and how we transition from a model to affecting patient outcome and the socio-technical challenges for success.

Bio: Yindalon Aphinyanaphongs, MD, PhD (Predictive Analytics Team Lead) is a physician scientist in the Center for Healthcare Innovation and Delivery Science in the Department of Population Health at NYU Langone Health in New York City. Academically, he is an assistant professor and his lab focuses on novel applications of machine learning to clinical problems and the science behind successful translation of predictive models into clinical practice to drive value. Operationally, he is the Director of Operational Data Science and Machine Learning at NYU Langone Health. In this role, he leads a Predictive Analytics Unit composed of data scientists and engineers that build, evaluate, benchmark, and deploy predictive algorithms into the clinical enterprise.

Distributed Statistical Learning and Inference with Electronic Health Records Data

Rui Duan

Abstract: The growth of availability and variety of healthcare data sources has provided unique opportunities for data integration and evidence synthesis, which can potentially accelerate knowledge discovery and enable better clinical decision-making. However, many practical and technical challenges, such as data privacy, high-dimensionality and heterogeneity across different datasets, remain to be addressed. In this talk, I will introduce several methods for the effective and efficient integration of electronic health records and other healthcare datasets. Specifically, we develop communication-efficient distributed algorithms for jointly analyzing multiple datasets without the need of sharing patient-level data. Our algorithms can account for heterogeneity across different datasets. We provide theoretical guarantees for the performance of our algorithms, and examples of implementing the algorithms to real-world clinical research networks.

Bio: Dr. Duan is an Assistant Professor of Biostatistics at the Harvard T.H. Chan School of Public Health. She received her Ph.D. in Biostatistics in May 2020 from the University of Pennsylvania. Her research interests focus on three distinct areas: methods for integrating evidence from different data sources, identifying signals from high dimensional data, and accounting for suboptimality of real-world data, such as missing data and measurement errors.

Challenges in Developing Online Learning and Experimentation Algorithms in Digital Health

Walter Dempsey

Abstract: Digital health technologies provide promising ways to deliver interventions outside of clinical settings. Wearable sensors and mobile phones provide real-time data streams that provide information about an individual’s current health including both internal (e.g., mood) and external (e.g., location) contexts. This tutorial discusses the algorithms underlying mobile health clinical trials. Specifically, we introduce the micro-randomized trial (MRT), an experimental design for optimizing real time interventions. We define the causal excursion effect and discuss reasons why this effect is often considered the primary causal effect of interest in MRT analysis. We introduce statistical methods for primary and secondary analyses for MRT. Attendees will have access to synthetic digital health experimental data to better understand online learning and experimentation algorithms, the systems underlying real time delivery of treatment, and their evaluation using collected data.

Bio: Walter Dempsey is an Assistant Professor of Biostatistics and an Assistant Research Professor in the d3lab located in the Institute of Social Research. My research focuses on Statistical Methods for Digital and Mobile Health. My current work involves three complementary research themes: (1) experimental design and data analytic methods to inform multi-stage decision making in health; (2) statistical modeling of complex longitudinal and survival data; and (3) statistical modeling of complex relational structures such as interaction networks.

Causal Inference from Text Data

Dhanya Sridhar

Abstract: Does increasing the dosage of a drug treatment cause adverse reactions in patients? This is a causal question: did increased drug dosage cause some patients to have an adverse reaction, or would they have had the reaction anyway due to other factors? A classical approach to studying this causal question from observational data involves applying causal inference techniques to observed measurements of all the relevant clinical variables. However, there is a growing recognition that abundant text data, such as medical records, physicians' notes, or even forum posts from online medical communities, provide a rich source of information for causal inference. In this tutorial, I'll introduce causal inference and highlight the unique challenges that high-dimensional and noisy text data pose. Then, I'll use two text applications involving online forums and consumer complaints to motivate recent approaches that extend natural language processing (NLP) methods in service of causal inference. I'll discuss some new assumptions we need to introduce to bridge the gap between noisy text data and valid causal inference. I'll conclude by summarizing open research questions at the intersection of causal inference and text analysis.

Bio: Dhanya Sridhar is an assistant professor at the University of Montreal and a core academic member at Mila - Quebec AI Institute. She holds a Canada CIFAR AI Chair. She was a postdoctoral researcher at Columbia University and completed her PhD at the University of California, Santa Cruz. Her research interests are at the intersection of causality and machine learning, focusing on applications to text and social network data.

'Are log scales endemic yet?' Strategies for visualizing biomedical and public health data

Ana Crisan

Abstract: Data visualization is essential for analyzing biomedical and public health data and communicating the findings to key stakeholders. However, the presence of a data visualization is not enough; the choices we make when visualizing data are equally important in establishing its understandability and impact. This tutorial will discuss strategies for visualizing data and evaluating its impact with an appropriate target audience. The aim is to build an intuition for developing and assessing visualizations by drawing on theories of visualization theories together with examples from prior research and ongoing attempts to visualize the present pandemic.

Bio: Ana Crisan is currently a senior research scientist at Tableau, a Salesforce company. She conducts interdisciplinary research that integrates techniques and methods from machine learning, human computer interaction, and data visualization. Her research focuses on the intersection of Data Science and Data Visualization, especially toward the way humans can collaboratively work together with ML/AI systems through visual interfaces. She completed her Ph.D. in Computer Science at the University of British Columbia, under the joint supervision of Dr. Tamara Muzner and Dr. Jennifer L. Gardy. Prior to that, she was a research scientist at the British Columbia Centre for Disease Control and Decipher Biosciences, where she conducted research on machine learning and data visualization research toward applications in infectious disease and cancer genomics, respectively. Her research has appeared in publications of the ACM (CHI), IEEE (TVCG, CG&A), Bioinformatics, and Nature.