Causal Inference from Text Data

Dhanya Sridhar

Abstract: Does increasing the dosage of a drug treatment cause adverse reactions in patients? This is a causal question: did increased drug dosage cause some patients to have an adverse reaction, or would they have had the reaction anyway due to other factors? A classical approach to studying this causal question from observational data involves applying causal inference techniques to observed measurements of all the relevant clinical variables. However, there is a growing recognition that abundant text data, such as medical records, physicians' notes, or even forum posts from online medical communities, provide a rich source of information for causal inference. In this tutorial, I'll introduce causal inference and highlight the unique challenges that high-dimensional and noisy text data pose. Then, I'll use two text applications involving online forums and consumer complaints to motivate recent approaches that extend natural language processing (NLP) methods in service of causal inference. I'll discuss some new assumptions we need to introduce to bridge the gap between noisy text data and valid causal inference. I'll conclude by summarizing open research questions at the intersection of causal inference and text analysis.

Bio: Dhanya Sridhar is an assistant professor at the University of Montreal and a core academic member at Mila - Quebec AI Institute. She holds a Canada CIFAR AI Chair. She was a postdoctoral researcher at Columbia University and completed her PhD at the University of California, Santa Cruz. Her research interests are at the intersection of causality and machine learning, focusing on applications to text and social network data.