Algorithmic fairness and the science of health disparities
Abstract: It has been shown that equalizing health disparities can avert more deaths than the number of lives saved by medical advances alone in the same time frame. Moreover, without a simultaneous focus on innovations and equity, advances in health for one group can occur at the cost of added challenges for another. In this talk I will introduce the science of health disparities and juxtapose it with the machine learning subfield of algorithmic fairness. Given the key foci and principles of health equity and health disparities within public and population health, I will show examples of how machine learning and principles of public and population health can be synergized for using data to advance the science of health disparities and sustainable health of entire populations.
Bio: Dr. Rumi Chunara is an Associate Professor at New York University, jointly appointed at the Tandon School of Engineering (in Computer Science) and the School of Global Public Health (in Biostatistics/Epidemiology). Her PhD is from the Harvard-MIT Division of Health Sciences and Technology and her BSc from Caltech. Her research group focuses on developing computational and statistical approaches for acquiring, integrating and using data to improve population and public health. She is an MIT TR35, NSF Career, Bill & Melinda Gates Foundation Grand Challenges, Facebook Research and Max Planck Sabbatical award winner.
Machine Learning for Human Genetics: A Multi-Scale View on Complex Traits and Disease
Abstract: A common goal in genome-wide association (GWA) studies is to characterize the relationship between genotypic and phenotypic variation. Linear models are widely used tools in GWA analyses, in part, because they provide significance measures which detail how individual single nucleotide polymorphisms (SNPs) are statistically associated with a trait or disease of interest. However, traditional linear regression largely ignores non-additive genetic variation, and the univariate SNP-level mapping approach has been shown to be underpowered and challenging to interpret for certain trait architectures. While machine learning (ML) methods such as neural networks are well known to account for complex data structures, these same algorithms have also been criticized as “black box” since they do not naturally carry out statistical hypothesis testing like classic linear models. This limitation has prevented ML approaches from being used for association mapping tasks in GWA applications. In this talk, we present flexible and scalable classes of Bayesian feedforward models which provide interpretable probabilistic summaries such as posterior inclusion probabilities and credible sets which allows researchers to simultaneously perform (i) fine-mapping with SNPs and (ii) enrichment analyses with SNP-sets on complex traits. While analyzing real data assayed in diverse self-identified human ancestries from the UK Biobank, the Biobank Japan, and the PAGE consortium we demonstrate that interpretable ML has the power to increase the return on investment in multi-ancestry biobanks. Furthermore, we highlight that by prioritizing biological mechanism we can identify associations that are robust across ancestries---suggesting that ML can play a key role in making personalized medicine a reality for all.
Bio: Lorin Crawford is a Senior Researcher at Microsoft Research New England. He also holds a position as the RGSS Assistant Professor of Biostatistics at Brown University. His scientific research interests involve the development of novel and efficient computational methodologies to address complex problems in statistical genetics, cancer pharmacology, and radiomics (e.g., cancer imaging). Dr. Crawford has an extensive background in modeling massive data sets of high-throughput molecular information as it pertains to functional genomics and cellular-based biological processes. His most recent work has earned him a place on Forbes 30 Under 30 list, The Root 100 Most Influential African Americans list, and recognition as an Alfred P. Sloan Research Fellow and a David & Lucile Packard Foundation Fellowship for Science and Engineering. Before joining Brown, Dr. Crawford received his PhD from the Department of Statistical Science at Duke University and received his Bachelor of Science degree in Mathematics from Clark Atlanta University.
Understanding Heterogeneity as a Route to Understanding Health
Abstract: Machine learning presents an opportunity to understand the patient journey over high dimensional data in the clinical context. This is aligned to one of the foundational issues of machine learning for healthcare: how do you represent a patient state. Improving state representations allows us to (i) visualise/cluster deteriorating patients, (ii) understand the patient journey and thus heterogeneous pathways to improvement or clinical deterioration which encompasses different data modalities; and thus (iii) more quickly identify situations for intervention. In this talk, I present motivating examples of understanding heterogeneity as a route towards understanding health and personalising healthcare interventions.
Bio: Danielle Belgrave is a Senior Staff Research Scientist at DeepMind. Prior to joining DeepMind she worked in the Healthcare Intelligence group at Microsoft Research and was a tenured research fellow at Imperial College London. Her research focuses on integrating medical domain knowledge, machine learning and causal modelling frameworks to understand health. She obtained a BSc in Mathematics and Statistics from London School of Economics, an MSc in Statistics from University College London and a PhD in the area of machine learning in health applications from the University of Manchester.
Data Science against COVID-19
Abstract: In my talk, I will describe the work that I have been doing since March 2020, leading a multi-disciplinary team of 20+ volunteer scientists working very closely with the Presidency of the Valencian Government in Spain on 4 large areas: (1) human mobility modeling; (2) computational epidemiological models (both metapopulation, individual and LSTM-based models); (3) predictive models; and (4) a large-scale, online citizen surveys called the COVID19impactsurvey (https://covid19impactsurvey.org) with over 700,000 answers worldwide. This survey has enabled us to shed light on the impact that the pandemic is having on people's lives. I will present the results obtained in each of these four areas, including winning the 500K XPRIZE Pandemic Response Challenge and obtaining a best paper award at ECML-PKDD 2021. I will share the lessons learned in this very special initiative of collaboration between the civil society at large (through the survey), the scientific community (through the Expert Group) and a public administration (through the Commissioner at the Presidency level). For those interested in knowing more, WIRED magazine published an extensive article describing our story: https://www.wired.co.uk/article/valencia-ai-covid-data.
Bio: Nuria Oliver is Co-founder and Vice-president of ELLIS (The European Laboratory for Learning and Intelligent Systems), Co-founder and Director of the ELLIS Unit Alicante, Chief Data Scientist at Data-Pop Alliance and Chief Scientific Advisor to the Vodafone Institute. Nuria earned her PhD from MIT. She is a Fellow of the ACM, IEEE and EurAI. She is the youngest member (and fourth female) in the Spanish Royal Academy of Engineering. She is also the only Spanish scientist at SIGCHI Academy. She has over 25 years of research experience in human-centric AI and is the author of over 180 widely cited scientific articles as well as an inventor of 40+ patents and a public speaker. Her work is regularly featured in the media and has received numerous recognitions, including the Spanish National Computer Science Award, the MIT TR100 (today TR35), Young Innovator Award (first Spanish scientist to receive this award); the 2020 Data Scientist of the Year by ESRI, the 2021 King Jaume I award in New Technologies and the 2021 Abie Technology Leadership Award. In March of 2020, she was appointed Commissioner to the President of the Valencian Government on AI Strategy and Data Science against COVID-19. In that role, she has recently co-led ValenciaIA4COVID, the winning team of the 500k XPRIZE Pandemic Response Challenge. Their work was featured in WIRED, among other media.
Machine Learning in Public Health: are we there yet?
Abstract: Spoiler alert: No. And yes, it is much, much further. Public health has not traditionally been a data-driven field. The good news is that has been changing in recent years, accelerated significantly by the COVID epidemic. But public health and human services organizations have many more fundamental things to worry about before we will have the luxury of considering what machine learning can enable. These fundamentals include data-related facets such as electronic data capture and exchange, data quality, data governance, information technology infrastructure, and data management best practices. In addition, data literacy, workforce development, and compensation that is a fraction of what 'quants' can earn in industry are also major stumbling blocks toward advanced analytics in public health. At the start of the COVID pandemic, many communicable diseases were reporting by fax machine and then hand-entered into a database. Although there was significant interest in predictive modeling to project hospital capacity out in the future, even the most sophisticated models were of limited use to policy makers beyond basic trends and observations from the front lines. The most notable exception, where AI is in fact proving useful in public health, is in the use of 'robotic process automation' (RPA) as a band-aid for poorly designed systems that require mindless human intervention. These tools serve as workarounds for systems that lack interoperability by emulating human users to do the grunt work of data entry and wrangling. This talk will be a reality check from the trenches of state government on the heels of the COVID-19 pandemic.
Bio: Dr. Tenenbaum serves as the Chief Data Officer (CDO) for DHHS, where she oversees data strategy across the Department enabling the use of information to inform and evaluate policy and improve the health and well-being of residents of North Carolina. Prior to taking on the role of CDO, Dr. Tenenbaum was a founding faculty member of the Division of Translational Biomedical Informatics within Duke University's Department of Biostatistics and Bioinformatics where her research focused on informatics methods to enable precision medicine, particularly in mental health. She is also interested in ethical, legal, and social issues around big data and precision medicine. Nationally, Dr. Tenenbaum has served as Associate Editor for the Journal of Biomedical Informatics and as an elected member of the Board of Directors for the American Medical Informatics Association (AMIA). She currently serves on the Board of Scientific Counselors for the National Library of Medicine. After earning her bachelor's degree in biology from Harvard, Dr. Tenenbaum was a Program Manager at Microsoft Corporation in Redmond, WA for six years before pursuing a PhD in biomedical informatics at Stanford University. Dr. Tenenbaum is a strong promoter and advocate of young women interested in STEM (science, technology, engineering, and math) careers.
Reducing bias in machine learning systems: Understanding drivers of pain
Abstract: AI systems tend to amplify biases and disparities. When we feed them data that reflects our biases, they mimic them---from antisemitic chatbots to racially biased software. In this talk I am going to discuss two examples how AI can help us reduce biases and disparities. First I am going to explain how we can use AI to understand why underserved populations experience higher levels of pain. This is true even after controlling for the objective severity of diseases like osteoarthritis, as graded by human physicians using medical images, which raises the possibility that underserved patients’ pain stems from factors external to the knee, such as stress. We develop a deep learning approach to measure the severity of osteoarthritis, by using knee X-rays to predict patients’ experienced pain and show that this approach dramatically reduces unexplained racial disparities in pain.
Bio: Jure Leskovec is an associate professor of Computer Science at Stanford University, the Chief Scientist at Pinterest, and an Investigator at the Chan Zuckerberg Biohub. He co-founded a machine learning startup Kosei, which was later acquired by Pinterest. Leskovec's research area is machine learning and data science for complex, richly-labeled relational structures, graphs, and networks for systems at all scales, from interactions of proteins in a cell to interactions between humans in a society. Applications include commonsense reasoning, recommender systems, social network analysis, computational social science, and computational biology with an emphasis on drug discovery. This research has won several awards including a Lagrange Prize, Microsoft Research Faculty Fellowship, the Alfred P. Sloan Fellowship, and numerous best paper and test of time awards. It has also been featured in popular press outlets such as the New York Times and the Wall Street Journal. Leskovec received his bachelor's degree in computer science from University of Ljubljana, Slovenia, PhD in machine learning from Carnegie Mellon University and postdoctoral training at Cornell University. You can follow him on Twitter at @jure.
Changing patient trajectory: A case study exploring implementation and deployment of clinical machine learning models
Abstract: You’ve created an awesome model that predicts with near 100 percent accuracy. Now what? In this tutorial, we will give insight into the implementation, deployment, integration, and evaluation steps following the building of a clinical model. Specifically, we will discuss each step in the context of informing design choices as you build a model. For example, aggressive feature selection is a necessary step toward integration as real time data streams of all the data points a machine learning model may consume may not be accessible or feasible. We will use our implementation and evaluation of a Covid-19 adverse event model at our institution as a representative case study. This case study will demonstrate the full lifecycle of a clinical model and how we transition from a model to affecting patient outcome and the socio-technical challenges for success.
Bio: Yindalon Aphinyanaphongs, MD, PhD (Predictive Analytics Team Lead) is a physician scientist in the Center for Healthcare Innovation and Delivery Science in the Department of Population Health at NYU Langone Health in New York City. Academically, he is an assistant professor and his lab focuses on novel applications of machine learning to clinical problems and the science behind successful translation of predictive models into clinical practice to drive value. Operationally, he is the Director of Operational Data Science and Machine Learning at NYU Langone Health. In this role, he leads a Predictive Analytics Unit composed of data scientists and engineers that build, evaluate, benchmark, and deploy predictive algorithms into the clinical enterprise.
Distributed Statistical Learning and Inference with Electronic Health Records Data
Abstract: The growth of availability and variety of healthcare data sources has provided unique opportunities for data integration and evidence synthesis, which can potentially accelerate knowledge discovery and enable better clinical decision-making. However, many practical and technical challenges, such as data privacy, high-dimensionality and heterogeneity across different datasets, remain to be addressed. In this talk, I will introduce several methods for the effective and efficient integration of electronic health records and other healthcare datasets. Specifically, we develop communication-efficient distributed algorithms for jointly analyzing multiple datasets without the need of sharing patient-level data. Our algorithms can account for heterogeneity across different datasets. We provide theoretical guarantees for the performance of our algorithms, and examples of implementing the algorithms to real-world clinical research networks.
Bio: Dr. Duan is an Assistant Professor of Biostatistics at the Harvard T.H. Chan School of Public Health. She received her Ph.D. in Biostatistics in May 2020 from the University of Pennsylvania. Her research interests focus on three distinct areas: methods for integrating evidence from different data sources, identifying signals from high dimensional data, and accounting for suboptimality of real-world data, such as missing data and measurement errors.
Challenges in Developing Online Learning and Experimentation Algorithms in Digital Health
Abstract: Digital health technologies provide promising ways to deliver interventions outside of clinical settings. Wearable sensors and mobile phones provide real-time data streams that provide information about an individual’s current health including both internal (e.g., mood) and external (e.g., location) contexts. This tutorial discusses the algorithms underlying mobile health clinical trials. Specifically, we introduce the micro-randomized trial (MRT), an experimental design for optimizing real time interventions. We define the causal excursion effect and discuss reasons why this effect is often considered the primary causal effect of interest in MRT analysis. We introduce statistical methods for primary and secondary analyses for MRT. Attendees will have access to synthetic digital health experimental data to better understand online learning and experimentation algorithms, the systems underlying real time delivery of treatment, and their evaluation using collected data.
Bio: Walter Dempsey is an Assistant Professor of Biostatistics and an Assistant Research Professor in the d3lab located in the Institute of Social Research. My research focuses on Statistical Methods for Digital and Mobile Health. My current work involves three complementary research themes: (1) experimental design and data analytic methods to inform multi-stage decision making in health; (2) statistical modeling of complex longitudinal and survival data; and (3) statistical modeling of complex relational structures such as interaction networks.
Causal Inference from Text Data
Abstract: Does increasing the dosage of a drug treatment cause adverse reactions in patients? This is a causal question: did increased drug dosage cause some patients to have an adverse reaction, or would they have had the reaction anyway due to other factors? A classical approach to studying this causal question from observational data involves applying causal inference techniques to observed measurements of all the relevant clinical variables. However, there is a growing recognition that abundant text data, such as medical records, physicians' notes, or even forum posts from online medical communities, provide a rich source of information for causal inference. In this tutorial, I'll introduce causal inference and highlight the unique challenges that high-dimensional and noisy text data pose. Then, I'll use two text applications involving online forums and consumer complaints to motivate recent approaches that extend natural language processing (NLP) methods in service of causal inference. I'll discuss some new assumptions we need to introduce to bridge the gap between noisy text data and valid causal inference. I'll conclude by summarizing open research questions at the intersection of causal inference and text analysis.
Bio: Dhanya Sridhar is an assistant professor at the University of Montreal and a core academic member at Mila - Quebec AI Institute. She holds a Canada CIFAR AI Chair. She was a postdoctoral researcher at Columbia University and completed her PhD at the University of California, Santa Cruz. Her research interests are at the intersection of causality and machine learning, focusing on applications to text and social network data.
'Are log scales endemic yet?' Strategies for visualizing biomedical and public health data
Abstract: Data visualization is essential for analyzing biomedical and public health data and communicating the findings to key stakeholders. However, the presence of a data visualization is not enough; the choices we make when visualizing data are equally important in establishing its understandability and impact. This tutorial will discuss strategies for visualizing data and evaluating its impact with an appropriate target audience. The aim is to build an intuition for developing and assessing visualizations by drawing on theories of visualization theories together with examples from prior research and ongoing attempts to visualize the present pandemic.
Bio: Ana Crisan is currently a senior research scientist at Tableau, a Salesforce company. She conducts interdisciplinary research that integrates techniques and methods from machine learning, human computer interaction, and data visualization. Her research focuses on the intersection of Data Science and Data Visualization, especially toward the way humans can collaboratively work together with ML/AI systems through visual interfaces. She completed her Ph.D. in Computer Science at the University of British Columbia, under the joint supervision of Dr. Tamara Muzner and Dr. Jennifer L. Gardy. Prior to that, she was a research scientist at the British Columbia Centre for Disease Control and Decipher Biosciences, where she conducted research on machine learning and data visualization research toward applications in infectious disease and cancer genomics, respectively. Her research has appeared in publications of the ACM (CHI), IEEE (TVCG, CG&A), Bioinformatics, and Nature.
Human centered AI for health and wellness
This research roundtable will focus on ways that AI can be incorporated into computational system design to improve health and wellness. We will discuss the hypothesis that advancing an ecological approach to data collection will lead to human-centered AI. We will also discuss the importance of intellectual diversity in computing and how this allows us to tackle a unique set of research questions.
Social and environmental determinants of health
This research roundtable will focus on the use of emerging sources of digital data for characterising urban environmental features and exposures. For example, we will discuss monitoring of socio-economic status, housing quality, transport characteristics, and air pollution in urban areas at high spatial resolution.
Responsible AI for health
This research roundtable will focus on the issues of fairness, biases, and ethics in AI applications for health. For example, medical knowledge systems can disproportionately represent a majoritized population (e.g., women with heart attacks have worse outcomes when cared for by male cardiologists), and in general AI models developed in the healthcare domain often exhibit bias against those who already have worse outcomes to begin with (sort of a digital divide translating in the domain of AI for Health).