Alzheimer's disease (AD) is a major public health crisis and a national priority area of high significance. There is a growing recognition that neurodegeneration and AD are multifactorial that may be attributed to harmful changes at multiple levels and AD research must confront the challenge of elucidating the disease mechanisms by leveraging big health data such as -omics data, imaging data, and electronic health records (EHRs) data. To harness the full power of such rich, yet complex health data, powerful statistical and machine learning methods have been developed for risk prediction, clinical decision support, and many other important tasks. However, when applying statistical and machine learning algorithms to such data that are known to contain sensitive information about individuals, it has been widely investigated and recognized that exploiting the output of the algorithms, an adversary may be able to identify some individuals in a particular dataset, thus presenting serious privacy concerns. In addition, there is a growing recognition that powerful statistical and machine learning methods can unintentionally lead to unfair outcomes for some (marginalized) populations, defined by say sex, race/ethnicity or age. While there is a growing body of literature on improving fairness of these algorithms for people across racial, gender and other identities, there has been little work on assessing the impact of missing data on fairness. Plus, the areas of privacy and fairness have been under-investigated in AD research. Building on recent work on privacy and fairness, this project seeks to develop and assess methods for privacy and fairness in analysis of big health data for AD research.
Our specific aims are as follows.
In Aim 1, we will refine the state-of-the-art Gaussian Differential Privacy (GDP) method for analysis of big health data such as ?omics data, imaging data, EHRs data using statistical and machine learning algorithms, and compare its performance with that of existing methods based on (?, ?)-DP.
In Aim 2, we will assess the impact of missing data on biases in big health datasets and on algorithmic fairness for analysis of big health data for AD, particularly with respect to protected features such as sex and race/ethnicity.
In Aim 3, we will assess and compare the impact of existing imputation methods on algorithmic fairness for analysis of big health data for AD, particularly with respect to protected features such as sex and race/ethnicity. This project is expected to fill significant gaps in privacy and fairness for analysis of big health data for AD research that have not been investigated before. The results generated from this study will advance methodology for privacy protection and fairness protection in statistical and machine learning for AD research.

Public Health Relevance

This project is expected to fill significant gaps in privacy and fairness for analysis of big health data for AD research that have not been investigated before. The results generated from this study will advance methodology for protecting privacy and ensuring fairness in statistical and machine learning.

Agency
National Institute of Health (NIH)
Institute
National Institute on Aging (NIA)
Type
Multi-Year Funded Research Project Grant (RF1)
Project #
3RF1AG063481-01A1S1
Application #
10130699
Study Section
Program Officer
Petanceska, Suzana
Project Start
2019-08-15
Project End
2024-03-31
Budget Start
2020-08-15
Budget End
2024-03-31
Support Year
1
Fiscal Year
2020
Total Cost
Indirect Cost
Name
University of Pennsylvania
Department
Biostatistics & Other Math Sci
Type
Schools of Medicine
DUNS #
042250712
City
Philadelphia
State
PA
Country
United States
Zip Code
19104