Social media is a part of everyday life. A post about college graduation shares the joy of a significant accomplishment. Informing friends about a death in the family can be cathartic. A simple interaction with a long-lost friend can bring back whimsical childhood memories of simpler times. Given social media's widespread use, it has been used in a broad range of critical health-related applications, including, but not limited to, the early detection of disease outbreaks, extracting and monitoring adverse drug reactions, measuring health behaviors such as smoking use, and mining individual's mental and physical health. Eventually, the detection and treatment of mental and physical health problems may soon meet individuals in the social media platforms they already inhabit. Before these applications are integrated into decision-making processes from public policy to personal health decisions, it is essential to understand how these tools perform in real-world environments. Specifically, decisions must be fair across all factors, such as age, gender, race, ethnicity, and economic status. The main novelty of this project will be in its capacity to measure the fairness of these public health monitoring systems across many underrepresented groups. Overall, if the goal is to identify and treat individuals in online spaces or make policy decisions based on social media data, we must measure the fairness of the tools. Otherwise, the unethical use of biased tools may increase health disparities for many underrepresented groups.

This project will introduce a novel framework for measuring the fairness of public health monitoring systems. The major challenge is that measuring the fairness of underrepresented groups is difficult because they rarely appear in standard datasets, or worse, do not appear at all. Moreover, it is both costly and challenging to annotate data for all demographic factors of interest in a timely manner. This project aims to address this limitation in two ways. First, it will use style transfer to generate synthetic data that emulates the lexical, syntactic, and semantic characteristics of text generated by underrepresented groups. Synthetic data for specific groups will be used to overcome the issues of data sparsity to measure fairness. Second, while it is crucial to measure fairness across standard demographic factors, it is also essential to understand how tools will perform for specific communities. Therefore, style transfer methods will be expanded to generate geographic-specific text. The major challenge will be scaling to a large number of locations. This project will address this challenge by taking advantage of recent advances in adversarial learning. Finally, the project will impact the broader AI community via the release of open-source software that implements the tools and techniques this award generates. Moreover, public officials will gain access to easy-to-use tools that describe how the use of individual systems can adversely impact specific communities. More importantly, the tools will help officials make informed decisions about data generated from social media.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
1947697
Program Officer
Fay Cobb Payton
Project Start
Project End
Budget Start
2020-04-01
Budget End
2022-03-31
Support Year
Fiscal Year
2019
Total Cost
$174,797
Indirect Cost
Name
University of Texas at San Antonio
Department
Type
DUNS #
City
San Antonio
State
TX
Country
United States
Zip Code
78249