Multi-view data (collected on the same samples from multiple sources) are increasingly common with advances in multi-omics, neuroimaging and wearable technologies. For example, wearable devices such as physical activity trackers, continuous glucose monitors and ambulatory blood pressure monitors are worn concurrently to provide measurements of distinct subjects’ characteristics. There is enormous potential in integrating that concurrent information from the distinct vantages to better understand between-view associations and improve prediction of health outcomes. Existing tools for data integration are sensitive to outliers, and are not designed for mixed data types (e.g. continuous skewed glucose measurements, zero-inflated activity counts, binary indicators of sleep/wake). The PI will develop a more robust framework for multi-view data integration that is better able to account for outliers, better match the mixed types of data actually collected, and be more accurate in separating common from view-specific signals. The new methods will be implemented in open-source software accompanied by reproducible workflow examples, providing immediate and easy access for other researchers. The educational component centers on the development of structured research experiences (SRE) for students. SRE enhances students written communication, software development and reproducible research skills, all of which are lacking in traditional curriculum. This will improve students’ preparation for conducting research, and widen their STEM employment opportunities. The involvement of students from traditionally underrepresented groups will positively impact their retention rate and will broaden the participation of underrepresented groups in STEM.

Popular dimension reduction methods, such as principal component analysis and discriminant analysis, are tailored for single-view data, and thus fail to discover coordinated multi-view signals on a global level. On the other hand, existing multi-view dimension reduction methods suffer from reliance on the Gaussianity assumption, an inability to capture joint functional signals, and a lack of theoretical guarantees. The PI will address these drawbacks by (i) developing a joint dimension reduction framework for skewed continuous, binary and zero-inflated view types; (ii) a joint dimension reduction framework for mixed functional multi-view data and (iii) a new paradigm for simultaneous extraction of signals across views based on hierarchical low-rank constraints. This work will lead to critically needed new statistical methods for data integration with direct relevance for researchers working with wearable monitors, microbiome and multi-omics data through interdisciplinary collaborations of the PI. The proposed structured research experiences will center on the design and reproducibility of simulations studies, and align with computational components of the proposed research, including direct students’ involvement in multiple simulation studies.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
2044823
Program Officer
Pena Edsel
Project Start
Project End
Budget Start
2021-06-01
Budget End
2026-05-31
Support Year
Fiscal Year
2020
Total Cost
$78,917
Indirect Cost
Name
Texas A&M University
Department
Type
DUNS #
City
College Station
State
TX
Country
United States
Zip Code
77845