A Modeling Framework for Multi-View Data, with Applications to the Pioneer 100 Study and Protein Interaction Networks

Witten, Daniela; Bien, Jacob

Abstract

New advances in biomedical research have made it possible to collect multiple data ?views? ? for example, genetic, metabolomic, and clinical data ? for a single patient. Such multi-view data promises to offer deeper insights into a patient's health and disease than would be possible if just one data view were available. However, in order to achieve this promise, new statistical methods are needed. This proposal involves developing statistical methods for the analysis of multi-view data. These methods can be used to answer the following fundamental question: do the data views contain redundant information about the observations, or does each data view contain a different set of information? The answer to this question will provide insight into the data views, as well as insight into the observations. If two data views contain redundant information about the observations, then those two data views are related to each other. Furthermore, if each data view tells the same ?story? about the observations, then we can be quite con?dent that the story is true. The investigators will develop a uni?ed framework for modeling multi-view data, which will then be applied in a number of settings.
In Aim 1, this framework will be applied to multi-view multivariate data (e.g. a single set of patients, with both clinical and genetic measurements), in order to determine whether a single clustering can adequately describe the patients across all data views, or whether the patients cluster separately in each data view.
In Aim 2, the framework will be applied to multi-view network data (e.g. a single set of proteins, with both binary and co-complex interactions measured), in order to determine whether the nodes belong to a single set of communities across the data views, or a separate set of communities in each data view.
In Aim 3, the framework will be applied to multi-view multivariate data in order to determine whether the observations can be embedded in a single latent space across all data views, or whether they belong to a separate latent space in each data view.
In Aims 1 ?3, the methods developed will be applied to the Pioneer 100 study, and to the protein interactome.
In Aim 4 (a), the availability of multiple data views will be used in order to develop a method for tuning parameter selection in unsupervised learning.
In Aim 4 (b), protein communities that were identi?ed in Aim 2 will be validated experimentally. High-quality open source software will be developed in Aim 5. The methods developed in this proposal will be used to determine whether the ?ndings from multiple data views are the same or different. The application of these methods to multi-view data sets, including the Pioneer 100 study and the protein interactome, will improve our understanding of human health and disease, as well as fundamental biology.

Public Health Relevance

Biomedical researchers often collect multiple ?types? of data (e.g. clinical data and genetic data) for a single patient, in order to get a fuller picture of that patient's health or disease status than would be possible using any single data type. This proposal involves developing new statistical methods that can be used in order to analyze data sets that consist of multiple data types. Applying these methods will lead to new insights and better understanding of human health and disease.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of General Medical Sciences (NIGMS)
Type: Research Project (R01)
Project #: 1R01GM123993-01
Application #: 9361170
Study Section: Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer: Ravichandran, Veerasamy

Project Start: 2017-08-01
Project End: 2021-06-30
Budget Start: 2017-08-01
Budget End: 2018-06-30
Support Year: 1
Fiscal Year: 2017
Total Cost
Indirect Cost

Institution

Name: University of Washington
Department: Biostatistics & Other Math Sci
Type: Schools of Arts and Sciences
DUNS #: 605799469

City: Seattle
State: WA
Country: United States
Zip Code: 98195

Related projects


NIH 2020 R01 GM	A Modeling Framework for Multi-View Data, with Applications to the Pioneer 100 Study and Protein Interaction Networks Witten, Daniela; Bien, Jacob / University of Washington
NIH 2019 R01 GM	A Modeling Framework for Multi-View Data, with Applications to the Pioneer 100 Study and Protein Interaction Networks Witten, Daniela; Bien, Jacob / University of Washington
NIH 2018 R01 GM	A Modeling Framework for Multi-View Data, with Applications to the Pioneer 100 Study and Protein Interaction Networks Witten, Daniela; Bien, Jacob / University of Washington
NIH 2017 R01 GM	A Modeling Framework for Multi-View Data, with Applications to the Pioneer 100 Study and Protein Interaction Networks Witten, Daniela; Bien, Jacob / University of Washington

Comments

Be the first to comment on Daniela Witten's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: