In medical research, a growing number of high-content platforms and technologies are used to measure di- verse but related information. Examples include sequencing of the genome, epigenome, transcriptome and translatome, metabolite pro?ling, and imaging modalities. Moreover, data from the same high-content platform are often measured over multiple dimensions, such as multiple tissues, body regions, or developmental time points. We refer to data measured over multiple platforms or technologies as multi-source, and data measured over multiple dimensions as multi-way. Many modern biomedical studies collect data that are both multi-source and multi-way, meaning multi-way data are collected from multiple platforms. Multi-source multi-way data has enormous potential to capture and synthesize every facet of a complex biological system. However, to date there has been little methodology developed for fully integrative analysis of such data. We will focus on devel- oping methods to identify biomarkers for a clinical outcome from multi-source multi-way data. Biomarkers are often used as a surrogate for disease progression or as an endpoint for clinical trials, and so their precision in capturing a given medical phenomenon is crucial. We propose to develop new composite biomarker meth- ods that identify patterns across multiple sources of data, and multiple dimensions, that are associated with a clinical outcome. Our central hypothesis is that a fully integrated and multivariate approach will yield more precise biomarkers and simplify their interpretation. The novel product of this project will be a suite of methods extending common biomarker tasks to the multi-source multi-way context, including dimension reduction (Aim 1a), missing value imputation (Aim 1b), high-dimensional prediction (Aim 2) and dependent hypothesis testing (Aim 3). This work is motivated by our involvement in several ongoing collaborative translational projects with rich multi-source multi-way data, including biomarker discovery for the development of lung cancer in chronic obstructive pulmonary disease patients, for the progression of neurodegenerative disorders such as Friedre- ich's Ataxia, and for brain iron de?ciency in infants. We will apply and rigorously assess our multi-source multi-way approaches on these applications. All methods will be implemented in free, open-source and easily accessible software to facilitate their use by other researchers and practitioners.
The identi?cation of biomarkers, which are measurable features that indicate a medical phenomenon, are very useful as a surrogate of disease progression and to better understand the causal mechanisms of a disease. We will develop methods that yield more accurate, powerful and interpretable biomarkers to describe complex biological systems by identifying patterns across multiple sources of data, and multiple tissues and body regions, that are associated with a clinical outcome.