This award is funded under the American Recovery and Reinvestment Act of 2009 (Public Law 111-5).

Intellectual Merit The research objective of this proposal is to develop a new regularization framework for effectively identifying molecular signatures under the following practical challenges: (a) small n large p data; (b) sparse signatures; (c) structured features; and (d) noisy features. Molecular profiling usually entails collection of a massive amount of features (i.e., large p) for only a small number of biological individuals (i.e., small n), and therefore identifying the underlying sparse signatures presents a task of finding a very few needles in a haystack. A regularized orthogonal-components regression framework is proposed to address these challenges for effective dimension reduction and thus signature identification. Traditional unsupervised dimension reduction can be used to exclude many features from constructed sparse signatures, but the false discovery rate can be very high. On the other hand, available supervised dimension reduction ignores the sparse nature of the underlying signatures. Furthermore, all these methods assume that the feature values are accurately measured, and do not incorporate functional relatedness of candidates. As a result, despite years of searching, only a handful of predictive bio-markers have advanced to general clinical practice. Clearly, more effective approaches are needed to realize the true potential of predictive molecular signatures. The main idea behind this approach is to construct properly regularized orthogonal components using generalized thresholding estimators which can be implemented by a Bayesian approach. Such a Bayesian implementation will provide a flexible structure for realizing different types of regularization. For example, a sparsity-oriented regularization (SORE) will provide sparse signatures, while a locality-oriented regularization (LORE) will be able to incorporate the structure of features (e.g. functional relatedness). Indeed, even the implementation with a simple SORE has certain ability to identify signatures with clustered features. Furthermore, the proposed framework is naturally enabled to integrate collinear or nearly collinear features. Such properties will make it possible to relieve the effects of measurement errors in observed feature values. Preliminary studies show that this new framework, even implemented with a very simple SORE, provides a clear and significant benefit to the general task of variable selection in the large p small n paradigm with clustered features. A simulation study demonstrated that it can reduce as much as 79% of the loss, and decrease the FDR from 41% to 9%. It confirmed the utility of the new method in molecular signature identification, thus indicating an enormous promise for its use in early disease detection, assessment of prognosis, measurement of drug efficacy, and eventually, personalized medicine. The full potential of the new framework, however, lies in providing breakthrough solutions to structured noisy features.

Broader Impacts Molecular signatures are crucial for understanding complex biological systems. With the new regularization framework, development of signature identification methodologies will be carried out with applications to gene expression profiling, expression quantitative trait loci mapping, genome-wide association study and comparative metabolomics. This project will involve and train interdisciplinary graduate students in analyzing high-dimensional data. The research results will be disseminated through cceHUB, the web server of the multi-institutional Cancer Care Engineering project, to benefit research inside/outside this community.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
0844945
Program Officer
Vijayalakshmi Atluri
Project Start
Project End
Budget Start
2009-06-01
Budget End
2014-05-31
Support Year
Fiscal Year
2008
Total Cost
$433,291
Indirect Cost
Name
Purdue University
Department
Type
DUNS #
City
West Lafayette
State
IN
Country
United States
Zip Code
47907