Genomic Signal Processing (GSP) is the engineering discipline that studies modeling and statistical issues related to biological signals measured by high-throughput technology, such as gene-expression microarrays or protein-abundance mass spectrometry. Research in GSP typically involve the discovery of reliable molecular markers for disease diagnosis and prognosis, using pattern recognition or machine learning approaches. Such approaches rely on the accuracy of error estimation for classification and prediction. This is particularly critical due to the small sample sizes that are common in GSP applications. Novel robust small-sample error estimation methodologies in GSP are needed in order to enable reproducible scientific discovery that leads to genuine medical advancement.

This research has as its goal solving significant computational and statistical problems that exist in small-sample error estimation. Among the open problems that will be addressed are (1) to obtain exact and approximate representations of the joint sampling distribution of the estimated and true errors for linear continuous classifiers, which will lead to better-performing error estimators and practical tools to assess significance of results; (2) to study error estimation for discrete classifiers, including the binary coefficient of determination (CoD), using both analytical and complete enumeration approaches; (3) to develop the methodology of bolstered error estimation, addressing the application in high-dimensional spaces and with adaptive kernels, with an emphasis on feature selection; (4) to apply these error estimation techniques to the problem of biomarker discovery for diagnosis and prognosis in cancer and infectious diseases, in partnership with medical collaborators at Translational Genomics (TGen), the Johns Hopkins Medical School, and the Oswaldo Cruz Foundation, Brazil (FIOCRUZ).

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Communication Foundations (CCF)
Application #
0845407
Program Officer
John Cozzens
Project Start
Project End
Budget Start
2009-03-01
Budget End
2014-02-28
Support Year
Fiscal Year
2008
Total Cost
$400,000
Indirect Cost
Name
Texas Engineering Experiment Station
Department
Type
DUNS #
City
College Station
State
TX
Country
United States
Zip Code
77845