The broad, long-term objective of this project concerns the development of novel statistical methods and computational tools for statistical and probabilistic modeling of genomic data motivated by important biological questions and experiments.
The specific aim of the current project is to develop new statistical models and methods for analysis of genomic data with graphical structures, focusing on methods for analyzing genetic pathways and networks, including the development of nonparametric pathway-smooth tests for two-sample and analysis of variance problems for identifying pathways with perturbed activity between two or multiple experimental conditions, the development of group Lasso and group threshold gradient descent regularized estimation procedures for the pathway-smoothed generalized linear models, Cox proportional hazards models and the accelerated failure time models in order to identify pathways that are related to various clinical phenotypes. These methods hinge on novel integration of spectral graph theory, non-parametric methods for analysis of multivariate data and regularized estimation methods fro statistical learning. The new methods can be applied to different types of genomic data and will ideally facilitate the identification of genes and biological pathways underlying various complex human diseases and complex biological processes. The project will also investigate the robustness, power and efficiencies o these methods and compare them with existing methods. In addition, this project will develop practical a feasible computer programs in order to implement the proposed methods, to evaluate the performance o these methods through application to real data on microarray gene expression studies of human hear failure, cardiac allograft rejection and neuroblastoma. The work proposed here will contribute both statistical methodology to modeling genomic data with graphical structures, to studying complex phenotypes and biological systems and methods for high-dimensional data analysis, and offer insight into each of the clinical areas represented by the various data sets to evaluate these new methods. All programs developed under this grant and detailed documentation will be made available free-of-charge to interested researchers via the World Wide Web.
Showing the most recent 10 out of 63 publications