We propose developing, evaluating and comparing statistical methods in analyzing and interpreting microarray data, including a heart failure dataset collected in the co-Principal Investigator's lab. Some of the proposed methods will incorporate or be applied to other types of genomic or proteomic data.
In Aim A.1, we consider detecting differential gene expression. A weighted permutation scheme is proposed to improve permutation-based inference procedures, and these methods will be compared with several recently proposed parametric and semi-parametric methods. We also propose incorporating existing biological data in the statistical methods.
In Aim A.2, we study a clustering-based classification (CBC) method for gene function prediction using microarray data. CBC will be compared with other state-of-the-art supervised machine learning algorithms, such as support vector machines and random forests. Other sources of biological data, such as protein-protein interaction data, will be incorporated in the proposed method.
In Aim A.3, we consider sample classification and prediction based on gene expression profiles in a general framework called penalized partial least squares (PPLS). PPLS will be compared with other supervised machine learning algorithms. We will extend PPLS to combine microarray data from multiple studies. We plan to implement the proposed statistical methods in R and make the software publicly and freely available.
Showing the most recent 10 out of 69 publications