We propose developing, evaluating and comparing biologically motivated statistical methods in analyzing and interpreting microarray data, including a heart failure dataset. Some of the proposed methods will incorporate or be applied to multiple types of genomic or proteomic data. The overarch- ing theme is that, to increase statistical power for new discovery and to maximize the use of existing knowledge and data, we propose integrating gene networks and multiple types of high-throughput data, such as gene expression data, DNA-protein binding, DNA sequences and SNP data, with novel analysis methods and applications. Specifically, we propose 1) further development and evaluation of a network-based statistical analysis method for genomic discovery with applications to several real datasets;2) developing analysis strategies to integrate gene networks and gene functional annota- tions for genomic discovery, such as detecting differentially expressed genes based on expression data, and identify binding target genes of a single transcription factor based on DNA-protein binding (i.e. ChIP-chip) data;3) developing analysis strategies to integrate gene networks and multiple types of genomic and proteomic data, such as gene expression data, DNA-protein binding data, and DNA sequences;4) integrating gene networks into regression analysis for variable selection and param- eter smoothing with applications to inferring expression quantitative trait loci (eQTL) by regressing expression data on genotype data. 5) software development for free public use.
This proposed research is expected not only to advance statistical methodology and theory for complex data with complicated dependency structures, but also to contribute valuable analysis tools to the elucidation of molecular mechanisms underlying diseases.
Showing the most recent 10 out of 69 publications