Genetical genomics, the combined analysis of genetic and gene expression data, holds great promise in elucidating gene regulation and predicting gene networks associated with complex phenotypes. In this project, the investigators aim to develop novel statistical inference procedures and computational tools to understand different gene regulation mechanisms such as cis- and trans-regulation, nonlinear regulation as well as joint regulation of multiple genes from a systems biology perspective. Both parametric and nonparametric inference procedures which incorporate pathway or gene set information are developed to identify novel regulators and to gain novel insights into gene regulation underlying developmental diversity. Statistical tests in a high-dimensional nonparametric regression, and penalized and empirical likelihood methods in semi-parametric models are proposed while their asymptotic distributions are evaluated. Dense SNP genotype and next-generation RNA-Seq data are combined under the proposed framework while accounting for the discrete nature of the sequence mapping reads.
The genetic bases of complex traits often involve multiple inherited genetic factors that function in a network basis. Gene regulation is thought to play a pivotal role in determining trait variations by promoting or reducing the expression of functional genes directly or indirectly related to the phenotype. Drawing on methodology from statistics, genetics and systems biology, this project contributes to knowledge about gene regulation by providing biologically meaningful statistical infrastructures: developing novel statistical devices to handle various complexities in genetic and gene expression data (e.g., RNA-seq), and incorporating pathway information to enhance model biological relevance. The developed models have particular power to attack long-standing genetic questions regarding the interplay between gene expression and regulation. The success will greatly advance the discovery of novel genes, regulators and pathways to facilitate identification of drug targets to enhance public health, and help animal and plant breeders to improve trait quality. Computational tools are made available for public use free of charge. The research is integrated into education to train new generations in statistical genetics, and is widely disseminated through publications, presentations, online software and collaborations with biologists.