Evidence is accumulating to support the hypothesis that different combinations of histone modifications confer different functional specificities. Identification of various histone modification patterns and linking them with functional elements of the genome is of great interest in epigenetics. High-throughput experimental techniques, such as ChIP-chip and ChIP-Seq, lead to a rich amount of histone modification data. However, current experimental and computational methods have only been able to explore these data to a very limited extent. This project bears a long-term objective of developing novel statistical methods for sparse structure identification from histone modification data. Imposing sparsity is an ideal way for handling extremely high-dimensional data with noisy information and small sample size.
Four specific aims are proposed, including (1) identification of new functional sites on the genome;(2) accurate dissemination between different regulatory elements;(3) identification of the interaction between histone modifications in regulation;(4) uncovering the predictive DNA motifs of the chromatin signature. Novel sparse statistical methods will be developed to achieve these aims, including a high-dimensional clustering method combined with variable selection, a classification method featured by sparse covariance estimation based dimension reduction, a joint estimation of graphical models for multiple functional elements, and a multi-response multi-predictor regression method. This project will be conducted through the collaboration between two statisticians and a biochemist. The proposed methods will be validated through and applied to both published datasets and those provided by the epigenome roadmap project in which one of the PIs is involved.
Epigenetic modifications such as histone modifications play critical roles in regulating gene expression and aberrant epigenetic modifications have been observed in many diseases. A statistically rigorous characterization and understanding of such modifications can greatly facilitate development of new therapeutics.
|Won, Kyoung-Jae; Zhang, Xian; Wang, Tao et al. (2013) Comparative annotation of functional regions in the human genome using epigenomic data. Nucleic Acids Res 41:4423-32|
|Huang, Shuai; Li, Jing; Ye, Jieping et al. (2013) A sparse structure learning algorithm for Gaussian Bayesian Network identification from high-dimensional data. IEEE Trans Pattern Anal Mach Intell 35:1328-42|