New statistical and analytical methods will be developed to study regulatory role of histone modifications in Saccharomyces cerevisiae. Gene activities in eukaryotic cells are concertedly regulated by transcription factors and chromatin structure. The basic repeating unit of chromatin is the nucleosome, an octamer containing two copies each of four core histone proteins. While nucleosome occupancy in promoter regions typically occludes transcription factor binding, thereby repressing global gene expression, the role of histone modification is more complex. Histone tails can be modified in various ways, including acetylation, methylation, phosphorylation, and ubiquitination. Even the regulatory role of histone acetylation, the best characterized modification to date, is still not fully understood. Mass spectral and genome-wide microarray data from Saccharomyces cerevisiae have offered new opportunities for investigators to evaluate the regulatory effects of histone modifications. The investigators will develop statistical methods for identifying target genes of histone modifications and associated DNA sequence features of histone modifications. The investigators will also develop computational and statistical methods for predicting histone modifications and their interactions.

Experimental data are noisy and high dimensional, which renders many tradition statistical methods ineffective. How to build prediction models with only a small set of informative variables adds another layer of complexity. New statistical methods will be developed to surmount the challenges. The proposed methods lead to a statistical framework for integrating multiple types of proteomic and genomic data. A complete framework for such integration has not been developed and tested in the statistics and computational biology literature. The proposed method can produce innovative methodologies for analyzing very large amounts of heterogeneous data, suggest new lines of quantitative investigations in systems biology, and offer opportunities for students to participant in inter-disciplinary research.

Project Report

Gene activities in eukaryotic cells are concertedly regulated by special proteins, called transcription factors, and chromatin structure. Extensive studies have been done to understand how various transcription factors (TF) are recruited to activate or repress gene expression in response to different environmental conditions. However, researches in probing the regulatory role of chromatin are still very limited. The basic repeating unit of chromatin is the nucleosome. It consists of 146 base pairs of DNA wrapped around an octamer of histone proteins: two of each of the histones H2A, H2B, H3, and H4. The N-terminal tails of each of the four core histones are highly conserved in their sequence. Each tail is subject to several types of covalent modifications, including acetylation, methylation, phosphorylation, and ubiquitination. Of these, acetylation and methylation have been studied most extensively. The enzymes that are responsible for adding acetyl groups, i.e., histone acetyltransferases or removing them , i.e., histone deacetylases were first identified in the 1990s. Since then, approximately 10 HATs and 5 HDACs have been identified in the yeast genome. Most of these enzymes have orthologs in higher eukaryotes, including humans. Chromatin plays important roles in diverse biological processes including gene regulation, DNA replication, and DNA repair. The regulatory role of histone modifications has been the subject of much recent interest. However, the functions of specific modifications are not well understood. In particular, the histone code hypothesis claims that different combinations of residue modifications have distinct functions. Recently, high resolution mass spectrometry and genome-wide microarray and RNA-seq for detection and localization of histone modifications have become available, offering researchers a great opportunity to delineate the regulatory role of histone modifications. Our work built computational models to estimate the effect of the histone modifications on TF binding and gene expression in yeast. Our works provide a unique perspective to test the histone code hypothesis through effectively integrating information from sequence, gene expression, histone modification, and nucleosome data. For example, the diagram shows one computational method, called MotifExpress, we developed for disocovering transcription factors binding patterns. Unlike existing methods, which either use only DNA sequence information or integrate sequence information with a single-sample measurement of gene expression, MotifExpress integrates DNA sequence information with gene expression measured in multiple samples. By selecting transcription factor binding patterns that are significantly associated with gene expression, we can identify active transcription factor binding patterns under specific experimental conditions and thus provide clues for the construction of regulatory networks. Compared with existing methods, MotifExpress substantially reduces the number of spurious results. Statistically, MotifExpress uses a penalized multivariate regression approach with a composite absolute penalty, which is highly stable and can effectively find the globally optimal set of active motifs. We demonstrate the excellent performance of MotifExpress by applying it to synthetic data and real examples of Saccharomyces cerevisiae. The software is available at http://publish.illinois.edu/pingma/motifexpress/. Such computational frameworks have been also applied to many other research fields to help scientists navigate new discovery.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
0800631
Program Officer
Mary Ann Horn
Project Start
Project End
Budget Start
2008-06-15
Budget End
2013-05-31
Support Year
Fiscal Year
2008
Total Cost
$595,000
Indirect Cost
Name
University of Illinois Urbana-Champaign
Department
Type
DUNS #
City
Champaign
State
IL
Country
United States
Zip Code
61820