Although numerous genomes, including the human genome, have been completely sequenced, the specific function of the most of the DNA remains unknown. Identifying all the functional components of genomes has become an important goal of the NIH (e.g., via the ENCODE and modENCODE initiatives). A significant fraction of this DNA is believed to be involved in regulating gene expression, a fundamental process that plays key roles in both normal development and in disease. A basic unit for gene regulation is the cis-regulatory module (CRM;often referred to as an "enhancer"), but identification of these modules on a genomic scale has proven difficult. For the most part, computational methods for CRM discovery have been effective only in those situations where there is already an extensive body of knowledge about the transcription factors that bind to the CRMs, and the sequences (motifs) to which they bind. In this proposal, we develop novel computational tools for CRM discovery. In particular, we depart from current approaches to CRM discovery by developing algorithms that do not rely on prior knowledge of transcription factor binding motifs. By doing so, we are able to identify CRMs even in less well-studied biological contexts where significant prior knowledge is minimal or lacking. We then expand upon this approach by additionally developing methods that utilize partial prior knowledge of CRMs known to be involved in a particular biological process. We will combine our new methods with promising existing approaches to generate a computational pipeline that uses complementary strategies for sensitive and specific CRM discovery, and conduct extensive prediction of CRMs that function in many tissues and cell types. We will take advantage of the powerful genomic and experimental resources available for the model organism Drosophila melanogaster to subject all of our methods to validation both in silico and in vivo, using a large body of existing CRM data that we have compiled and extensive empirical testing in transgenic animals, respectively. The methods we develop here will be instrumental in helping to identify an important class of genomic functional element, the cis-regulatory module, in any metazoan genome. cis-Regulatory modules (CRMs) are key mediators of normal phenotypic variation, drivers of evolutionary change, and causes of birth defects as well as chronic and acute disease. Identifying CRMs genome-wide is an important first step on the way to comprehending both normal and pathological aspects of gene regulation and gene function with broad implications for understanding disease, predicting disease risk, and preventing and curing disease.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM085233-05
Application #
8303253
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Sledjeski, Darren D
Project Start
2008-08-01
Project End
2014-07-31
Budget Start
2012-08-01
Budget End
2014-07-31
Support Year
5
Fiscal Year
2012
Total Cost
$330,559
Indirect Cost
$60,687
Name
University of Illinois Urbana-Champaign
Department
Biostatistics & Other Math Sci
Type
Schools of Engineering
DUNS #
041544081
City
Champaign
State
IL
Country
United States
Zip Code
61820
Blatti, Charles; Sinha, Saurabh (2014) Motif enrichment tool. Nucleic Acids Res 42:W20-5
Samee, Md Abul Hassan; Sinha, Saurabh (2014) Quantitative modeling of a gene's expression from its intergenic sequence. PLoS Comput Biol 10:e1003467
Atkinson, Taylor J; Halfon, Marc S (2014) Regulation of gene expression in the genomic context. Comput Struct Biotechnol J 9:e201401001
Duque, Thyago; Samee, Md Abul Hassan; Kazemian, Majid et al. (2014) Simulations of enhancer evolution provide mechanistic insights into gene regulation. Mol Biol Evol 31:184-200
Cheng, Qiong; Kazemian, Majid; Pham, Hannah et al. (2013) Computational identification of diverse mechanisms underlying transcription factor-DNA occupancy. PLoS Genet 9:e1003571
Samee, Abul Hassan; Sinha, Saurabh (2013) Evaluating thermodynamic models of enhancer activity on cellular resolution gene expression data. Methods 62:79-90
Suleimenov, Yerzhan; Ay, Ahmet; Samee, Md Abul Hassan et al. (2013) Global parameter estimation for thermodynamic models of transcriptional regulation. Methods 62:99-108
Kazemian, Majid; Pham, Hannah; Wolfe, Scot A et al. (2013) Widespread evidence of cooperative DNA binding by transcription factors in Drosophila development. Nucleic Acids Res 41:8237-52
Ament, Seth A; Wang, Ying; Chen, Chieh-Chun et al. (2012) The transcription factor ultraspiracle influences honey bee social behavior and behavior-related gene expression. PLoS Genet 8:e1002596
He, Xin; Duque, Thyago S P C; Sinha, Saurabh (2012) Evolutionary origins of transcription factor binding site clusters. Mol Biol Evol 29:1059-70

Showing the most recent 10 out of 18 publications