Although numerous genomes, including the human genome, have been completely sequenced, the specific function of the most of the DNA remains unknown. Identifying all the functional components of genomes has become an important goal of the NIH (e.g., via the ENCODE and modENCODE initiatives). A significant fraction of this DNA is believed to be involved in regulating gene expression, a fundamental process that plays key roles in both normal development and in disease. A basic unit for gene regulation is the cis-regulatory module (CRM; often referred to as an """"""""enhancer""""""""), but identification of these modules on a genomic scale has proven difficult. For the most part, computational methods for CRM discovery have been effective only in those situations where there is already an extensive body of knowledge about the transcription factors that bind to the CRMs, and the sequences (motifs) to which they bind. In this proposal, we develop novel computational tools for CRM discovery. In particular, we depart from current approaches to CRM discovery by developing algorithms that do not rely on prior knowledge of transcription factor binding motifs. By doing so, we are able to identify CRMs even in less well-studied biological contexts where significant prior knowledge is minimal or lacking. We then expand upon this approach by additionally developing methods that utilize partial prior knowledge of CRMs known to be involved in a particular biological process. We will combine our new methods with promising existing approaches to generate a computational pipeline that uses complementary strategies for sensitive and specific CRM discovery, and conduct extensive prediction of CRMs that function in many tissues and cell types. We will take advantage of the powerful genomic and experimental resources available for the model organism Drosophila melanogaster to subject all of our methods to validation both in silico and in vivo, using a large body of existing CRM data that we have compiled and extensive empirical testing in transgenic animals, respectively. The methods we develop here will be instrumental in helping to identify an important class of genomic functional element, the cis-regulatory module, in any metazoan genome. ? ? cis-Regulatory modules (CRMs) are key mediators of normal phenotypic variation, drivers of evolutionary change, and causes of birth defects as well as chronic and acute disease. Identifying CRMs genome-wide is an important first step on the way to comprehending both normal and pathological aspects of gene regulation and gene function with broad implications for understanding disease, predicting disease risk, and preventing and curing disease. ? ? ?

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
1R01GM085233-01
Application #
7506876
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Tompkins, Laurie
Project Start
2008-08-01
Project End
2013-07-31
Budget Start
2008-08-01
Budget End
2009-07-31
Support Year
1
Fiscal Year
2008
Total Cost
$351,875
Indirect Cost
Name
University of Illinois Urbana-Champaign
Department
Biostatistics & Other Math Sci
Type
Schools of Engineering
DUNS #
041544081
City
Champaign
State
IL
Country
United States
Zip Code
61820
Suryamohan, Kushal; Hanson, Casey; Andrews, Emily et al. (2016) Redeployment of a conserved gene regulatory network during Aedes aegypti development. Dev Biol 416:402-13
Blatti, Charles; Kazemian, Majid; Wolfe, Scot et al. (2015) Integrating motif, DNA accessibility and gene expression data to build regulatory maps in an organism. Nucleic Acids Res 43:3998-4012
Suryamohan, Kushal; Halfon, Marc S (2015) Identifying transcriptional cis-regulatory modules in animal genomes. Wiley Interdiscip Rev Dev Biol 4:59-84
Duque, Thyago; Samee, Md Abul Hassan; Kazemian, Majid et al. (2014) Simulations of enhancer evolution provide mechanistic insights into gene regulation. Mol Biol Evol 31:184-200
Blatti, Charles; Sinha, Saurabh (2014) Motif enrichment tool. Nucleic Acids Res 42:W20-5
Atkinson, Taylor J; Halfon, Marc S (2014) Regulation of gene expression in the genomic context. Comput Struct Biotechnol J 9:e201401001
Samee, Md Abul Hassan; Sinha, Saurabh (2014) Quantitative modeling of a gene's expression from its intergenic sequence. PLoS Comput Biol 10:e1003467
Kazemian, Majid; Suryamohan, Kushal; Chen, Jia-Yu et al. (2014) Evidence for deep regulatory similarities in early developmental programs across highly diverged insects. Genome Biol Evol 6:2301-20
Samee, Abul Hassan; Sinha, Saurabh (2013) Evaluating thermodynamic models of enhancer activity on cellular resolution gene expression data. Methods 62:79-90
Kazemian, Majid; Pham, Hannah; Wolfe, Scot A et al. (2013) Widespread evidence of cooperative DNA binding by transcription factors in Drosophila development. Nucleic Acids Res 41:8237-52

Showing the most recent 10 out of 25 publications