Although numerous genomes, including the human genome, have been completely sequenced, the specific function of the most of the DNA remains unknown. Identifying all the functional components of genomes has become an important goal of the NIH (e.g., via the ENCODE and modENCODE initiatives). A significant fraction of this DNA is believed to be involved in regulating gene expression, a fundamental process that plays key roles in both normal development and in disease. A basic unit for gene regulation is the cis-regulatory module (CRM;often referred to as an """"""""enhancer""""""""), but identification of these modules on a genomic scale has proven difficult. For the most part, computational methods for CRM discovery have been effective only in those situations where there is already an extensive body of knowledge about the transcription factors that bind to the CRMs, and the sequences (motifs) to which they bind. In this proposal, we develop novel computational tools for CRM discovery. In particular, we depart from current approaches to CRM discovery by developing algorithms that do not rely on prior knowledge of transcription factor binding motifs. By doing so, we are able to identify CRMs even in less well-studied biological contexts where significant prior knowledge is minimal or lacking. We then expand upon this approach by additionally developing methods that utilize partial prior knowledge of CRMs known to be involved in a particular biological process. We will combine our new methods with promising existing approaches to generate a computational pipeline that uses complementary strategies for sensitive and specific CRM discovery, and conduct extensive prediction of CRMs that function in many tissues and cell types. We will take advantage of the powerful genomic and experimental resources available for the model organism Drosophila melanogaster to subject all of our methods to validation both in silico and in vivo, using a large body of existing CRM data that we have compiled and extensive empirical testing in transgenic animals, respectively. The methods we develop here will be instrumental in helping to identify an important class of genomic functional element, the cis-regulatory module, in any metazoan genome. cis-Regulatory modules (CRMs) are key mediators of normal phenotypic variation, drivers of evolutionary change, and causes of birth defects as well as chronic and acute disease. Identifying CRMs genome-wide is an important first step on the way to comprehending both normal and pathological aspects of gene regulation and gene function with broad implications for understanding disease, predicting disease risk, and preventing and curing disease.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Tompkins, Laurie
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Illinois Urbana-Champaign
Biostatistics & Other Math Sci
Schools of Engineering
United States
Zip Code
Suryamohan, Kushal; Hanson, Casey; Andrews, Emily et al. (2016) Redeployment of a conserved gene regulatory network during Aedes aegypti development. Dev Biol 416:402-13
Blatti, Charles; Kazemian, Majid; Wolfe, Scot et al. (2015) Integrating motif, DNA accessibility and gene expression data to build regulatory maps in an organism. Nucleic Acids Res 43:3998-4012
Suryamohan, Kushal; Halfon, Marc S (2015) Identifying transcriptional cis-regulatory modules in animal genomes. Wiley Interdiscip Rev Dev Biol 4:59-84
Duque, Thyago; Samee, Md Abul Hassan; Kazemian, Majid et al. (2014) Simulations of enhancer evolution provide mechanistic insights into gene regulation. Mol Biol Evol 31:184-200
Atkinson, Taylor J; Halfon, Marc S (2014) Regulation of gene expression in the genomic context. Comput Struct Biotechnol J 9:e201401001
Blatti, Charles; Sinha, Saurabh (2014) Motif enrichment tool. Nucleic Acids Res 42:W20-5
Samee, Md Abul Hassan; Sinha, Saurabh (2014) Quantitative modeling of a gene's expression from its intergenic sequence. PLoS Comput Biol 10:e1003467
Kazemian, Majid; Suryamohan, Kushal; Chen, Jia-Yu et al. (2014) Evidence for deep regulatory similarities in early developmental programs across highly diverged insects. Genome Biol Evol 6:2301-20
Samee, Abul Hassan; Sinha, Saurabh (2013) Evaluating thermodynamic models of enhancer activity on cellular resolution gene expression data. Methods 62:79-90
Kazemian, Majid; Pham, Hannah; Wolfe, Scot A et al. (2013) Widespread evidence of cooperative DNA binding by transcription factors in Drosophila development. Nucleic Acids Res 41:8237-52

Showing the most recent 10 out of 25 publications