Gene regulation is the framework on which neuronal cellular diversity is built. The substantial cellular diversity that characterizes the central nervous system of vertebrates, such as humans, must therefore require immense regulatory complexity. Although regulatory control acts at many levels, we will focus on the roles played by cis- regulatory elements (REs) in controlling the timing, location and levels of neuronal transcripts. However, the biological relevance of non-coding sequences cannot be inferred by examination of sequence alone. Perhaps the most commonly used indicator of non-coding REs is evolutionary sequence conservation. Although conservation can uncover functionally constrained sequences, it cannot predict biological function and regulatory function is not always confined to conserved sequences. At its simplest level, regulatory instructions are inscribed in transcription factor binding sites (TFBS) within REs. Yet, while many TFBS have been identified, TFBS combinations predictive of specific regulatory control have not yet emerged for vertebrates. We posit that motif combinations accounting for tissue-specific regulatory control can be identified in REs of genes expressed in those cell types. The long-range goal for this proposal is to begin to identify TFBS combinations that can predict neuronal REs - a first step in developing a neuronal regulatory lexicon. We propose 3 aims to directly approach this important challenge. First, we will evaluate ~500 putative neuronal REs in vivo, prioritizing genes critical in catecholaminergic (CA) neurogenesis and function because of the prominent role of these neurons in neurodegenerative and psychiatric disorders (Aim 1), establishing a repository of regulatory data to support the study neuronal development and dysfunction. Critically such an undertaking would not be cost effective in mice. We have developed a highly efficient reporter transgene system in zebrafish that can accurately evaluate the regulatory control of mammalian sequences, enabling characterization of reporter expression during development at a fraction of the cost. Second, we will directly determine what fraction of regulatory information may be overlooked by conservation, tiling across 4 loci (approximately 150 amplicons) and testing all non-coding sequences in our in vivo assay (Aim 2). Third, we will use these and published data sets to improve upon existing computational tools, predicting/evaluating the biological relevance of sequences at genes not tested in Aims 1 and 2 (Aim 3). This proposal is a crucial first step towards a neuronal regulatory lexicon, independent of conservation, and subsequently for other cell types.

Public Health Relevance

We wish to better understand how the regulatory instructions of critical developmental and disease genes are encoded in DNA sequence. We will focus on genes important for the neurons that are lost in disorders like Parkinson's disease. We also aim to establish new computational paradigms, and generate reagents, that will have wide applicability to understanding the wealth of information arising out of genome sequencing efforts.

National Institute of Health (NIH)
National Institute of Neurological Disorders and Stroke (NINDS)
Research Project (R01)
Project #
Application #
Study Section
Molecular Neurogenetics Study Section (MNG)
Program Officer
Gwinn, Katrina
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Johns Hopkins University
Schools of Medicine
United States
Zip Code
Ghandi, Mahmoud; Mohammad-Noori, Morteza; Beer, Michael A (2014) Robust k-mer frequency estimation using gapped k-mers. J Math Biol 69:469-500
Fletez-Brant, Christopher; Lee, Dongwon; McCallion, Andrew S et al. (2013) kmer-SVM: a web server for identifying predictive regulatory sequence features in genomic data sets. Nucleic Acids Res 41:W544-56
Praetorius, Christian; Grill, Christine; Stacey, Simon N et al. (2013) A polymorphism in IRF4 affects human pigmentation through a tyrosinase-dependent MITF/TFAP2A pathway. Cell 155:1022-33
Pol, Suyog U; Lang, Jennifer K; O'Bara, Melanie A et al. (2013) Sox10-MCS5 enhancer dynamically tracks human oligodendrocyte progenitor fate. Exp Neurol 247:694-702
Burzynski, Grzegorz M; Reed, Xylena; Maragh, Samantha et al. (2013) Integration of genomic and functional approaches reveals enhancers at LMX1A and LMX1B. Mol Genet Genomics 288:579-89
Hodonsky, Chani J; Kleinbrink, Erica L; Charney, Kira N et al. (2012) SOX10 regulates expression of the SH3-domain kinase binding protein 1 (Sh3kbp1) locus in Schwann cells via an alternative promoter. Mol Cell Neurosci 49:85-96
Taher, Leila; McGaughey, David M; Maragh, Samantha et al. (2011) Genome-wide identification of conserved regulatory function in diverged sequences. Genome Res 21:1139-49
Lee, Dongwon; Karchin, Rachel; Beer, Michael A (2011) Discriminative prediction of mammalian enhancers from DNA sequence. Genome Res 21:2167-80
Stine, Zachary E; McGaughey, David M; Bessling, Seneca L et al. (2011) Steroid hormone modulation of RET through two estrogen responsive enhancers in breast cancer. Hum Mol Genet 20:3746-56
Prasad, Megana K; Reed, Xylena; Gorkin, David U et al. (2011) SOX10 directly modulates ERBB3 transcription via an intronic neural crest enhancer. BMC Dev Biol 11:40

Showing the most recent 10 out of 12 publications