The analysis of individual parts of the genome enables a more comprehensive understanding of how the components work together in the broader context of disease. The following projects are described with respect to continuation of previous years' projects and represent integrative analyses of independent genomic data types to address the genome as a complex regulatory system. The analysis areas include cis-acting, trans-acting and epigenetic regulators of the human genome. Addressing the evolution of the human genome through the emergence of new human-specific genes regulated by bidirectional promoters: Research from my group previously established the enrichment of bidirectional promoters in vertebrate genomes including human, mouse, rat, and cow (Yang et al, 2008), which indicates evolutionary selection to maintain their presence. Despite the cross-species similarities, we discovered that some bidirectional promoters correspond to positions of unidirectional promoters in other vertebrate species; leading to the hypothesis that species-specific bidirectional promoters greatly and uniquely target the detection of species-specific transcripts in any genome. We confirmed this hypothesis while participating in the Bovine Genome Consortium (Bovine Seq. Cons. et al. 2009) and identified new species-specific bidirectional gene pairs (Piontkivska et al. 2009). To find human-specific transcripts, my group identified a set of 1,400 nonconserved, novel noncoding transcripts flanking bidirectional promoters (Gotea et al. 2013). Once identified, we developed a test for positive selection in these transcripts and elsewhere, as an indicator of beneficial function of noncoding sequences to the human genome. We also realized that positive selection can be inaccurately measured in the genome and developed a test for GC-biased gene conversion to distinguish adaptive forces from non-adaptive forces in the genome (Gotea et al. 2014). After integrating computational and experimental data, we found nucleotide substitutions that facilitate the emergence of new exons in the bidirectional transcripts of the genome. The gene list provides the basis for studying the role of novel transcripts that are unique to the human genome. Moreover, using this approach, novel transcripts can be identified for any species (Yang and Elnitski 2015). Postscript: The model for the emergence of new noncoding genes through bidirectional promoters is consistent with recent reports showing that the majority of lincRNA genes have bidirectional promoters, encompassing many species-specific transcripts. Comparing genome-wide methylation patterns in subtypes of ovarian tumors and mouse models: Altered DNA methylation in promoter regions can distinguish genes that are relevant to ovarian tumor pathology (Kolbe et al, PLoS ONE 2012). Given the sporadic nature of 90% of ovarian cancers, disruption of normal gene regulation is a likely contributor to disease etiology. Methylation patterns at 25,475 unique loci in 43 samples of ovarian, endometrial or metastatic tumors, along with normal fallopian tube and normal endometrium showed that methylation mirrors histopathological subdivisions and discriminates tumor types with finer granularity and greater reproducibility than published gene expression assays. The extensive differences we showed between tumor and normal samples are the first report of a methylator phenotype in ovarian endometrioid tumors, analogous to the methylator phenotype identified in colorectal cancer and glioblastoma. We expanded these studies to show that methylator phenotypes can be identified in a majority of tumor types (Sanchez-Vega et al. 2015). Profiling common epigenetic features in solid human epithelial tumors: The study of aberrant DNA methylation in cancer holds the key to the discovery of novel biological markers for diagnostics and can help to delineate important mechanisms of disease. We have identified 12 loci that are differentially methylated in serous ovarian cancers and endometrioid ovarian and endometrial cancers with respect to normal controls. The strongest signal showed hypermethylation in tumors at a CpG island within the ZNF154 promoter (Sanchez-Vega et al. 2013). We show that hypermethylation of this locus is recurrent across solid human epithelial tumor samples for 15 of 16 distinct cancer types from TCGA. Furthermore, ZNF154 hypermethylation is strikingly present across a diverse panel of ENCODE cell lines, and unique to cell lines derived from tumor cells. By extending our analysis from the Illumina 27K Infinium platform to the 450K platform, to PCR amplification of bisulfite treated DNA, we demonstrate that hypermethylation extends across the breadth of the ZNF154 CpG island. We have also identified recurrent hypomethylation in two genomic regions associated with CASP8 and VHL. These three genes exhibit significant negative correlation between methylation and gene expression across many cancer types, as well as patterns of DNaseI hypersensitivity and histone marks that reflect different chromatin accessibility in cancer vs. normal cell lines. Our findings emphasize hypermethylation of ZNF154 as a biological marker of relevance for tumor diagnostics. Epigenetic modifications affecting the promoters of ZNF154, CASP8 and VHL are shared across a vast array of tumor types and may therefore be important for understanding the genomic landscape of cancer. I am studying ZNF154 as a marker for development of diagnostic tests of myriad epithelial cancers. Update of research projects on individual functional elements and community impact: Exon Skipping. My work to identify sequence mutations that cause exon skipping (Woolfe et al. 2010)applied statistical tests to determine which features showed statistically significant, predictive ability to discriminate neutral variants from disease-causing mutations. We implemented the results in a web server that evaluates variants of unknown function to predict those most likely to cause exon skipping, Skippy, (http://research.nhgri.nih.gov/skippy/), which continues to receive the most visits of all NHGRI webservers and downloads for private use. In the last year, the Skippy server had 39,381 total page views, 107 average page views per day and 12.21 average page views per visit. In an application of the Skippy toolset, my group showed that synonymous substitutions detected in cystic fibrosis patients cause exon skipping in CFTR. These variants are novel candidates for uncharacterized second allele mutations in CFTR (Scott et al, 2012 J. Cystic Fibrosis). This project extends into collaborative projects on the study of recurrent functional synonymous mutations in melanoma (Gartner et al. 2013, and Gotea et al. 2015). Negative regulatory elements: My group developed the first, systematic expression vector system to experimentally assay negative regulatory elements (Petrykowska et al. 2008 Gen. Res.). Despite the commonly held hypothesis that negative cis-acting elements are present in the human genome, examples have not been widely defined or characterized. My research to help identify negative elements has broader importance because mutations in these elements would be activating for disease and could play a role in a host of diseases. Annotations of negative elements discovered by my group are posted on the UCSC Human Genome Browser test web site. Since inception of the assay, I have provided the vectors as source materials to the community and continue to collaborate with other labs upon request. Furthermore, I have participated in the ENCODE Consortium analysis groups to experimentally assess the functional activity of putative negative regulatory elements predicted in genomic sequences (ENCODE Cons. et al. 2011 PLoS Bio., ENCODE Cons. et al. 2012 Nature and Kellis et al, 2014a PNAS, Kellis et al 2014).
Showing the most recent 10 out of 35 publications