Bioinformatics Developments In 2018, the Comparative Genomics Analysis Unit began a project to mine existing long-read data from NCBIs Sequence Read Archive (SRA) to interrogate regions of the genome containing fixed or polymorphic insertions of the endogenous retrovirus HERV-K/HML2. This work, which will eventually be extended to other types of mobile elements, uses software developed by Dr. Adam Phillippys group (MASHmap, canu) to obtain high-quality consensus sequences for regions of the genome that are typically hard to sequence. In addition, the results of these analyses can be viewed in an R/shiny web application developed as part of the project, which we hope to make available online in the coming year. The unit continues to develop and distribute its SVanalyzer package for comparing, merging, and benchmarking structural variant calls in the presence of repetitive sequence. This software, used extensively in analyses performed by the Genome in a Bottle structural analysis working group, is publicly available on github and will be described in a manuscript in the near future. Collaborative Work The units participation in a hackathon held before the Biological Data Science Meeting at Cold Spring Harbor Laboratory in 2016 contributed to the creation of software to construct human genome graphs from long-read assemblies. The resulting pipeline, named NovoGraph, is described in a peer-reviewed manuscript, published in F1000 Research. This work represented a multi-center collaboration of scientists from the University Hospital of Dusseldorf, the New York Genome Center, Lawrence Berkeley National Laboratory, Cold Spring Harbor Laboratory, University of Arizona, Tucson, and Baylor College of Medicine. (Biederstedt, Oliver et al. 2018) In a continuation of our units collaboration with Dr. Douglas Stewart of NCI, we reported on a genomic analysis of atypical neurofibromas (ANFs), lesions which present a high risk of transformation to malignant peripheral nerve sheath tumors (MPNSTs) in neurofibromatosis type 1 patients. Using whole-exome sequence data from 16 matched tumor/normal pairs, we analyzed somatic small mutations and copy number alterations, establishing that ANFs have a relatively low somatic mutation burden, but show frequent inactivation of NF1, CDKN2A, CDKN2B, and SMARCA2. We also found that ANFs are distinct from MPNSTs in not showing recurrent mutation of PRC2 genes SUZ12, EED, and TP53. (Pemov, Hansen et al. 2019) In collaboration with Dr. Shawn Burgess and the NIH Intramural Sequencing Center, we sequenced and assembled the goldfish (Carassius auratus) genome with a variety of sequencing methods and assemblers. Initially, PCR-free Illumina libraries were sequenced with 2x250 base reads and assembled using Discovar De-Novo. This helped us to identify the fraction of the genome that was homozygous due to inbreeding, and that the heterozygous fraction was 1% divergent. We also used 10X Genomics linked read sequencing, and found that while homozygous regions assembled well, the high heterozygosity regions proved to be too difficult for the 10X assembler, Supernova 2.0.1, to handle correctly. Finally, we used Pacific Biosciences RS-II to generate 71-fold coverage and the Canu assembler to generate a final assembly for this fish. Additional goldfish were sequenced using PCR-free Illumina sequencing for variation discovery, yielding over 12 million SNPs and over 2 million indels. Comparative genomics methods were applied with the goldfish genome to the genomes of carp and the zebrafish reference, better resolving the time-to-common ancestors of these species. (Chen, Omori et al. 2019)

Project Start
Project End
Budget Start
Budget End
Support Year
15
Fiscal Year
2019
Total Cost
Indirect Cost
Name
National Human Genome Research Institute
Department
Type
DUNS #
City
State
Country
Zip Code
Le Gallo, Matthieu; Rudd, Meghan L; Urick, Mary Ellen et al. (2018) The FOXA2 transcription factor is frequently somatically mutated in uterine carcinosarcomas and carcinomas. Cancer 124:65-73
Chen, Y-C; Sudre, G; Sharp, W et al. (2018) Neuroanatomic, epigenetic and genetic differences in monozygotic twins discordant for attention deficit hyperactivity disorder. Mol Psychiatry 23:683-690
Randall, Thomas A; Mullikin, James C; Mueller, Geoffrey A (2018) The Draft Genome Assembly of Dermatophagoides pteronyssinus Supports Identification of Novel Allergen Isoforms in Dermatophagoides Species. Int Arch Allergy Immunol 175:136-146
Gandolfi, Barbara; Alhaddad, Hasan; Abdi, Mona et al. (2018) Applications and efficiencies of the first cat 63K DNA array. Sci Rep 8:7024
Serrano Negron, Yazmin L; Hansen, Nancy F; Harbison, Susan T (2018) The Sleep Inbred Panel, a Collection of Inbred Drosophila melanogaster with Extreme Long and Short Sleep Duration. G3 (Bethesda) 8:2865-2873
Le Gallo, Matthieu; Rudd, Meghan L; Urick, Mary Ellen et al. (2017) Somatic mutation profiles of clear cell endometrial tumors revealed by whole exome and targeted gene sequencing. Cancer 123:3261-3268
Kwon, Erika M; Connelly, John P; Hansen, Nancy F et al. (2017) iPSCs and fibroblast subclones from the same fibroblast population contain comparable levels of sequence variations. Proc Natl Acad Sci U S A 114:1964-1969
Dewan, Ramita; Pemov, Alexander; Dutra, Amalia S et al. (2017) First insight into the somatic mutation burden of neurofibromatosis type 2-associated grade I and grade II meningiomas: a case report comprehensive genomic study of two cranial meningiomas with vastly different clinical presentation. BMC Cancer 17:127
Ng, David; Hong, Celine S; Singh, Larry N et al. (2017) Assessing the capability of massively parallel sequencing for opportunistic pharmacogenetic screening. Genet Med 19:357-361
Pemov, A; Li, H; Patidar, R et al. (2017) The primacy of NF1 loss as the driver of tumorigenesis in neurofibromatosis type 1-associated plexiform neurofibromas. Oncogene 36:3168-3177

Showing the most recent 10 out of 141 publications