Bioinformatics Developments The Comparative Genomics Analysis Unit continues to develop, maintain, and distribute software tools for the analysis of DNA and RNA sequence data. This year, we expanded our suite of custom-developed bioinformatics software to include a set of tools to perform quality control analyses on RNA-Seq data, including the detection and identification of artifacts, errors, and other features introduced in library prep, sequencing, or alignment. These tools have been packaged together and named QoRTs, for Quality of RNA-Seq Toolset. QoRTs generates a wide variety of plots that make it easy for a bioinformatician to identify consistent biases that would otherwise be obscured by the vast size and dimensionality of RNA-Seq data. We continue to work on our copy number detection algorithms. In November, 2013, we presented the BardCNV somatic copy number variant detection package at Cold Spring Harbors Genome Informatics meeting, and we have made BardCNV publicly available on github. In addition, we are currently testing our germline copy number variant detector, GSVseq, on captured exomic DNA sequence data, and hope to submit manuscripts on both of these tools in the coming year. Whole Exome Pipeline Developments This year, as NISC prepared for certification under the Center for Medicaid and Medicare Services Clinical Laboratory Improvement Amendments (CLIA), the Comparative Genomics Analysis Unit performed a complete assessment of our whole exome software pipeline, comparing variant and genotype calls for the Coriell sample NA12878 to the high quality integrated dataset for that same sample produced by the Genome in a Bottle Consortium at the National Institutes of Standards and Technology (NIST). These comparisons demonstrated that the sensitivity and specificity of our NovoMPG pipeline are comparable to results given by the Broad Institutes Genome Analysis Toolkit (GATK), even though NovoMPG is simpler to install and implement than GATK, and faster in its execution. In addition to measuring accuracy, we demonstrated the precision of our variant calling pipeline by comparing results from separate datasets prepared for the same sample by capturing and sequencing two duplicate libraries. We have now begun to distribute our entire pipeline, enabling other groups at the NIH to run the programs on their own next generation exomic datasets, and we have prepared a manuscript describing our work. Collaborative Work In collaboration with Patrick Duffy at the National Institutes of Allergy and Infectious Diseases (NIAID), we have sequenced, assembled, and annotated the genome of Plasmodium coatneyi, a species of Plasmodium that serves as a model for malaria sequestration in macaque monkeys. We have deposited the read sequences and assembly contigs into GenBank (accessions JFFQ00000000 and GCA_000725905 respectively, bioproject PRJNA233970), and are working with EuPathDB to make them available in the next release of PlasmoDB. In addition, we used RNA-Seq data to enhance predicted gene models, significantly improving on other Plasmodium sequencing efforts in annotating the important and rapidly evolving surface antigen genes. In P. coatneyi, we found a full complement of roughly 200 SICAvar genes, linked to sequestration, and previously known only in P. knowlesi, the closest sequenced relative to P. coatneyi. Finally, we performed a phylogenomic analysis of the ten sequenced Plasmodium genomes that revealed strong support for a bird or reptile origin of P. falciparum, correlating its phenotypic differences from the other primate malarias with evolutionary distance. In collaboration with Svante Paabo we sequenced DNA extracted from a toe bone from a Neanderthal to a high-depth of coverage. The analyses of these data add further evidence of interbreeding between hominins, i.e. human, Neanderthal and Denisovan, which has left clear genomic signatures in present-day humans and illuminates the history and evolution of our species (Prufer, Racimo et al. 2014). With Shawn Burgess, we characterized a zebrafish line, NHGRI-01, by sequencing it to a depth of 50x and aligning it the Zv9 reference sequence. Variants were identified using bam2mpg, and annotated with ANNOVAR against ensembl transcripts. We deposited the raw sequence and variant calls into NCBIs short read archive (SRA). This zebrafish line has utility for many reasons, but in particular it will be useful for any researcher who needs to know the exact sequence of a particular genomic region, or who wants to be able to robustly map sequences back to a genome with all possible variants defined (LaFave, Varshney et al. 2014). In collaboration with Aravinda Chakravarti at Johns Hopkins, we analyzed sequence from 43 individuals and 16 HapMap controls in a region previously established to be associated with long QT interval, the cardiomyocyte intercalated disc protein NOS1AP region, which aided in the discovery of a functional non-coding variant lying within an enhancer, which correlates with increased NOS1AP expression (Kapoor, Sekar et al. 2014). In collaboration with Andy Baxevanis, we sequenced, assembled and annotated the Ctenophore genome of Mnemiopsis leidyi. Phylogenomic analyses of both amino acid positions and gene content suggest that ctenophores rather than sponges are the sister lineage to all other animals (Ryan, Pang et al. 2013). Two projects resulting from dye-terminator sequencing of PCR amplicons were brought to completion this year. In collaboration with Charles Rotimi, we sequenced five lipid-associated genes in 48 African Americans, leading to the observation of an ethnicity-specific association of a variant in the LPL gene with serum lipid levels (Bentley, Chen et al. 2014). In another project, with Susana Seixas, we designed primers and sequenced amplicons across three subspecies of chimpanzee, and observed signatures of strong selective constraint in the region of the WFDC6 gene, a recent paralog of the epididymal protease inhibitor EPPIN (Ferreira, Hurle et al. 2013). In collaboration with Yardena Samuels, analysis of next generation whole genome and whole exome sequence from 29 melanoma samples revealed somatic mutation of MAP3K5 in five samples, which seem to be exclusively in samples that are wild-type for the BRAF gene (Prickett, Zerlanko et al. 2014). With Leslie Biesecker, investigations were conducted on malignant hyperthermia (Gonsalves, Ng et al. 2013) and genes effecting coronary artery calcification (Sen, Barb et al. 2014), (Sen, Boelte et al. 2014).

Project Start
Project End
Budget Start
Budget End
Support Year
10
Fiscal Year
2014
Total Cost
Indirect Cost
Name
Human Genome Research
Department
Type
DUNS #
City
State
Country
Zip Code
Le Gallo, Matthieu; Rudd, Meghan L; Urick, Mary Ellen et al. (2018) The FOXA2 transcription factor is frequently somatically mutated in uterine carcinosarcomas and carcinomas. Cancer 124:65-73
Chen, Y-C; Sudre, G; Sharp, W et al. (2018) Neuroanatomic, epigenetic and genetic differences in monozygotic twins discordant for attention deficit hyperactivity disorder. Mol Psychiatry 23:683-690
Randall, Thomas A; Mullikin, James C; Mueller, Geoffrey A (2018) The Draft Genome Assembly of Dermatophagoides pteronyssinus Supports Identification of Novel Allergen Isoforms in Dermatophagoides Species. Int Arch Allergy Immunol 175:136-146
Gandolfi, Barbara; Alhaddad, Hasan; Abdi, Mona et al. (2018) Applications and efficiencies of the first cat 63K DNA array. Sci Rep 8:7024
Serrano Negron, Yazmin L; Hansen, Nancy F; Harbison, Susan T (2018) The Sleep Inbred Panel, a Collection of Inbred Drosophila melanogaster with Extreme Long and Short Sleep Duration. G3 (Bethesda) 8:2865-2873
Le Gallo, Matthieu; Rudd, Meghan L; Urick, Mary Ellen et al. (2017) Somatic mutation profiles of clear cell endometrial tumors revealed by whole exome and targeted gene sequencing. Cancer 123:3261-3268
Kwon, Erika M; Connelly, John P; Hansen, Nancy F et al. (2017) iPSCs and fibroblast subclones from the same fibroblast population contain comparable levels of sequence variations. Proc Natl Acad Sci U S A 114:1964-1969
Dewan, Ramita; Pemov, Alexander; Dutra, Amalia S et al. (2017) First insight into the somatic mutation burden of neurofibromatosis type 2-associated grade I and grade II meningiomas: a case report comprehensive genomic study of two cranial meningiomas with vastly different clinical presentation. BMC Cancer 17:127
Ng, David; Hong, Celine S; Singh, Larry N et al. (2017) Assessing the capability of massively parallel sequencing for opportunistic pharmacogenetic screening. Genet Med 19:357-361
Pemov, A; Li, H; Patidar, R et al. (2017) The primacy of NF1 loss as the driver of tumorigenesis in neurofibromatosis type 1-associated plexiform neurofibromas. Oncogene 36:3168-3177

Showing the most recent 10 out of 141 publications