Comparative Genomics Unit Research

Mullikin, James

Abstract

Bioinformatics developments We continue the development of our genotyping program bam2mpg (Teer et al., 2010). The MPG algorithm used by the program is based on Bayesian modeling of sequence read data, and bam2mpgs default genotype scoring reflects the probability that the called genotypes are correct. To expand the algorithms use to comparison of two samples (e.g., a tumor and matched normal tissue from the same individual), we implemented a new scoring method called MPV, or most probable variant scoring. When run with the MPV option, bam2mpg reports scores that reflect the probability that any sequence variant exists at a site, rather than the probability that the genotype itself is accurate. The resulting scores enable more sensitive somatic variant detection, and have been described and used in a publication with Yardena Samuels and Elliott Margulies analyzing the whole genome sequence of melanoma tumors (10). When Bayesian methods like MPG are applied to the comparison of very similar samples, there is a need for careful filtering of the predicted differences, since Bayesian genotype callers are easily fooled by areas of low sequence coverage or unusually high error rates. In collaboration with Paul Liu and Linzhao Cheng at Johns Hopkins, our group analyzed whole genome sequence from three induced pluripotent stem (iPS) cell lines and compared the data to whole genome sequence from each lines parent cell sample. Our filtering method was able to limit the false positive rate for discovered stem-cell specific variants to just 6% while still detecting thousands of differences in each iPS cell line (5). We have also developed a second software package for somatic variant detection from next generation sequencing data, called Shimmer. Shimmer detects single nucleotide variants by applying a Fishers exact test to sample allele frequencies, and correcting for multiple testing using the Benjamini-Hochberg procedure. This simple testing method is more accurate than existing Bayesian methods, even without filtering of predictions. Our comparisons on simulated sequence and on known true mutations from the COLO-829 melanoma cell line show that Shimmer is more accurate than other programs while still maintaining comparable sensitivity to detect known true positives. In addition, Shimmer will predict copy number alterations in tumor sequences using a hidden Markov model (HMM). Genome assembly We have been working on various projects that require whole genome assemblies. One of these includes the assembly of whole genome sequence from various mouse strains, e.g., C57BL6 and C3H. The method we have developed for this is alignment-based, followed by local de novo assembly. Since these strains are inbred, they have minimal within-sample genetic variation, which allows for accurate local assemblies, resulting in high quality consensus sequence. We are also working on the assembly of microbial genome sequence from a variety of genome sequencing platforms, e.g., 454, HiSeq, MiSeq, and Pacific Biosciences. Two primate genome assemblies were published in the last year, bonobo (16) and gorilla (18). In addition, earlier work on the cat genome and polymorphism discovery in cat was used to create a 1,536 SNP panel used in conjunction with a 15,000 rad radiation hybrid panel to demonstrate improved efficiencies in mapping techniques (2). Whole exome pipeline developments In collaboration with the NISC bioinformatics group, the Mullikin group continues to develop its whole exome bioinformatics pipeline for the analysis of next generation sequence from captured exomic DNA. In addition to expanding the pipelines capabilities to include the analysis of mouse and dog sequences, we have made improvements in the annotation of human mitochondrial sequences, and upgraded and improved the annotation tool ANNOVAR. In collaboration with Joan Bailey-Wilsons group, we are evaluating and improving bam2mpgs algorithm for calling small insertions and deletions, and as part of the ClinSeq project, we have performed principal component analysis on whole exome genotypes from over 600 individuals to examine population structure, and have submitted 374,499 high-confidence variants discovered from Agilent-captured DNA to dbSNP, where these variants are publicly available for download as part of dbSNPs Human Build 137. Our group continued to develop and improve the variant-viewing program VarSifter, incorporating suggested changes from numerous collaborators, and publishing a Bioinformatics applications note (20) this year describing its capabilities. In collaboration with Les Bieseckers group, members of the Mullikin also examined the frequency of high-penetrance variants involved in cancer susceptibility. In a publication examining the implications of secondary discovery of these variants in exome sequencing, we made recommendations for the development of better procedures for the interpretation of incidental findings in large sequencing projects (9). Sanger-based Medical Sequencing Collaborations Results continue to be published using the Mullikin groups analysis pipeline for Sanger medical sequencing reads. Daphne Bells research group has sequenced all coding exons of the Atad5 gene in 108 primary endometrial tumors, and using our analysis methods, discovered 11 somatic mutations in 5 of them. This increased prevalence of somatic mutation in Atad5, as well as the observation that 90% of mice haploinsufficient for Atad5 develop tumors, were detailed in a PLoS Genetics publication implicating Atad5 defects in the development of murine cancer (3). In collaboration with Ajit Varki at UCSC and the NISC Sequencing Center, our group designed PCR primers for the amplification of SIGLEC genes in multiple primates including human, a task which requires extensive screening to assure uniqueness and efficacy of priming in all species. These genes were sequenced and analyzed for polymorphisms and fixed differences at NISC, and the resulting data were included in publications examining the evolution of SIGLEC11 and SIGLEC16 (21) and showing that two SIGLEC genes (SIGLEC13 and SIGLEC17) have been inactivated during human evolution (22).

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Investigator-Initiated Intramural Research Projects (ZIA)
Project #: 1ZIAHG200330-08
Application #: 8565548
Study Section

Project Start
Project End
Budget Start
Budget End
Support Year: 8
Fiscal Year: 2012
Total Cost: $1,358,435
Indirect Cost

Institution

Name: National Human Genome Research Institute
Department
Type
DUNS #

City
State
Country
Zip Code

Related projects

Publications

Le Gallo, Matthieu; Rudd, Meghan L; Urick, Mary Ellen et al. (2018) The FOXA2 transcription factor is frequently somatically mutated in uterine carcinosarcomas and carcinomas. Cancer 124:65-73

Chen, Y-C; Sudre, G; Sharp, W et al. (2018) Neuroanatomic, epigenetic and genetic differences in monozygotic twins discordant for attention deficit hyperactivity disorder. Mol Psychiatry 23:683-690

Randall, Thomas A; Mullikin, James C; Mueller, Geoffrey A (2018) The Draft Genome Assembly of Dermatophagoides pteronyssinus Supports Identification of Novel Allergen Isoforms in Dermatophagoides Species. Int Arch Allergy Immunol 175:136-146

Gandolfi, Barbara; Alhaddad, Hasan; Abdi, Mona et al. (2018) Applications and efficiencies of the first cat 63K DNA array. Sci Rep 8:7024

Serrano Negron, Yazmin L; Hansen, Nancy F; Harbison, Susan T (2018) The Sleep Inbred Panel, a Collection of Inbred Drosophila melanogaster with Extreme Long and Short Sleep Duration. G3 (Bethesda) 8:2865-2873

Le Gallo, Matthieu; Rudd, Meghan L; Urick, Mary Ellen et al. (2017) Somatic mutation profiles of clear cell endometrial tumors revealed by whole exome and targeted gene sequencing. Cancer 123:3261-3268

Kwon, Erika M; Connelly, John P; Hansen, Nancy F et al. (2017) iPSCs and fibroblast subclones from the same fibroblast population contain comparable levels of sequence variations. Proc Natl Acad Sci U S A 114:1964-1969

Dewan, Ramita; Pemov, Alexander; Dutra, Amalia S et al. (2017) First insight into the somatic mutation burden of neurofibromatosis type 2-associated grade I and grade II meningiomas: a case report comprehensive genomic study of two cranial meningiomas with vastly different clinical presentation. BMC Cancer 17:127

Ng, David; Hong, Celine S; Singh, Larry N et al. (2017) Assessing the capability of massively parallel sequencing for opportunistic pharmacogenetic screening. Genet Med 19:357-361

Pemov, A; Li, H; Patidar, R et al. (2017) The primacy of NF1 loss as the driver of tumorigenesis in neurofibromatosis type 1-associated plexiform neurofibromas. Oncogene 36:3168-3177

Showing the most recent 10 out of 141 publications

Comments

Be the first to comment on James Mullikin's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: