The NHGRI Bioinformatics and Scientific Programming Core actively supports the research being performed by NHGRI/DIR investigators by providing expertise and assistance in bioinformatics and computational analysis. The Core facilitates access to specialized software and hardware, develops generalized software solutions that can address a variety of questions in genomic research, develops database solutions for the efficient archiving and retrieval of experimental and clinical data, disseminates new software and database solutions to the genome community at-large, collaborates with NHGRI researchers on computationally-intensive projects, and provides educational opportunities in bioinformatics to NHGRI Investigators and trainees. The majority of engagements between the Bioinformatics and Scientific Programming Core and DIR investigators are focused on collaborative interactions intended to advance specific research projects. The support provided for these projects includes not only data analysis, but related efforts focused on data collection and dissemination through the public NHGRI/DIR Web site ( as well. Scientific projects undertaken during the reporting period include the development of a new variant discovery and phenotyping pipeline to address the increased demand for variant calling on human genome and exome data. A GATK-based pipeline that builds upon best practices established by the Broad Institute has been designed and implemented. This standardized and validated pipeline is currently being used in the context of The Genome Ascertainment Consortium (TGAC) effort being led by Dr. Leslie Biesecker; the goals of this effort are to improve our understanding of the phenotypic consequences of genetic variation and to predict phenotypes from genotypes. To that end, this new pipeline has facilitated the creation of a uniformly processed and formatted genotype callset across multiple cohorts, based on data from multiple sources. To date, over 1,500 exome samples from the ClinSeq cohort have been processed, and a larger dataset of 4,600 genomes from the INOVA Translational Medicine Institute is currently being processed. Additional projects include the development of computational methods to analyze RNA-seq data obtained from the zebrafish translatome, the implementation of new gene prediction pipelines for annotation of whole-genome sequencing data, annotation of samples from the TGAC cohorts with HLA genotypes and integration of these data into the gnomAD browser, implementation of the GEMINI database to allow viewing of full sample-level genotypes from the TGAC cohort, development of a website to return negative secondary findings to participants from the A2 ClinSeq cohort, analysis of ClinSeq exams for somatic variants in genes implicated in clonal hematopoiesis of indeterminate potential (CHIP), support for a somatic variant-calling pipeline and downstream analyses used for the study of mosaic variation in overgrowth syndromes, development of a public web browser and BLAT interface for the goldfish genome assembly based on the UCSC Genome Browser, RNA-seq analyses and eQTL mapping to identify modifier genes responsible for aggressive forms of prostate cancer in (TRAMP x WSB) F2 mice and (HiMyc x DO) F1 mice, ChIP-seq analyses to determine how HIST1H1A dysregulation affects transcription factor and chromatin-associated protein binding, ATAC-seq analyses to determine how HIST1H1A dysregulation impacts prostate cancer-specific chromatin structure, RNA-seq analyses comparing differential expression of genes in wild type vs. HIST1H1A prostate tissue samples from knock-out mice, to determine how HIST1H1A affects metastasis susceptibility in prostate cancer; updating the Skippy web server to include additional complementary tools for splicing prediction, design and implementation of surveys that assess the health of dogs whose DNA samples have been submitted to scientific studies, and molecular modeling and comparative secondary structural analyses on DHX15, an RNA helicase involved in pre-mRNA splicing and ribosome biogenesis, with results informing the development of appropriate knock-out and knock-in animal models for further study. Finally, as part of the Finnish-United States Investigation of NIDDM Genetics (FUSION) Project, we have provided computational support aimed at investigating the functional basis of diabetes disease risk through the use of single-cell RNA sequencing technology, with the goal of interrogating the transcriptome at the single-cell level. A particular focus of the FUSION study has been on pancreatic islets, consisting at least five major cell types, in order to assess each cell and cell type individually.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Scientific Cores Intramural Research (ZIC)
Project #
Application #
Study Section
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Human Genome Research
Zip Code