A crucial component to the recent major advances in genomic research has been the uniting of advances in biology with those in computers, informatics and networking. As technologies have advanced allowing high throughput, Genomics scale data collection, the technological burden has shifted to analysis and informatics. This project was established to ensure that necessary computational tools and resources are available to the NIH intramural community. OIR's long-term collaboration with Dr. Louis Staudt (Distinguished Investigator, NCI Center for Cancer Research and Director, NCI Center for Cancer Genomics) has yielded significant findings and discoveries that have led to improvements in the treatment of lymphoma. By providing comprehensive computational expertise, resources, and support, Dr. Staudt's lab has been able to perform sophisticated analyses on large-scale, high-dimensional data which have in turn been instrumental to achieving a number of highly significant findings. In 2018, OIR made significant contributions to the identification of genetic subtypes of Diffuse Large B-cell Lymphoma (DLBCL) based on patterns of occurrence of mutations and other genetic aberrations. Predictions from the novel GenClass iterative method were validated using a random forest model. To distinguish somatic from germline mutations, a random forest model was trained on toxicity assessors (Annovar annotations such as GERP++ and MutationTaster) on 46 cancer-normal pairs from The Cancer Genome Atlas (TCGA). This model achieved perfect sensitivity at 90% PPV and has shown applicability to other cancers. To identify aberrant somatic hypermutations based on the AID sequence motif, we trained two classifiers on a set of 44 genes previously reported to be hypermutated in DLBCL using a set of features deemed characteristic of somatic hypermutations as well as the flanking nucleotide patterns of the mutations. OIR provides comprehensive computational support to Dr. Staudt's laboratory. This support entails maintaining databases of genomic data, providing computational servers with custom software for running a variety of analyses, and developing and maintaining public and local-access Web sites. These supported resources include the following: - LLMPP/SPECS: The Lymphoma/Leukemia Molecular Profiling Project/Strategic Partnering to Evaluate Cancer Signature (SPECS) program is a multi-institution grant for translational cancer research funded by National Cancer Institute. This website is designed for entering/managing clinical data for cases associated with samples included in the SPECS study. The LLMPP/SPECS project is using microarrays and other high throughput whole genome technologies to define the molecular profiles of all types of human lymphoid malignancies. One primary goal of this project is to redefine the classification of human lymphoid malignancies in molecular terms. A second major goal is to define molecular correlates of clinical parameters that can be used in prognosis and in the selection of appropriate therapy for these patients. As members of the international LLMPP/SPECS consortium, we provide the informatics development and support critical to the success of this project. A database and tools have been implemented to facilitate integrating and analyzing clinical parameters with genomic/genetic data from high throughput technologies. The consortium involves 12 participating centers in 7 countries. Data for 3,000 clinical cases have been uploaded into the system. - LYMPHCX: A Web site that allows researchers to predict DLBCL subtypes based on samples processed with a Nanostring protocol. Determination of these subtypes can be critical in deciding appropriate therapy since some subtypes are more aggressive than others. - LymphoDB: An interactive Web site and database for researchers to search and compare over 1.5 million lymphoma mutations that have been reported in 57 prominent publications. All mutations have been validated and stored along with relevant annotations and metrics to enable comparative quantitative analyses. - Signature database: A Web-site companion to Shaffer AL et al. A library of gene expression signatures to illuminate normal and pathological lymphoid biology, Immunol Rev. 2006 Apr;210:67-85. - Staudt lab analytical test bed: Web site to support quick turn-around of test analytical methods and rapidly allow lab members to more easily explore their own data with new algorithms. - Database support: OIR maintains information on more than 10 million mutations across over 3,000 clinical samples. Information on digital expression is also stored. The mAdb (microArray database, https://madb.nci.nih.gov) system provides a secure data management system for gathering, storing, and managing experimental information and expression array data. A variety of Web accessible tools has been implemented to support the multiple analytical approaches needed to decipher array data in a more meaningful way. Important to the mAdb system design is compatibility with any platform (Unix, Windows or Macintosh) capable of running an Internet browser. A natural extension of mAdb has been the inclusion of additional data resources. This includes supporting information from various data sources (e.g. Gene Ontology, GenBank, Entrez Gene, UniGene, BioSystems Pathways, Biocarta Pathways, COSMIC, and 1000 Genomes) to enable drilling down into the rapidly expanding biological knowledge-base. In order to have effective use of the informational resource developed to support microArray analysis, ongoing user training and support is provided through CIT facilities for this collaborative effort. While ongoing development of new and improved analysis tools continues, the mAdb system is in routine service, having supported over 1900 NIH researchers and collaborators and containing over 111,000 microArray experiments. A critical design element for the mAdb system was to accommodate scalability to allow expansion to support other ICDs. The design allows us to support separate web servers serving different user communities from a single code base. The mAdb system has been set up on separate Web servers to support users of the NIAID microArray core facility. In addition to user-specific, Web-based analyses, our group has facilitated the submission of over 7,000 samples to the NCBI Gene Expression Omnibus (GEO) public repository for required sharing of data associated with publications. In collaboration with Dr. Timothy Myers of NIAID, OIR also provides comprehensive computational support the Genomic Technologies Section (GTS) of NIAID. Since GTS provides state-of-the-art bioinformatics support to the entire NIAID intramural research program, we effectively support all the users of the GTS facility. In addition to maintaining GTS computational servers and databases, OIR maintains a number of commercial software packages for GTS, including CLC-Genomics and SAS Visual Analyzer. In 2018, OIR contributed to the development of an improved De Novo assembly procedure for early detection of minor populations of drug-resistant HIV strains. We developed a statistical test for the identification of variant pairs with unexpectedly high co-occurrence frequencies.

Agency
National Institute of Health (NIH)
Institute
Center for Information Technology (CIT)
Type
Scientific Computing Intramural Research (ZIH)
Project #
1ZIHCT000260-23
Application #
9787091
Study Section
Project Start
Project End
Budget Start
Budget End
Support Year
23
Fiscal Year
2018
Total Cost
Indirect Cost
Name
Computer Research and Technology
Department
Type
DUNS #
City
State
Country
Zip Code
Schmitz, Roland; Wright, George W; Huang, Da Wei et al. (2018) Genetics and Pathogenesis of Diffuse Large B-Cell Lymphoma. N Engl J Med 378:1396-1407
Liang, Ma; Raley, Castle; Zheng, Xin et al. (2016) Distinguishing highly similar gene isoforms with a clustering-based bioinformatics analysis of PacBio single-molecule long reads. BioData Min 9:13