Informatics, Machine Learning & Biomedical Data Science

Johnson, Calvin

Abstract

Over the past year, we have been active in: (1) developing computationally efficient methods and algorithms to solve known problems in the analysis of biomedical and clinical data and study complex interactions in biological systems; (2) developing knowledge-based data management systems for the discovery and curation of biomedical knowledge, including distributed annotation systems and clinical information management systems; (3) applying predictive-analytic models to scientific and administrative domains; and (4) consulting with NIH leadership to provide evidence-based solutions to improve the grant application and review process. Specifically, in 2017, collaborative efforts in support of these goals included the following: -In a partnership with Dr. John Tsang of the NIAID Laboratory of Systems Biology, HPCIO is conducting a multifaceted project to profile the immune system using the latest high-throughput, multiplexed technologies and systems approaches. One of the goals of this collaboration is to develop novel computational methodologies that can exploit inter-subject heterogeneity and measurements at various scales to assess the roles of the immune system in health and disease. We have collected samples from a large cohort of patients with immune-mediated monogenic diseases and are the in process of deeply phenotyping blood samples of these patients. By studying the immune system of multiple monogenic, immune-mediated diseases, we will have the opportunities to infer cellular and molecular networks of the human immune system. HPCIO is actively involved in the development of a database to record clinical information of patient visits and in the bioinformatics analyses of data generated from the project. - HPCIO is working with NCI Occupational & Environmental Epidemiology Branch to develop methodologies to incorporate occupational risk factors into epidemiological models. We are enlarging the training data to improve our novel classifiers for coding free text job descriptions into the 840 codes of the 2010 U.S. Standard Occupational Classification System. Agreement between our classification system and expert coders is measured using SOC code agreement and exposure prediction from CANJEM, a job-exposure matrix of over 250 exposure agents developed by Jerome Lavoue at the University of Montreal. We are also working with NCI to develop a two-stage mixed generalized linear model to predict lifetime occupation exposures to lead. - In collaboration with the Membrane Transport Biophysics Section of NINDS, HPCIO is 1) developing a computational tool to accurately identify the boundaries of the lysosomes in fluorescence microscopy and 2) using the fluorescence ration to measure lysosomal pH within each organelle for better understanding of the lysosomal pH regulation. - A freely available plasmid database that is inter-operable with popular freeware is currently being developed for the NIDA Optogenetics and Transgenic Technology Core. The Plasmid Manager offers a versatile yet simple platform for scientists to store and analyze their plasmid data. Motivated by the need for a more comprehensive approach to archiving plasmid data, the database platform is enriched with numerous components beyond the repository, serving as an informatics platform designed to enhance the efficiency and analytic capabilities of scientists. - As high-throughput next-generation sequencing (NGS) technology plays an important role in systematically identifying novel cancer driver mutations in genome-wide surveys, NGS data generation is rapidly increasing, currently accumulating at a rate of several terabytes of data every month at the Lymphoid Malignancies Section of NCI. In collaboration with the Louis Staudt Laboratory, a bioinformatics website is being developed containing useful tools for the analysis of the laboratory's Diffuse Large B-Cell Lymphoma data. This website enables users with very little computer expertise to run their own analyses, as opposed to having a specialist run the analyses for them. Methodologies in parallelization and text searching have also been incorporated for returning the analysis results much more quickly and efficiently than before. In 2017, a new dimension to this collaboration was the development of machine learning methods to identify somatic from germline mutations from NGS sequencing data. Machine learning models have also been tested to identify subtypes of diffuse large B-cell lymphoma, based on their features of gene aberrations. - In collaboration with NIA and NCI, we are applying machine learning and visualization techniques on large biological datasets to discover novel patterns of functional gene or protein interactions as related to aging. In this collaboration, we are developing a machine learning method that models the temporal nature of the longitudinal clinical data to predict the progression of Amyotrophic lateral sclerosis. Such machine learning method may also work well in prediction of high-dimensional time-series genomic data. - The Human Salivary Proteome Wiki is a community-driven Web portal developed by HPCIO, in collaboration with NIDCR, to enable scientists to add their own research data, share results, and discover new knowledge. Many features and external contents have been incorporated over the last few years to make it easier for users to extract different kinds of information from the wiki. One of the latest enhancements is the integration of RNA-seq transcriptional and protein immunohistochemistry data from the Human Protein Atlas. This affords users the ability to weigh evidence generated by different, independent modalities, in addition to the original mass-spectrometry-based data, to assess the status of a protein. - In collaboration with CSR, HPCIO is applying text analytics to provide CSR leadership with evidence-based decision support in evaluation of the grant review process. A Web-based automated referral tool, called ART, was developed and deployed to help PIs and SROs to identify the most relevant study section(s) or special emphasis panel(s) based on the scientific content of an application. In addition, HPCIO is analyzing text from quick feedback surveys on peer review. HPCIO has developed a system to capture the sentiment of reviewer comments in quick feedback surveys and classify these comments with sentiment score into broad categories. Progress has been made to identify needs and suggestions offered by the reviewers and to assign topic labels for these needs and suggestions. In 2017, HPCIO began to explore appropriate topological network mapping diagrams of CSR study sections, superimposed with measures of scientific productivity for those study sections. - In collaboration with the Office of Data Analysis Tools and Systems, NIH Office of Extramural Research, HPCIO has been developing a standard database update pipeline for NIH Topic Maps, originally developed by Dr. Ned Talley of NINDS. This effort was concluded in 2017. - In collaboration with NIAID, HPCIO has supported its release HT JoinSolver(R), a new application capable of analyzing V(D)J recombination in thousands of immunoglobulin gene sequences produced by high throughput sequencing.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: Center for Information Technology (CIT)
Type: Scientific Computing Intramural Research (ZIH)
Project #: 1ZIHCT000200-28
Application #: 9550738
Study Section

Project Start
Project End
Budget Start
Budget End
Support Year: 28
Fiscal Year: 2017
Total Cost
Indirect Cost

Institution

Name: Computer Research and Technology
Department
Type
DUNS #

City
State
Country
Zip Code

Related projects


NIH 2019 ZIH CT	Informatics, Machine Learning & Biomedical Data Science Johnson, Calvin A. / Center for Information Technology
NIH 2018 ZIH CT	Informatics, Machine Learning & Biomedical Data Science Johnson, Calvin A. / Computer Research and Technology
NIH 2017 ZIH CT	Informatics, Machine Learning & Biomedical Data Science Johnson, Calvin A. / Computer Research and Technology
NIH 2016 ZIH CT	Informatics, Machine Learning & Biomedical Data Science Johnson, Calvin A. / Computer Research and Technology
NIH 2015 ZIH CT	Informatics, Machine Learning & Biomedical Data Science Johnson, Calvin A. / Computer Research and Technology
NIH 2014 ZIH CT	Text Analytics, Machine Learning &Biomedical Data Science Johnson, Calvin A. / Computer Research and Technology
NIH 2013 ZIH CT	Text Analytics, Machine Learning &High Performance Computing Johnson, Calvin A. / Center for Information Technology	$2,419,860
NIH 2012 ZIH CT	Text Analytics, Knowledge Engineering, &High Performance Computing Johnson, Calvin A. / Center for Information Technology	$2,726,852
NIH 2010 ZIH CT	Collective Intelligence, Knowledge Infrastructure, &High Performance Computing Johnson, Calvin A. / Center for Information Technology	$2,823,000
NIH 2009 ZIH CT	Collective Intelligence, Knowledge Infrastructure, &High Performance Computing Johnson, Calvin A. / Center for Information Technology	$2,941,656

Publications

Schmitz, Roland; Wright, George W; Huang, Da Wei et al. (2018) Genetics and Pathogenesis of Diffuse Large B-Cell Lymphoma. N Engl J Med 378:1396-1407

Martins, Andrew J; Narayanan, Manikandan; Prüstel, Thorsten et al. (2017) Environment Tunes Propagation of Cell-to-Cell Variation in the Human Macrophage Gene Network. Cell Syst 4:379-392.e12

Wilcox, Amber N; Silverman, Debra T; Friesen, Melissa C et al. (2016) Smoking status, usual adult occupation, and risk of recurrent urothelial bladder carcinoma: data from The Cancer Genome Atlas (TCGA) Project. Cancer Causes Control 27:1429-1435

Liang, Ma; Raley, Castle; Zheng, Xin et al. (2016) Distinguishing highly similar gene isoforms with a clustering-based bioinformatics analysis of PacBio single-molecule long reads. BioData Min 9:13

Lau, William W; Tsang, John S (2016) Humoral Fingerprinting of Immune Responses: 'Super-Resolution', High-Dimensional Serology. Trends Immunol 37:167-169

Lau, William W; Sparks, Rachel; OMiCC Jamboree Working Group et al. (2016) Meta-analysis of crowdsourced data compendia suggests pan-disease transcriptional signatures of autoimmunity. F1000Res 5:2884

Sparks, Rachel; Lau, William W; Tsang, John S (2016) Expanding the Immunology Toolbox: Embracing Public-Data Reuse and Crowdsourcing. Immunity 45:1191-1204

Russ, Daniel E; Ho, Kwan-Yuet; Colt, Joanne S et al. (2016) Computer-based coding of free-text job descriptions to efficiently identify occupations in epidemiological studies. Occup Environ Med 73:417-24

Maudsley, Stuart; Martin, Bronwen; Gesty-Palmer, Diane et al. (2015) Delineation of a conserved arrestin-biased signaling repertoire in vivo. Mol Pharmacol 87:706-17

Russ, Daniel E; Ho, Kwan-Yuet; Longo, Nancy S (2015) HTJoinSolver: Human immunoglobulin VDJ partitioning using approximate dynamic programming constrained by conserved motifs. BMC Bioinformatics 16:170

Showing the most recent 10 out of 14 publications

Comments

Be the first to comment on Calvin Johnson's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: