A critical challenge in Big Data science is the overall lack of data ahalysis platforms available for transforming Big Data into biological knowledge. To address this challenge, we propose a set of interconnected computational tools capable of organizing and analyzing heterogeneous data to support combined inquiries and to de-convolute complex relationships embedded within large-scale data. We demonstrate its utility with a cardiovascular-centric platform that is easily generalizable to similar efforts in other disciplines. Our Center has designed a federated data architecture of existing resources substantiated by a solid and growing user base, and innovations to elevate functionality. Novel crowdsourcing and text-mining methods will extract the wealth of untapped knowledge embedded in biomedical literature, and novel in-depth proteomics analytical tools will unprecedentedly elucidate dynamic protein features. A key strength of our platform will be the rigorous validation using clinical data from Jackson Heart Study and the Healthy Elderly Active Longevity (HEAL; Wellderly) cohorts. Our proposal includes nine scientific aims that address three main focus areas: (i) we will build a new model platform that amalgamates community-supported Big Data resources, enabling data annotations and collaborative analyses; (ii) we will integrate molecular data with drug and disease information, both structured and unstructured, for knowledge aggregation, and (iii) we will create on-the-cloud analytical and modeling tools to power in-depth protein discoveries. Specifically, we will create a novel distributed query system and cloud-based infrastructure that is capable of providing unified access to multi-omics datasets; we will develop computational and crowdsourcing methods to systematically define relationships between genes, proteins, diseases, and drugs from the literature, emphasizing cardiovascular medicine; we will rally community participation and promote awareness of collaborative research through outreach and educational games; we will create a platform to analyze and visualize multi-scale pathway models of genes, proteins, and metabolites; we will develop tools and algorithms to mechanistically model spatiotemporal protein networks in organelles and to. predict higher physiological phenotypes; and we will correlate individual phenotypes, health histories, and multi-scale molecular profiles to examine cardiovascular disease mechanisms. These tools will be implemented, delivered, and executed on the cloud infrastructure to minimize the computational power required of users.

Public Health Relevance

The challenge of biomedical Big Data are multifaceted. Everyday, biomedical researchers face the daunting task of storing, analyzing, and distributing large-scale genomics and proteomics data, and aggregating all information to discern deeper meanings. Only through a coherent effort can we harness copious amounts of unruly genomics and proteomics data for transformation into testable hypotheses that can dovetail with all of scientific research. This Data Science Research Component is designed to address these challenges.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Specialized Center--Cooperative Agreements (U54)
Project #
5U54GM114833-04
Application #
9298693
Study Section
Special Emphasis Panel (ZRG1-BST-R)
Project Start
Project End
2019-04-30
Budget Start
2017-05-01
Budget End
2018-04-30
Support Year
4
Fiscal Year
2017
Total Cost
$3,211,023
Indirect Cost
$337,146
Name
University of California Los Angeles
Department
Type
Domestic Higher Education
DUNS #
092530369
City
Los Angeles
State
CA
Country
United States
Zip Code
90095
Liem, David A; Murali, Sanjana; Sigdel, Dibakar et al. (2018) Phrase mining of textual data to analyze extracellular matrix protein patterns across cardiovascular disease. Am J Physiol Heart Circ Physiol 315:H910-H924
Mouton, Alan J; DeLeon-Pennell, Kristine Y; Rivera Gonzalez, Osvaldo J et al. (2018) Mapping macrophage polarization over the myocardial infarction time continuum. Basic Res Cardiol 113:26
Wang, Jie; Choi, Howard; Chung, Neo C et al. (2018) Integrated Dissection of Cysteine Oxidative Post-translational Modification Proteome During Cardiac Hypertrophy. J Proteome Res :
Caufield, John Harry; Liem, David A; Garlid, Anders O et al. (2018) A Metadata Extraction Approach for Clinical Case Reports to Enable Advanced Understanding of Biomedical Concepts. J Vis Exp :
Lau, Edward; Cao, Quan; Lam, Maggie P Y et al. (2018) Integrated omics dissection of proteome dynamics during cardiac remodeling. Nat Commun 9:120
Moon, Clara; Stupp, Gregory S; Su, Andrew I et al. (2018) Metaproteomics of Colonic Microbiota Unveils Discrete Protein Functions among Colitic Mice and Control Groups. Proteomics 18:
Jupe, Steve; Ray, Keith; Roca, Corina Duenas et al. (2018) Interleukins and their signaling pathways in the Reactome biological pathway database. J Allergy Clin Immunol 141:1411-1416
Caufield, J Harry; Zhou, Yijiang; Garlid, Anders O et al. (2018) A reference set of curated biomedical data and metadata from clinical case reports. Sci Data 5:180258
Ping, Peipei; Hermjakob, Henning; Polson, Jennifer S et al. (2018) Biomedical Informatics on the Cloud: A Treasure Hunt for Advancing Cardiovascular Medicine. Circ Res 122:1290-1301
Lindsey, Merry L; Jung, Mira; Yabluchanskiy, Andriy et al. (2018) Exogenous CXCL4 Infusion Inhibits Macrophage Phagocytosis by Limiting CD36 Signaling to Enhance Post-myocardial Infarction Cardiac Dilation and Mortality. Cardiovasc Res :

Showing the most recent 10 out of 118 publications