Data Science Research

Sinha, Saurabh; Sinha, Saurabh; Song, Jun; Weinshilboum, Richard

Abstract

DATA SCIENCE RESEARCH BACKGROUND AND SIGNIFICANCE Biology in the 21st century has emerged as a big data science on par with physics or astronomy. Beginning with the landmark sequencing projects over a decade ago [1, 2], there have been successive waves of technological breakthroughs in probing cellular information on a genome-wide scale: microarrays [3], next generation sequencing [4], large-scale proteomics [5] and their many derivatives [6, 7]. Quick and widespread adoption of high throughput technologies has created massive amounts of data, yet there is a consensus that the floodgates have only barely opened [8]. The explosive growth of data volume has fostered intense research in the development of informatics tools to store, manage and analyze such data [9]. However, the scale and efficiency of the analysis is lagging behind the generation of data, a fact recognized by the major national funding agencies, with the result that the true potential of the data to accelerate biological discovery is not being realized. Analysis of biological data today is hampered by two major bottlenecks: (1) Integration: Different biotechnological tools record different kinds of cellular activities that provide complementary views of the same underlying biological phenomena. However, it has proved extremely difficult to integrate those partial descriptions into a well-organized whole, even though the advantages of such an integrative analysis of diverse data types are well recognized [10]. (2) Scalability: The challenge of data integration is generally met with the most heavy-duty machine learning techniques of the day [10], which typically do not scale well with data size. Biology needs analysis tools that can handle the data deluge of its modern omics era. We propose to develop an E-science framework that will address the issues of integrative analysis and scalability associated with big data analysis in biology. We will build this environment from the ground up, laying its algorithmic foundations, engineering the scalable systems that form its skeleton frame, and creating the human-computer interface that makes it hospitable.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of General Medical Sciences (NIGMS)
Type: Specialized Center--Cooperative Agreements (U54)
Project #: 5U54GM114838-04
Application #: 9301577
Study Section: Special Emphasis Panel (ZRG1-BST-R)

Project Start
Project End: 2019-04-30
Budget Start: 2017-05-01
Budget End: 2018-04-30
Support Year: 4
Fiscal Year: 2017
Total Cost: $3,003,221
Indirect Cost: $806,194

Institution

Name: University of Illinois Urbana-Champaign
Department
Type: Domestic Higher Education
DUNS #: 041544081

City: Champaign
State: IL
Country: United States
Zip Code: 61820

Related projects

Publications

Huang, Edward W; Wang, Sheng; Zhai, ChengXiang (2018) VisAGE: Integrating external knowledge into electronic medical record visualization. Pac Symp Biocomput 23:578-589

Zhang, Yi; Manjunath, Mohith; Zhang, Shilu et al. (2018) Integrative Genomic Analysis Predicts Causative Cis-Regulatory Mechanisms of the Breast Cancer-Associated Genetic Variant rs4415084. Cancer Res 78:1579-1591

Athreya, Arjun; Iyer, Ravishankar; Neavin, Drew et al. (2018) Augmentation of Physician Assessments with Multi-Omics Enhances Predictability of Drug Response: A Case Study of Major Depressive Disorder. IEEE Comput Intell Mag 13:20-31

Zhang, Yi; Manjunath, Mohith; Kim, Yeonsung et al. (2018) SequencEnG: an Interactive Knowledge Base of Sequencing Techniques. Bioinformatics :

Shi, Yu; Gui, Huan; Zhu, Qi et al. (2018) AspEm: Embedding Learning by Aspects in Heterogeneous Information Networks. Proc SIAM Int Conf Data Min 2018:144-152

Baheti, Saurabh; Tang, Xiaojia; O'Brien, Daniel R et al. (2018) HGT-ID: an efficient and sensitive workflow to detect human-viral insertion sites using next-generation sequencing data. BMC Bioinformatics 19:271

Tabe-Bordbar, Shayan; Emad, Amin; Zhao, Sihai Dave et al. (2018) A closer look at cross-validation for assessing the accuracy of gene regulatory networks and models. Sci Rep 8:6620

Ho, Ming-Fen; Correia, Cristina; Ingle, James N et al. (2018) Ketamine and ketamine metabolites as novel estrogen receptor ligands: Induction of cytochrome P450 and AMPA glutamate receptor gene expression. Biochem Pharmacol 152:279-292

Adami, Guy R; Tangney, Christy C; Tang, Jessica L et al. (2018) Effects of green tea on miRNA and microbiome of oral epithelium. Sci Rep 8:5873

Xiao, Jinfeng; Blatti, Charles; Sinha, Saurabh (2018) SigMat: a classification scheme for gene signature matching. Bioinformatics 34:i547-i554

Showing the most recent 10 out of 74 publications

Comments

Be the first to comment on Saurabh Sinha's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: