DATA SCIENCE RESEARCH BACKGROUND AND SIGNIFICANCE Biology in the 21st century has emerged as a """"""""big data"""""""" science on par with physics or astronomy. Beginning with the landmark sequencing projects over a decade ago [1, 2], there have been successive waves of technological breakthroughs in probing cellular information on a genome-wide scale: microarrays [3], next generation sequencing [4], large-scale proteomics [5] and their many derivatives [6, 7]. Quick and widespread adoption of high throughput technologies has created massive amounts of data, yet there is a consensus that the floodgates have only barely opened [8]. The explosive growth of data volume has fostered intense research in the development of informatics tools to store, manage and analyze such data [9]. However, the scale and efficiency of the analysis is lagging behind the generation of data, a fact recognized by the major national funding agencies, with the result that the true potential of the data to accelerate biological discovery is not being realized. Analysis of biological data today is hampered by two major bottlenecks: (1) Integration: Different biotechnological tools record different kinds of cellular activities that provide complementary views of the same underlying biological phenomena. However, it has proved extremely difficult to integrate those partial descriptions into a well-organized whole, even though the advantages of such an integrative analysis of diverse data types are well recognized [10]. (2) Scalability: The challenge of data integration is generally met with the most heavy-duty machine learning techniques of the day [10], which typically do not scale well with data size. Biology needs analysis tools that can handle the data deluge of its modern """"""""omics"""""""" era. We propose to develop an E-science framework that will address the issues of integrative analysis and scalability associated with big data analysis in biology. We will build this environment from the ground up, laying its algorithmic foundations, engineering the scalable systems that form its skeleton frame, and creating the human-computer interface that makes it hospitable.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Specialized Center--Cooperative Agreements (U54)
Project #
1U54GM114838-01
Application #
8907580
Study Section
Special Emphasis Panel (ZRG1-BST-R (52))
Program Officer
Lyster, Peter
Project Start
2014-09-29
Project End
2018-04-30
Budget Start
2014-07-01
Budget End
2015-06-30
Support Year
1
Fiscal Year
2014
Total Cost
$1,352,683
Indirect Cost
$430,029
Name
University of Illinois Urbana-Champaign
Department
Type
DUNS #
041544081
City
Champaign
State
IL
Country
United States
Zip Code
61820
Wang, Sheng; Qu, Meng; Peng, Jian (2017) PROSNET: INTEGRATING HOMOLOGY WITH MOLECULAR NETWORKS FOR PROTEIN FUNCTION PREDICTION. Pac Symp Biocomput 22:27-38
Gui, Huan; Liu, Jialu; Tao, Fangbo et al. (2017) Embedding Learning with Events in Heterogeneous Information Networks. IEEE Trans Knowl Data Eng 29:2428-2441
Giacomini, Kathleen M; Yee, Sook Wah; Mushiroda, Taisei et al. (2017) Genome-wide association studies of drug response and toxicity: an opportunity for genome medicine. Nat Rev Drug Discov 16:1
Weinshilboum, Richard M; Wang, Liewei (2017) Pharmacogenomics: Precision Medicine and Drug Response. Mayo Clin Proc 92:1711-1722
Emad, Amin; Cairns, Junmei; Kalari, Krishna R et al. (2017) Knowledge-guided gene prioritization reveals new insights into the mechanisms of chemoresistance. Genome Biol 18:153
Kim, Minji; Kim, Yeonsung; Qian, Lei et al. (2017) TeachEnG: a Teaching Engine for Genomics. Bioinformatics 33:3296-3298
Shi, Yu; Kim, Myunghwan; Chatterjee, Shaunak et al. (2016) Dynamics of Large Multi-View Social Networks: Synergy, Cannibalization and Cross-View Interplay. KDD 2016:1855-1864
Blatti, Charles; Sinha, Saurabh (2016) Characterizing gene sets using discriminative random walks with restart on heterogeneous biological networks. Bioinformatics 32:2167-75
Yu, Hongkun; Shang, Jingbo; Hsu, Meichun et al. (2016) Data-Driven Contextual Valence Shifter Quantification for Multi-Theme Sentiment Analysis. Proc ACM Int Conf Inf Knowl Manag 2016:939-948
Liu, Jialu; Ren, Xiang; Shang, Jingbo et al. (2016) Representing Documents via Latent Keyphrase Inference. Proc Int World Wide Web Conf 2016:1057-1067

Showing the most recent 10 out of 54 publications