The BD2K initiative was developed by the NIH to enable biomedical researchers to capitalize on the Big Data being generated, foster new discovery and increase biological knowledge. The need to train a new generation of skilled scientists in computation, informatics, and statistics to surmount the challenges of big data analysis for biological and biomedical science is widely recognized. An important recommendation with respect to big data computing was to build capacity by training the workforce in the relevant quantitative sciences such as bioinformatics, biomathematics, biostatistics, and clinical informatics. Basic science and biomedical advances rely increasingly on these very large, complex datasets generated by high throughput -omic and other biological technologies, and sound statistical reasoning and sophisticated computational techniques are needed throughout the process of analysis and discovery. This includes all stages of investigation, from experimental design and data pre-processing, de-noising and normalization, to integrating multiple datasets, testing hypotheses, and visualizing data in interactive and informative ways. The new challenges posed by high dimensional and complex data require that life and computer scientists working with big data acquire a substantive understanding of statistics and bioinformatics, and that statisticians working in this area, in return, acquire a substantive understanding of biological principles, experimental technologies and computation. These will converge into an interdisciplinary domain where existing statistical and computational tools are used and combined effectively, and novel methods are generated, to promote innovation and discovery in big data analysis for biomedical science. This interdisciplinary communication is essential for the emergence of a new cadre of researchers who can effectively communicate with their peers in the complementary disciplines required for tackling real problems important for life sciences in big data. The Biomedical Big Data to Knowledge (B2D2K) Training Program at The Pennsylvania State University will bring together Data Science researchers and educators from 5 colleges at Penn State: the Colleges of Science, Engineering, Health and Human Development, Information Sciences and Technology, and Medicine, and Geisinger Health System to create a truly transformative multi-disciplinary predoctoral training environment. The goal of the B2D2K program is to train a diverse cohort comprising the next-generation biomedical data scientists with a deep knowledge of Data Science to develop novel algorithmic and statistical methods for building predictive, explanatory, and causal models through integrative analyses of disparate types of biomedical data (including Electronic Health Records, genomics, behavioral, socio-economic, and environmental data) to advance science and improve health. We believe that the investment in this generation of data scientists will be critical to see all of the `Biomedical Big Data' fully utilized to its greatest potential.

Public Health Relevance

The Biomedical Big Data to Knowledge (B2D2K) Training Program at The Pennsylvania State University will bring together Data Science researchers and educators to create a truly transformative multi-disciplinary predoctoral training environment. The goal of the B2D2K program is to train a diverse cohort comprising the next-generation biomedical data scientists.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Institutional National Research Service Award (T32)
Project #
5T32LM012415-04
Application #
9718266
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Ye, Jane
Project Start
2016-04-01
Project End
2021-03-31
Budget Start
2019-04-01
Budget End
2020-03-31
Support Year
4
Fiscal Year
2019
Total Cost
Indirect Cost
Name
Pennsylvania State University
Department
Biochemistry
Type
Schools of Arts and Sciences
DUNS #
003403953
City
University Park
State
PA
Country
United States
Zip Code
16802
Basile, Anna Okula; Ritchie, Marylyn DeRiggi (2018) Informatics and machine learning to define the phenotype. Expert Rev Mol Diagn 18:219-226
Li, Runze; Ren, Jian-Jian; Yang, Guangren et al. (2018) Asymptotic Behavior of Cox's Partial Likelihood and its Application to Variable Selection. Stat Sin 28:2713-2731
Tian, Yuan; Nichols, Robert G; Cai, Jingwei et al. (2018) Vitamin A deficiency in mice alters host and gut microbial metabolism leading to altered energy homeostasis. J Nutr Biochem 54:28-34
El-Manzalawy, Yasser; Hsieh, Tsung-Yu; Shivakumar, Manu et al. (2018) Min-redundancy and max-relevance multi-view feature selection for predicting ovarian cancer survival using multi-omics data. BMC Med Genomics 11:71
Kürüm, Esra; Hughes, John; Li, Runze et al. (2018) Time-varying copula models for longitudinal data. Stat Interface 11:203-221
Coble, Joel L; Sheldon, Kathryn E; Yue, Feng et al. (2017) Identification of a rare LAMB4 variant associated with familial diverticulitis through exome sequencing. Hum Mol Genet 26:3212-3220
Hubbard, Troy D; Murray, Iain A; Nichols, Robert G et al. (2017) Dietary Broccoli Impacts Microbial Community Structure and Attenuates Chemically Induced Colitis in Mice in an Ah receptor dependent manner. J Funct Foods 37:685-698
Hall, Molly A; Wallace, John; Lucas, Anastasia et al. (2017) PLATO software provides analytic framework for investigating complexity beyond genome-wide association studies. Nat Commun 8:1167
Kim, Dokyoon; Basile, Anna O; Bang, Lisa et al. (2017) Knowledge-driven binning approach for rare variant association analysis: application to neuroimaging biomarkers in Alzheimer's disease. BMC Med Inform Decis Mak 17:61
Walia, Rasna R; El-Manzalawy, Yasser; Honavar, Vasant G et al. (2017) Sequence-Based Prediction of RNA-Binding Residues in Proteins. Methods Mol Biol 1484:205-235

Showing the most recent 10 out of 13 publications