Advances in information and high-throughput technologies have set the stage for the 'big data age' in biomedicine. However, there remain unresolved challenges that could limit the impact of big data exploration in the basic, clinical an biomedical sciences. These challenges range from assuring privacy and security in cloud computing environments to establishing the integrity and reproducibility of quantitative analysis tools to proving the validity and generalizability of common probabilistic frameworks used to interpret big data in a biomedical context. Our community can prepare for these challenges by developing a workforce that studies data as a science and engineers scalable technologies. Vanderbilt University is uniquely positioned to establish such a program and train the next generation of brightest minds in data science. The proposed program lays a foundation in, and emphasizes the symbiotic relationship between, biomedical informatics, Computer Science, and Biostatistics. Data scientists must be highly knowledgeable in 1) computational techniques, technologies, and infrastructure for collecting, processing, and analyzing data on a massive scale, 2) statistical methodologies that accommodate large-scale, complex, high-dimensional biomedical data (e.g., model building and validation, false discovery rates, missing data imputation, recalibration for measurement error, and assessing the strength of statistical evidence) and 3) the scientific method and the specific biomedical and clinical contexts that led to data capture, downstream discovery and next-generation decision support systems (which governs the generalizability of results and quantitative tools). Because this field is evolving quickly, it is paramount to provide students with pragmatic training environments that emphasize and develop critical thinking skills and expose them to modern biomedical data analysis in real systems. For over a decade, the biomedical informatics doctoral program at Vanderbilt University has provided students with these experiences, leading to innovations in big data analytics with high impact in the underlying scientific applications in real clinical environments. Despite this, there is no formal program dedicated to big data science where students can study this area in the context of real biomedical collaborations (previous students have managed to do this via a patchwork of goodwill and determination, which is necessarily inefficient in coursework and laborious research collaborations). This proposal seeks to build on Vanderbilt's strength in this area to establish the Vanderbilt Training Program in Big Biomedical Data Science (BIDS) for the next generation of data scientists. This program will be managed as a track within the existing biomedical informatics doctoral program and led by the three PI's with complementary expertise in 1) computational infrastructure, 2) statistical methodologies, and 3) management of NIH-sponsored training programs. The program's mentorship comes from an impressive collection of well-established university faculty and ensures students have exposure to novel biomedical problems and interdisciplinary team-based investigations.
Advances in information and high-throughput technologies have made it possible to store, process, and share vast quantities of biomedical data. The goal of Vanderbilt's Training Program in Big Biomedical Data Science (BIDS) is to prepare the next generation of investigators and practitioners in the foundations of data science (biomedical informatics, computer science, statistical science, biomedical science) so they are prepared to take advantage of readily available large-scale complex biomedical data and the specialized tools and analysis techniques needed to properly interpret them. The proposed program leverages Vanderbilt's exceptional and wide-ranging faculty expertise in big data analytics, cloud computing, and biomedical applications.
Chen, Bob; Herring, Charles A; Lau, Ken S (2018) pyNVR: Investigating factors affecting feature selection from scRNA-seq data for lineage reconstruction. Bioinformatics : |
Liu, Qi; Herring, Charles A; Sheng, Quanhu et al. (2018) Quantitative assessment of cell population diversity in single-cell landscapes. PLoS Biol 16:e2006687 |
Gao, Yurui; Schilling, Kurt G; Stepniewska, Iwona et al. (2018) Tests of cortical parcellation based on white matter connectivity using diffusion tensor imaging. Neuroimage 170:321-331 |
Damon, Stephen M; Boyd, Brian D; Plassard, Andrew J et al. (2017) DAX - The Next Generation: Towards One Million Processes on Commodity Hardware. Proc SPIE Int Soc Opt Eng 2017: |
Bao, Shunxing; Weitendorf, Frederick D; Plassard, Andrew J et al. (2017) Theoretical and Empirical Comparison of Big Data Image Processing with Apache Hadoop and Sun Grid Engine. Proc SPIE Int Soc Opt Eng 10138: |
Yao, Xiuya; Chaganti, Shikha; Nabar, Kunal P et al. (2017) Structural-Functional Relationships Between Eye Orbital Imaging Biomarkers and Clinical Visual Assessments. Proc SPIE Int Soc Opt Eng 10133: |
Bao, Shunxing; Plassard, Andrew J; Landman, Bennett A et al. (2017) Cloud Engineering Principles and Technology Enablers for Medical Image Processing-as-a-Service. Proc IEEE Int Conf Cloud Eng 2017:127-137 |
Plassard, Andrew J; D'Haese, Pierre F; Pallavaram, Srivatsan et al. (2017) Multi-Modal and Targeted Imaging Improves Automated Mid-Brain Segmentation. Proc SPIE Int Soc Opt Eng 10133: |
Plassard, Andrew J; Landman, Bennett A (2017) Multiprotocol, multiatlas statistical fusion: theory and application. J Med Imaging (Bellingham) 4:034002 |
Plassard, Andrew J; McHugo, Maureen; Heckers, Stephan et al. (2017) Multi-Scale Hippocampal Parcellation Improves Atlas-Based Segmentation Accuracy. Proc SPIE Int Soc Opt Eng 10133: |
Showing the most recent 10 out of 11 publications