Problems in data generation, acquisition, management, analysis, visualization, and interpretation, which have always been important players in biomedical science, now assume leading roles in the massive effort to understand health and disease. The unprecedented size, complexity, and heterogeneity of big biomedical data demands research that will allow us to more efficiently extract knowledge from data in order to make better predictions, to characterize biological systems, and generally to enable subsequent investigation. The research activity directed towards these problems is an amalgamation of multiple disciplines: computer sciences, statistics/biostatistics, and specific biomedical science areas all offer critical insights into biomedical data science, or what we call bio-data science in this proposal. This science affects basic biological investigations as well as translational studies and clinical research. Taking advantage of standing PhD programs, the collaborative research infrastructure, and various initiatives in data science and big data at UW, we propose cross training in bio-data science for pre-doctoral students. Trainees will come from one of three focus areas, will complete course work in the three areas, and will be trained in interdisciplinary research, computing infrastructure, and the responsible conduct of research on their way to discovering new knowledge in their PhD thesis work.
Modern biological, medical, and health studies often involve large heterogeneous data sets from which useful, accurate information cannot be efficiently extracted with available methods. Research to improve the analysis of biomedical big data is active at the interface of computer sciences, statistics, and various other biomedical domains, such as genomics and brain science. We propose to train research workers for this interface in order to further advance developments in areas of biomedicine that are reliant on big data.