The expanded ability to collect data at all scales-molecular, cellular, tissue, organism and population-has created unparalleled opportunities for biomedical discovery. These opportunities cross all areas of research from basic science to clinical care. In response to these tremendously exciting emerging challenges in data science, Stanford University announced the creation of a new Department of Biomedical Data Science (DBDS) to begin in fall of 2015. Fundamental to the DBDS is bringing together faculty in (1) informatics and computer science, and (2) biostatistics and mathematical modeling, who work closely with a broad range of (3) biomedical science collaborators to advance knowledge. The Stanford Biomedical Informatics (BMI) training program is focused on the creation of new methods for the organization, analysis and modeling of biomedical data and knowledge. The BMI program has been a small interdisciplinary program at Stanford for more than 33 years; nonetheless, it has produced many leaders in biomedical informatics and data science. The BMI program will now have its administrative home in the DBDS, and will become the epicenter for biomedical data science training at Stanford. We are able quickly to respond to the shortage of trained scientists in biomedical data science because of a flexible curriculum, an unusually fertile set of course offerings, and a plethora of research opportunities. In this proposal, we outline a plan to engage faculty broadly across the University to create scalable mechanisms for training the next generation of biomedical data scientists, and creating a pathway for data science within the BMI program that stresses statistical reasoning, machine learning and data mining of biomedical data.

Public Health Relevance

Our ability to collect large amounts of data at the molecular, cellular, tissue, organism, and population levels creates fantastic opportunities for discovery in health. We propose a plan for training scientists with skills to harness these data and turn them into valuable knowledge for biology and medicine.

National Institute of Health (NIH)
National Library of Medicine (NLM)
Institutional National Research Service Award (T32)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Ye, Jane
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Stanford University
Internal Medicine/Medicine
Schools of Medicine
United States
Zip Code
DeBoever, Christopher; Tanigawa, Yosuke; Lindholm, Malene E et al. (2018) Medical relevance of protein-truncating variants across 337,205 individuals in the UK Biobank study. Nat Commun 9:1612
Lavertu, Adam; McInnes, Greg; Daneshjou, Roxana et al. (2018) Pharmacogenomics and big genomic data: from lab to clinic and back again. Hum Mol Genet 27:R72-R78
Paskov, Kelley M; Wall, Dennis P (2018) A Low Rank Model for Phenotype Imputation in Autism Spectrum Disorder. AMIA Jt Summits Transl Sci Proc 2017:178-187
Merker, Jason D; Devereaux, Kelly; Iafrate, A John et al. (2018) Proficiency Testing of Standardized Samples Shows Very High Interlaboratory Agreement for Clinical Next-Generation Sequencing-Based Oncology Assays. Arch Pathol Lab Med :
Gupta, Anika; Sun, Min Woo; Paskov, Kelley Marie et al. (2018) Coalitional game theory as a promising approach to identify candidate autism genes. Pac Symp Biocomput 23:436-447
Pan, Cuiping; McInnes, Gregory; Deflaux, Nicole et al. (2017) Cloud-based interactive analytics for terabytes of genomic variants data. Bioinformatics 33:3709-3715
Lalonde, Simon; Stone, Oliver A; Lessard, Samuel et al. (2017) Frameshift indels introduced by genome editing can lead to in-frame exon skipping. PLoS One 12:e0178700