The University of Washington conducts world-class research in the development of big data analytics, as well as in many areas of biomedical research. However, most predoctoral students in biomedical science do not receive cutting-edge training in statistical and computational methods for big data. Furthermore, most predoctoral students in statistics and computing do not receive in-depth training in biomedical science. In short, the university currently lacks an integrated training program that spans computation, statistics, and biomedical science. Given the growing importance of big data across many areas of biomedical research, such an integrated program is critically needed. In order to train a new generation of researchers with expertise in statistics, computing, and biomedical science, we propose the University of Washington PhD Training in Big Data from Genomics and Neuroscience (BDGN). This program will focus on two areas of biomedical science, both of which are characterized by huge amounts of data as well as extensive expertise at the University of Washington: genomics and neuroscience. The program will draw six predoctoral students per year from the following seven PhD programs: Applied Mathematics, Biology, Biostatistics, Computer Science & Engineering, Genome Sciences, Neuroscience, and Statistics. Trainees will be appointed to the training grant during their ?rst or second year of hD studies and will continue on the training grant for two years. They will take a rigorous curriculum that involves three courses in statistics, machine learning, and data science, and three courses in either genomics or neuroscience. Each trainee will be paired with two world-class faculty mentors: one specializing in either genomics or neuroscience, and a second specializing in the development of either computational or statistical methods for big data. Other key features of the training program include three one-quarter rotations, with at least one focusing on genomics or neuroscience and one focusing on statistical or computational methods, a summer internship program, opportunities to attend world-class summer courses run through UW programs, peer mentoring, seminars, journal clubs, and courses on reproducible research and on responsible conduct of research. All predoctoral trainees will leave the BDGN Training Program with a core set of skills and a common language required for generating, interpreting, and developing statistical and computational methods for big data from genomics or neuroscience.

Public Health Relevance

New technologies in biomedical research have led to the generation of very large data sets, such as DNA sequences and fMRI images. However, in order to use these big data sets to improve scienti?c understanding and human health, there is a need for a new generation of biomedical scientists with interdisciplinary training that spans three areas: computer science, statistics, and biomedicine. The proposed program will provide predoctoral training for students engaged in big data research in genomics and neuroscience at the University of Washington.

National Institute of Health (NIH)
National Cancer Institute (NCI)
Institutional National Research Service Award (T32)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1-IMST-B (50))
Program Officer
Lim, Susan E
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Washington
Schools of Medicine
United States
Zip Code
Alexandre, Cristina M; Urton, James R; Jean-Baptiste, Ken et al. (2018) Complex Relationships between Chromatin Accessibility, Sequence Divergence, and Gene Expression in Arabidopsis thaliana. Mol Biol Evol 35:837-854