Genomic data are transforming how scientists in medicine and basic science conduct research. The advancement of genome science requires a new generation of scientists with strong computational and statistical skills and the ability to effectively interact with experimentalists. The proposed Penn State Computation, Bioinformatics, and Statistics (CBIOS) Training Program will prepare a cadre of investigators to think innovatively and keep pace with the quickly evolving landscape of high throughput genomic technologies. The program faculty are interdisciplinary and highly collaborative, with expertise in computation, bioinformatics, statistics, functional, medical, and evolutionary genomics. Learning these discipline-crossing skills will make trainees competitive for future careers in emerging and rapidly advancing fields of comparative, systems, statistical and medical genomics. The educational objectives of the CBIOS program are to engender in the trainees the following: 1. A thorough understanding of hypothesis testing in the scientific process. 2. The ability to work from theory to data and back. 3. Fluency in the use of computational and statistical tools for high throughput data. 4. The ability to integrate and innovate computational and statistical analysis of high throughput data. 5. Excellence in cross-disciplinary scientific communication including ethical implications of computational and bioinformatics research. 6. The ability to lead cross-disciplinary research teams The CBIOS training program will accomplish these objectives through a set of existing core and elective courses along with a new practicum course, all of which are integrated with a journal club and seminar series. The program will enhance professional development through invited seminar speakers and retreats, and will specifically develop trainees' communication skills to enable dissemination of genomics research to a broad audience. Predoctoral trainees will be selected early in their graduate program for two years of intensive training. A total of 15 trainees (10 NIH and 5 PSU supported) will be trained during a five-year granting period. The faculty supporting this training program have a combined annual research funding base of $65 million direct costs, and thus offer a robust mentoring foundation for student research experience and opportunities.

Public Health Relevance

A genome sequence holds all the information to make an organism, and the genome sequences of humans and model species were determined in order to use this comprehensive knowledge to uncover new insights into the molecular and cellular basis of disease. Analysis and interpretation of the comprehensive datasets generated by genomics research require skills cutting across the traditional disciplines of computer science, bioinformatics and statistics, and these skills must be applied in a manner that brings out the underlying biology. We propose a new predoctoral training program to prepare a cadre of young scientists that excel in these cross-disciplinary research skills; these trainees will be vital members of the genomics and bioinformatics research community that strives to harvest insights from genomic data to improve human health.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Institutional National Research Service Award (T32)
Project #
Application #
Study Section
Training and Workforce Development Subcommittee - D (TWD)
Program Officer
Marcus, Stephen
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Pennsylvania State University
Schools of Arts and Sciences
University Park
United States
Zip Code
Vinayachandran, Vinesh; Reja, Rohit; Rossi, Matthew J et al. (2018) Widespread and precise reprogramming of yeast protein-genome interactions in response to heat shock. Genome Res :
Zhang, Yan; An, Lin; Xu, Jie et al. (2018) Enhancing Hi-C data resolution with deep convolutional neural network HiCPlus. Nat Commun 9:750
Iyer, Janani; Singh, Mayanglambam Dhruba; Jensen, Matthew et al. (2018) Pervasive genetic interactions modulate neurodevelopmental defects of the autism-associated 16p11.2 deletion in Drosophila melanogaster. Nat Commun 9:2548
Warris, Sven; Schijlen, Elio; van de Geest, Henri et al. (2018) Correcting palindromes in long reads after whole-genome amplification. BMC Genomics 19:798
Rangavittal, Samarth; Harris, Robert S; Cechova, Monika et al. (2018) RecoverY: k-mer-based read classification for Y-chromosome-specific sequencing and assembly. Bioinformatics 34:1125-1131
Lee, Sang Y; Zhu, Junjia; Salzberg, Anna C et al. (2017) Analysis of single nucleotide variants of HFE gene and association to survival in The Cancer Genome Atlas GBM data. PLoS One 12:e0174778
Yang, Tao; Zhang, Feipeng; Yard?mc?, Galip G├╝rkan et al. (2017) HiCRep: assessing the reproducibility of Hi-C data using a stratum-adjusted correlation coefficient. Genome Res 27:1939-1949
Jensen, Matthew; Girirajan, Santhosh (2017) Mapping a shared genetic basis for neurodevelopmental disorders. Genome Med 9:109
Rieber, Lila; Mahony, Shaun (2017) miniMDS: 3D structural inference from high-resolution Hi-C data. Bioinformatics 33:i261-i266
Wang, Qingyu; Shashikant, Cooduvalli S; Jensen, Matthew et al. (2017) Novel metrics to measure coverage in whole exome sequencing datasets reveal local and global non-uniformity. Sci Rep 7:885

Showing the most recent 10 out of 18 publications