Genomic data are transforming how scientists in medicine and basic science conduct research. The advancement of genome science requires a new generation of scientists with strong computational and statistical skills and the ability to effectively interact with experimentalists. The proposed Penn State Computation, Bioinformatics, and Statistics (CBIOS) Training Program will prepare a cadre of investigators to think innovatively and keep pace with the quickly evolving landscape of high throughput genomic technologies. The program faculty are interdisciplinary and highly collaborative, with expertise in computation, bioinformatics, statistics, functional, medical, and evolutionary genomics. Learning these discipline-crossing skills will make trainees competitive for future careers in emerging and rapidly advancing fields of comparative, systems, statistical and medical genomics. The educational objectives of the CBIOS program are to engender in the trainees the following: 1. A thorough understanding of hypothesis testing in the scientific process. 2. The ability to work from theory to data and back. 3. Fluency in the use of computational and statistical tools for high throughput data. 4. The ability to integrate and innovate computational and statistical analysis of high throughput data. 5. Excellence in cross-disciplinary scientific communication including ethical implications of computational and bioinformatics research. 6. The ability to lead cross-disciplinary research teams The CBIOS training program will accomplish these objectives through a set of existing core and elective courses along with a new practicum course, all of which are integrated with a journal club and seminar series. The program will enhance professional development through invited seminar speakers and retreats, and will specifically develop trainees' communication skills to enable dissemination of genomics research to a broad audience. Predoctoral trainees will be selected early in their graduate program for two years of intensive training. A total of 15 trainees (10 NIH and 5 PSU supported) will be trained during a five-year granting period. The faculty supporting this training program have a combined annual research funding base of $65 million direct costs, and thus offer a robust mentoring foundation for student research experience and opportunities.

Public Health Relevance

A genome sequence holds all the information to make an organism, and the genome sequences of humans and model species were determined in order to use this comprehensive knowledge to uncover new insights into the molecular and cellular basis of disease. Analysis and interpretation of the comprehensive datasets generated by genomics research require skills cutting across the traditional disciplines of computer science, bioinformatics and statistics, and these skills must be applied in a manner that brings out the underlying biology. We propose a new predoctoral training program to prepare a cadre of young scientists that excel in these cross-disciplinary research skills; these trainees will be vital members of the genomics and bioinformatics research community that strives to harvest insights from genomic data to improve human health.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Institutional National Research Service Award (T32)
Project #
Application #
Study Section
Training and Workforce Development Subcommittee - D (TWD)
Program Officer
Marcus, Stephen
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Pennsylvania State University
Schools of Arts and Sciences
University Park
United States
Zip Code
Wang, Qingyu; Shashikant, Cooduvalli S; Jensen, Matthew et al. (2017) Novel metrics to measure coverage in whole exome sequencing datasets reveal local and global non-uniformity. Sci Rep 7:885
Jensen, Matthew; Girirajan, Santhosh (2017) Mapping a shared genetic basis for neurodevelopmental disorders. Genome Med 9:109
Yang, Tao; Zhang, Feipeng; Yard?mc?, Galip Gürkan et al. (2017) HiCRep: assessing the reproducibility of Hi-C data using a stratum-adjusted correlation coefficient. Genome Res 27:1939-1949
Stoler, Nicholas; Arbeithuber, Barbara; Guiblet, Wilfried et al. (2016) Streamlined analysis of duplex sequencing data with Du Novo. Genome Biol 17:180
Tomaszkiewicz, Marta; Rangavittal, Samarth; Cechova, Monika et al. (2016) A time- and cost-effective strategy to sequence mammalian Y Chromosomes: an application to the de novo assembly of gorilla Y. Genome Res 26:530-40
Fuller, Zachary L; Niño, Elina L; Patch, Harland M et al. (2015) Genome-wide analysis of signatures of selection in populations of African honey bees (Apis mellifera) using new web-based tools. BMC Genomics 16:518
Wang, Feng; Polydore, Seth; Axtell, Michael J (2015) More than meets the eye? Factors that affect target selection by plant miRNAs and heterochromatic siRNAs. Curr Opin Plant Biol 27:118-24
Fuller, Zachary L; Haynes, Gwilym D; Zhu, Dianhui et al. (2014) Evidence for stabilizing selection on codon usage in chromosomal rearrangements of Drosophila pseudoobscura. G3 (Bethesda) 4:2433-49
Rebolledo-Jaramillo, Boris; Su, Marcia Shu-Wei; Stoler, Nicholas et al. (2014) Maternal age effect and severe germ-line bottleneck in the inheritance of human mitochondrial DNA. Proc Natl Acad Sci U S A 111:15474-9
Blankenberg, Daniel; Von Kuster, Gregory; Bouvier, Emil et al. (2014) Dissemination of scientific software with Galaxy ToolShed. Genome Biol 15:403

Showing the most recent 10 out of 11 publications