Cornell University is awarded a grant by the NSF Faculty Early Career Development (CAREER) Program to develop new statistical models and algorithms for the identification of novel functional elements in the genomes of humans and eukaryotic model organisms. These new models describe both the structure and the evolution of functional elements, and will make full use of the large quantities of comparative sequence data and other high throughput genomic data that have recently become available. They will allow functional sequences to be identified by their evolutionary signatures, and at the same time will reveal aspects of the evolutionary history of the identified sequences. Software implementations will be developed and applied genome-wide to the latest mammalian and Drosophila comparative sequence data. Predicted elements will be browsable by the public in the UC Santa Cruz Genome Browser, and a subset of them will be tested experimentally. This research will be closely integrated with several educational goals, including the development of new undergraduate and graduate courses in computational genomics, the supervision and mentoring of students and postdoctoral associates, and the development of an already widely used collection of computer programs into a well-documented software package that is easy to use for students and other nonexperts.

Project Report

This NSF CAREER award supported broad research and educational activities in the area of computational biology, with a particular focus on comparative genomics of mammals. It also involved some research in comparative genomics of the important model organism Drosophila melanogaster (the fruitfly) and closely related species. Our main research accomplishments were to develop new statistical methods and computer programs for discovering new genes and regulatory sequences based on patterns of variation across species, for measuring the influence of natural selection on these sequences, and for assessing the historical relationships between major human population groups (such as divergence times and rates of gene flow between groups). The project was primarily concerned with the development of computational methods but it also included some experimental work to validate computational predictions, particularly of new human genes. As with all CAREER awards, career development and teaching were major focuses of activity. The principle investigator developed three new courses in computational biology and machine learning, mentored more than a dozen students and post-doctoral researchers (including several women and under-represented minorities), and participated broadly in other teaching, outreach, and advisory activities. The project produced a widely used open-source software package, called PHAST (Phylogenetic Analysis with Space/Time models), and several widely used genome browser tracks in the University of California, Santa Cruz Genome Browser. The project led to twenty six publications in peer-reviewed journals.

Agency
National Science Foundation (NSF)
Institute
Division of Biological Infrastructure (DBI)
Application #
0644111
Program Officer
Julie Dickerson
Project Start
Project End
Budget Start
2007-03-01
Budget End
2013-02-28
Support Year
Fiscal Year
2006
Total Cost
$645,870
Indirect Cost
Name
Cornell Univ - State: Awds Made Prior May 2010
Department
Type
DUNS #
City
Ithica
State
NY
Country
United States
Zip Code
14850