Massachusetts Institute of Technology researcher Manolis Kellis is awarded a Faculty Early Career Development award to develop comparative genomics methods that can maximally use mammalian sequences for biological signal discovery and to systematically interpret the human genome. In Aim 1 comparative methods for ortholog identification will be developed that can scale to dozens of complete mammals, can account for the complex phylogenies relating them, gene duplication and loss, and varying rates of divergence across gene families, and across species. In a second aim a classification framework for gene identification will be constructed. Beyond the simple use of comparative genomics to recognize conserved regions, evolutionary signatures will be defined specific to protein-coding regions, based on patterns of insertion and deletion, codon mutational biases, and motif abundance in proximity of exons. Finally, phylogenetically-informed tools for motif discovery and enhancer identification will be developed. The study of rates of motif gain and loss will allow them to derive parameters for every lineage based on the information content of each motif. These parameters will then be used to discover motifs at the genome scale, tolerating motif movement and loss, based on a likelihood ratio of motif conservation over a branch length within a given window of tolerated movement. Further research will develop methods for identifying functional motif combinations, and use these to identify regions of motif clustering, and relate them with experimentally identified enhancer elements and other regions of regulatory importance. Functional information of expression, regulator binding, and chromatin structure from human tissues and cell lines, across the entire genome and in selected ENCODE regions, will provide training data for our methods, and validate datasets for the predictions. Two key model organisms will be the yeast Saccharomyces cerevisiae, and the fruit fly Drosophila melanogaster, which provide unique models for the computational and experimental efforts. Both benefit from large phylogenetic trees with many completely sequenced relatives, compact genomes, and extensive experimental information, which can inform the power of the methods. The ability to work with multiple model organisms will allows the study of each aspect of human biology at the appropriate scale of complexity, from yeast, to fly, mouse, and human, by virtue of the early shared ancestry of our common biology. The education plan involves developing new courses and curricula and a computational biology textbook. The tools produced by the project will be distributed freely. Already strong efforts to include underrepresented groups will continue along outreach activities, such as lectures at museums or working with high school teachers in the Boston area.