Genetic studies identify regions of the human genome associated with disease, and containing large numbers of potentially causative variants. Prioritizing these variants for follow-up functional studies is one of the foremost issues in genetic research today, but existing datasets lack the precision that is needed. We propose to use a new cost-effective genome assembly method in the sequencing of 150 additional placental mammals, and then to produce a comprehensive annotation resource for the human genome as well as for primates, rodents, and dogs. This will increase the understanding of the function and evolution of the human genome and enable the prioritization mutations that diseases and biological functions that have emerged over a variety of time-scales. We will select 150 placental mammals for sequencing in collaboration with the San Diego Zoo. In combination with the ~50 already existing high quality placental mammalian assemblies, we aim to achieve one sequenced placental mammal per family. We will use the new assembly method: DISCOVAR de novo, which allows the production of a good quality novel genome assembly using only a single sequencing library type. We will thus reduce the sequencing costs of new genome assemblies ten-fold and will only require small amounts of normal-quality DNA, solving two of the major problems for projects of this nature. We will align our new 150 placental mammalian assemblies, along with the 50 already existing assemblies together, and will then use this to annotate many protein-coding and non-coding features of the human genome, along with a conservation (similarity) score. We will perform a similar analysis aligning all primate assemblies to the human genome, all rodent assemblies to the mouse genome and all Laurasiatheria assemblies to the dog genome. In addition, we will analyse large-scale disease genetics datasets to prioritize causative variants, determine rates of transcription factor binding site turnover and examine convergent evolution and genotype-phenotype correlation across placental mammals. Overall, this work will yield important new resources for the genomics community: a wealth of new mammalian genomes and conservation tracks and analysis to aid the discovery of evolutionarily important and disease-causing variants. Collaborators include Dr. Ryder, an expert in phylogeny and conservation, the Broad Institute and SciLifeLab Genomics Platforms, experts in sequencing, Dr. Haussler, leader of the UCSC Genome Browser, and a consortium of analysts: Drs. Bejerano, Marques-Bonet, Lewin, Taipale, and Ponting, all of whom are outstanding computational biologists in the field of comparative genomics.

Public Health Relevance

Most current large-scale genetic studies can tell us what general area in the human genome is associated with a given disease, but not what exact changes are causing the disease. By sequencing the genomes of 150 mammals we can determine which bases in the human genome are functional, and therefore likely to be important for disease.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
5R01HG008742-03
Application #
9547480
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Troyer, Jennifer L
Project Start
2016-09-28
Project End
2019-08-31
Budget Start
2018-09-01
Budget End
2019-08-31
Support Year
3
Fiscal Year
2018
Total Cost
Indirect Cost
Name
Broad Institute, Inc.
Department
Type
DUNS #
623544785
City
Cambridge
State
MA
Country
United States
Zip Code
02142