Anthropogenic pressure and other natural causes have resulted in severe disruption of the global ecosystems in recent years, including loss of biodiversity and invasion of non-native plants and animals. A particular problem is that it is not enough to simply determine the population of various species; it is also important to determine whether there exists enough genetic diversity within a species to ensure its survival. It is therefore necessary to estimate the genetic biodiversity of various areas in order to decide where and which plants and animals are in most need of protection, and to predict the outcome of proposed interventions. However, current algorithms for computing biodiversity, which are based on taking on computing the genetic "distance" between samples of organisms, are too computationally intensive and slow to be applied at large scale. This project will overcome the problem by developing new, highly efficient algorithms for computing biodiversity. As a result, this work will provide tools needed to improve our knowledge of ecosystems and make better decisions for managing plant and animal natural resources.

In place of the currently popular technique of isolating and sequencing specific phylogenetically informative regions, the PIs propose a low-pass whole genome sequencing (genome skims) and alignment-free methods for barcoding. To enable this approach, the PIs will develop algorithms and tools to identify all genome-skims in a given library, use them for phylogenetic reconstruction and use meta-barcoding and genome-skims as a mechanism for examining populations of organisms. The proposed activities will allow the estimation of genomic bio-diversity for a fraction of the current costs of labor and genome sequencing. The proposal uses a number of innovative and novel algorithmic and statistical techniques and describes the first systematic study of the feasibility of computing the genomic distance using only a small, random fraction of the genome. The project will advance the field by providing a simple and inexpensive protocol for measuring biodiversity with higher sensitivity than is currently achievable.

The proposed activities will allow the estimation of genomic bio-diversity for a fraction of the current costs of labor and genome sequencing. The proposal uses a number of innovative and novel algorithmic and statistical techniques and describes the first systematic study of the feasibility of computing the genomic distance using only a small, random fraction of the genome. If successful, the project will advance the field by providing a simple and inexpensive protocol for measuring biodiversity with higher sensitivity than is currently achievable. The investigators have a strong history of prior research in related fields, but have complementary expertise, in evolution and phylogenetic reconstruction and computational population genomics. There are three aims: given genome-skims of two organisms, estimate the hamming distance and use that to search a given library; use genome-skims for phylogenetic reconstruction; and given a meta-barcoding query (genome-skims of a mix of organisms), identify the constituent organisms and their relative abundance in the sample.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
1815485
Program Officer
Sylvia Spengler
Project Start
Project End
Budget Start
2018-08-15
Budget End
2021-07-31
Support Year
Fiscal Year
2018
Total Cost
$499,987
Indirect Cost
Name
University of California San Diego
Department
Type
DUNS #
City
La Jolla
State
CA
Country
United States
Zip Code
92093