Methods to predict molecular complexity in sequencing experiments

Smith, Andrew

Abstract

Predicting the molecular complexity of a genomic sequencing library has emerged as a critical but difficult problem in modern applications of DNA sequencing. In applications like RNA-seq and single-cell sequencing, the molecular complexity of the underlying biological sample is also of central interest. This project will produce computational methods for predicting the number of distinct molecules that will be sequenced from deeper sequencing of an existing sequencing library. We will adapt these methods to also predict saturation in RNA-seq and the fraction of the genome covered above some fold in genome resequencing as a function of sequencing depth. We will also develop methods for estimating heterogeneity of phenotypes in a tissue based on single-cell RNA-seq experiments. These methods will allow investigators to optimize their use of DNA sequencing resources, minimizing waste and improving throughput.

Public Health Relevance

DNA sequencing technology will inevitably revolutionize the practice of medicine. Clinical DNA sequencing, for example in diagnosis or guiding treatment, requires robust statistical methods to evaluate the information content of DNA samples and detect the presence of technical artifacts in sequencing data. This project develops statistical methods to evaluate the quality DNA sequencing libraries based on very small amounts of sequencing, which assist in developing more reliable and cost effective clinical sequencing protocols.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Project (R01)
Project #: 5R01HG007650-02
Application #: 8986188
Study Section: Biodata Management and Analysis Study Section (BDMA)
Program Officer: Sofia, Heidi J

Project Start: 2014-12-16
Project End: 2017-11-30
Budget Start: 2015-12-01
Budget End: 2016-11-30
Support Year: 2
Fiscal Year: 2016
Total Cost
Indirect Cost

Institution

Name: University of Southern California
Department: Biology
Type: Schools of Arts and Sciences
DUNS #: 072933393

City: Los Angeles
State: CA
Country: United States
Zip Code: 90032

Related projects


NIH 2017 R01 HG	Methods to predict molecular complexity in sequencing experiments Smith, Andrew David / University of Southern California
NIH 2016 R01 HG	Methods to predict molecular complexity in sequencing experiments Smith, Andrew David / University of Southern California
NIH 2015 R01 HG	Methods to predict molecular complexity in sequencing experiments Smith, Andrew David / University of Southern California	$381,975

Publications

Mangul, Serghei; Yang, Harry Taegyun; Strauli, Nicolas et al. (2018) ROP: dumpster diving in RNA-sequencing to find the source of 1 trillion reads across diverse adult human tissues. Genome Biol 19:36

Delás, M Joaquina; Sabin, Leah R; Dolzhenko, Egor et al. (2017) lncRNA requirements for mouse acute myeloid leukemia and normal differentiation. Elife 6:

Deng, Chao; Daley, Timothy; Smith, Andrew D (2015) Applications of species accumulation curves in large-scale biological data analysis. Quant Biol 3:135-144

Comments

Be the first to comment on Andrew Smith's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: