In a wide range of problems in genomics and personalized medicine, it is of critical importance to accurately reconstruct distinct nucleotide sequences present in a heterogeneous mixture. Examples include viral quasispecies reconstruction, mapping repertoire of immune cells, and haplotyping. While recent advancements in high-throughput DNA sequencing have enabled affordable studies of genetic variations, technological limitations of sequencing platforms as well as potentially non-uniform frequencies of the sequences in a mixture render the analysis of heterogeneous mixtures a challenging and computationally intensive task.

This research aims to develop fast and accurate algorithms for reconstruction and frequency estimation of sequences in diverse mixtures that will assist practitioners in pharmacogenomics and personalized medicine. The project includes a focus on fostering diversity, dissemination of new interdisciplinary research across disciplines, and enrichment of the educational experience of participating engineering students.

Specific goals of the project include: First, the design and analysis of matrix factorization methods for accurate and efficient reconstruction of distinct sequences present in a heterogeneous mixture and for estimation of their frequencies. In the proposed framework, sequence reconstruction is formulated as the problem of factorizing structured, partially observed low-rank matrices and efficiently solved by exploiting salient features of high-throughput sequencing data. Second, the development of a methodology for the analysis of dynamically evolving mixtures of sequences temporally sampled by means of high-throughput sequencing. This research thrust will lead to novel sequence reconstruction methods capable of tracking the evolution of sequences over time and accurate identification of their frequencies. The third and final goal is the development of algorithmic solutions to specific sequence diversity analysis problems that fully exploit structural features of the respective applications and thus enable superior performance.

Project Start
Project End
Budget Start
2016-09-01
Budget End
2020-08-31
Support Year
Fiscal Year
2016
Total Cost
$400,000
Indirect Cost
Name
University of Texas Austin
Department
Type
DUNS #
City
Austin
State
TX
Country
United States
Zip Code
78759