Statistical methods for cancer genomics and cell-free DNA analysis

Miller, Jeffrey

Abstract

If detected early, many cancers can be successfully treated, leading to a high rate of survival. Unfortunately, cancer is often detected only at late stages since current screening technologies have insuf?cient sensitiv- ity and speci?city at low tumor fractions. Further, screening itself is often invasive or even harmful, leading health policy experts to recommend delaying or avoiding screening since the disadvantages may outweigh the bene?t. Cell-free DNA (cfDNA) sequencing presents an exciting recent possibility for highly accurate, non- invasive cancer screening. When cells die, they often release small fragments of their DNA into the body, and these cell-free DNA fragments temporarily circulate in the bloodstream. Thus, when cancer is present, plasma obtained from routine blood draws contains DNA fragments from cancer cells. By performing genome sequencing on this plasma cfDNA, it is possible to non-invasively detect and analyze cancers. However, ad- vanced statistical methods are needed to extract the signal from the noise. The fraction of tumor-derived cfDNA fragments is very small, on the order of 1/1000 or less for early stage cancers. The main objective of the proposed project is to develop and test a ?exible suite of statistical methods for cancer detection and analysis using cfDNA sequencing data at low tumor fractions. Our central hypothesis is that structured prob- abilistic models of genomic signals of cancer in cfDNA data, along with careful handling of errors and biases, will enable cancer detection and classi?cation with high sensitivity and speci?city.
(Aim 1) Develop robust non- parametric Poisson regression framework, applied to mutational signatures. The mutational processes that lead to cancer exhibit characteristic genome-wide signatures that are naturally modeled using nonnegative matrix factorization (NMF). We generalize the Poisson NMF model to a nonparametric hierarchical Bayesian regression model with priors informed by latent cancer type/subtype, covariates, known biological structure, and large databases of cancer genomes.
(Aim 2) Develop grammar-based methods for complex models of sequential data, applied to SCNAs. Accurate genome-wide SCNA modeling requires continuous and dis- crete latent states, asynchronous emissions, inhomogeneous transition kernels, and informed priors based on previously observed cancer/normal genomes. We develop a grammar and algorithms for complex sequence models with these features.
(Aim3) Develop integrated Bayesian framework for robust cancer detection from cfDNA sequencing. We will combine the methods from Aims 1 and 2 in a hierarchical model with cancer type/subtype as a latent variable.
(Aim 4) Develop software, provide documentation, and disseminate results to facilitate reproducibility. We will provide user-friendly open-source software, preprocessed public data, and thorough documentation to enable reproducibility and maximize ease-of-use.

Public Health Relevance

The blood contains free-?oating DNA from cells throughout the body, and when cancer is present, some of this free-?oating DNA comes from cancer cells. This makes it possible to detect cancer from a routine blood draw, however, advanced statistical techniques are needed since this requires the detection of weak signals in large quantities of complex data. The purpose of this project is to develop statistical methods and software tools for this task, with the goal of enabling accurate and noninvasive early cancer screening.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Cancer Institute (NCI)
Type: Research Project (R01)
Project #: 1R01CA240299-01A1
Application #: 10052095
Study Section: Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer: Chen, Huann-Sheng

Project Start: 2020-09-01
Project End: 2025-05-31
Budget Start: 2020-09-01
Budget End: 2021-05-31
Support Year: 1
Fiscal Year: 2020
Total Cost
Indirect Cost

Statistical methods for cancer genomics and cell-free DNA analysis
Miller, Jeffrey Wayne
Harvard University, Boston, MA, United States

Abstract

Public Health Relevance

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Public Health Relevance

Funding Agency

Institution

Comments