Software Systems for Detecting Rare Mutations

Smith, Todd

Abstract

Next generation DNA sequencing (NGS) technologies hold great promise as tools for building a new understanding of health and disease. In the case of understanding cancer, deep sequencing provides more sensitive ways to detect the germline and somatic mutations that cause different types of cancer as well as identify new mutations within small subpopulations of tumor cells that can be prognostic indicators of tumor growth or drug resistance. Completing the transition from proof of principal applications to practical applications, however, requires that many basic and clinical research groups to be able to effectively utilize NGS. Ongoing technical developments and intense vendor competition amongst NGS platform and service providers are commoditizing data collection costs making systems more assessable. However, the single greatest impediment to the adoption of NGS technology is the lack of systems that create easy access to the immense bioinformatics and IT infrastructures needed to work with the data. In the case of variant analysis, such systems will need to process very large datasets, and accurately predict common, rare, and de novo levels of variation. Genetic variation must be presented in an annotation-rich, biological context to determine the clinical utility, frequency, and putative biological impact. Software systems used for this work must integrate data from many samples together with resources ranging from core analysis algorithms to application specific datasets to annotations, all woven into computational systems with interactive user interfaces (UIs). Such end-to-end systems currently do not exist. In this project, Geospiza will create integrated methods for robust detection and rich contextualization of genetic variants. Using variation analysis in cancer genomics as a model system, we will conduct research to improve assay sensitivity by deeply characterizing data from existing and emerging NGS platforms, quality value (QV) recalibration tools, and alignment algorithms, to understand the systematic artifacts that create errors in the data. To improve how researchers understand a variant's biological context, function and potential clinical utility, we will develop methods to combine assay results from many samples with de novo NGS datasets for assays like RNA-Seq and existing data such as those in GEO and SRA, and information resources from dbSNP, cancer genome databases, and ENCODE. Finally, we will develop the necessary scalable computing infrastructure and novel UI's needed to organize and process the data and explore and annotate the results. Through this work, and follow on product development, we will produce integrated sensitive assay systems that harness NGS for identifying very low (1:1000) levels of changes between DNA sequences to detect cancerous mutations and emerging drug resistance. Our tools and infrastructure can be later applied in assays designed to follow viral epidemics, and understand autoimmune disorders.

Public Health Relevance

The SBIR project """"""""Software Systems for Detecting Rare Mutations"""""""" will deliver new software technologies to further advance the applications for deep DNA sequencing in personalized medicine by improving methods for detecting rare mutations that define cancer types and determine how a cancer cell may grow and respond to, or resist, treatment. In addition to improving cancer research and diagnostics, the software developed will have general use for any application where DNA sequencing is used to understand the genetic basis of human health, disease, and response to drug therapies.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Small Business Innovation Research Grants (SBIR) - Phase II (R44)
Project #: 5R44HG005297-03
Application #: 8209085
Study Section: Special Emphasis Panel (ZRG1-IMST-J (15))
Program Officer: Brooks, Lisa

Project Start: 2009-09-30
Project End: 2013-12-31
Budget Start: 2012-01-01
Budget End: 2013-12-31
Support Year: 3
Fiscal Year: 2012
Total Cost: $582,698
Indirect Cost

Institution

Name: Geospiza, Inc.
Department
Type
DUNS #: 117537170

City: Seattle
State: WA
Country: United States
Zip Code: 98107

Related projects


NIH 2012 R44 HG	Software Systems for Detecting Rare Mutations Smith, Todd M. / Geospiza, Inc.	$582,698
NIH 2011 R44 HG	Software Systems for Detecting Rare Mutations Smith, Todd M. / Geospiza, Inc.	$582,698

Publications

Chhangawala, Sagar; Rudy, Gabe; Mason, Christopher E et al. (2015) The impact of read length on quantification of differentially expressed genes and splice junction detection. Genome Biol 16:131

SEQC/MAQC-III Consortium (2014) A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium. Nat Biotechnol 32:903-14

Mason, Christopher E; Porter, Sandra G; Smith, Todd M (2014) Characterizing multi-omic data in systems biology. Adv Exp Med Biol 799:15-38

Chiron, David; Martin, Peter; Di Liberto, Maurizio et al. (2013) Induction of prolonged early G1 arrest by CDK4/CDK6 inhibition reprograms lymphoma cells for durable PI3K? inhibition through PIK3IP1. Cell Cycle 12:1892-900

Li, Sheng; Garrett-Bakelman, Francine E; Akalin, Altuna et al. (2013) An optimized algorithm for detecting and annotating regional differential methylation. BMC Bioinformatics 14 Suppl 5:S10

Ricarte-Filho, Julio C; Li, Sheng; Garcia-Rendueles, Maria E R et al. (2013) Identification of kinase fusion oncogenes in post-Chernobyl radiation-induced thyroid cancers. J Clin Invest 123:4935-44

Rosenfeld, Jeffrey A; Mason, Christopher E; Smith, Todd M (2012) Limitations of the human reference genome for personalized genomics. PLoS One 7:e40294

Laborde, Rebecca R; Wang, Vivian W; Smith, Todd M et al. (2012) Transcriptional profiling by sequencing of oropharyngeal cancer. Mayo Clin Proc 87:226-32

Comments

Be the first to comment on this grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: