High-throughput sequencing (HTS) platforms are revolutionizing genomics and health research. The incredible throughput of new sequencing instruments has enabled sequencing of genomes, exomes, methylomes, and transcriptomes in both research and clinical settings. As the cost of DNA sequencing has plummeted, two important trends have become apparent. First, the cost of analysis, in terms of computing resources and personnel, will soon surpass the cost of data generation. This will increase the pressing demand for analytical algorithms that run faster, with fewer CPU/memory resources, while processing overgrowing data sets. Second, the advent of HTS technologies has put low-cost, high-throughput sequencing into the hands of small research labs and clinical investigators;groups that are not accustomed to dealing with this type and scale of data. These developments will undoubtedly yield an unprecedented number of new discoveries, clinical insights, and medical breakthroughs in the coming years, provided the outstanding issues of HTS data analysis (short read lengths, inherent errors, and sheer number of sequence reads) can be conclusively resolved. Until now, most HTS has taken place in large genome centers with teams of bioinformaticians and substantial computing infrastructures. There is an urgent need to make their analysis tools and next-generation pipelines available to the wider research community as easy to install and use packages. We have spent several years developing a computational framework and innovative tools for HTS data analysis, with a particular focus on the discovery and interpretation of genetic variants. Our goal in this proposal is to make these tools available to the wider community, both individually and as part of a complete informatics solution from alignment to detection to interpretation. The solution we describe is flexible and powerful enough to be adopted by experienced laboratories, while at the same time providing high quality, push-button analysis of sequence data for those with little bioinformatics expertise. The framework will run in the cloud or on a single CPU, enabling researchers, educators, and clinicians to speed the transition from sequencing technology adoption to biological knowledge and clinical application.

Public Health Relevance

The promise of the personalized medicine will only be realized when each individual's genetic code can be read and analyzed in the clinical setting. Unfortunately, the associated technologies will generate massive amounts of data that are difficult to analyze and interpret. The software describe in this proposal will enable widespread and easy analysis and interpretation of genetic data, accelerating the overall understanding of genetic information and its application to human health.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Research Project--Cooperative Agreements (U01)
Project #
Application #
Study Section
Special Emphasis Panel (ZHG1-HGR-M (O3))
Program Officer
Sofia, Heidi J
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Washington University
Schools of Medicine
Saint Louis
United States
Zip Code
Niu, Beifang; Ye, Kai; Zhang, Qunyuan et al. (2014) MSIsensor: microsatellite instability detection using paired tumor-normal sequence data. Bioinformatics 30:1015-6
Zighelboim, Israel; Mutch, David G; Knapp, Amy et al. (2014) High frequency strand slippage mutations in CTCF in MSI-positive endometrial cancers. Hum Mutat 35:63-5
Miller, Christopher A; White, Brian S; Dees, Nathan D et al. (2014) SciClone: inferring clonal architecture and tracking the spatial and temporal patterns of tumor evolution. PLoS Comput Biol 10:e1003665
Fan, Xian; Abbott, Travis E; Larson, David et al. (2014) BreakDancer - Identification of Genomic Structural Variation from Paired-End Read Mapping. Curr Protoc Bioinformatics 2014:
Chen, Ken; Chen, Lei; Fan, Xian et al. (2014) TIGRA: a targeted iterative graph routing assembler for breakpoint assembly. Genome Res 24:310-7
Hughes, Andrew E O; Magrini, Vincent; Demeter, Ryan et al. (2014) Clonal architecture of secondary acute myeloid leukemia defined by single-cell sequencing. PLoS Genet 10:e1004462
Ding, Li; Wendl, Michael C; McMichael, Joshua F et al. (2014) Expanding the computational toolbox for mining cancer genomes. Nat Rev Genet 15:556-70
Ding, Li; Raphael, Benjamin J; Chen, Feng et al. (2013) Advances for studying clonal evolution in cancer. Cancer Lett 340:212-9
Chen, Ken; Navin, Nicholas E; Wang, Yong et al. (2013) BreakTrans: uncovering the genomic architecture of gene fusions. Genome Biol 14:R87
Gonzalez-Perez, Abel; Mustonen, Ville; Reva, Boris et al. (2013) Computational approaches to identify functional genetic variants in cancer genomes. Nat Methods 10:723-9

Showing the most recent 10 out of 13 publications