The vast throughput of next-generation sequencing technologies will enable costeffective organismal polymorphism discovery, complete mutational profiling, and individual human resequencing. An ambitious undertaking, the 1000 Genomes Project aims to discover all common human genetic variations by sequencing a large number of individuals. These projects will generate a vast amount of data posing formidable challenges for data storage and analysis. The shorter read length of next-generation technologies and the need to support new sequencing applications demand new, efficient informatics tools. Building on our existing prototype software we will develop a complete suite of tools to support next-generation resequencing applications. Specifically, we will develop base calling programs that improve upon the native software supplied by the machine manufacturers. We will delineate those regions of genomes that can be unambiguously resequenced with the shorter next-generation reads, and propose novel protocols for efficient representation of such annotations. We will develop a flexible, high-performance read alignment program that can map billons of reads to large, complex genome sequences. We will expand our existing SNP and short-INDEL polymorphism discovery program, and build new software for structural variation discovery. Finally, we will develop a graphical assembly viewer program to aid data validation and hypothesis generation by integrating gene annotations with primary data views. Our tools will be used both in whole-genome and in targeted individual human resequencing applications: in normal samples to discover segregating markers for medical association studies;in cases and controls to identify the causative alleles in regions implicated by such studies;and in cancer samples to find point mutations and structural rearrangements. The projects enabled by our tools will help understand the genetic causes of human diseases, leading to improved diagnostic procedures and more successful treatment. We are developing computer software for DNA sequencing projects to uncover the genetic causes of human diseases. The discoveries made from these projects will help to better understand, diagnose, and treat the disease.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Research Project (R01)
Project #
Application #
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Brooks, Lisa
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Boston College
Schools of Arts and Sciences
Chestnut Hill
United States
Zip Code
Lee, Wan-Ping; Wu, Jiantao; Marth, Gabor T (2015) Toolbox for mobile-element insertion detection on cancer genomes. Cancer Inform 14:37-44
Challis, Danny; Antunes, Lilian; Garrison, Erik et al. (2015) The distribution and mutagenesis of short coding INDELs from 1,128 whole exomes. BMC Genomics 16:143
Qiao, Yi; Quinlan, Aaron R; Jazaeri, Amir A et al. (2014) SubcloneSeeker: a computational framework for reconstructing tumor clone structure for cancer variant interpretation and prioritization. Genome Biol 15:443
Wu, Jiantao; Lee, Wan-Ping; Ward, Alistair et al. (2014) Tangram: a comprehensive toolbox for mobile element insertion detection. BMC Genomics 15:795
Farrell, Andrew; Coleman, Bradley I; Benenati, Brian et al. (2014) Whole genome profiling of spontaneous and chemically induced mutations in Toxoplasma gondii. BMC Genomics 15:354
Lee, Wan-Ping; Wu, Jiantao; Marth, Gabor T (2014) Toolbox for mobile-element insertion detection on cancer genomes. Cancer Inform 13:45-52
Lee, Wan-Ping; Stromberg, Michael P; Ward, Alistair et al. (2014) MOSAIK: a hash-based algorithm for accurate next-generation sequencing short-read mapping. PLoS One 9:e90581
Zhao, Mengyao; Lee, Wan-Ping; Garrison, Erik P et al. (2013) SSW library: an SIMD Smith-Waterman C/C++ library for use in genomic applications. PLoS One 8:e82138
Busby, Michele A; Stewart, Chip; Miller, Chase A et al. (2013) Scotty: a web tool for designing RNA-Seq experiments to measure differential gene expression. Bioinformatics 29:656-7
Miller, Chase A; Anthony, Jon; Meyer, Michelle M et al. (2013) Scribl: an HTML5 Canvas-based graphics library for visualizing genomic data over the web. Bioinformatics 29:381-3

Showing the most recent 10 out of 25 publications