The vast throughput of next-generation sequencing technologies will enable costeffective organismal polymorphism discovery, complete mutational profiling, and individual human resequencing. An ambitious undertaking, the 1000 Genomes Project aims to discover all common human genetic variations by sequencing a large number of individuals. These projects will generate a vast amount of data posing formidable challenges for data storage and analysis. The shorter read length of next-generation technologies and the need to support new sequencing applications demand new, efficient informatics tools. Building on our existing prototype software we will develop a complete suite of tools to support next-generation resequencing applications. Specifically, we will develop base calling programs that improve upon the native software supplied by the machine manufacturers. We will delineate those regions of genomes that can be unambiguously resequenced with the shorter next-generation reads, and propose novel protocols for efficient representation of such annotations. We will develop a flexible, high-performance read alignment program that can map billons of reads to large, complex genome sequences. We will expand our existing SNP and short-INDEL polymorphism discovery program, and build new software for structural variation discovery. Finally, we will develop a graphical assembly viewer program to aid data validation and hypothesis generation by integrating gene annotations with primary data views. Our tools will be used both in whole-genome and in targeted individual human resequencing applications: in normal samples to discover segregating markers for medical association studies;in cases and controls to identify the causative alleles in regions implicated by such studies;and in cancer samples to find point mutations and structural rearrangements. The projects enabled by our tools will help understand the genetic causes of human diseases, leading to improved diagnostic procedures and more successful treatment. We are developing computer software for DNA sequencing projects to uncover the genetic causes of human diseases. The discoveries made from these projects will help to better understand, diagnose, and treat the disease.
Showing the most recent 10 out of 25 publications