Rapid advances in genome sequencing techniques and throughput are providing scientists with increasingly detailed views of individual genomes, furthering our understanding of genetic variation in a wide array of organisms, of human population history and of the biology of Mendelian disorders and complex traits. Experiments that were until recently restricted to very large genome centers, such as the resequencing of human genomes, can now be carried out by a wide range of investigators. While these technological advances will enable many new discoveries in human and model organism genetics, they also pose formidable computational challenges. RFA-HG-10-018, entitled "Informatics Tools for High-Throughput Sequence Data Analysis", is intended to fund further development of existing software to ensure that any biological or biomedical research laboratory can benefit from advances in sequencing technologies. We have developed specialized, state-of-the-art tools for the processing and analysis of next generation sequence data. Our tools encompass many key steps in sequence data analysis, ranging from quality control, to read mapping, to the identification, genotyping and annotation of many classes of sequence variation, to downstream association analyses that seek to connect identified variants with organismal phenotypes. These tools have been used to support analysis of several large, challenging datasets including not only data from the 1000 Genomes Project but also >1000 whole genomes and >2500 exomes sequenced in medical sequencing projects. Here, we propose to develop these tools into easy-to-use, portable, well-documented packages and complete pipelines that facilitate biomedical research in a wide variety of settings. A key component of the proposal is the deployment of these tools in the Galaxy cloud, where they will be accessible to investigators without direct access to a local high-throughput computing and data storage facility.

Public Health Relevance

We are developing computer software to discover and interpret genetic differences between individual human genomes from DNA sequencing data. We are starting with existing computer programs and turning them into stable software packages that can be readily used by any biological laboratory. These methods will enhance the study of human genetic variability and the understanding of heritable human diseases.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Research Project--Cooperative Agreements (U01)
Project #
Application #
Study Section
Special Emphasis Panel (ZHG1-HGR-M (O3))
Program Officer
Sofia, Heidi J
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Boston College
Schools of Arts and Sciences
Chestnut Hill
United States
Zip Code
Pistis, Giorgio; Porcu, Eleonora; Vrieze, Scott I et al. (2015) Rare variant genotype imputation with thousands of study-specific whole-genome sequences: implications for cost-effective study designs. Eur J Hum Genet 23:975-83
Qiao, Yi; Quinlan, Aaron R; Jazaeri, Amir A et al. (2014) SubcloneSeeker: a computational framework for reconstructing tumor clone structure for cancer variant interpretation and prioritization. Genome Biol 15:443
Brown, Kevin M; Suvorova, Elena; Farrell, Andrew et al. (2014) Forward genetic screening identifies a small molecule that blocks Toxoplasma gondii growth by inhibiting both host- and parasite-encoded kinases. PLoS Pathog 10:e1004180
Miller, Chase A; Qiao, Yi; DiSera, Tonya et al. (2014) bam.iobio: a web-based, real-time, sequence alignment file inspector. Nat Methods 11:1189
Wu, Jiantao; Lee, Wan-Ping; Ward, Alistair et al. (2014) Tangram: a comprehensive toolbox for mobile element insertion detection. BMC Genomics 15:795
Lee, Wan-Ping; Wu, Jiantao; Marth, Gabor T (2014) Toolbox for mobile-element insertion detection on cancer genomes. Cancer Inform 13:45-52
Feng, Shuang; Liu, Dajiang; Zhan, Xiaowei et al. (2014) RAREMETAL: fast and powerful meta-analysis for rare variants. Bioinformatics 30:2828-9
Lee, Seunggeung; Abecasis, Gonçalo R; Boehnke, Michael et al. (2014) Rare-variant association analysis: study designs and statistical tests. Am J Hum Genet 95:5-23
Lee, Wan-Ping; Stromberg, Michael P; Ward, Alistair et al. (2014) MOSAIK: a hash-based algorithm for accurate next-generation sequencing short-read mapping. PLoS One 9:e90581
Wang, Chaolong; Zhan, Xiaowei; Bragg-Gresham, Jennifer et al. (2014) Ancestry estimation and control of population stratification for sequence-based association studies. Nat Genet 46:409-15

Showing the most recent 10 out of 15 publications