Remarkable improvements in throughput, accuracy and cost-effectiveness of next-generation sequencing (next-gen) technologies are ushering in a new era of clinical medicine. Genome wide association studies (GWAS) in particular have begun to leverage these advances to determine the complete catalog of common and rare variants for each member of a cohort. The resolving power of this approach has the potential to greatly accelerate our understanding, diagnosis and treatment of human disease. Unfortunately, analysis of these massive data sets requires that several disparate pieces of software be cobbled together including a large capacity next-gen sequencing assembler, variation detection modules, mapping and comparison tools for tens to hundreds of variant reports, statistical analysis packages, reporting tools, and so on. Combining and using these tools typically requires extensive bioinformatic expertise as the software is rarely well documented or supported and often depends on having elaborate hardware. These hurdles makes next-gen based GWAS inaccessible to the vast majority of the crucial user base, the physician researchers. The goal of this proposal is to assemble the essential next-gen based GWAS software components into a single coherent pipeline that that is fully equipped to meet the needs of the medical research community. Consistent with DNASTAR's 28 year tradition, the software will be easy to use, run on a reasonably priced (<$3000) desktop computer, and will be fully documented and supported. The pipeline will consist of two modules already available through DNASTAR, SeqMan NGen 3.0 (SM NGen 3.0) and ArrayStar. SM NGen 3.0, our recently released human genome scale assembly and analysis package, forms the front end of pipeline. Reference-guided assemblies of whole human genome or exome next-gen data sets produce variation reports including impact on gene features and associations with the dbSNP database. Putative variations can be verified by direct inspection of the alignment through the SeqMan Pro component of the package. Variation reports from each member of a GWAS cohort will then be fed into our multi-sample comparison and analysis program, ArrayStar, at the back end of the pipeline. ArrayStar has the infrastructure for multi-sample management and processing which can be easily adapted to GWAS analysis. These adaptations and their documentation are a central focus of this application. Critical to the successful development of this software is our collaboration with Dr. Douglas McNeel (Dept. of Oncology, UW-Madison). The exomes from a panel of prostate cancer vaccine recipients, including responders and non-responders, from the McNeel lab will be sequenced as input from which to build the pipeline using iterative cycles of development followed by evaluation by the McNeel group. This relationship offers an ideal opportunity to build the analysis and reporting software needed by physician researchers to form, test and validate GWAS generated hypotheses.

Public Health Relevance

The easy to use tools to be developed and integrated in this project will dramatically enhance the efficiency of clinical and diagnostic research for a wide range of life scientists and medical professionals using next-generation DNA sequencing technologies, allowing new treatments to be brought to market sooner, enhancing scientists'understanding of treatment efficacy, and supporting the tailoring of different treatments to specific groups of individuals based on their genetic composition. These tools will be flexible enough to support critical analysis of large populations for clinical research and easy enough to use for all life scientists and medical professionals to feel comfortable with them.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Small Business Innovation Research Grants (SBIR) - Phase II (R44)
Project #
Application #
Study Section
Special Emphasis Panel (ZHG1-HGR-M (O3))
Program Officer
Sofia, Heidi J
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Dnastar, Inc.
United States
Zip Code