The objective of this project is to develop software for the analysis of data from large-scale genotyping and sequencing studies, building on the existing software package PLINK and its companion package PLINK/Seq. Designed to manipulate and analyze whole-genome SNP datasets, PLINK has been actively developed for over six years and has a wide base of users, with over 5000 citations in peer-reviewed journals. Over the past years, we have added considerable support for the analysis of large rare variant datasets, primarily focused on whole-exome sequencing studies in PLINK/Seq. In this renewal application, we seek to 1) provide tighter integration between PLINK and PLINK/Seq, aiming to provide a single interface for both genome-wide association and sequencing studies, particularly in the context of large statistically-imputed datasets;2) enhance the data-integration facilities already present, across different classes of genetic variation as well as large, diverse datasets;3) provide improved handling of family-based datasets, focused on de novo and inherited variation in (nuclear) family-based association studies;4) to work on improving performance on very large datasets. Particular attention will be paid to ensure interoperability with other major software, file-formats and resources that are generated by the broader genetics community.

Public Health Relevance

This Project is to develop software for the analysis of large datasets from modern genetic studies. New high-throughput genotyping and sequencing technologies are capable of producing vast amounts of data, but there is a need for analytic tools that biomedical researchers can use. These studies have the potential to uncover genetic determinants for a large number of diseases and traits, which can be relevant for prediction of risk, and give insight into novel targets for treatments.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
2R01HG005827-05A1
Application #
8762148
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Brooks, Lisa
Project Start
2010-09-27
Project End
2017-06-30
Budget Start
2014-08-01
Budget End
2015-06-30
Support Year
5
Fiscal Year
2014
Total Cost
$370,000
Indirect Cost
$136,385
Name
Icahn School of Medicine at Mount Sinai
Department
Psychiatry
Type
Schools of Medicine
DUNS #
078861598
City
New York
State
NY
Country
United States
Zip Code
10029
Fromer, Menachem; Purcell, Shaun M (2014) Using XHMM Software to Detect Copy Number Variation in Whole-Exome Sequencing Data. Curr Protoc Hum Genet 81:7.23.1-7.23.21
Sham, Pak C; Purcell, Shaun M (2014) Statistical power and significance testing in large-scale genetic studies. Nat Rev Genet 15:335-46
Purcell, Shaun M; Moran, Jennifer L; Fromer, Menachem et al. (2014) A polygenic burden of rare disruptive mutations in schizophrenia. Nature 506:185-90
Fromer, Menachem; Pocklington, Andrew J; Kavanagh, David H et al. (2014) De novo mutations in schizophrenia implicate synaptic networks. Nature 506:179-84