The objective of this project is to develop software for the analysis of data from large-scale genotyping and sequencing studies, building on the existing software package PLINK and its companion package PLINK/Seq. Designed to manipulate and analyze whole-genome SNP datasets, PLINK has been actively developed for over six years and has a wide base of users, with over 5000 citations in peer-reviewed journals. Over the past years, we have added considerable support for the analysis of large rare variant datasets, primarily focused on whole-exome sequencing studies in PLINK/Seq. In this renewal application, we seek to 1) provide tighter integration between PLINK and PLINK/Seq, aiming to provide a single interface for both genome-wide association and sequencing studies, particularly in the context of large statistically-imputed datasets;2) enhance the data-integration facilities already present, across different classes of genetic variation as well as large, diverse datasets;3) provide improved handling of family-based datasets, focused on de novo and inherited variation in (nuclear) family-based association studies;4) to work on improving performance on very large datasets. Particular attention will be paid to ensure interoperability with other major software, file-formats and resources that are generated by the broader genetics community.

Public Health Relevance

This Project is to develop software for the analysis of large datasets from modern genetic studies. New high-throughput genotyping and sequencing technologies are capable of producing vast amounts of data, but there is a need for analytic tools that biomedical researchers can use. These studies have the potential to uncover genetic determinants for a large number of diseases and traits, which can be relevant for prediction of risk, and give insight into novel targets for treatments.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Research Project (R01)
Project #
Application #
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Brooks, Lisa
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Icahn School of Medicine at Mount Sinai
Schools of Medicine
New York
United States
Zip Code
Grinde, Kelsey E; Qi, Qibin; Thornton, Timothy A et al. (2018) Generalizing polygenic risk scores from Europeans to Hispanics/Latinos. Genet Epidemiol :
Ruderfer, Douglas M; Charney, Alexander W; Readhead, Ben et al. (2016) Polygenic overlap between schizophrenia risk and antipsychotic response: a genomic medicine approach. Lancet Psychiatry 3:350-7
Ruderfer, Douglas M; Hamamsy, Tymor; Lek, Monkol et al. (2016) Patterns of genic intolerance of rare copy number variation in 59,898 human exomes. Nat Genet 48:1107-11
Rees, E; Kirov, G; Walters, J T et al. (2015) Analysis of exome sequence in 604 trios for recessive genotypes in schizophrenia. Transl Psychiatry 5:e607
Sham, Pak C; Purcell, Shaun M (2014) Statistical power and significance testing in large-scale genetic studies. Nat Rev Genet 15:335-46
Purcell, Shaun M; Moran, Jennifer L; Fromer, Menachem et al. (2014) A polygenic burden of rare disruptive mutations in schizophrenia. Nature 506:185-90
Fromer, Menachem; Pocklington, Andrew J; Kavanagh, David H et al. (2014) De novo mutations in schizophrenia implicate synaptic networks. Nature 506:179-84
Fromer, Menachem; Purcell, Shaun M (2014) Using XHMM Software to Detect Copy Number Variation in Whole-Exome Sequencing Data. Curr Protoc Hum Genet 81:7.23.1-21
Fromer, Menachem; Moran, Jennifer L; Chambert, Kimberly et al. (2012) Discovery and statistical genotyping of copy-number variation from whole-exome sequencing depth. Am J Hum Genet 91:597-607
Ruderfer, D M; Kirov, G; Chambert, K et al. (2011) A family-based study of common polygenic variation and risk of schizophrenia. Mol Psychiatry 16:887-8