The objective of this project is to develop software for the analysis of data from large- scale genotyping and sequencing genetic studies, building on the existing software package PLINK. PLINK, a software tool to manipulate and analyze whole-genome SNP datasets that has been actively developed over the past four years and has a wide base of users.
The specific aims are to significantly upgrade core capacities, the interface, auxiliary resources and user-support: Core capacities: significantly adapt and upgrade data-storage capacities to handle a) order-of-magnitude larger datasets than can fit into memory and b) a more generic, unified representation of different types of genetic variation data and meta-information. Interface: extend the existing interface to provide a) a looser coupling between data storage and analysis components, via multiple interfaces in external languages, including standard bioinformatics tools such as R and Perl, and b) features designed to facilitate reproducible research and parallel processing. Auxiliary resources: package standard existing resources, including the functional annotation of variants, reference genome sequences and gene assemblies, pathways and ontologies, in a manner that allows seamless integration between genomic resources and user data. Support: create high-quality collection resources to support users, via online documentation and tutorials, including user-generated wiki pages, e-mail support and an annual training course. Particular attention will be paid to ensure interoperability with other major software, file-formats and resources that are generated by the broader genetics community.

Public Health Relevance

This Project is to develop software for the analysis of large datasets from modern genetic studies. New high-throughput genotyping and sequencing technologies are capable of producing vast amounts of data, but there is a need for analytic tools that biomedical researchers can use. These studies have the potential to uncover genetic determinants for a large number of diseases and traits, which can be relevant for prediction of risk, and give insight into novel targets for treatments.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
5R01HG005827-04
Application #
8305019
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Brooks, Lisa
Project Start
2010-09-27
Project End
2013-06-30
Budget Start
2012-07-01
Budget End
2013-06-30
Support Year
4
Fiscal Year
2012
Total Cost
$335,610
Indirect Cost
$137,610
Name
Icahn School of Medicine at Mount Sinai
Department
Psychiatry
Type
Schools of Medicine
DUNS #
078861598
City
New York
State
NY
Country
United States
Zip Code
10029
Fromer, Menachem; Purcell, Shaun M (2014) Using XHMM Software to Detect Copy Number Variation in Whole-Exome Sequencing Data. Curr Protoc Hum Genet 81:7.23.1-7.23.21
Sham, Pak C; Purcell, Shaun M (2014) Statistical power and significance testing in large-scale genetic studies. Nat Rev Genet 15:335-46
Purcell, Shaun M; Moran, Jennifer L; Fromer, Menachem et al. (2014) A polygenic burden of rare disruptive mutations in schizophrenia. Nature 506:185-90
Fromer, Menachem; Pocklington, Andrew J; Kavanagh, David H et al. (2014) De novo mutations in schizophrenia implicate synaptic networks. Nature 506:179-84