The objective of this project is to develop software for the analysis of data from large- scale genotyping and sequencing genetic studies, building on the existing software package PLINK. PLINK, a software tool to manipulate and analyze whole-genome SNP datasets that has been actively developed over the past four years and has a wide base of users.
The specific aims are to significantly upgrade core capacities, the interface, auxiliary resources and user-support: Core capacities: significantly adapt and upgrade data-storage capacities to handle a) order-of-magnitude larger datasets than can fit into memory and b) a more generic, unified representation of different types of genetic variation data and meta-information. Interface: extend the existing interface to provide a) a looser coupling between data storage and analysis components, via multiple interfaces in external languages, including standard bioinformatics tools such as R and Perl, and b) features designed to facilitate reproducible research and parallel processing. Auxiliary resources: package standard existing resources, including the functional annotation of variants, reference genome sequences and gene assemblies, pathways and ontologies, in a manner that allows seamless integration between genomic resources and user data. Support: create high-quality collection resources to support users, via online documentation and tutorials, including user-generated wiki pages, e-mail support and an annual training course. Particular attention will be paid to ensure interoperability with other major software, file-formats and resources that are generated by the broader genetics community.
This Project is to develop software for the analysis of large datasets from modern genetic studies. New high-throughput genotyping and sequencing technologies are capable of producing vast amounts of data, but there is a need for analytic tools that biomedical researchers can use. These studies have the potential to uncover genetic determinants for a large number of diseases and traits, which can be relevant for prediction of risk, and give insight into novel targets for treatments.
|Fromer, Menachem; Purcell, Shaun M (2014) Using XHMM Software to Detect Copy Number Variation in Whole-Exome Sequencing Data. Curr Protoc Hum Genet 81:7.23.1-7.23.21|
|Sham, Pak C; Purcell, Shaun M (2014) Statistical power and significance testing in large-scale genetic studies. Nat Rev Genet 15:335-46|
|Purcell, Shaun M; Moran, Jennifer L; Fromer, Menachem et al. (2014) A polygenic burden of rare disruptive mutations in schizophrenia. Nature 506:185-90|
|Fromer, Menachem; Pocklington, Andrew J; Kavanagh, David H et al. (2014) De novo mutations in schizophrenia implicate synaptic networks. Nature 506:179-84|