Scalable methods for the characterization and analysis of families in large genomic datasets

Williams, Amy

Abstract

Numerous studies of common genetic diseases in humans are now analyzing very large genomic datasets with information from up to 500,000 individuals. These large studies pose challenges to traditional analysis approaches?especially in terms of computational runtime scaling?but also afford opportunities for re?ned in- ference, and necessitate the development of new computational methods. The program of research we will undertake focuses on the emerging opportunities of widespread relatedness in large studies. We are currently developing a method to ef?ciently infer identical by descent (IBD) sharing using an algorithm that does not require phased data. We are also ?nalizing a method that distinguishes among second degree relative types? half-sibling, avuncular, and grandparent-grandchild pairs. Building on these models, we will develop novel, ef?cient methods to: (1) identify pedigrees that de?ne close relationships within large datasets; (2) fundamen- tally advance genome-wide association studies (GWAS) by inferring the genomes of parents of sets of siblings and other relatives; (3) leverage recombination patterns in men and women to infer the parent-of-origin of hap- lotypes in a set of close relatives; and (4) infer haplotypes by jointly modeling both family- and population-level structure. Notably, no method we are aware of enables the reconstruction of parent haplotypes without parent data, and this will enable improved GWAS power by utilizing individuals for whom more complete health history information is known. Furthermore, few studies of parent-of-origin associations have been done in humans be- cause of the dif?culty of obtaining parent data, but we will perform these analyses in large studies even without parent data. All software will be made freely available to the public and distributed under open source software licenses.

Public Health Relevance

We propose the development of new computational methods to identify families within large genomic datasets and to utilize these families for high precision analysis. Relatives are common in large studies and the opportu- nity to ?nd, model, and further advance family-based methodologies holds promise for future genomic studies of human disease. Eventually, family modeling will be essential as datasets will include many millions of individuals with all study subjects having relatives in the sample.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of General Medical Sciences (NIGMS)
Type: Unknown (R35)
Project #: 5R35GM133805-02
Application #: 10003350
Study Section: Special Emphasis Panel (ZGM1)
Program Officer: Ravichandran, Veerasamy

Project Start: 2019-09-01
Project End: 2024-07-31
Budget Start: 2020-08-01
Budget End: 2021-07-31
Support Year: 2
Fiscal Year: 2020
Total Cost
Indirect Cost

Institution

Name: Cornell University
Department: Biostatistics & Other Math Sci
Type: Earth Sciences/Resources
DUNS #: 872612445

City: Ithaca
State: NY
Country: United States
Zip Code: 14850

Related projects


NIH 2020 R35 GM	Scalable methods for the characterization and analysis of families in large genomic datasets Williams, Amy Lynne / Cornell University
NIH 2019 R35 GM	Scalable methods for the characterization and analysis of families in large genomic datasets Williams, Amy Lynne / Cornell University

Comments

Be the first to comment on Amy Williams's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: