In many genetic studies, case-control samples (probands) are recruited and phenotypes in their relatives are collected through a family health history interview on the probands. In these designs with combined genome-wide association study (GWAS) data in probands and family history in relatives (GWAS+FH), family member?s dense genotypes are often not collected due to the high cost of in-person collection of blood sample or death of a relative. Discarding relatives? phenotypes lead to waste of much useful information because by examining patterns of the phenotypes among relatives with combination of genetic factors, environmental conditions, and lifestyle choices, a GWAS+FH leads to improved power of identifying an individual at risk of disease than using probands data alone. Multilevel models are powerful tools to test for association between genetic markers and correlated phenotypes because of their ability to account for varying degrees of relatedness among individuals. Improved power is expected from increased sample size by including relatives, higher chance to detect genuine genetic associations, and better type I error control compared to probands only analyses. However, analysis is highly challenging due to missing genotypes in relatives and correlation among family members? phenotypes. The use of mixed effects multilevel model tools is rare in genetic association studies until recently, mainly due to the bottleneck of sub-optimal computational tools that do not meet requirements to handle large-scale GWAS and large sample size. This proposal addresses these challenges by providing fast and comprehensive statistical tools to increase our ability to map genetic variants in the combined data of proband GWAS and family history in relatives. Through multilevel mixed effects models, we will achieve improved power of association testing while controlling for correlation and confounding by: (1) use dense genotypes in probands to estimate between-family genetic similarities and expected values of missing relative genotypes; and (2) combine with within-family relatedness represented by polygenic effects. We will apply our methods to analyze Washington Heights-Inwood Columbia Aging Project, which offers golden opportunities to discover genetic variants associated with the risk of Alzheimer's disease in multiple ethnicity groups (Caucasian, African American and Hispanics). The novel statistical methods will ultimately allow personalized risk estimation of disease to each individual?s unique biomarkers, and aid in important decision making including genetic testing and genetic counseling.
Efficient statistical methods to estimate genotype-specific risk prediction in disease and analyze association of genetic risk variants with disease will directly impact the lives of individuals and families by providing prognostic information for subjects in relation to disease risk or onset. The novel techniques will ultimately allow personalized risk estimation of disease tailored to each patient?s unique features with information from family history, and aid in important decision making process including genetic counseling and genetic testing.
Lee, Annie J; Wang, Yuanjia; Alcalay, Roy N et al. (2017) Penetrance estimate of LRRK2 p.G2019S mutation in individuals of non-Ashkenazi Jewish ancestry. Mov Disord 32:1432-1438 |
Lee, Annie J; Marder, Karen; Alcalay, Roy N et al. (2017) Estimation of genetic risk function with covariates in the presence of missing genotypes. Stat Med 36:3533-3546 |