Efficient Methods for Genotype-Specific Distributions with Unobserved Genotypes.

Wang, Yuanjia

Abstract

This proposal develops a series of new semiparametric efficient methods for genetic data where subjects'genotypes are not observed therefore phenotype data arise from a mixture of genotype-specific subpopulations. One example is data collected in a kin-cohort study, where the scientific interest is in estimating the distribution function of a trait or time to developing a disease for deleterious mutation carriers (penetrance function). In a kin- cohort study, index subjects (probands) possibly enriched with mutation carriers are sampled and genotyped. Disease history in relatives of the probands is collected, but the relatives are not genotyped therefore it may be unknown whether they carry a mutation. However, one can calculate the probability of each relative being a mutation carrier using the proband's genotype and Mendelian laws. Another example is interval mapping of quantitative traits (QTL). In such studies, genotype at a QTL is unobserved therefore the trait distribution takes the form of a mixture of QTL-genotype specific distributions. The probability of the QTL having a specific geno- type is computed based on marker genotypes and recombination fractions between the marker and the QTL. Interest is on estimating the QTL genotype-specific distributions. A common feature of these examples is that the scientific interest is in inference of genotype-specific subpopulations but it is unknown which subpopulation each observation belongs to. The probability of each observation being in any subpopulation varies and can be estimated. Without making a prespecified, error prone parametric assumption on these genotype-specific distributions, the only available statistical methods in the literature are two distinct nonparametric maximum like- lihood estimators (NPMLE1, NPMLE2). However, we will show that NPMLE1 is not efficient, and NPMLE2 is not consistent. There is therefore great need to develop valid and efficient statistical tools for such data. We use modern semiparametric theory to carry out a formal semiparametric analysis where we define a rich class of estimators. We show that any least squares based estimator is a member of this estimation class. We construct an optimal member of this family which obtains the minimum estimation variance hence reaches the semipara- metric efficiency bound. For censored outcomes, we propose a semiparametric efficient estimator given an influence function of the complete uncensored data. We propose an inverse probability weighting estimator, and add an augmentation term to obtain optimal efficiency. We also construct an imputation estimator which is easy to implement and does not require additional model assumption for the imputation step. Furthermore we propose methods to handle other observed covariates such as gender and additional residual correlation among family members. We also develop a series of tests for equality of two distributions at single or multi- ple time points simultaneously and an overall test of two distributions being equal at all time points. We will apply apply developed methods to analyze a kin-cohort study on Parkinson's disease, a large family study on Huntington's disease and two QTL studies.

Public Health Relevance

This proposal develops a series of new semiparametric efficient methods for genetic data where subjects'genotypes are not observed therefore trait data arise from a mixture of genotype-specific subpopulations. The methodologies can be applied to estimate risk of developing a disease for deleterious mutation carriers.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of Neurological Disorders and Stroke (NINDS)
Type: Research Project (R01)
Project #: 5R01NS073671-04
Application #: 8663321
Study Section: Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer: Gwinn, Katrina

Project Start: 2011-07-15
Project End: 2015-06-30
Budget Start: 2014-07-01
Budget End: 2015-06-30
Support Year: 4
Fiscal Year: 2014
Total Cost
Indirect Cost

Institution

Name: Columbia University (N.Y.)
Department: Biostatistics & Other Math Sci
Type: Schools of Public Health
DUNS #

City: New York
State: NY
Country: United States
Zip Code: 10032

Related projects


NIH 2020 R01 NS	Statistical methods for early disease prediction and treatment strategy estimation using biomarker signatures Wang, Yuanjia / Columbia University (N.Y.)
NIH 2019 R01 NS	Statistical methods for early disease prediction and treatment strategy estimation using biomarker signatures Wang, Yuanjia / Columbia University (N.Y.)
NIH 2018 R01 NS	Statistical methods for early disease prediction and treatment strategy estimation using biomarker signatures Wang, Yuanjia / Columbia University (N.Y.)
NIH 2017 R01 NS	Statistical methods for early disease prediction and treatment strategy estimation using biomarker signatures Wang, Yuanjia / Columbia University (N.Y.)	$366,940
NIH 2014 R01 NS	Efficient Methods for Genotype-Specific Distributions with Unobserved Genotypes. Wang, Yuanjia / Columbia University (N.Y.)
NIH 2013 R01 NS	Efficient Methods for Genotype-Specific Distributions with Unobserved Genotypes. Wang, Yuanjia / Columbia University (N.Y.)	$257,478
NIH 2012 R01 NS	Efficient Methods for Genotype-Specific Distributions with Unobserved Genotypes. Wang, Yuanjia / Columbia University (N.Y.)	$267,091
NIH 2011 R01 NS	Efficient Methods for Genotype-Specific Distributions with Unobserved Genotypes. Wang, Yuanjia / Columbia University (N.Y.)	$280,540

Publications

Wang, Yuanjia; Fu, Haoda; Zeng, Donglin (2018) Learning Optimal Personalized Treatment Rules in Consideration of Benefit and Risk: with an Application to Treating Type 2 Diabetes Patients with Insulin Therapies. J Am Stat Assoc 113:1-13

Liang, Liang; Carroll, Raymond; Ma, Yanyuan (2018) Dimension reduction and estimation in the secondary analysis of case-control studies. Electron J Stat 12:1782-1821

Li, Xiang; Xie, Shanghong; Zeng, Donglin et al. (2018) Efficient ?0 -norm feature selection based on augmented and penalized minimization. Stat Med 37:473-486

Qiu, Xin; Zeng, Donglin; Wang, Yuanjia (2018) Estimation and evaluation of linear individualized treatment rules to guarantee performance. Biometrics 74:517-528

Liu, Jianxuan; Ma, Yanyuan; Wang, Lan (2018) An alternative robust estimator of average treatment effect in causal inference. Biometrics 74:910-923

Liu, Ying; Wang, Yuanjia; Kosorok, Michael R et al. (2018) Augmented outcome-weighted learning for estimating optimal dynamic treatment regimens. Stat Med 37:3776-3788

Lee, Annie J; Wang, Yuanjia; Alcalay, Roy N et al. (2017) Penetrance estimate of LRRK2 p.G2019S mutation in individuals of non-Ashkenazi Jewish ancestry. Mov Disord 32:1432-1438

Liang, Baosheng; Tong, Xingwei; Zeng, Donglin et al. (2017) SEMIPARAMETRIC REGRESSION ANALYSIS OF REPEATED CURRENT STATUS DATA. Stat Sin 27:1079-1100

Liu, Ying; Wang, Yuanjia; Huang, Chaorui et al. (2017) Estimating personalized diagnostic rules depending on individualized characteristics. Stat Med 36:1099-1117

Wang, Qianqian; Ma, Yanyuan; Wang, Yuanjia (2017) Predicting disease Risk by Transformation Models in the Presence of Unspecified Subgroup Membership. Stat Sin 27:1857-1878

Showing the most recent 10 out of 60 publications

Comments

Be the first to comment on Yuanjia Wang's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: