Methods for Genetic EpidemiologyWe developed semiparametric maximum likelihood estimates (SPMLE) for case-control studies of gene-environment interactions under the assumption of independence of gene and environmental factors. Traditional logistic regression analysis is not efficient in this setting. We use a profile-likelihood technique to obtain SPMLE and study its asymptotic theory. The results are extended to deal with situations where genetic and environmental factors are independent conditional on some other factors. The method is applied to ovarian case-control data to investigate the interplay of BRCA1/2 mutations and oral contraceptive use. We studied the false positive report probability (FPRP), the probability of no true association given a significant finding. FPRP depends not only on the observed p-value but also on the prior probability that the association is real and on the power of the test. We proposed a four-step approach that uses a decision rule based on FPRP to evaluate the chance that a finding deemed noteworthy is, in fact, truly associated with disease. We investigated current proposals for selecting single nucleotide polymorphisms (SNPs) to define haplotypes for disease association studies in independent samples of cases and controls. Current proposals based on diversity measures often lead to sub-optimal selections of subsets of SNPs. We developed score tests for associations of haplotypes with disease in cohort studies and in nested case-control studies. We used data on lymphoma in families of Hodgkin lymphoma (HL) cases from the Swedish Family Cancer Database to develop and illustrate survival methods for detecting familial aggregation in first degree relatives of case probands compared to first degree relatives of control probands. Because more than one case may occur in a given family, the first degree relatives of case probands are not necessarily independent, and we present procedures that allow for such dependence. A bootstrap procedure also accommodates matching of case and control probands. Regarding families as independent sampling units leads to inference based on """"""""sandwich variance estimators"""""""" and accounts for dependencies from having more than one proband in a family but not for matching. We compare these methods in analysis of familial aggregation of HL and also present simulations to compare survival analyses with analyses of binary outcome data. We identified a bias in case-control estimation of intervention effects among mutation carriers in populations derived from high-risk clinics. This bias can arise if a large portion of case patients were diagnosed before being seen at the clinic and all controls were persons previously seen at the clinic. In this circumstance, the intervention can appear to prevent disease if, as is likely, mutation carriers seen at the clinic were more likely to receive the intervention than mutation carriers in the general population. Design and Analysis of Case-Control and Cohort Studies Polytomous logistic regression is commonly used to analyze epidemiological data with disease subtype information. In this approach effects of exposures on different disease subtypes are studied through separate exposure odds ratios comparing different case groups to the common control group. We considered the situation where disease subtypes can be defined using multiple characteristics of a disease. For efficient analysis of such data, a two-stage modeling approach is proposed. At the first stage, a standard polytomous logistic regression model is considered for all possible distinct disease subtypes that can be defined by the cross-classification of the different disease characteristics. At the second stage, the exposure odds ratio parameters for the first-stage disease subtypes are further modeled in terms of the defining characteristics of the Subtypes. When the total number of first-stage disease subtypes is small, standard maximum likelihood methods can be used for inference in the proposed model. For dealing with a large number of disease Subtypes. a novel semiparametric pseudo-conditional-likelihood approach is proposed that does not require any model assumption about the baseline probabilities for the different disease subtypes. We developed the asymptotic theory for the estimator and studied its small-sample properties using simulation experiments. We applied the method to study the effect of fiber on the risk of various forms of colorectal adenoma using data available from the Prostate, Lung, Colorectal and Ovarian Cancer (PLCO) Screening Trial. We have developed methods to estimate survival (absolute risk) and PAR from sampled cohort studies. Prior to this research there were no practical semiparametric estimators, and no non-parametric estimators. This research also establishes the considerations for the efficient design of risk estimation in all sampled cohort studies, and provides efficient estimators. We are documenting computer programs that allow implementation of these results in the software, R. Though the focus was on survival estimation, the paper contains the same results for relative risk estimation in the Cox proportional hazards model. Using cross-sectional data, we developed a model to estimate risk of infection with human herpesvirus 8 from transfusion. The data consist of: age at the time of study, status (presence or absence) of infection, and a chronology of events possibly associated with the disease. We developed a flexible parametric approach for combining current-status data with a history of blood transfusions in Ugandan children with sickle cell anemia. We modeled heterogeneity in transfusion-associated risk by a child-specific random effect. We present an extension of the model to accommodate the fact that there is no gold standard for HHV-8 infection and infection status was assessed by a serological assay. The parameters are estimated via maximum likelihood. Finally, we present results from applying various parameterizations of the model to the Ugandan study. We developed a supplemented case-control design to improve the precision of estimate main effects and interactions. The supplemented case-control design consists of a case-control sample and of an additional sample of disease free subjects who arise from a given stratum of one of the measured exposures in the case-control study. The supplemental data might, for example, arise from a population survey conducted independently of the case-control study. This design improves precision of estimates of main effects and especially of effects of joint exposures, particularly when joint exposures are uncommon and the prevalence of one of the exposures is low. We first presented a pseudo-likelihood estimator that is easy to compute. We further adapted two-phase design methods to find maximum likelihood estimates for the log odds ratios for this design and derive asymptotic variance estimators that appropriately account for the differences in sampling schemes of this design from that of the traditional two-phase design. As an illustration of our design we presented a study that was conducted to assess the influence to joint exposure of hepatitis-B virus and hepatitis-C virus infection on the risk of hepatocellular carcinoma in data from Qidong County, Jiangsu Province, China. We developed methods to analyze multivariate recurrent events subject to right censoring. These methods were applied to the analysis of bone metastases to various bony sites, all subject to censoring by death. Using these methods we found that there is a consistent trend towards a reduction in the cumulative mean for four types of skeletal complications with bisphosphonate therapy and a significant reduction in the need for radiation therapy for the treatment of bone.
Showing the most recent 10 out of 21 publications