There are numerous situations in which observed data is generated by some unknown mechanism, where interest lies in estimating a function that is related to a model for the data. It is proposed to model the corresponding unknown functions in a linear space of smooth piecewise polynomials. An algorithm employing stepwise addition and deletion of basis functions is used to determine this space adaptively. In the proportional hazards model, the dependence of the survival times on the covariates is modeled fully parametrically. Hazard regression (HARE) employs an adaptive algorithm based on piecewise polynomials to model the conditional log-hazard function. It does not assume a proportional hazard model. It is proposed to develop and investigate a number of extensions to HARE involving missing data, time dependent covariates, categorical predictors with many levels, dependent data and family studies. For problems of moderate size the POLYCLASS method of the proposer and collaborators for polychotomous regression and classification is claimed to be competitive with other classification methods while providing reliable estimates of conditional class probabilities. An algorithm based on the stochastic gradient method makes the POLYCLASS method applicable to large data sets. It is proposed to develop a corresponding model selection algorithm. Triogram is the name given by the proposer for a function estimation method which using piecewise linear, bivariate splines based on an adaptively constructed triangulation. It is proposed to develop methods based on the triogram that yield smoother estimates than do the current methods and that select the basis functions more effectively. It is proposed to investigate statistical modeling with free knot splines, where knot locations are treated as parameters. It is claimed that this makes it possible to obtain standard errors that take into account the uncertainty in the knot positions and should provide new insight about inference for adaptive polynomial spline methodologies. Publicly available software for the proposed methodologies will be developed.
Kooperberg, Charles; LeBlanc, Michael; Obenchain, Valerie (2010) Risk prediction using genome-wide association studies. Genet Epidemiol 34:643-52 |
Scharpf, Robert B; Parmigiani, Giovanni; Pevsner, Jonathan et al. (2008) Hidden Markov models for the assessment of chromosomal alterations using high-throughput SNP arrays. Ann Appl Stat 2:687-713 |
Etzioni, Ruth; Kooperberg, Charles; Pepe, Margaret et al. (2003) Combining biomarkers to detect disease with application to prostate cancer. Biostatistics 4:523-38 |
Pan, W; Kooperberg, C (1999) Linear regression for bivariate censored data via multiple imputation. Stat Med 18:3111-21 |