Adaptive Function Estimation for Genomic Data

Kooperberg, Charles

Abstract

The publication of the sequence of the human genome and breakthroughs in the high throughput technologies for single nucleotide polymorphism (SNP) genotyping, gene expression, and protein measurements have offered new opportunities for the study of genome complexity. New technologies are generating large amounts of high dimensional data at an astounding speed. Relative to the high dimension of the data the number of independent samples is often rather small, either because the techniques are too expensive, or because it is hard to obtain enough independent biological samples. Clearly, the development of new statistical techniques is required for the extraction of useful biological information from such data. Adaptive regression methods, which combine variable selection and nonlinear modeling, are well suited for many of these problems.
The aim of this proposal is to develop and enhance these methods to address the practical problems that arise directly from several collaborative projects. In particular we focus on association studies with SNP and microarray data. For SNP association studies we plan to make use of Logic Regression. This methodology combines mostly binary predictors using rules of Boolean algebra. The proposed developments include new techniques to deal with haplotype data, new approaches to model selection that scale up to high-dimensional problems, and computational techniques that make it feasible to deal with large data sets. For the analysis of microarray association studies we plan to use polynomial splines, an approach that combines nonlinear functions of predictors and low-order interactions. Gene expression measurements usually have a large variance, and measurements for different genes are often highly correlated. This, combined with the high dimensionality, makes regularization a necessity. Therefore, another focus of this proposal is to develop methods for combining predictors or models to regularize the model selection process. In addition, we plan to develop methods to improve inference for polynomial spline methodologies.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Cancer Institute (NCI)
Type: Research Project (R01)
Project #: 2R01CA074841-06A1
Application #: 6726557
Study Section: Social Sciences, Nursing, Epidemiology and Methods 4 (SNEM)
Program Officer: Feuer, Eric J

Project Start: 1998-04-01
Project End: 2007-08-31
Budget Start: 2003-09-30
Budget End: 2004-08-31
Support Year: 6
Fiscal Year: 2003
Total Cost: $285,371
Indirect Cost

Institution

Name: Fred Hutchinson Cancer Research Center
Department
Type
DUNS #: 078200995

City: Seattle
State: WA
Country: United States
Zip Code: 98109

Related projects


NIH 2006 R01 CA	Adaptive Function Estimation for Genomic Data Kooperberg, Charles L. / Fred Hutchinson Cancer Research Center	$263,440
NIH 2005 R01 CA	Adaptive Function Estimation for Genomic Data Kooperberg, Charles L. / Fred Hutchinson Cancer Research Center	$270,648
NIH 2004 R01 CA	Adaptive Function Estimation for Genomic Data Kooperberg, Charles L. / Fred Hutchinson Cancer Research Center	$270,030
NIH 2003 R01 CA	Adaptive Function Estimation for Genomic Data Kooperberg, Charles L. / Fred Hutchinson Cancer Research Center	$285,371

Publications

LeBlanc, Michael; Kooperberg, Charles (2010) Boosting predictions of treatment success. Proc Natl Acad Sci U S A 107:13559-60

Kooperberg, Charles; LeBlanc, Michael; Obenchain, Valerie (2010) Risk prediction using genome-wide association studies. Genet Epidemiol 34:643-52

Dai, James Y; Leblanc, Michael; Smith, Nicholas L et al. (2009) SHARE: an adaptive algorithm to select the most informative set of SNPs for candidate genetic association. Biostatistics 10:680-93

Rajapakse, Indika; Perlman, Michael D; Scalzo, David et al. (2009) The emergence of lineage-specific chromosomal topologies from coordinate gene regulation. Proc Natl Acad Sci U S A 106:6679-84

Kooperberg, Charles; Leblanc, Michael; Dai, James Y et al. (2009) Structures and Assumptions: Strategies to Harness Gene × Gene and Gene × Environment Interactions in GWAS. Stat Sci 24:472-488

Dai, James Y; LeBlanc, Michael; Kooperberg, Charles (2009) Semiparametric estimation exploiting covariate independence in two-phase randomized trials. Biometrics 65:178-87

LeBlanc, Michael; Kooperberg, Charles (2009) Adaptively weighted association statistics. Genet Epidemiol 33:442-52

Strand, Andrew D; Aragaki, Aaron K; Baquet, Zachary C et al. (2007) Conservation of regional gene expression in mouse and human brain. PLoS Genet 3:e59

Strand, Andrew D; Baquet, Zachary C; Aragaki, Aaron K et al. (2007) Expression profiling of Huntington's disease models suggests that brain-derived neurotrophic factor depletion plays a major role in striatal degeneration. J Neurosci 27:11758-68

LeBlanc, Michael; Moon, James; Kooperberg, Charles (2006) Extreme regression. Biostatistics 7:71-84

Comments

Be the first to comment on Charles Kooperberg's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: