The proposed research targets two important statistical problems in protein and gene expression microarray experiments: (I) quantification of the protein lysate arrays, an emerging technology for directly measuring protein contents of different lysed tissue samples simultaneously; (II) modeling probe level gene expression data, in particular, the exon tiling arrays to detect alternative splicing, which is an essential process resulting in much of the human diversity. The investigators provide a statistical framework that allows for unknown regressor values in a nonparametric regression model, with applications to the quantification of protein lysate array data. The investigators also develop a quantile regression approach for mixed-effect models that are appropriate for detecting treatment and/or interaction effects without parametric distributional assumptions on the model. The investigators propose to make use of information across genes to enhance performance of the inferential methods in small sample problems. The new principles developed in the proposal are statistically interesting beyond their direct applications to gene and protein expression data.
Findings from the Human Genome Project highlight the intricacy of interactions between cell regulation, proteins and genes. It is generally understood that biological functions and biological activities are controlled by subsets of genes interacting with proteins in a highly controlled manner. High throughput technologies such as microarrays are valuable for studying a large number of biological components simultaneously. In particular, the protein lysate and exon tiling arrays have begun to show their important roles in cancer study and other biomedical research. However, sound conclusions from these technologies depend on appropriate statistical analysis of the proteomic and genomic data. The statistical methods developed in the proposal are timely and important for proper quantification of the protein lysate arrays and for detecting alternative splicing through the exon tiling arrays. The nonparametric approach proposed is especially appealing due to its flexibility and adaptivity in modeling probe level gene expression data as well as protein lysate array data.