Molecular technology for studying the genome of human cells leads to large structured sets of categorical data. These data are used by cancer researchers to understand the complex and variable sequence of genetic changes that occur within cells of evolving tumors. The primary goal of the proposed research is to develop a statistical methodology that will assist oncologists in the analysis and interpretation of such data. In particular, statistical methods are proposed for the localization of genes associated with the cancer phenotype. A very common experiment, used in the study of diverse cancers, involves a panel of molecular markers either scattered throughout the genome or from a single chromosomal region. By comparing signals from normal and tumor cells, the oncologist can score each tumor-marker combination for loss of heterozygosity. Putative tumor suppressor genes may exist in regions commonly inactivated, and thus identifying such regions is of critical importance. Inference from marker data must account for various complexities: within tumor variation, dependence of response between nearby markers, the problem of multiple comparison, the known structural features of chromosomes like locations of fragile sites, the dependence of data from related cells, consequences of genetic instability like aneuploidy and background loss, and covariate information like levels of oncoproteins. The absence of statistical analysis, or the use of naive methods, is an inefficient use of valuable data, and may even lead to erroneous conclusions. The evolutionary nature of tumor growth suggests a natural form for a stochastic model of the changing genome--one based on genetic instability and selection. Such a model creates a framework for parametrizing the distribution of loss-of- heterozygosity data. Questions about the location and action of putative suppressor genes can be formulated as questions about components of the stochastic model, and thus classical inference procedures can be applied. Numerous technical questions arise about how and what to compute. Bayesian and profile likelihood strategies are proposed to estimate gene location given the model. Markov chain Monte Carlo methods are necessary to implement the Bayesian strategy, and predictive distributions will be studied to asses goodness of fit. Alternatively, bootstrap methods enable frequency calibration of profile likelihood as well as methods for model testing. Asymptotic analysis will give insight into the form of the non- standard likelihood surface. Computer simulation of the model will be useful both to study bias and variance properties of the proposed methods and as the basis for power calculations to design marker studies.
Newton, M A; Kendziorski, C M; Richmond, C S et al. (2001) On differential variability of expression ratios: improving statistical inference about gene expression changes from microarray data. J Comput Biol 8:37-52 |
Newton, M A; Lee, Y (2000) Inferring the location and effect of tumor suppressor genes by instability-selection modeling of allelic-loss data. Biometrics 56:1088-97 |
Newton, M A; Gould, M N; Reznikoff, C A et al. (1998) On the statistical analysis of allelic-loss data. Stat Med 17:1425-45 |