With new existing biotechnologies, a typical longitudinal study following up a sample of subjects (or organisms, in general) can involve collection of multi-thousand dimensional gene expression profiles (or biomarkers, in general) at one or more points in time, change of medical treatments over time, and missing/censored data on clinical outcomes such as survival. An important issue with such data -- in addition to the more classical issues """"""""censoring"""""""" and """"""""confounding of treatment"""""""" involving gene expression is that each gene represents important parameters itself, so that one typically needs to estimate thousands of parameters at once. In particular, this implies that one needs statistical inference in a setting where one estimates many more parameters than one has independent observations. In addition, visualization techniques and sophisticated supervised/unsupervised clustering techniques are required to discover the significant and important overall structures. This research will develop statistical semi parametric methods for the analysis of data arising in observational longitudinal studies collecting gene-expression data. An important focus of this work will be to apply the proposed methods to analyze longitudinal studies in collaboration with subject matter experts on 1) the causal effect of air pollution on the natural history of asthma in children, 2) the causal effect of leisure time activity on survival and health in the elderly population, 3) causal relationships between recurrence/survival and gene expression profiles in cancer patients and 4) gene expression in yeast data sets and its relationship to the non-coding regions. The methods will make it possible to learn how expression of different genes (and hence the encoded proteins) interact, providing insight into biochemical pathways and clues about underlying causal mechanisms at work on the genomic level. In particular, it is believed that gene expression is an important indicator of cancer progression and that an understanding of which genes are active in this process will ultimately lead to better strategies for diagnosis and treatment.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM067233-02
Application #
6604930
Study Section
Special Emphasis Panel (ZRG1-SNEM-5 (01))
Program Officer
Onken, James B
Project Start
2002-07-01
Project End
2006-06-30
Budget Start
2003-07-01
Budget End
2004-06-30
Support Year
2
Fiscal Year
2003
Total Cost
$222,894
Indirect Cost
Name
University of California Berkeley
Department
Type
Schools of Public Health
DUNS #
124726725
City
Berkeley
State
CA
Country
United States
Zip Code
94704