DNA copy number data, which measures gains and losses of segments of genomes, is an important data type for understanding genetic variation and for clinical research. The analysis of DNA copy number data motivates new statistical problems, especially in the areas of change-point detection and high dimensional data analysis. This proposal identies these problems, formulates statistical models, and proposes methods for their solution. The topics covered include model selection for irregular high dimensional models, simultaneous change-point detection in a large number of aligned sequences, and segmentation of partially observed sequences. These developments in statistical methodology are a direct response to the current analysis needs at the Stanford Genome Technology Center and in the Cancer Genome Atlas Project, and open source software will be made available to these and broader communities.
Cancer and other genetic diseases are no stranger to genome scientists: high-throughput technologies and statistical analyses have always promised to provide a systems level?s view of disease inheritance and progression. In recent years, new concurrent advances in genomics and statistics, including more efficient high throughput data-collection methods, larger patient sample sets, the atmosphere of more open collaboration, and greater sophistication in study design and data analysis have positioned us to make major new advances in studying genetic disease. Despite this promise, there is still much waiting to be done. In particular, statistical methods for the analysis of genome-wide profiling data lacks the sophistication to deal with the many issues that arises in modern data collection schemes. These issues include high dimensionality, missing observations and simultaneous inference in a large number of patient samples. In this proposal, the investigator and her colleagues formulate these new problems and put forth models with practical solutions. These developments in statistical methodology are a direct response to the current analysis needs at the Stanford Genome Technology Center and in the Cancer Genome Atlas Project, and open source software will be made available to these and broader communities.