This project addresses a key modeling issue faced by many data analysts working with genomic data. For a set of individuals or observations, many different types of high throughput experimental datasets are generated, and the question then becomes how to model these data. In many problems, the goal is to prioritize which parts of the genome one wishes to study. While it is commonly assumed that the different data types are linearly correlated in either an unconditional or conditional sense, in many settings the nature of the correlation is unknown. This research focuses on multivariate methods of analysis with high-dimensional genomic data that relax the linearity assumption. Two classes of problems will be studied during the course of the project. The first is Hidden Markov Models and the second is multiple testing procedures, whose use have become commonplace with genomic datasets. This project proposes novel multivariate extensions of both types of method with a goal of being characterized by sound theoretical statistical principles while simultaneously being computationally feasible on big datasets. The methodology will be evaluated using several real datasets as well as through simulation studies.

This work will involve an interplay between statisticians and biologists. The broader use of this work will be to prioritize molecules for follow-up studies in any biological setting. It will be useful for biologists and scientists studying disease processes who wish to find new therapeutic targets or further advance basic etiological understanding. The educational goals of the project include new course components for graduate students at Penn State and training of graduate students in Statistics.

Agency
National Science Foundation (NSF)
Institute
Division of Biological Infrastructure (DBI)
Application #
1262538
Program Officer
Anne Maglia
Project Start
Project End
Budget Start
2013-06-01
Budget End
2014-10-31
Support Year
Fiscal Year
2012
Total Cost
$358,834
Indirect Cost
Name
Pennsylvania State University
Department
Type
DUNS #
City
University Park
State
PA
Country
United States
Zip Code
16802