To meet the challenges brought by the increasingly amount of high-throughput data generated in many fields, especially in the area of plant genome research, a one-day session entitled "Interactions Between Omics and Statistics: Analyzing High Dimensional Data" will be held as part of the 8th International Purdue Symposium on Statistics June 20 - 24, 2012. The theme of the overall symposium is "Diversity in the Statistical Sciences for the 21st Century" and the session will be organized as an interactive frontier for prestigious researchers within plant biology and statistics, to bridge the gap between plant biologists and statisticians, and provide new insight into addressing the issues associated with high-dimensional data analysis. The topics of presentations include, but are not limited to, an overall introduction of high dimensional omics data, the special statistical challenges of these data, and the newly developed statistical methods to analyze these high dimensional data. In addition, future directions will be discussed to further improve the analysis of such high dimensional omics data. A companion workshop featured in the symposium entitled "iPlant Data Store and iPlant Discovery Environment" will be organized and presented by the iPlant Collaborative. NSF funds will in part defray the costs of participation of graduate students and postdocs to attend the symposium, session and workshop as part of their training in the highly interdisciplinary field of statistical genomics.

Project Report

Intellectual Merit. This session has greatly enhanced the collaboration between plant biologists and statisticians, especially in addressing the biological, computational, and statistical issues of high dimensional omics data analysis. The invited speakers are from many diverse areas including Agronomy, Botany & Plant Sciences, Mathematics, Molecular Genetics, Plant Biology, and Statistics. The presentations consist of discussions of high dimensional data in plant biology, the newly developed methods to analyze such data, and how the integration of them can advance our understanding of plants. As increasingly amount of high dimensional data are generated from high throughput biotechnologies, Dr. C. Robin Buell discussed the RNA-Seq data and how to use it to understand the expression patters and diversity in maize. Dr. Nathan Springer presented using RNA-Seq data and microarray data to generate co-expression networks that can provide more meaningful biological findings. On methodology development for high dimensional data, Dr. Dennis Cook introduced a new class of dimension reduction methods in regression setting, and these methods have nice theoretical properties with applications to analysis of systems biology. For sparse and high dimensional multivariate response regression models, Dr. Marten Wegkamp developed estimators on the basis of penalized least squares with novel penalties, and these estimators are adaptive to the unknown matrix sparsity and the rate of their convergence is fast. Dr. Ping Ma presented a new way to elucidate regulatory network with a nonparametric method involving Bayesian inference and mixed-effect models. For interdisciplinary research, Dr. Jianming Yu gave an overview on genome wide association studies (GWAS), and discussed the opportunities and challenges of statistical genetics using examples of an Arabidopsis study and a maize study respectively. Dr. Shizhong Xu showed a marker based infinitesimal model for quantitative trait analysis which can handle unlimited number of genetic markers withoutmarker selection. In post-GWAS era, Dr. L auren McIntyre combined GWAS and gene regulatory network using natural variation in allele-specific expression, and showed how phenotypic variations can be elucidated by using population genetics framework with whole genome data and molecular pathways. Dr. Lu Lu employed literature search and mRNA expression to conduct expression quantitative trait loci mapping that can help determine genetic regulatory relationship of the selected candidate genes. The promise of systems biology approach in inferring functional genetic covariance networks are shown by real data analysis. The posters from graduate students and postdoctoral researchers covered a wide range of topics in high dimensional data including metabolic profiling, metagenomic data, RNA-Seq data, and whole-genome sequencing analysis. In addition, some posters focus on methods development such as Bayesian methods for sparse linear mixed models, as well as joint linkage analysis and GWAS method for maize nest association mapping population. With the many discussions during the poster sessions, the graduate students and postdoctoral researchers received valuable comments from the speakers and other attendees of the conference. Broader Impacts. The presented statistical methods for analyzing omics data are applicable to similar types of high dimensional data with minimal modification. The presentations in the session have motivated graduate students to work on interdisciplinary areas, especially in high dimensional data analysis. The slides of all presentations have been disseminated via the symposium website to benefit the entire research community.

Agency
National Science Foundation (NSF)
Institute
Division of Integrative Organismal Systems (IOS)
Type
Standard Grant (Standard)
Application #
1240803
Program Officer
Diane Okamuro
Project Start
Project End
Budget Start
2012-05-01
Budget End
2013-04-30
Support Year
Fiscal Year
2012
Total Cost
$16,000
Indirect Cost
Name
Purdue University
Department
Type
DUNS #
City
West Lafayette
State
IN
Country
United States
Zip Code
47907