This research project is to create new tools for large-scale multiple comparisons. In particular, the investigator develops new tools in the frequency domain to tackle problems in this field. The project includes (a). Introduce Fourier analysis as a tool for multiple comparisons. The investigator devotes to push the boundary of the field by harnessing the power of Fourier analysis. The Fourier analysis has been repeatedly proven to be a powerful tool in many scientific areas, but has seldom been used in the field of large-scale multiple comparisons. (b). Develop practically feasible tools, and lay out theoretic frameworks for studying the optimality of the tools. (c). Extend and apply the developed methodology and theory to the analysis of massive data generated in various scientific fields, including comparative genomic hybridization (CGH), cosmology and astronomy, and gene microarray.

Modern data acquisition routinely produces massive data sets in many scientific areas, e.g. genomics, astronomy, functional Magnetic Resonance Imaging (fMRI), and image processing. The vision is that advances in massive data analysis will enable scientist from various fields to quickly extract the information they need, and at the same time, benefit the statistical discipline both with a broader scope of theory and methodology but also with a deeper understanding of nature and science. The project pushes the boundary of the field by introducing new ideas for problem solving, developing new tools and novel theory, and applying the tools to other scientific fields including but not limited to comparative genomic hybridization (CGH), cosmology and astronomy, and gene microarray.

Project Report

We are entering the so-called era of `Big Data', where massive datasets consisting of millions or billions of observations and variables are mined for associations and patterns. Such activity is the driving force in numerous areas of science, technology, and business, and is an ever increasing focus of modern intellectual life. The supported research has been in the area of analyzing `Big Data', focusing on the most challenging regime where the signals or useful features (e.g. genes, proteins) are rare and weak. The supported research spans a five-year period, during which the PI has developed new methods and new theoretic frameworks that are appropriate for analyzing rare and weak signals in `Big Data'. The supported research has applications in areas including the genomics, cosmology and astronomy, functional MRI, and image processing. The supported research has trained a total of seven Ph.D students, majored in Statistics or Mathematics, out of which three have obtained their Ph.D degree and found academic jobs, and four of them are making good progresses towards their Ph.D degree. The supported research has also educated master students and undergraduate students in the form of topic courses, journal club, seminars and conference talks. Some of these students are majored in statistics, the others are majored in the fields such as Engineering, Life Science, etc.. The PI has also served in the program committee of several workshops and conferences. These workshops and conferences has brought together research leaders, junior scientists, and graduate students from various different fields together around the theme of `Big Data', and has provided a friendly environment that fosters communications and conversations.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
0908613
Program Officer
Gabor J. Szekely
Project Start
Project End
Budget Start
2008-09-05
Budget End
2012-06-30
Support Year
Fiscal Year
2009
Total Cost
$342,188
Indirect Cost
Name
Carnegie-Mellon University
Department
Type
DUNS #
City
Pittsburgh
State
PA
Country
United States
Zip Code
15213