In many areas of biological or medical research, investigators are faced with the task of analyzing data sets that can be described as """"""""large sample, moderate dimension"""""""". An important example, which is the specific focus of this project, is multi-parameter flow cytometry where the number of data points is in the range of tens of thousands to several millions, and each data point can provide measurements on multiple variables (5 to 60). New statistical tools are needed to analyze and visualize this data, and to address the associated hypothesis testing and modeling challenges. The goal of this research is to develop new statistical methods for this type of multivariate data, and based on these methods, to create and provide more effective data analysis and interpretation tools for multi-parameter flow cytometry. First we will develop a new approach to multivariate density estimation based on the approximation of the density by simple functions. These estimates are essentially histograms based on data adaptive partitions of the basic multivariate domain.
The aim i s to attain effective learning of these partitions using methods with strong theoretical justification and good empirical performance. We will also implement and further develop these methods for the analysis of multi-parameter flow cytometry data. Particular attention will be paid to mass cytometry which is a new cytometry modality that can greatly increase the number of variables measured per cell, as compared to classical polychromatic flow cytometry.
The aim i s not only to improve primary analysis tasks such as cell population identification, but also to develop new methods for downstream analysis tasks such as graphical modeling of the variables being analyzed. The methods will be disseminated though distribution of computer programs and also though web service.

Public Health Relevance

The goal of this research is to develop new statistical methods for nonparametric learning from large amount of multi-dimensional data points. Based on these methods, we will create and provide more effective data analysis and interpretation tools for multi- parameter flow cytometry. There are two broad impacts of this research. First, the new analytical and computational methods will open up novel ways to use flow cytometry in many areas of studies in current biology and medicine. Second, the density estimation methods and software resulting from this research have general applicability beyond flow cytometry analysis, and can be used as building blocks to design new statistical analysis and modeling tools useful in other areas.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
1R01GM109836-01
Application #
8664732
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Brazhnik, Paul
Project Start
2014-07-15
Project End
2018-04-30
Budget Start
2014-07-15
Budget End
2015-04-30
Support Year
1
Fiscal Year
2014
Total Cost
Indirect Cost
Name
Stanford University
Department
Biostatistics & Other Math Sci
Type
Schools of Arts and Sciences
DUNS #
City
Stanford
State
CA
Country
United States
Zip Code
94304
Zamanighomi, Mahdi; Lin, Zhixiang; Daley, Timothy et al. (2018) Unsupervised clustering and epigenetic classification of single cells. Nat Commun 9:2410
Daley, Timothy P; Lin, Zhixiang; Lin, Xueqiu et al. (2018) CRISPhieRmix: a hierarchical mixture model for CRISPR pooled screens. Genome Biol 19:159
Duren, Zhana; Chen, Xi; Zamanighomi, Mahdi et al. (2018) Integrative analysis of single-cell genomics data by coupled nonnegative matrix factorizations. Proc Natl Acad Sci U S A 115:7723-7728
Burns, Tyler J; Frei, Andreas P; Gherardini, Pier F et al. (2017) High-throughput precision measurement of subcellular localization in single cells. Cytometry A 91:180-189
Samusik, Nikolay; Aghaeepour, Nima; Bendall, Sean (2017) SESSION INTRODUCTION. Pac Symp Biocomput 22:557-563
Samusik, Nikolay; Good, Zinaida; Spitzer, Matthew H et al. (2016) Automated mapping of phenotype space with single-cell data. Nat Methods 13:493-6
Lin, Zhixiang; Yang, Can; Zhu, Ying et al. (2016) Simultaneous dimension reduction and adjustment for confounding variation. Proc Natl Acad Sci U S A 113:14662-14667
Anchang, Benedict; Hart, Tom D P; Bendall, Sean C et al. (2016) Visualization and cellular hierarchy inference of single-cell data using SPADE. Nat Protoc 11:1264-79
Angst, Martin S; Fragiadakis, Gabriela K; Gaudillière, Brice et al. (2016) In Reply. Anesthesiology 124:1414-5
Mohiyuddin, Marghoob; Mu, John C; Li, Jian et al. (2015) MetaSV: an accurate and integrative structural-variant caller for next generation sequencing. Bioinformatics 31:2741-4

Showing the most recent 10 out of 17 publications