In many areas of biological or medical research, investigators are faced with the task of analyzing data sets that can be described as "large sample, moderate dimension". An important example, which is the specific focus of this project, is multi-parameter flow cytometry where the number of data points is in the range of tens of thousands to several millions, and each data point can provide measurements on multiple variables (5 to 60). New statistical tools are needed to analyze and visualize this data, and to address the associated hypothesis testing and modeling challenges. The goal of this research is to develop new statistical methods for this type of multivariate data, and based on these methods, to create and provide more effective data analysis and interpretation tools for multi-parameter flow cytometry. First we will develop a new approach to multivariate density estimation based on the approximation of the density by simple functions. These estimates are essentially histograms based on data adaptive partitions of the basic multivariate domain.
The aim i s to attain effective learning of these partitions using methods with strong theoretical justification and good empirical performance. We will also implement and further develop these methods for the analysis of multi-parameter flow cytometry data. Particular attention will be paid to mass cytometry which is a new cytometry modality that can greatly increase the number of variables measured per cell, as compared to classical polychromatic flow cytometry.
The aim i s not only to improve primary analysis tasks such as cell population identification, but also to develop new methods for downstream analysis tasks such as graphical modeling of the variables being analyzed. The methods will be disseminated though distribution of computer programs and also though web service.

Public Health Relevance

The goal of this research is to develop new statistical methods for nonparametric learning from large amount of multi-dimensional data points. Based on these methods, we will create and provide more effective data analysis and interpretation tools for multi- parameter flow cytometry. There are two broad impacts of this research. First, the new analytical and computational methods will open up novel ways to use flow cytometry in many areas of studies in current biology and medicine. Second, the density estimation methods and software resulting from this research have general applicability beyond flow cytometry analysis, and can be used as building blocks to design new statistical analysis and modeling tools useful in other areas.

National Institute of Health (NIH)
Research Project (R01)
Project #
Application #
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Brazhnik, Paul
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Stanford University
Biostatistics & Other Math Sci
Schools of Arts and Sciences
United States
Zip Code
Angst, Martin S; Fragiadakis, Gabriela K; Gaudillière, Brice et al. (2016) In Reply. Anesthesiology 124:1414-5
Samusik, Nikolay; Aghaeepour, Nima; Bendall, Sean (2016) SESSION INTRODUCTION. Pac Symp Biocomput 22:557-563
Samusik, Nikolay; Good, Zinaida; Spitzer, Matthew H et al. (2016) Automated mapping of phenotype space with single-cell data. Nat Methods 13:493-6
Weirather, Jason L; Afshar, Pegah Tootoonchi; Clark, Tyson A et al. (2015) Characterization of fusion genes and the significantly expressed fusion isoforms in breast cancer by hybrid sequencing. Nucleic Acids Res 43:e116
Taylor, Sarah E B; Li, Ye Henry; Wong, Wing H et al. (2015) Genome-wide mapping of DNA hydroxymethylation in osteoarthritic chondrocytes. Arthritis Rheumatol 67:2129-40
Spitzer, Matthew H; Gherardini, Pier Federico; Fragiadakis, Gabriela K et al. (2015) IMMUNOLOGY. An interactive reference framework for modeling a dynamic immune system. Science 349:1259425
Mohiyuddin, Marghoob; Mu, John C; Li, Jian et al. (2015) MetaSV: an accurate and integrative structural-variant caller for next generation sequencing. Bioinformatics 31:2741-4
Fang, Li Tai; Afshar, Pegah Tootoonchi; Chhibber, Aparna et al. (2015) An ensemble approach to accurately detect somatic mutations using SomaticSeq. Genome Biol 16:197
Behbehani, Gregory K; Samusik, Nikolay; Bjornson, Zach B et al. (2015) Mass Cytometric Functional Profiling of Acute Myeloid Leukemia Defines Cell-Cycle and Immunophenotypic Properties That Correlate with Known Responses to Therapy. Cancer Discov 5:988-1003
Mu, John C; Mohiyuddin, Marghoob; Li, Jian et al. (2015) VarSim: a high-fidelity simulation and validation framework for high-throughput genome sequencing with cancer applications. Bioinformatics 31:1469-71

Showing the most recent 10 out of 11 publications