This project aims to provide biologists with new tools to help them understand complex systems for which they have different sources of heterogeneous 'in situ' data. These data present many levels of heterogeneity and come concurrently with spatio-temporal and prior information that need to be incorporated into integrated data structures. This collaboration starts with the design of the collection process and provides tools for data integration and analysis written around the statistics package R and an interactive image analysis program GEMEDENT written in JAVA. The project concentrates on two specific types of heterogeneous data: metagenomic data and sequence mixtures provided by the new pyrosequencing machines and cell image data provided by automated microscopes. The first type of heterogeneous data are microbial soil sample data collected by Alfred Spormann from Civil and Environmental Engineering at Stanford. The proposal focuses on applying Bayesian computations in the design of sample locations and number of sequences collected and then using spectral multivariate methods to analyze diversity indices as tables (instead of summaries), thus incorporating the data structure into the decompositions. These methods will also be useful in the study of mixture data from pyrosequencing HIV, bacteria, viruses and cancer cells. The second study focuses on the interaction between immune cells and breast cancer in a collaboration with Peter Lee, hematologist at Stanford. We will analyze data from microscope images of stained lymph nodes. An integrated image analysis system enables the automatic detection of the location and size of many different cell types from stained images. Random forests have been incorporated into the image analysis system and an effective interactive boosting component provides the user with the possibility to iterate the learning process until a desired level of accuracy is attained. These data enable us to infer the spatial and dynamic interaction between the tumors and the immune cells. A postdoctoral fellow will be in charge of combining the cell data with the clinical history and the micro-array expression data from the same patient. The heterogeneity will be dealt with by using exploratory multivariate techniques based on spectral analysis, kernel methods and graphical representations. ? ? ?

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZGM1-CBCB-5 (BM))
Program Officer
Remington, Karin A
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Stanford University
Biostatistics & Other Math Sci
Schools of Arts and Sciences
United States
Zip Code
Callahan, Benjamin; Proctor, Diana; Relman, David et al. (2016) REPRODUCIBLE RESEARCH WORKFLOW IN R FOR THE ANALYSIS OF PERSONALIZED HUMAN MICROBIOME DATA. Pac Symp Biocomput 21:183-94
Bik, Elisabeth M; Costello, Elizabeth K; Switzer, Alexandra D et al. (2016) Marine mammals harbor unique microbiotas shaped by and yet distinct from the sea. Nat Commun 7:10516
DiGiulio, Daniel B; Callahan, Benjamin J; McMurdie, Paul J et al. (2015) Temporal and spatial variation of the human microbiota during pregnancy. Proc Natl Acad Sci U S A 112:11060-5
Bacallado, Sergio; Diaconis, Persi; Holmes, Susan (2015) de Finetti Priors using Markov chain Monte Carlo computations. Stat Comput 25:797-808
McMurdie, Paul J; Holmes, Susan (2015) Shiny-phyloseq: Web application for interactive microbiome analysis with provenance tracking. Bioinformatics 31:282-3
McMurdie, Paul J; Holmes, Susan (2014) Waste not, want not: why rarefying microbiome data is inadmissible. PLoS Comput Biol 10:e1003531
Pennings, Pleuni S; Holmes, Susan P; Shafer, Robert W (2014) HIV-1 transmission networks in a small world. J Infect Dis 209:180-2
Sahoo, Malaya K; Lefterova, Martina I; Yamamoto, Fumiko et al. (2013) Detection of cytomegalovirus drug resistance mutations by next-generation sequencing. J Clin Microbiol 51:3700-10
Pinter-Wollman, Noa; Bala, Ashwin; Merrell, Andrew et al. (2013) Harvester ants use interactions to regulate forager activation and availability. Anim Behav 86:197-207
Navas-Molina, José A; Peralta-Sánchez, Juan M; González, Antonio et al. (2013) Advancing our understanding of the human microbiome using QIIME. Methods Enzymol 531:371-444

Showing the most recent 10 out of 37 publications