This project aims to provide biologists with new tools to help them understand complex systems for which they have different sources of heterogeneous 'in situ'data. These data present many levels of heterogeneity and come concurrently with spatio-temporal and prior information that need to be incorporated into integrated data structures. This collaboration starts with the design of the collection process and provides tools for data integration and analysis written around the statistics package R and an interactive image analysis program GEMEDENT written in JAVA. The project concentrates on two specific types of heterogeneous data: metagenomic data and sequence mixtures provided by the new pyrosequencing machines and cell image data provided by automated microscopes. The first type of heterogeneous data are microbial soil sample data collected by Alfred Spormann from Civil and Environmental Engineering at Stanford. The proposal focuses on applying Bayesian computations in the design of sample locations and number of sequences collected and then using spectral multivariate methods to analyze diversity indices as tables (instead of summaries), thus incorporating the data structure into the decompositions. These methods will also be useful in the study of mixture data from pyrosequencing HIV, bacteria, viruses and cancer cells. The second study focuses on the interaction between immune cells and breast cancer in a collaboration with Peter Lee, hematologist at Stanford. We will analyze data from microscope images of stained lymph nodes. An integrated image analysis system enables the automatic detection of the location and size of many different cell types from stained images. Random forests have been incorporated into the image analysis system and an effective interactive boosting component provides the user with the possibility to iterate the learning process until a desired level of accuracy is attained. These data enable us to infer the spatial and dynamic interaction between the tumors and the immune cells. A postdoctoral fellow will be in charge of combining the cell data with the clinical history and the micro-array expression data from the same patient. The heterogeneity will be dealt with by using exploratory multivariate techniques based on spectral analysis, kernel methods and graphical representations.
Callahan, Benjamin; Proctor, Diana; Relman, David et al. (2016) REPRODUCIBLE RESEARCH WORKFLOW IN R FOR THE ANALYSIS OF PERSONALIZED HUMAN MICROBIOME DATA. Pac Symp Biocomput 21:183-94 |
Bik, Elisabeth M; Costello, Elizabeth K; Switzer, Alexandra D et al. (2016) Marine mammals harbor unique microbiotas shaped by and yet distinct from the sea. Nat Commun 7:10516 |
DiGiulio, Daniel B; Callahan, Benjamin J; McMurdie, Paul J et al. (2015) Temporal and spatial variation of the human microbiota during pregnancy. Proc Natl Acad Sci U S A 112:11060-5 |
Bacallado, Sergio; Diaconis, Persi; Holmes, Susan (2015) de Finetti Priors using Markov chain Monte Carlo computations. Stat Comput 25:797-808 |
McMurdie, Paul J; Holmes, Susan (2015) Shiny-phyloseq: Web application for interactive microbiome analysis with provenance tracking. Bioinformatics 31:282-3 |
McMurdie, Paul J; Holmes, Susan (2014) Waste not, want not: why rarefying microbiome data is inadmissible. PLoS Comput Biol 10:e1003531 |
Pennings, Pleuni S; Holmes, Susan P; Shafer, Robert W (2014) HIV-1 transmission networks in a small world. J Infect Dis 209:180-2 |
Navas-Molina, José A; Peralta-Sánchez, Juan M; González, Antonio et al. (2013) Advancing our understanding of the human microbiome using QIIME. Methods Enzymol 531:371-444 |
Kashyap, Purna C; Marcobal, Angela; Ursell, Luke K et al. (2013) Genetically dictated change in host mucus carbohydrate landscape exerts a diet-dependent effect on the gut microbiota. Proc Natl Acad Sci U S A 110:17059-64 |
Diaconis, Persi; Holmes, Susan; Janson, Svante (2013) Interval Graph Limits. Ann Comb 17:27-52 |
Showing the most recent 10 out of 37 publications