Design of experiments and test of hypotheses need to be available to research teams to develop and validate taxonomic metagenomic biomarkers in gastrointestinal tract (GIT) diseases. The translational research team needs to 1) design an experiment to test a hypothesis (e.g., microbiota in Crohn's disease is different than in normal controls), 2) calculate the number of subjects needed to ensure a set level for power and significance for testing differences in taxa frequencies at a defined level (e.g., class, genus) , ) have an objective test statistic to decide if the null hypothesis is rejected, possibly adjusting fr patient factors such as age, gender, diet, etc., and 4) possess tools to calculate diagnostic performance needed to validate biomarkers (e.g., sensitivity, specificity, positive/negative predictive probabilities). Current analytic approaches (e.g., mass univariate "one-taxa-at-a-time" comparisons with multiple testing adjustments, exploratory cluster analysis) are excellent for discovery, but the subjectivity in their application and interpretation (e.g., Bray-Curtis versus Jaccard distance) make them inappropriate for translational research. Objective decision making using parametric statistical tools is necessary to move metagenomics from the discovery phase to clinical biomarker validation and qualification. This Phase I SBIR will develop and commercialize an experimental design and statistical analysis software platform for translational clinical research teams developing diagnostic and prognostic taxonomic labeled metagenomic biomarkers for gastrointestinal tract (GIT) diseases. Existing open source software and new software will be organized into the STATISTICAL TOOLS FOR GIT TRANSLATIONAL RESEARCH platform. An open source business model similar to Red Hat's Linux and Cloudera's Hadoop will be used with revenue generated by offering (optionally) a licensed proprietary GUI to facilitate deployment and use of these functions, contract for Data-Analysis-As-A-Service (DAAAS) model, cloud computing, and consulting services. It is important to note that the platform will not replace the metagenomic pre-processing pipelines (e.g., data acquisition, assembly, annotation, RDP matching), but takes the end product of these steps (e.g., taxa by sample frequency tables) to analyze for clinical use.
Metagenomics is being seen as the next revolution in gastrointestinal research and health care. This project will develop new statistical tools for analyzing metagenomic data in biomedical research to measure how differences in bacteria populations can be used to predict gastrointestinal disease progression and identify the best treatment options for patients. With these tools researchers will be better able to use these big data resources to improve human health.