Our overarching hypothesis is that distorted microbial activities including excretion of signaling molecules can interact with the gut epithelium and contribute to the establishment of a persistent local inflammatory host milieu that can drive colorectal carcinogenesis. We hypothesize that aberrant methylation is associated with gut microbiota composition, both globally and at distinct clusters of GC sites. As both global as well as local methylation pattern can affect differentiation and proliferation of gut epithelium such a correlation would represent a novel mechanism through which microbiota might contribute to CRC risk. One of the bottlenecks for testing distinct hypotheses regarding the contributions of microbiota to CRC is a lack of advanced 'Big Data' bioinformatics tools. New approaches are needed to effectively mine the wealth of microbiota sequence data generated using high throughput platforms and integrate it with clinical metadata and other complex data, such as methylation status at half a million GC sites. Although a challenging task, general approaches borrowed from microarray and RNAseq approaches can be adapted to such microbiota analyses. While in the past we have contributed to the development and evaluation of 16S based microbiota analytical algorithms, here we propose to expand this work into a new direction.
The aims of this application are: To explore associations between microbiota composition and methylation patterns in the gut epithelium. For this exploratory study we will analyze fecal and biopsy samples that have already been collected in a colonoscopy based microbiota study. We will expand methylation analysis from the 12 samples previously analyzed to samples from all 125 participants.
Specific Aim 2 : To develop a 'Big Data' analytical approach for linking multiple large datasets to facilitate the study of complex interactions between microbiota composition, biomarkers and CRC risk. Datasets that include methylation and meta-genomics data, coupled with demographic and clinical indicators, are heterogeneous, sparse, and multi-dimensional. Distributed unsupervised computational learning, including the interpretable association rule mining, is well suited to overcome obstacles due to these data characteristics, and will allow us to determine patterns of CRC risk predictors to reveal associations between microbiota and methylation pattern with a limited sample size.
While a distorted microbiota has been observed in IBD and associated CRC, which is thought to represent a distinct CRC pathway, correlations between gut microbiota and global or local gut epithelium methylation have not been thoroughly investigated. In this project we will develop new tools that can address concerns in the analysis of multi-dimensional 'Big Data' datasets that are frequently generated using high throughput technologies. Identifying associations between gut microbiota, methylation and CRC risk might lead to new screening approaches and identify subjects that would benefit from microbiota targeting interventions.