We will work with the iHMP data resource to apply novel tools and data analysis methodologies to the challenge of disease association between large microbiome data sets, Inflammatory Bowel Disease, and the onset of diabetes. We will start with an annotation-free approach using k-mers to preprocess IBD and diabetes cohorts. We then will apply a novel scaling technology implemented in the sourmash software to reduce the data set size by a factor of 2000, rendering it tractable to machine learning approaches. We next will use random forests to determine a subset of predictive k-mers, and will measure their accuracy on validation data sets not used in the initial training. Finally, we will annotate the predictive k-mers using all available genome databases as well as a novel method to infer the metagenomic presence of accessory genomes of known genomes. Our outcomes will include a catalog of microbial genomes that correlate with IBD subtype and the onset of diabetes, as well as automated workflows to apply similar approaches to other data sets.

Public Health Relevance

We propose to work with the iHMP data, a large central microbiome resource, to study disease correlations with inflammatory bowel disease and diabetes. We will work to associate specific microbial species with the disease conditions. We will also produce resources that will help other researchers perform similar studies.

Agency
National Institute of Health (NIH)
Institute
Office of The Director, National Institutes of Health (OD)
Type
Small Research Grants (R03)
Project #
1R03OD030596-01
Application #
10112077
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Resat, Haluk
Project Start
2020-09-15
Project End
2021-08-31
Budget Start
2020-09-15
Budget End
2021-08-31
Support Year
1
Fiscal Year
2020
Total Cost
Indirect Cost
Name
University of California Davis
Department
Veterinary Sciences
Type
Schools of Veterinary Medicine
DUNS #
047120084
City
Davis
State
CA
Country
United States
Zip Code
95618