Computational Studies of Virus-host Interactions Using Metagenomics Data and Applications

Sun, Fengzhu; Ahlgren, Nathan

Abstract

Computational Studies of Virus-host Interactions Using Metagenomics Data and Applications Summary: Viruses are ubiquitous in almost every ecological environment including the human body, water, soil, etc. They play important roles in the normal function of human microbiome. Many viruses have been shown to be associated with human diseases. However, our understanding of the roles of viruses in ecological communities is very limited. Recent technological and computational advances make it possible to have a deep understanding of the roles of viruses in public health and the environment. Metagenomics studies from various environments including the human microbiome projects (HMP), global ocean, and the earth microbiome projects have generated large amounts of short read data. Viruses are present in most of these metagenomic data sets and their hosts are unknown. In this proposal, the investigators will develop computational approaches for the identification of viral sequences from metagenomic data sets and for the study of virus-host interactions. For the identification of viral sequences from metagenomics samples, novel statistical measures using word patterns will first be developed. Second, a unified nave Bayesian integrative approach by combining information from word patterns, gene directionality, and gene annotation will be studied. Third, the identified viral sequences from metagenomes will be further assembled to construct complete viral genomes using a novel binning approach to be developed by the investigators. Finally, the remaining reads will be assigned to the corresponding bins. For the study of virus- host interactions, computational methods to estimate the reliability of virus-host interactions from high-throughput experiments will first be developed. Then machine learning approaches will be developed to predict viruses infecting certain hosts. Finally, a network logistic regression approach will be developed to predict virus-host interactions. These computational approaches for the identification of viral sequences and for predicting virus-host interactions will be applied to a public liver cirrhosis and a unique metagenomics data set to understand how metagenomes change with health status, identify viruses and virus-host interactions associated with disease status and accurately predict disease status using bacteria, viruses and virus-host interactions. The developed computational methods will also be used to analyze metageomic data from various locations based on the TARA ocean data and a unique time series data to understand how environmental factors affect virus abundance and virus-host interactions. Some of the predictions will be experimentally validated. Software derived from the proposal will be developed and freely distributed to the scientific community.

Public Health Relevance

Viruses are abundant in many environments and are important to public health. New statistical and computational tools will be developed for the identification of viral sequences from metagenomics samples and for the prediction of virus-host interactions. These tools will be used to analyze microbial data sets related to liver cirrhosis and travelers? diarrhea as well as marine metagenomics data sets from various geographic locations and time series.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of General Medical Sciences (NIGMS)
Type: Research Project (R01)
Project #: 3R01GM120624-03S1
Application #: 9992596
Study Section
Program Officer: Ravichandran, Veerasamy

Project Start: 2017-04-15
Project End: 2021-03-31
Budget Start: 2019-04-01
Budget End: 2020-03-31
Support Year: 3
Fiscal Year: 2020
Total Cost
Indirect Cost

Institution

Name: University of Southern California
Department: Biology
Type: Schools of Arts and Sciences
DUNS #: 072933393

City: Los Angeles
State: CA
Country: United States
Zip Code: 90089

Related projects


NIH 2020 R01 GM	Computational Studies of Virus-host Interactions Using Metagenomics Data and Applications Sun, Fengzhu; Ahlgren, Nathan / University of Southern California
NIH 2020 R01 GM	Computational Studies of Virus-host Interactions Using Metagenomics Data and Applications Sun, Fengzhu; Ahlgren, Nathan / University of Southern California
NIH 2019 R01 GM	Computational Studies of Virus-host Interactions Using Metagenomics Data and Applications Sun, Fengzhu; Ahlgren, Nathan / University of Southern California
NIH 2019 R01 GM	Computational Studies of Virus-host Interactions Using Metagenomics Data and Applications Sun, Fengzhu; Ahlgren, Nathan / University of Southern California
NIH 2018 R01 GM	Computational Studies of Virus-host Interactions Using Metagenomics Data and Applications Sun, Fengzhu; Ahlgren, Nathan / University of Southern California
NIH 2018 R01 GM	Computational Studies of Virus-host Interactions Using Metagenomics Data and Applications Sun, Fengzhu; Ahlgren, Nathan / University of Southern California
NIH 2017 R01 GM	Computational Studies of Virus-host Interactions Using Metagenomics Data and Applications Sun, Fengzhu; Ahlgren, Nathan / University of Southern California

Publications

Li, Han; Sun, Fengzhu (2018) Comparative studies of alignment, alignment-free and SVM based approaches for predicting the hosts of viruses based on viral sequences. Sci Rep 8:10032

Tang, Kujin; Lu, Yang Young; Sun, Fengzhu (2018) Background Adjusted Alignment-Free Dissimilarity Measures Improve the Detection of Horizontal Gene Transfer. Front Microbiol 9:711

Lu, Yang Young; Tang, Kujin; Ren, Jie et al. (2017) CAFE: aCcelerated Alignment-FrEe sequence analysis. Nucleic Acids Res 45:W554-W559

Lu, Yang Young; Lv, Jinchi; Fuhrman, Jed A et al. (2017) Towards enhanced and interpretable clustering/classification in integrative genomics. Nucleic Acids Res 45:e169

Ahlgren, Nathan A; Ren, Jie; Lu, Yang Young et al. (2017) Alignment-free $d_2^*$ oligonucleotide frequency dissimilarity measure improves prediction of hosts from metagenomically-derived viral sequences. Nucleic Acids Res 45:39-53

Wang, Ying; Wang, Kun; Lu, Yang Young et al. (2017) Improving contig binning of metagenomic data using [Formula: see text] oligonucleotide frequency dissimilarity. BMC Bioinformatics 18:425

Ren, Jie; Ahlgren, Nathan A; Lu, Yang Young et al. (2017) VirFinder: a novel k-mer based tool for identifying viral sequences from assembled metagenomic data. Microbiome 5:69

Comments

Be the first to comment on Fengzhu Sun's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: