A New Approach to Phylogenetic Analysis Using Structural Information

Rodriguez, Abel

Abstract

Phylogenetic analysis is a key tool in multiple areas including disease monitoring and drug design;its goal is to infer evolutionary relationships among multiple species, as well as to provide insights into the mechanisms driving the process of molecular evolution. This proposal is informed by two recent trends in phylogenetic analysis. On one hand, most current approaches for phylogenetic analysis require sequence alignments as input and produce reliable results only for proteins with at least a moderate degree of sequence similarity. On the other hand, the scientific community has started to realize that standard procedures for phylogenetic analysis, which first construct a sequence alignment and then use this single point estimate to guide the construction of the phylogenetic tree, can introduce serious biases and make researchers overconfident about the inferred evolutionary history. Indeed, alignment and tree construction are two interrelated problems that should be tackled jointly rather than sequentially. The proposed work represents the first attempt to include structural protein alignments in phylogenetic analysis while jointly accounting for uncertainty in both alignment and tree construction. Our approach employs Markov chain Monte Carlo algorithms to generate samples from the posterior distribution of alignments and trees given the sequences and structures, providing a straightforward procedure to compute probabilities of hypotheses of interest.
Specific aims of this project include: 1) To develop novel methods for using unaligned proteins to improve our understanding of the evolutionary relationship between protein sequence and tertiary structure. 2) To develop models for phylogenetic analysis that incorporate sequence and structure information and account for uncertainty in the alignment in the construction of phylogenetic trees and the estimation of evolutionary parameters. 3) To develop new computational algorithms for analyzing a large number of unaligned proteins. 4] To train interdisciplinary scientists capable of using sophisticated statistical methods to solve complex problems in evolutionary biology.

Public Health Relevance

This research will generate improved methods for investigating phylogenetic relationships over longer evolutionary timescales, improving our understanding of protein function. Since phylogenies capture the biological history and the correlation between living organism, these methods will have an impact on determining the origins and infection pattern of emerging diseases such as SARS and designing more effective drugs for rapidly evolving diseases such as influenza.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of General Medical Sciences (NIGMS)
Type: Research Project (R01)
Project #: 5R01GM090201-04
Application #: 8310154
Study Section: Special Emphasis Panel (ZGM1-CBCB-5 (BM))
Program Officer: Eckstrand, Irene A

Project Start: 2009-09-30
Project End: 2014-07-31
Budget Start: 2012-08-01
Budget End: 2013-07-31
Support Year: 4
Fiscal Year: 2012
Total Cost: $276,634
Indirect Cost: $31,609

Institution

Name: University of California Santa Cruz
Department: Engineering (All Types)
Type: Schools of Engineering
DUNS #: 125084723

City: Santa Cruz
State: CA
Country: United States
Zip Code: 95064

Related projects


NIH 2013 R01 GM	A New Approach to Phylogenetic Analysis Using Structural Information Rodriguez, Abel / University of California Santa Cruz	$266,418
NIH 2012 R01 GM	A New Approach to Phylogenetic Analysis Using Structural Information Rodriguez, Abel / University of California Santa Cruz	$276,634
NIH 2011 R01 GM	A New Approach to Phylogenetic Analysis Using Structural Information Rodriguez, Abel / University of California Santa Cruz	$277,140
NIH 2010 R01 GM	A New Approach to Phylogenetic Analysis Using Structural Information Rodriguez, Abel / University of California Santa Cruz	$282,099
NIH 2009 R01 GM	A New Approach to Phylogenetic Analysis Using Structural Information Rodriguez, Abel / University of California Santa Cruz	$299,999

Publications

Mukherjee, Chiranjit; Rodriguez, Abel (2016) GPU-powered Shotgun Stochastic Search for Dirichlet process mixtures of Gaussian Graphical Models. J Comput Graph Stat 25:762-788

Lee, Hui-Jie; Kishino, Hirohisa; Rodrigue, Nicolas et al. (2016) Grouping substitution types into different relaxed molecular clocks. Philos Trans R Soc Lond B Biol Sci 371:

Wang, Kuangyu; Yu, Shuhui; Ji, Xiang et al. (2015) Roles of solvent accessibility and gene expression in modeling protein sequence evolution. Evol Bioinform Online 11:85-96

Rodríguez, Abel; Quintana, Fernando A (2015) On species sampling sequences induced by residual allocation models. J Stat Plan Inference 157-158:108-120

Estrada, Rolando; Tomasi, Carlo; Schmidler, Scott C et al. (2015) Tree Topology Estimation. IEEE Trans Pattern Anal Mach Intell 37:1688-701

Wang, Hao; Rodríguez, Abel (2014) Identifying pediatric cancer clusters in Florida using loglinear models and generalized lasso penalties. Stat Public Policy (Phila) 1:86-96

Daniels, Kyle G; Tonthat, Nam K; McClure, David R et al. (2014) Ligand concentration regulates the pathways of coupled protein folding and binding. J Am Chem Soc 136:822-5

Herman, Joseph L; Challis, Christopher J; Novák, Ádám et al. (2014) Simultaneous Bayesian estimation of alignment and phylogeny under a joint model of protein sequence and structure. Mol Biol Evol 31:2251-66

Rodríguez, Abel; Martínez, Julissa C (2014) Bayesian semiparametric estimation of covariate-dependent ROC curves. Biostatistics 15:353-69

Rodriguez, Abel; Schmidler, Scott C (2014) BAYESIAN PROTEIN STRUCTURE ALIGNMENT. Ann Appl Stat 8:2068-2095

Showing the most recent 10 out of 18 publications

Comments

Be the first to comment on Abel Rodriguez's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: