The proposed research considers the development of new statistical methods to detect positive natural selection in genes encoding malaria antigens. Our research is motivated by the analysis of multiple DNA sequences encoding the Apical Membrane Antigen 1 (AMA-i), the Circumsporozoite Protein (CSP) and the Merozoite Surface Protein (MSP-1) in the P.falciparum and P.vivax human malaria parasites. Several DNA sequences collected in different geographical areas in Africa, Asia and South America are available for each antigen. The fact that a large number of very similar sequences is available implies that a single phylogenetic structure cannot be assumed, as very many phylogenies are equally likely to explain the evolution of these sequences. The development of new statistical methods based on a class of hierarchical generalized linear models is proposed. These methods do not require the specification of a phylogenetic structure and can predict sites under positive selection in a relatively fast way. The new approach is Bayesian and will need the implementation of efficient computational methods for parameter estimation. The new models allow for the incorporation of information that might be relevant to infer the pattern of substitutions, such as geographical location and information on pairwise evolutionary distances if available. The specific goals of the proposed research are: (1) developing new statistical models to detect molecular adaptation in a large number of DNA protein coding sequences that are closely related in evolutionary time, and for which little or no phylogenetic structure is available, (2) assessing the predictive performance of the new models via simulation studies for different kinds of datasets and comparing the new and current methodologies to address the problem of identifying sites under positive selection in DNA sequences, (3) analyzing a large number of DNA sequences encoding malaria antigens using the new models.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM072003-04
Application #
7215660
Study Section
Special Emphasis Panel (ZGM1-CMB-0 (MB))
Program Officer
Singh, Shiva P
Project Start
2004-04-01
Project End
2009-03-31
Budget Start
2007-04-01
Budget End
2008-03-31
Support Year
4
Fiscal Year
2007
Total Cost
$88,621
Indirect Cost
Name
University of California Santa Cruz
Department
Engineering (All Types)
Type
Schools of Engineering
DUNS #
125084723
City
Santa Cruz
State
CA
Country
United States
Zip Code
95064
Datta, Saheli; Rodriguez, Abel; Prado, Raquel (2012) Bayesian semiparametric regression models to characterize molecular evolution. BMC Bioinformatics 13:278
Datta, Saheli; Prado, Raquel; Rodríguez, Abel et al. (2010) Characterizing molecular adaptation: a hierarchical approach to assess the selective influence of amino acid properties. Bioinformatics 26:2818-25