Malaria is one of the most common human infectious diseases, with an estimated 300-500 million cases a year and between one and three million yearly deaths. Malaria is caused by protozoan parasites, with the most serious forms of the disease in human caused by Plasmodium falciparum. The P. falciparum genome has been fully sequenced. Remarkably, only 55% of its identified proteins have any predicted or known functional annotations, and much of the organism's core machinery remains unidentified, thereby significantly hampering our understanding of this organism and of malaria. Since traditional bioinformatics approaches have had limited success in uncovering P. falciparum protein functions, the long-term goal of this research is to develop novel computational approaches that are more effective for this task. Our framework is centered on better identification of protein domains, the structural, functional and evolutionary units of proteins, and linking uncovered P. falciparum protein domains to well-characterized domains associated with known protein functions. Our approaches leverage comparative genomics, graph-theoretic methods, and sensitive probabilistic profile-profile comparisons, all within a robust computational pipeline.
The specific aims of this proposal are (1) To uncover putative domains within P. falciparum protein sequences using homologous sequences in closely related genomes, and to use these to identify similarity to known functionally characterized protein domains. (2) To increase the number of P. falciparum proteins with predicted functional motifs and domains by exploiting the tendency of certain motifs and domains to occur together within the same sequence. (3) To experimentally test a representative set of predictions, in order to uncover new P. falciparum biology and to evaluate our computational pipeline. The proposed techniques have significant potential for expanding the number of protein functional annotations within P. falciparum, and for therefore accelerating ongoing research efforts aimed at developing anti-malarial drug targets against the causative agent of human malaria.

Public Health Relevance

Malaria is one of the most common human infectious diseases, with an estimated 300- 500 million cases a year and between one and three million yearly deaths. The most serious forms of the disease in human are caused by /P. falciparum/. The proposed research aims to significantly expand the number of protein functional annotations for /P. falciparum/, in order to accelerate our understanding of the causative agent of human malaria.

Agency
National Institute of Health (NIH)
Institute
National Institute of Allergy and Infectious Diseases (NIAID)
Type
Exploratory/Developmental Grants (R21)
Project #
1R21AI085415-01
Application #
7773079
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Joy, Deirdre A
Project Start
2010-03-01
Project End
2012-02-28
Budget Start
2010-03-01
Budget End
2011-02-28
Support Year
1
Fiscal Year
2010
Total Cost
$239,112
Indirect Cost
Name
Princeton University
Department
Biostatistics & Other Math Sci
Type
Schools of Engineering
DUNS #
002484665
City
Princeton
State
NJ
Country
United States
Zip Code
08544
Ochoa, Alejandro; Singh, Mona (2017) Domain prediction with probabilistic directional context. Bioinformatics 33:2471-2478
Ochoa, Alejandro; Storey, John D; Llinás, Manuel et al. (2015) Beyond the E-Value: Stratified Statistics for Protein Domain Prediction. PLoS Comput Biol 11:e1004509
Ochoa, Alejandro; Llinás, Manuel; Singh, Mona (2011) Using context to improve protein domain identification. BMC Bioinformatics 12:90