This project will develop a new computational framework to advance the understanding of epigenetic gene regulation in the human malaria parasite. Epigenetics is the study of heritable changes in gene expression or cellular phenotype caused by mechanisms other than changes in the underlying DNA sequence.At the core of the computational framework is the ability to solve a set of hard computational questions, which are the focus of the research plan. The computational challenges require the study of novel combinatorial optimization problems, the development of new time- and space-efficient algorithms, and ultimately the implementation and deployment of user-friendly web-based software tools. The ability to analyze the epigenome of the human malaria parasite will improve our comprehension of its biology and possibly enable molecular biologists to identify new antimalarial strategies. The proposed computational framework will also enable life scientists to make novel epigenetic discoveries and ultimately improve the understanding of the complex mechanisms that drive gene expression inother eukaryotic organisms. Software tools will be placed into the public domain, which will benefit researchers and the public worldwide, and potentially lead to new international and industrial collaborations. This project will support two graduate students and one post-doc in a highly interdisciplinary environment.

Most eukaryotic genomes have a second layer of information which is embedded on chemical marks added to DNA and to the protruding tail of special proteins that package DNA into a complex called the nucleosome. One of the most astonishing discoveries in molecular biology of the past decades is that this "covert" layer, called the epigenome, affects a variety of cellular and metabolic processes. Epigenetic marks not only controls what genes are accessible in each type of cell, but also determine when the accessible genes may be activated. Molecular biologists have also confirmed that the epigenome is affected by the interactions of the organism with the environment and that changes to the epigenetic marks induced by these interactions are inherited across cell division, despite not being encoded directly in DNA.

This project will study a set of computational challenges that will be brought about by the increasing number of epigenome projects. Specifically, the goal is to develop methods and software tools for (1) the analysis nucleosome and methylation maps(using a modified Gaussian mixture model and expectation maximization); (2) the study of dynamics of nucleosome positioning, histone tail modifications and DNA methylation patterns (using graph theoretical approaches, e.g., k-partite matching); (3) the analysis of DNA motifs for stable nucleosomes and specific histone modifications (using combinatorial optimization approaches); (4) the discovery of new genes using nucleosome or methylation landscapes (using machine learning classifiers); (5) the identification of statistically significant genome-wide correlations between nucleosome positioning, histone modifications, DNA methylation patterns and gene expression (using dynamic Bayesian networks). These five computational tasks will require the study of novel combinatorial optimization and machine learning problems, the development of new time- and space-efficient algorithms, and ultimately the implementation and deployment of user-friendly web-based software tools.

The "platform" on which the algorithms will be developed is P. falciparum, the parasite responsible each year for 350-500 million cases of malaria, and between one and three million of human deaths world-wide. There is no vaccine against malaria (one is currently on clinical trials) and the parasite is developing resistances to almost all drugs currently available. The methods and tools developed will not be malaria-specific, and will scale to a variety of other eukaryota with much larger/complex genomes. Updates and additional information about this project will be made available at www.cs.ucr.edu/~stelo/iis13.htm

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
1302134
Program Officer
Sylvia Spengler
Project Start
Project End
Budget Start
2013-09-15
Budget End
2017-08-31
Support Year
Fiscal Year
2013
Total Cost
$994,370
Indirect Cost
Name
University of California Riverside
Department
Type
DUNS #
City
Riverside
State
CA
Country
United States
Zip Code
92521