Building on our previous work on coevlution of interacting proteins (5) we studied power and limitation of the mirror-tree method to predict protein interaction. We and others have observed that the evolutionary distances of interacting proteins often display a higher level of similarity than those of noninteracting proteins. It has been difficult, however, to identify the direct cause of the observed similarities between evolutionary trees. One possible explanation is the existence of compensatory mutations between partners'binding sites to maintain proper binding. This explanation, though, has been recently challenged, and it has been suggested that the signal of correlated evolution uncovered by the mirrortree method is unrelated to any correlated evolution between binding sites. In (5),we examined the contribution of binding sites to the correlation between evolutionary trees of interacting domains. We showed that binding neighborhoods of interacting proteins have, on average, higher coevolutionary signal compared with the regions outside binding sites;however, when the binding neighborhood is removed, the remaining domain sequence still contains some coevolutionary signal. I also continued study of evolutionary pressure exerted on genome sequences, focusing on the optimization of codon usage. The question that we asked is whether codon usage is optimized towards avoiding frameshifting errors in translation. I have also expanded the scope of the systems biology research done in my group. In addition to studying properties of protein interaction networks and regulatory networks (3) we began to develop new apporaches to phenotype-genotype associations. For example, in publication (1) we developed a new method for analysis of expression quantitative trait loci (eQTL). Such analysis significantly contributes to the determination of gene regulation programs. To address some of the known challenges, of analysis of associations of gene expression levels and their underlying sequence polymorphisms, we developed the Graph based eQTL Decomposition method (GeD) that allowed us to model genotype and expression data using the so called eQTL association graph. Through graph-based heuristics, GeD identifies dense subgraphs in the eQTL association graph. By identifying eQTL association cliques that expose the hidden structure of genotype and expression data, GeD effectively filters out most locus-gene pairs that are unlikely to have significant linkage. We applied GeD to the eQTL data for Plasmodium falciparum, the human malaria parasite, and demonstrated that GeD reveals the structure of the relationship between all loci and all genes on a whole genome level. Furthermore, GeD allowed us to uncover additional eQTLs with lower FDR, providing an important complement to traditional eQTL analysis methods. We are also working on new methods to associate genotype variation with pathway level phenotypes.
Showing the most recent 10 out of 72 publications