The University of Illinois at Chicago and Princeton University are awarded collaborative grants to develop efficient and scalable computational methods for comparing protein structures. Protein sequences and structures are being determined at an increasingly rapid rate. To date, there are more than 1000 sequenced genomes with several thousand more in progress and over 50,000 three-dimensional structures in the Protein Data Bank. Whether considering proteins at the level of sequence or structure, comparing or aligning two proteins is the fundamental technique for uncovering principles of protein structure, function and evolution. While the vast majority of research efforts have focused on sequence comparisons, since protein structures are generally better conserved than protein sequences, identifying structural similarity between proteins can yield valuable clues to protein function and can be used to classify proteins, analyze their evolutionary histories and even to help predict protein interactions. Though considerable advances have been made in recent years in comparing protein structures, key difficulties include detecting shared, conserved structures between proteins where the individual structural elements are in different orderings on the two sequences. This project will develop innovative methods to enable discovery of sequence-order-independent substructure similarity with a long-term goal of doing a large-scale comparison over all protein structures. The research team will formulate precise theoretical problems, design efficient algorithms for them and implement and test the resulting algorithms to test accuracy and efficiency issues. The final software for comparing protein structures will be released to the scientific community and is expected to provide a significant and demonstrable impact on further research in structural bioinformatics.
Scientifically, the methodologies to be developed for substructure comparison are general and will have broader impacts beyond structural proteomics and bioinformatics. For example, a biomedical application of the proposed project lies in guiding protein engineering and rational drug design via a systematic identification of all such substructures and their underlying sequences. The project will involve undergraduates and under-represented minority (URM) groups in active research. A central component is to engage URM undergraduate students from the urban UIC campus and involve them in summer research at Princeton with the goal of possible recruitment into Princeton's graduate program in Quantitative and Computational Biology. Additionally, the PIs are planning course and curriculum development, dissemination of research, mentoring of undergraduate and graduate students, outreach and community involvement.
The outcomes of the project will be made available through the websites of all the investigators: www.cs.uic.edu/~dasgupta http://gila.bioengr.uic.edu/lab www.cs.princeton.edu/~mona