Accurate multiple sequence alignment (MSA) is the major unsolved problem in protein bioinformatics. Alignments and similarity searches are essential first steps in experimental design for all studies involving proteins, and the accuracy of these methods is crucial for the success in biomedical research. In the course of this project, we plan to significantly improve the accuracy of alignments and the precision of sequence similarity detection between protein families. During the last few years, our group proposed 4 methods for MSA construction and 2 methods for remote homology inference. Presently, our latest program PROMALS3D is judged to be the most accurate aligner for weakly similar sequences. We also performed a comprehensive survey of kinase sequences and structures that revealed 25 homologous groups (superfamilies) in 10 structural folds. Building on these results, we propose to: 1) Improve sensitivity of sequence profile similarity search, mainly by using known relationships between database sequences. 2) Develop software for accurate MSA of sequences with low similarity. The emphasis is being made on employment of structural features and predictions to improve MSA quality. 3) Design an easy to use web server for exploration of protein families. The server could be queried with a single sequence to find, align and analyze its homologs, or with a set of sequences. The central component of the server is MSA using the software developed in this project. 4) Assemble a database of high quality MSAs for kinases and their relatives, and make testable structure-functional predictions for groups without experimental annotations. Since kinases attract considerable attention due to their medical relevance (e.g. cancer studies), this database should be a valuable asset to researchers.
Accurate multiple sequence alignment is the major unsolved problem in protein bioinformatics. Alignments and sequence similarity searches are essential first steps in experimental design for all studies involving proteins, and the accuracy of these methods is crucial for the success of biomedical research. We will improve alignment accuracy and using the new method will analyze kinases, which are a medically important group of enzymes attracting high interest because of their relevance to many diseases, cancer in particular.
|Schaeffer, R Dustin; Kinch, Lisa N; Liao, Yuxing et al. (2016) Classification of proteins with shared motifs and internal repeats in the ECOD database. Protein Sci 25:1188-203|
|Song, Jeongmin; Wilhelm, Cara L; Wangdi, Tamding et al. (2016) Absence of TLR11 in Mice Does Not Confer Susceptibility to Salmonella Typhi. Cell 164:827-8|
|Lee, Jyh-Yeuan; Kinch, Lisa N; Borek, Dominika M et al. (2016) Crystal structure of the human sterol transporter ABCG5/ABCG8. Nature 533:561-4|
|Cong, Qian; Shen, Jinhui; Warren, Andrew D et al. (2016) Speciation in Cloudless Sulphurs Gleaned from Complete Genomes. Genome Biol Evol 8:915-31|
|Zhang, Yinxin; Lee, Kwang Min; Kinch, Lisa N et al. (2016) Direct Demonstration That Loop1 of Scap Binds to Loop7: A CRUCIAL EVENT IN CHOLESTEROL HOMEOSTASIS. J Biol Chem 291:12888-96|
|Li, Wenlin; Schaeffer, R Dustin; Otwinowski, Zbyszek et al. (2016) Estimation of Uncertainties in the Global Distance Test (GDT_TS) for CASP Models. PLoS One 11:e0154786|
|Baker, Richard H; Narechania, Apurva; DeSalle, Rob et al. (2016) Spermatogenesis Drives Rapid Gene Creation and Masculinization of the X Chromosome in Stalk-Eyed Flies (Diopsidae). Genome Biol Evol 8:896-914|
|Li, Peng; Rivera-Cancel, Giomar; Kinch, Lisa N et al. (2016) Bile salt receptor complex activates a pathogenic type III secretion system. Elife 5:|
|Cong, Qian; Shen, Jinhui; Borek, Dominika et al. (2016) Complete genomes of Hairstreak butterflies, their speciation, and nucleo-mitochondrial incongruence. Sci Rep 6:24863|
|Kinch, Lisa N; Li, Wenlin; Schaeffer, R Dustin et al. (2016) CASP 11 target classification. Proteins 84 Suppl 1:20-33|
Showing the most recent 10 out of 67 publications