Computational analysis of proteins is an essential shortcut to random experimentation. Multiple sequence alignments (MSAs) reveal evolutionary history of a protein family, govern predictions of 3D structures and functions and guide experimental design. Accuracy of these alignments is critical for the accuracy of conclusions from their analysis. With the finding from the previous round of the grant, we significantly advanced the power of sequence similarity search and improved the accuracy of MSA. Using these techniques, we aided biological discoveries in dozens of collaborations with experimentalists, analyzed medically important protein families and implemented a number of public web-servers. For the next funding period we propose to: 1) Build on our advances to perfect homology search and multiple sequence alignment. Sequence profile search will be improved by more sound statistics and by averaging scores over predicted homologs of found hits. Sequence alignment will be corrected in regions that interact less closely with the rest of the protein and segments that require large adjustments. 2) Maintain, improve and integrate our protein sequence analysis servers. During the first funding period of the grant, in addition to improving our sequence search and alignment web-servers, we developed three new servers for predicting a number of characteristics for a protein sequence, finding literature about a protein and visualizing relationships between proteins as networks, and compiled a searchable database of clinical mutations. We will integrate these servers into a single sequence analysis stop, augmented with other information, such as expression patterns, protein interactions, human polymorphism and known diseases. 3) Develop an Atlas of clinical mutations in proteins, freely available for browsing and download without login requirements. Each out of 25,000 known mutations will have a dedicated web-page with mutation's characteristics and predictions about its negative effects.

Public Health Relevance

Accurate protein sequence analysis is an essential step in planning of experiments. Despite recent progress, computational methods are not precise enough to predict properties of biological molecules, explain molecular mechanisms of diseases and design drugs. We will improve the accuracy of sequence analysis methods and apply them to develop an Atlas of clinical mutations - a free, accessible to all on-line interactive database with hypotheses about how each of 25,000 known mutations affects a protein and causes disease.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM094575-08
Application #
9328098
Study Section
Macromolecular Structure and Function D Study Section (MSFD)
Program Officer
Wehrle, Janna P
Project Start
2010-07-01
Project End
2018-08-31
Budget Start
2017-09-01
Budget End
2018-08-31
Support Year
8
Fiscal Year
2017
Total Cost
Indirect Cost
Name
University of Texas Sw Medical Center Dallas
Department
Physiology
Type
Schools of Medicine
DUNS #
800771545
City
Dallas
State
TX
Country
United States
Zip Code
75390
Li, Peng; Kinch, Lisa N; Ray, Ann et al. (2017) Acute Hepatopancreatic Necrosis Disease-Causing Vibrio parahaemolyticus Strains Maintain an Antibacterial Type VI Secretion System with Versatile Effector Repertoires. Appl Environ Microbiol 83:
Fédry, Juliette; Liu, Yanjie; Péhau-Arnaudet, Gérard et al. (2017) The Ancient Gamete Fusogen HAP2 Is a Eukaryotic Class II Fusion Protein. Cell 168:904-915.e10
Schaeffer, R Dustin; Liao, Yuxing; Cheng, Hua et al. (2017) ECOD: new developments in the evolutionary classification of domains. Nucleic Acids Res 45:D296-D302
Gao, Qiang; Binns, Derk D; Kinch, Lisa N et al. (2017) Pet10p is a yeast perilipin that stabilizes lipid droplets and promotes their assembly. J Cell Biol 216:3199-3217
Zhang, Jing; Cong, Qian; Fan, Xiao-Ling et al. (2017) Mitogenomes of Giant-Skipper Butterflies reveal an ancient split between deep and shallow root feeders. F1000Res 6:222
Pei, Jimin; Grishin, Nick V (2017) Expansion of divergent SEA domains in cell surface proteins and nucleoporin 54. Protein Sci 26:617-630
Cong, Qian; Shen, Jinhui; Li, Wenlin et al. (2017) The first complete genomes of Metalmarks and the classification of butterfly families. Genomics 109:485-493
Shen, Jinhui; Cong, Qian; Borek, Dominika et al. (2017) Complete Genome of Achalarus lyciades, The First Representative of the Eudaminae Subfamily of Skippers. Curr Genomics 18:366-374
Zhang, Jing; Kinch, Lisa N; Cong, Qian et al. (2017) Assessing predictions of fitness effects of missense mutations in SUMO-conjugating enzyme UBE2I. Hum Mutat 38:1051-1063
Cong, Qian; Shen, Jinhui; Borek, Dominika et al. (2016) Complete genomes of Hairstreak butterflies, their speciation, and nucleo-mitochondrial incongruence. Sci Rep 6:24863

Showing the most recent 10 out of 86 publications