This project will advance current capabilities in protein structure prediction. Only about a third of known protein families with no experimentally-available structures are amenable to homology modeling; that is, have other proteins with sufficiently similar sequence profiles for which structures have been resolved in experimental laboratories. For the majority of known protein families, the so-called dark proteome, this is not the case. Structures are missing. Being able to obtain them experimentally or computationally is key to understanding the roles of proteins in key cellular proteins, obtaining a detailed view of molecular mechanisms, guiding efforts on therapeutic development, engineering proteins with specific functions, and more. This project will advance such efforts for the dark proteome with novel informatics techniques that are capable of harnessing useful signals hidden in protein sequences.

The project will evaluate the hypothesis that covariational signals in multiple sequence alignment can be harnessed to advance free modeling. Deep neural network architectures will be utilized for this purpose. Research activities are organized in two thrusts: (1) development of distant-homology fold recognition methods by alignment of inter-residue distance bounds predicted using 2D deep fully residual networks (FRNs); and (2) development of protein model quality estimation methods driven by per-residue distance errors predicted using 1D deep residual neural networks (ResNets). The project benefits researchers in diverse communities that are working at the interface of computing and biology. Planned activities include free dissemination of novel bioinformatics tools and research results, broadening of participation of K-12 students in computing through creative mentoring and outreach, and increasing public understanding of interdisciplinary science via Samuel Ginn College of Engineering’s GINNing podcast series.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Project Start
Project End
Budget Start
2020-06-01
Budget End
2021-05-31
Support Year
Fiscal Year
2020
Total Cost
$100,276
Indirect Cost
Name
Auburn University
Department
Type
DUNS #
City
Auburn
State
AL
Country
United States
Zip Code
36832