The complexity of eukaryotic cells cannot be explained by genes and proteins alone but, rather, by their complex regulation that involves interactions based on a number of mechanisms. One important aspect of this regulation is performed after translation of mRNA to protein, whereby cellular proteins undergo modification (post-translational modifications, PTMs). In all forms of life studied, these modifications affect both the structure of proteins and their functions, including their participation in regulatory mechanisms. Identifying where these PTM sites occur is essential to correctly elucidating structure-function relationships. While wet-lab methods can test individual protein modification and function, computational methods are a promising high-throughput alternative by which to characterize PTM sites; thus, development of accurate and reliable PTM site prediction methods has become an important area of research. The project fits well with the mission and goals of the institution, as indicated in NCA&T's Preeminence 2020: "to make a strong commitment to education while working towards a much more-intensive research-intensive environment". This Excellence in Research (EiR) award will enable the project researchers to develop and establish an independent, high-quality research program that will impact the research experience and education of minority students at NCA&T.
This project aims to develop accurate Deep Learning (DL)-based predictors for the sites of three important post-translational protein modification events viz. phosphorylation, methylation sites and SUMOylation. This project aims to develop first an accurate DL approach for phosphorylation site prediction (one of the most widely studied PTMs) to tackle a host of key issues for adopting this method, including such factors as the required amount of training data, required complexity of the architecture among others; once the factors have been defined and trained on the phosphorylation data, the approach will be used to predict two other types of PTMs targeting lysine residues: methylation and SUMOylation. The project will provide insight into three important questions relevant to the use of DL architectures specific to applications in bioinformatics including: i) required number of training examples, ii) complexity of architecture vs. the performance, and iii) performance of hand-crafted features vs. simple features. In addition, the project aims to explore issues related to the use of simple features vs. more complex features that integrate biological observations, and best practices for the creation of negative datasets needed in computational experiments. The project will provide a novel and broadly applicable DL-based approach to predicting PTMs, producing more accurate and complete annotation that other researchers can then use to facilitate related biological studies. New education and outreach opportunities will be created for enhancing educational offerings and skills development in students, particularly in DL methods and applications; student recruitment will focus on creating opportunities for women and minority students at North Carolina A&T State University, the nation's largest HBCU. In addition, the project will establish an international research experience program for students, to visit research labs in Japan. This project will also provide support to conduct the "SciPhD Bootcamp" for professional career preparation for students from NCA&T and neighboring HBCUs. The developed resources will be accessible to the community at http://bcb.ncat.edu.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.