The candidate, Dr. Arlo Randall, is performing directed research in the Institute for Genomics and Bioinformatics (IGB) at the University of California, Irvine (UCI). His mentor, Dr. Pierre Baldi, is a Professor in the Department of Computer Science, with a joint appointment in the Department of Biological Chemistry in the School of Medicine. The IGB fosters and promotes innovative basic and applied research at the intersection of the computational and life-sciences, and is educating the next generation of computational biologists. Accurate models of molecular structures and their interactions are crucial for our basic understanding of life processes and for important biomedical applications. The candidate is dedicated to the development of computational methods for predicting and engineering molecular structures, from the scale of small organic molecules made up of a few tens of atoms to large protein complexes consisting of many thousands of atoms. Accurate small molecule 3D structure prediction is fundamental for computational biochemistry and predicted models support practical applications such as drug discovery. In the mentored phase project the candidate will develop a data-driven predictor that will surpass state-of-the-art commercial methods in speed, accuracy, and coverage of diverse organic and metal-organic molecules. Data-mining methods will be developed to curate libraries of structure fragments and torsion angle distributions from experimentally determined structures available in databases. To predict the structure of a new molecule the predictor will decompose it until each component can be built from the libraries using exact or analogous structure matches, and the fragments will be assembled using a combination of physical and statistical energy terms. The predictor will be open-access so that it will benefit the broad scientific community. In the independent phase project the candidate will develop methods to guide protein engineering experiments because engineered proteins have considerable potential to positively impact human health as therapies (e.g., insulin analogues and synthetic antibodies). Coupling effective methods for selecting improved variants with synthetic DNA technologies for introducing variation is a powerful approach to engineering proteins with desirable properties;however, targeting the relevant subset of sequence space to search is critical because the vast majority of the virtually unlimited possible variants (a small protein of 100 amino acids has 20100 possible sequences) will not produce stable proteins, let alone accomplish the engineering goals. Thus the candidate will develop computational methods that will consider evolutionary and structural information for the target protein, engineering goals, previous experimental results, limitations and capabilities of established and emerging synthetic DNA technologies, to direct the exploration of the most relevant subset of sequence space. The resulting software will be made freely available, and will include a tutorial and user- friendly graphical user interface.

Public Health Relevance

Accurate models of molecular structures and their interactions are crucial for our basic understanding of life processes, and for important biomedical applications. In the first phase of this project a small molecule 3D structure predictor that is faster, more accurate, and covers more molecules than current state-of-the-art methods will be developed, and it will be used to populate public databases and in drug discovery research. In the second phase, a software system for optimizing protein engineering experiments will be developed, and it will be used to support the development of protein therapeutics and enzymes for producing drugs.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Career Transition Award (K99)
Project #
5K99LM010821-02
Application #
8142211
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Ye, Jane
Project Start
2010-09-30
Project End
2012-12-02
Budget Start
2011-09-30
Budget End
2012-12-02
Support Year
2
Fiscal Year
2011
Total Cost
$101,874
Indirect Cost
Name
University of California Irvine
Department
Type
Organized Research Units
DUNS #
046705849
City
Irvine
State
CA
Country
United States
Zip Code
92697
Nagata, Ken; Randall, Arlo; Baldi, Pierre (2014) Incorporating post-translational modifications and unnatural amino acids into high-throughput modeling of protein structures. Bioinformatics 30:1681-9
Baum, Elisabeth; Randall, Arlo Z; Zeller, Michael et al. (2013) Inferring epitopes of a polymorphic antigen amidst broadly cross-reactive antibodies using protein microarrays: a study of OspC proteins of Borrelia burgdorferi. PLoS One 8:e67445
Feher, Victoria A; Randall, Arlo; Baldi, Pierre et al. (2013) A 3-dimensional trimeric ?-barrel model for Chlamydia MOMP contains conserved and novel elements of Gram-negative bacterial porins. PLoS One 8:e68934
Lee, Shih-Hui; Hoshino, Yu; Randall, Arlo et al. (2012) Engineered synthetic polymer nanoparticles as IgG affinity ligands. J Am Chem Soc 134:15765-72
Kao, Athit; Randall, Arlo; Yang, Yingying et al. (2012) Mapping the structural topology of the yeast 19S proteasomal regulatory particle using chemical cross-linking and probabilistic modeling. Mol Cell Proteomics 11:1566-77
Nagata, Ken; Randall, Arlo; Baldi, Pierre (2012) SIDEpro: a novel machine learning approach for the fast and accurate prediction of side-chain conformations. Proteins 80:142-53
Srikrishnan, Sneha; Randall, Arlo; Baldi, Pierre et al. (2012) Rationally selected single-site mutants of the Thermoascus aurantiacus endoglucanase increase hydrolytic activity on cellulosic substrates. Biotechnol Bioeng 109:1595-9
Andronico, Alessio; Randall, Arlo; Benz, Ryan W et al. (2011) Data-driven high-throughput prediction of the 3-D structure of small molecules: review and progress. J Chem Inf Model 51:760-76