The candidate, Dr. Arlo Randall, is performing directed research in the Institute for Genomics and Bioinformatics (IGB) at the University of California, Irvine (UCI). His mentor, Dr. Pierre Baldi, is a Professor in the Department of Computer Science, with a joint appointment in the Department of Biological Chemistry in the School of Medicine. The IGB fosters and promotes innovative basic and applied research at the intersection of the computational and life-sciences, and is educating the next generation of computational biologists. Accurate models of molecular structures and their interactions are crucial for our basic understanding of life processes and for important biomedical applications. The candidate is dedicated to the development of computational methods for predicting and engineering molecular structures, from the scale of small organic molecules made up of a few tens of atoms to large protein complexes consisting of many thousands of atoms. Accurate small molecule 3D structure prediction is fundamental for computational biochemistry and predicted models support practical applications such as drug discovery. In the mentored phase project the candidate will develop a data-driven predictor that will surpass state-of-the-art commercial methods in speed, accuracy, and coverage of diverse organic and metal-organic molecules. Data-mining methods will be developed to curate libraries of structure fragments and torsion angle distributions from experimentally determined structures available in databases. To predict the structure of a new molecule the predictor will decompose it until each component can be built from the libraries using exact or analogous structure matches, and the fragments will be assembled using a combination of physical and statistical energy terms. The predictor will be open-access so that it will benefit the broad scientific community. In the independent phase project the candidate will develop methods to guide protein engineering experiments because engineered proteins have considerable potential to positively impact human health as therapies (e.g., insulin analogues and synthetic antibodies). Coupling effective methods for selecting improved variants with synthetic DNA technologies for introducing variation is a powerful approach to engineering proteins with desirable properties;however, targeting the relevant subset of sequence space to search is critical because the vast majority of the virtually unlimited possible variants (a small protein of 100 amino acids has 20100 possible sequences) will not produce stable proteins, let alone accomplish the engineering goals. Thus the candidate will develop computational methods that will consider evolutionary and structural information for the target protein, engineering goals, previous experimental results, limitations and capabilities of established and emerging synthetic DNA technologies, to direct the exploration of the most relevant subset of sequence space. The resulting software will be made freely available, and will include a tutorial and user- friendly graphical user interface.
Accurate models of molecular structures and their interactions are crucial for our basic understanding of life processes, and for important biomedical applications. In the first phase of this project a small molecule 3D structure predictor that is faster, more accurate, and covers more molecules than current state-of-the-art methods will be developed, and it will be used to populate public databases and in drug discovery research. In the second phase, a software system for optimizing protein engineering experiments will be developed, and it will be used to support the development of protein therapeutics and enzymes for producing drugs.