The proposal """"""""The Crystallography of Macromolecules"""""""" addresses the limitations of diffraction data analysis methods in the field of X-ray crystallography. The significance of this work is determined by the importance of the technique, which generates uniquely-detailed information about cellular processes at the atomic level. The structural results obtained with crystallography are used to explain and validate results obtain by other biophysical, biochemical and cell biology techniques, to generate hypotheses for detailed studies of cellular process and to guide drug design studies - all of which are highly relevant to NIH mission. The proposal focuses on method development to address a frequent situation, where the crystal size and order is insufficient to obtain a structure from a single crystal. This is particularly frequent in cases of large eukaryotic complexes and membrane proteins, where the structural information is the most valuable to the NIH mission. The diffraction power of a single crystal is directly related to the microscopic order and size of that specimen. It is also one of the main correlates of structure solution success. The method used to solve the problem of data insufficiency in the case of a single crystal is to use multiple crystals and to average data between them, which allows to retrieve even very low signals. However, different crystals of the same protein, even if they are very similar i.e. have the same crystal lattice symmetry and very similar unit cell dimensions, still are characterized by a somewhat different order. This non-isomorphism is often high enough to make their solution with averaged data impossible. Moreover, the use of multiple data sets complicates decision making as each of the datasets contains different information and it is not clear when and how to combine them. The proposed solution relies on hierarchical analysis. First, the shape of the diffraction spot profiles will be modeled using a novel approach (Aim 1). This will form the ground for the next step, in which deconvolution of overlapping Bragg spot profiles from multiple lattices will be achieved (Aim 2). An additional benefit of algorithms developed in Aim 1 is that they will automatically derive the integration parameters and identify artifacts, making the whole process more robust. This is particularly significant for high-throughput and multiple crystal analysis.
In Aim 3, comparison of data from multiple crystals will be performed to identify subsets of data that should be merged to produce optimal results. The critical aspect of this analysis will be the identification and assessment of non- isomorphism between datasets. The experimental decision-making strategy is the subject of Aim 4. The Support Vector Machine (SVM) method will be used to evaluate the suitability of available datasets for possible methods of structure solution. In cases of insufficient data it will identify the most significant factor that needs to be improved.
Aim 5 is to simplify navigation of data reduction and to integrate the results of previous aims with other improvements in hardware and computing.

Public Health Relevance

The goal of the proposal is to develop methods for analysis of X-ray diffraction data with a particular focus on the novel analysis of diffraction spot shape and the streamlining of data analysis in multi-crystal modes. The development of such methods is essential to advance structural studies in thousands of projects, which individually are important for NIH mission.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Macromolecular Structure and Function D Study Section (MSFD)
Program Officer
Edmonds, Charles G
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Texas Sw Medical Center Dallas
Schools of Medicine
United States
Zip Code
Zheng, Heping; Porebski, Przemyslaw J; Grabowski, Marek et al. (2017) Databases, Repositories, and Other Data Resources in Structural Biology. Methods Mol Biol 1607:643-665
Dobosz-Bartoszek, Malgorzata; Pinkerton, Mark H; Otwinowski, Zbyszek et al. (2016) Crystal structures of the human elongation factor eEFSec suggest a non-canonical mechanism for selenocysteine incorporation. Nat Commun 7:12941
Li, Wenlin; Schaeffer, R Dustin; Otwinowski, Zbyszek et al. (2016) Estimation of Uncertainties in the Global Distance Test (GDT_TS) for CASP Models. PLoS One 11:e0154786
Alkire, R W; Rotella, F J; Duke, N E C et al. (2016) Taking a look at the calibration of a CCD detector with a fiber-optic taper. J Appl Crystallogr 49:415-425
Porebski, Przemyslaw Jerzy; Cymborowski, Marcin; Pasenkiewicz-Gierula, Marta et al. (2016) Fitmunk: improving protein structures by accurate, automatic modeling of side-chain conformations. Acta Crystallogr D Struct Biol 72:266-80
Minor, Wladek; Dauter, Zbigniew; Jaskolski, Mariusz (2016) The young person's guide to the PDB. Postepy Biochem 62:242-249
Kikuchi, Sotaro; Borek, Dominika M; Otwinowski, Zbyszek et al. (2016) Crystal structure of the cohesin loader Scc2 and insight into cohesinopathy. Proc Natl Acad Sci U S A 113:12444-12449
Bromberg, Raquel; Grishin, Nick V; Otwinowski, Zbyszek (2016) Phylogeny Reconstruction with Alignment-Free Method That Corrects for Horizontal Gene Transfer. PLoS Comput Biol 12:e1004985
Lee, Jyh-Yeuan; Kinch, Lisa N; Borek, Dominika M et al. (2016) Crystal structure of the human sterol transporter ABCG5/ABCG8. Nature 533:561-4
Meyer, Peter A; Socias, Stephanie; Key, Jason et al. (2016) Data publication with the structural biology data grid supports live analysis. Nat Commun 7:10882

Showing the most recent 10 out of 77 publications