This subproject is one of many research subprojects utilizing the resources provided by a Center grant funded by NIH/NCRR. The subproject and investigator (PI) may have received primary funding from another NIH source, and thus could be represented in other CRISP entries. The institution listed is for the Center, which is not necessarily the institution for the investigator. ABSTRACT: This TRD addresses a problem that is paramount in cryo-EM single-particle reconstruction of macromolecules, and that is in many cases the single obstacle preventing the attainment of high resolution (better than 10 ?). This problem is the heterogeneity of molecules in the sample due to partial ligand occupancy and conformational variability. Our focus has been on the implementation and testing of a novel classification algorithm which we published jointly with the group of Jose-Maria Carazo in Madrid in Nature Methods earlier this year (Scheres et al., 2007, previously listed as one of the highlights). We now have access to some supercomputer centers and explore how to efficiently implement the XMIPPS programs from Madrid, so they utilize massively parallel architecture.
Our aim i s to test the performance of the program both on phantom data and on experimental data routinely encountered in applications of the single-particle reconstruction technique. Dr. William Baxter has laid the groundwork for the creation of quantitatively satisfactory phantom data which are suitable for testing classification algorithms. A large heterogeneous dataset (~195,000 particles, from an initial set of ~1,000,000) obtained by Derek Taylor in Dr. Frank's group (eRF1 and eRF3 binding to the eukaryotic ribosome, in collaboration with Dr. Tatyana Pestova, SUNY Downstate Medical Center) was used to explore various classification strategies. We were successful in identifying classes corresponding to complexes in which the ribosome was bound to either or both of the factors, thus shedding light on the termination process in eukaryotes. These results will be written up and prepared for publication.
Specific Aims 1. (Exploration phase): Explore methods of classification of single-particle projections that refine existing template-based approaches, or exploit general intrinsic mathematical relationships among projections of unchanged objects. In this phase of the project, algorithms such as self-organized (SOMs) will be designed, or the utility of existing ones explored. Phantom data sets are derived from existing density maps of molecules or from X-ray structures that present different conformations or states of ligand binding. Such maps are projected systematically into a variety of directions, the resulting projections are low-pass filtered and contaminated with noise. These data will allow a determination of which algorithm or which SOM configuration will perform best at different resolutions and signal-to-noise ratios. 2. (Testing phase): Test the resulting algorithms and SOMs on well-defined experimental cryo-EM data sets from single-particle projects that are conducted within and outside the Wadsworth Center. Ideally, these should be data that have been characterized in previous publications, so that the improvements due to the new classification approaches can be easily assessed. 3. (Dissemination phase): Integrate the software with existing SPIDER software and develop comprehensive documentation. Publication of the underlying concepts in explicit form will also allow other authors of software packages such as EMAN (Ludtke et al., 2001) to implement their own version, for wider dissemination. Choice of Maximum Likelihood Classification (ML3D) as standard A collaboration with Dr. Jose-Maria Carazo group in Spain, our main collaborator in TRD3, produced remarkable results and this has evidently helped to popularize the Maximum-likelihood method within the 3DEM community. 90,000 ribosome images were classified according to EF-G binding and associated """"""""ratcheting"""""""" changes in ribosome conformation. Following collaborative publication of the Nature Methods paper by Scheres et al. in 2007, there has been a surge of applications by several EM groups in the field. Because of the success of this approach, we have stopped pursuing the """"""""cluster tracking"""""""" method (Fu et al., J. Structural Biology 2007) since efforts to expand the cluster tracking globally (in the hands of BMS student Jie Fu, under Dr. Frank's mentorship, and RVBC-supported postdoc Tanvir Shaikh) were unsuccessful (details to be found in Dr. Jie Fu's dissertation). Much larger datasets may be needed to pursue this particular development in the future. One of our collaborators, Dr. Harry Zuzan, is working on a GPU (graphics processing unit) implementation of Scheres'Maximum-likelihood method. Speedups of up to 100 might be expected. Dr. Zuzan is doing this as a private effort as he is now employed by a Pharmacy Company. He has promised to share the software as well as the hardware specifications with us once he succeeds. Construction of a Phantom Dataset To enable an objective comparison of classification methods, or parameter settings of any particular method, we set out to construct a phantom data set based on the E. coli ribosome with and without EF-G bound. We argued that such an effort would not only serve our own optimization efforts, but would also be welcomed by the entire 3DEM community. An analysis of the noise sources showed that an important source of noise, namely structural noise, had been overlooked in all previous attempts to produce phantom data. As described in the previous report, we conducted experiments to estimate the signal-to-noise ratio (SNR) of various steps of EM image formation, including the SNR of structural noise. The method and results of the estimation has now been published in Journal of Structural Biology (Baxter et al., 2008). The article features an estimation of the SNRs along with their spectral distributions (SSNRs). A phantom dataset was computed in a two-step process using the structural (i.e., pre-CTF) and post-CTF SNR values from our estimation, and deposited at the European Bioinformatics Institute (EBI) in Cambridge.
Showing the most recent 10 out of 252 publications