Cellular cryo-electron tomography (Cryo-ET) has made possible the observation of cellular organelles and macromolecular complexes at nanometer resolution with native conformations. The rapid increasing amount of Cryo-ET data available however brings along some major challenges for analysis which we will timely ad- dress in this proposal. We will design novel data-driven machine learning algorithms for improving structural discrimination and resolution. In particular, we have the following speci?c aims: (1) We will develop a novel Autoencoder and Iterative region Matching (AIM) algorithm for marker-free alignment of image tilt-series to re- construct tomograms with improved resolution; (2) We will develop a saliency-based auto-picking algorithm for better detecting macromolecular complexes, and combine it with an innovative 2D-to-3D framework to further improve structure detection accuracy; (3) We will design an end-to-end convolutional model for pose-invariant clustering of subtomograms. This model will produce an initial clustering which will be re?ned by a new subto- mogram averaging algorithm that automatically down-weights subtomograms of noise and little contribution; (4) We will perform experimental evaluations by using previously reported bacterial secretion systems and mito- chondrial ultrastructures datasets to improve the ?nal resolution. Implementing algorithms in Aims 1-3, we will develop a user-friendly open-source graphical user interface -tom to directly bene?t the scienti?c community. -tom will be systematically compared with existing software including IMOD, EMAN2, and Relion on simulated and benchmark datasets. To facilitate distribution, -tom will be integrated into existing software platforms Sci- pion and TomoMiner. Our data-driven algorithms and software not only will facilitate and accelerate the future use of Cryo-ET, but also can be readily used on analyzing the existing large amounts of Cryo-ET data to im- prove our understanding of the structure, function, and spatial organization of macromolecular complexes in situ.

Public Health Relevance

This project will create a system of machine learning algorithms to accelerate and facilitate the use and re-use of the rapidly accumulating Cryo-ET datasets. For easy use, we will develop an open-source GUI -Tom (to be disseminated into the Scipion and TomoMiner software platforms) that streamlines the new approaches from the initial tomogram reconstruction step to the ?nal subtomogram averaging step. We will validate the performance of our system by applying it on published Cryo-ET datasets and monitor the improvement of the ?nal results.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
1R01GM134020-01A1
Application #
9973462
Study Section
Macromolecular Structure and Function D Study Section (MSFD)
Program Officer
Wu, Mary Ann
Project Start
2020-06-10
Project End
2024-05-31
Budget Start
2020-06-10
Budget End
2021-05-31
Support Year
1
Fiscal Year
2020
Total Cost
Indirect Cost
Name
Carnegie-Mellon University
Department
Biostatistics & Other Math Sci
Type
Schools of Arts and Sciences
DUNS #
052184116
City
Pittsburgh
State
PA
Country
United States
Zip Code
15213