Fundamental intracellular processes of immediate relevance to biomedicine?such as gene regulation and transcription?often involve large clusters of proteins dynamically assembling and disassembling within small diffraction-limited volumes at timescales approaching imaging data acquisition. Despite impressive ?s-ms data collection timescales achieved by many SM ?uorescence methods, single molecule (SM) kinetic parameters are often instead determined from large quantities of data (millions of photons) collected and averaged over long timescales. This compromises the temporal resolution of the data that theoretically encodes information on events that may unfold and be resolved within ms. Drawing insight on complex processes resolved within ms presents a profound analysis challenge. Funda- mentally, this is because highly stochastic SMs are indirectly monitored by the equally stochastic measure- ment output to which SMs are inextricably tied: photons. Our overall objective is therefore to develop a framework to determine dynamical models?relevant downstream to complex intra-cellular processes? resolved at the SM level from very limited data (i.e., time traces tens of ms or thousand of photons). For this FTRD grant, our focus is on benchmarking our framework on simple in vitro test data sets. To resolve these fast dynamics, we will rely on cutting-edge tools from Data Science and Statistics termed Bayesian nonparametrics (BNPs) largely unknown to the Natural Sciences. Here we will adapt BNP tools? some less than ?ve years old and proposed here for the ?rst time for Natural Science applications?to provide a fundamentally new treatment of data derived from confocal setups (Speci?c Aim I) and single molecule ?u- orescence resonance energy transfer termed smFRET (Speci?c Aim II)?both workhorses across Biology. As BNPs are highly ?exible, we develop strategies to rigorously constrain them with knowledge of the measure- ment process, e.g., the shape of the point spread function. For both Speci?c Aims, we will develop fully-integrated and unsupervised methods to resolve SM dynamical models from ms worth of data by exploiting BNPs. In particular for Speci?c Aim I, we will do so starting from single photon arrivals derived from confocal experiments. We will determine diffusive species numbers (relevant in dealing with multimeric mixtures) as well as the diffusion coef?cients for each species. By resolving diffusion coef?cients with the same precision as FCS from just thousands (as opposed to millions) of photons, we could collect far shorter traces thereby dramatically minimizing sample photo-damage. Alternatively, we could use long traces to resolve previously indeterminable quantities, e.g., diffusion coef?cient differences in multimeric mixtures. For Speci?c Aim II we will determine quantities normally derived from current smFRET analysis but now accounting for spectral cross-talk, label blinking and determine the number of molecular states. Accounting for such photo-physics deeply in?uences our ultimate interpretation of smFRET data.

Public Health Relevance

Fundamental intracellular processes immediately relevant to biomedicine (e.g., gene regulation and transcrip- tion) often involve large and active protein clusters with dynamics occurring on short timescales approaching imaging data acquisition. Here we propose a new mathematical framework capable of unraveling fast pro- cesses down to the single molecule level from just a few milli-seconds of data. To achieve this, we adapt novel tools from Data Science and Statistics ideally suited to determine complex models from minimal, and thus uncertain, data.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Sammak, Paul J
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Arizona State University-Tempe Campus
Schools of Arts and Sciences
United States
Zip Code