Finding patterns in data is one of the most challenging open questions in infor- mation science. The number of possible relationships scales combinatorially with the size of the dataset, overwhelming the exponential increase in avail- ability of computational resources. Physical insights have been instrumental in developing efficient computational heuristics. Using quantum field theory methods and rethinking three centuries of Bayesian inference, we formulated the problem in terms of finding landscapes of patterns and solved this problem exactly. The generality of our calculus is illustrated by applying it to handwritten digit images and to finding structural features in proteins from sequence alignments without any presumptions about model priors suited to specific datasets. We are applying this calculus to several problems besides protein structure: (1) Transcription start site identification at a genome scale from CAGE-seq data;(2) Commensal bacteria interactions;(3) Identifying dynamical systems from time-course data;(4) Calculating biophysical quantitative sequence activity models directly from massively parallel reporter assays without optimization;(5) Deriving the graph of interactions directly from expression data;(6) Direct computation of SNP interactions from genome-wide association studies.
|Striegel, D A; Wojtowicz, D; Przytycka, T M et al. (2016) Correlated rigid modes in protein families. Phys Biol 13:025003|
|Shreif, Zeina; Striegel, Deborah A; Periwal, Vipul (2015) The jigsaw puzzle of sequence phenotype inference: Piecing together Shannon entropy, importance sampling, and Empirical Bayes. J Theor Biol 380:399-413|