Finding patterns in data is one of the most challenging open questions in infor- mation science. The number of possible relationships scales combinatorially with the size of the dataset, overwhelming the exponential increase in avail- ability of computational resources. Physical insights have been instrumental in developing efficient computational heuristics. Using quantum field theory methods and rethinking three centuries of Bayesian inference, we formulated the problem in terms of finding landscapes of patterns and solved this problem exactly. The generality of our calculus is illustrated by applying it to handwritten digit images and to finding structural features in proteins from sequence alignments without any presumptions about model priors suited to specific datasets. We are applying this calculus to several problems besides protein structure: (1) Transcription start site identification at a genome scale from CAGE-seq data;(2) Commensal bacteria interactions;(3) Identifying dynamical systems from time-course data;(4) Calculating biophysical quantitative sequence activity models directly from massively parallel reporter assays without optimization;(5) Deriving the graph of interactions directly from expression data;(6) Direct computation of SNP interactions from genome-wide association studies.

Project Start
Project End
Budget Start
Budget End
Support Year
2
Fiscal Year
2014
Total Cost
Indirect Cost
Name
U.S. National Inst Diabetes/Digst/Kidney
Department
Type
DUNS #
City
State
Country
Zip Code
Striegel, D A; Wojtowicz, D; Przytycka, T M et al. (2016) Correlated rigid modes in protein families. Phys Biol 13:025003
Shreif, Zeina Z; Gatti, Daniel M; Periwal, Vipul (2016) Block network mapping approach to quantitative trait locus analysis. BMC Bioinformatics 17:544
Shreif, Zeina; Striegel, Deborah A; Periwal, Vipul (2015) The jigsaw puzzle of sequence phenotype inference: Piecing together Shannon entropy, importance sampling, and Empirical Bayes. J Theor Biol 380:399-413