Information theory is a powerful tool for understanding the DNA and RNA patterns that define genetic control systems. My theoretical workis divided into several levels. Level 0 is the study of geneticsequences bound by proteins or other macromolecules, briefly describedbelow. The success of this theory suggested that other aspects ofinformation theory should also apply to molecular biology. Level 1theory introduces the more general concept of the molecular machine,and the concept of a machine capacity equivalent to Shannon's channelcapacity. In Level 2, the Second Law of Thermodynamics is connected tothe capacity theorem. This defines the limits of Maxwell's Demon andfuture molecular computers. The project also has three interrelatedactivities: theory, computer analysis and genetic engineeringexperiments. In level 0 I showed that binding sites on nucleic acidsusually contain just about the amount of information needed formolecules to find the sites in the genome. Apparent exceptions to this""""""""working hypothesis"""""""" have revealed many new phenomena. The first majoranomaly was found at bacteriophage T7 promoters, which conserve twiceas much information as the polymerase requires to locate them. Themost likely explanation is that a second protein binds to the DNA. Inanother case, we discovered that the F incD region has a three-foldexcess conservation, which implies that three proteins bind there. Weare investigating these and other anomalies experimentally. Ananomaly in the binding sites for the P1 RepA protein led to thehypothesis that the initial step of DNA replication and RNAtranscription is a base flipped out from the DNA. The experimentalevidence supports this hypothesis. Two graphical methods have beeninvented to display the structure of binding sites. A sequence logoshows the average patterns in a set of binding sites. The patentedsequence walker shows individual binding sites. Displaying manywalkers simultaneously has become such a powerful tool forinvestigating genetic structure that it will undoubtedly replaceconsensus sequences. Walkers can be used to distinguish mutations frompolymorphisms, and this has clinical applications. Threenanotechnology projects are in progress: a molecular momputer(European Patent 1057118, United States Patent 6,774,222), a methodfor molecular sequencing (patent pending), and a molecular engine(patent pending). See www.lecb.ncifcrf.gov/toms/ for further information.
Showing the most recent 10 out of 20 publications