The scope of this project is the exploratory application of information theory to basic and clinical research on the relationships between sequences of DNA, of RNA, and of the related proteins. The information theory portion is largely based on the work of Dr. T. Schneider of NCI, et al., on DNA splice site analyses. Collaborative development of processing algorithms for the information content of macromolecular sequences as well as communication of data, processing methods, and results among researchers in diverse fields are involved. The coding and combinatorial part is based on work done with Dr. M. Eden.? ? Most of the effort on this project this year has involved trying to link our locally accurate and precise measures of splice site position and strength to the more global models used for exon and intron prediction by other groups. Sequences of primary interest are those associated with the ASPM gene. This has proved to be very challenging. However, we feel it is most worthwhile because with our information-theoretic approach, we can find and predict splice site locations to single base pair precision. In fact, we have predicted splitting of sites involving adjacent base pairs from the genomic DNA, whereas many of the other methods cannot specify splicing locations to less than a half-dozen base pairs from the genomic sequence alone.? ? A question of very general interest to us is just how much prediction of alternative splicing can be done on the basis of the genomic DNA sequence, and how much is related to local and global environmental factors. It is clear that only some of the splicing variation associated with mutations can be explained by local factors, even with considerable variation in wild type and cryptic strength variations.? ? Beyond this, we have recently become interested in some alternative structures. It is clear that many features found in molecular biological sequences, such as repeated structures and translocations, cannot be explained by a splicing model. However, it is also clear that the resulting molecules, such as mRNA, must satisfy some very meaningful constraints to be viable. For example, nonsense structures are degraded rapidly in any healthy, and many diseased, organisms.? ? The literature in unit-distance codes has been searched. Considerable progress has been made in counting the number of codes found for different numbers of bits and under selected conditions. Less has been done on determining families and groups of codes. We have started to expand our work in this area, building on our previous results on groups that are reachable with selected simple local operations that preserve global properties of code segments. We feel that this approach will allow us to incorporate biological and other constraints in a natural and effective manner.

Project Start
Project End
Budget Start
Budget End
Support Year
2
Fiscal Year
2008
Total Cost
$10,345
Indirect Cost
Name
National Institute of Biomedical Imaging and Bioengineering
Department
Type
DUNS #
City
State
Country
United States
Zip Code