The biological interpretation of the data generated by large scale sequencing projects is plagued by a high rate of false positive results concerning the location of exons and coding regions, and /or the statistical significance of similarities found with the existing databases. This situation leads to unmanageable large program outputs the size and the noise/signal ratio of which obscure the truly relevant findings. This problem can be attributed to large fluctuations in the local information content of both natural and database sequences. After classifying the redundancies (repeats) in three categories, we have developed two programs (XNU and Xblast) that are now routinely used in an information enhancement step prior to the analysis of large body of sequence data. Following this steps, the output of gene identification and sequence comparison programs become biologically and statistically interpretable without further processing. The power of this approach was illustrated by the analysis of large human genomic contigs (90 kb from the HLA class III region on chromosome 6, 67 kb from Xp22.3 region) as well as from the analysis of large EST data sets.