This is an integrated research program on a new area with a rich array of applications: the problem of recovering a signal from its noise-corrupted version. The recovery can assume two major modes depending on the application: noncausal, i.e. it starts once the entire signal is available; and causal, i.e. decisions must be made immediately after each symbol is received. Our focus is on the theory and practice of denoising for the case where the alphabet of the noiseless, as well as that of the noise-corrupted signal, are finite. The problem arises in a variety of situations ranging from typing and/or spelling correction to hidden Markov process state estimation; from DNA sequence analysis and processing to enhancement of facsimile and other binary images; from blind equalization problems to joint source-channel decoding when a discrete source is sent unencoded through a discrete noisy channel. Certain instances of the discrete denoising problem have been studied, particularly in the context of state estimation for hidden Markov processes, assuming that the signal statistics are known. However, the literature on the universal setting, where there is uncertainty regarding the distribution of the underlying noiseless signal and/or regarding the noise-corrupting mechanism, has been sparse until this research. The compression-based approach to universal discrete denoising that preceded our research gave rise to schemes that were not implementable with reasonable complexity, and were considerably suboptimal relative to the case where the statistics are known. This research develops universal algorithms for discrete denoising without the need to know the statistical characterization of the noiseless signal. Promising results are obtained for the case of an unknown discrete source corrupted by various types of discrete channels. We show that it is possible to achieve universally the same asymptotic performance under any given distortion criterion as an algorithm that knows, and is specifically tailored for, the input statistics. Furthermore, we accomplish this with computational complexity that grows linearly with the size of the data.