The majority of current computer speech recognition systems model the speech signal with homogeneous observation frames, represent words by a string of phonemes, and rely heavily on statistical word-based language models to decode the underlying word sequence. This project aims to investigate an alternative approach that incorporates many more levels of linguistic information into a parsimonious hierarchical framework for speech recognition and understanding. This approach will provide new perspectives on incorporating constraints from the distinctive feature, phonetic, phonological, syllabic, morphological, lexical, syntactic, and semantic levels into a probabilistic framework for speech recognition and understanding. Structure sharing of sub-word levels across words will allow for the generalization of phonological effects across similar environments and increased flexibility for dynamic vocabularies and language models. Structure sharing should also produce a more efficient search with a smaller number of parameters. The proposed hierarchical framework also has the potential of serving as a recognition kernel, with the speech signal as input and a set of morpho-phonological units as output. This kernel would have a finite inventory of units for a given language, whose internals will be vocabulary and task independent. To ensure that the proposed framework is language independent, its utility will also be investigated for languages other than English.