The objective of this research is to develop a system for similarity based retrieval from video and pictorial databases. The system is based on an expressive query language called Hierarchical Temporal Logic (HTL) that represents the temporal and pictorial content of video data. HTL predicates are then matched with the database using multidimensional similarity measures. The approach includes development of representation and preprocessing methods for pictorial feature extraction both for generic objects and generic motion. The features extracted include spectral signatures of shape and motion, local and global color histograms and texture. Articulated motion in video is decomposed into basic components using Motion Spectral Signatures (MSS). Generic objects are recognized using Affine Invariant Spectral Signatures (AISS) and Expansion Matching (EXM) based indexing in the frequency domain. Motion and object representations are then utilized to construct a dictionary of HTL atomic predicates with multidimensional hyper-regional representation. An on-line learning capability for expanding the dictionary is also planned. Improved methods for video shot and scene segmentation are to be developed as well. The results of this research will provide an efficient means for query retrieval from video and pictorial databases. Such a system will have many applications in digital libraries and in TV movie industries.