This Small Business Innovation Research (SBIR) Phase I research project focusses on the development of ground-breaking real-time algorithms for automatically finding and recognizing text in digital video of complex 3-D environments using machine learning of fonts and text strings. Essentially, the project takes OCR from being a technology for 2-D documents and brings it to the 3-D world. The project builds on algorithms for optical character recognition (OCR) of documents where conventional OCR fails: colorful brochures, magazine covers, and other sources where photographs, line art, and arbitrarily-rotated text greatly complicate the OCR process. The project aims to build on this technology to find solutions to the finding and recognizing text in complex 3-D real world scenes such as street signs and storefronts where the text may be at any arbitrary 3-D angle to the camera. Critical to the success of this project is the algorithm's capability for machine learning of fonts.
There are a number of exciting applications that are impacted by accurate OCR from video sources. While OCR of text in video sources can be done, it usually must be on plainly obvious text, such as subtitles, and it cannot be done in real-time. Real-time and accurate video OCR would enable applications that include 1) Unaided indexing of digital video footage by the text contained therein, 2) aiding the blind navigate independently, both indoors and outdoors, 3) automated continuous roadside or vehicle based license plate scanning, and 4) as ground truth for improved GPS accuracy. Markets for the technology therefore include individuals, corporations, and government agencies. The societal impacts include 1) rendering digitized video libraries searchable by more metadata tags at low cost, 2) greater independence and safety for the blind, 3) improving road safety through automatically identifying cars reported stolen or cars owned by people with suspended licenses, and 4) improved GPS navigation accuracy. Technological impacts will be in the areas of machine learning applied to video OCR, real-time OCR, and low-resolution OCR.