Visually impaired individuals have achieved impressive autonomy using a combination of traditional aids (dog guides and long canes) and more recent advances such as global positioning systems, reading devices for printed text (e.g., the Kurzweil-National Federation for the Blind text-to-speech reader), and other technologies. Still, the desire or need to read street signs, store front banners, marquees, and other forms of text that are ubiquitous in the world cannot be met without help from another person. Our goal is to develop software for reading text in complex indoor and outdoor environments. We believe this is the key missing piece of technology in a universal reader, a device that could assist those who are blind and visually impaired in navigating and operating in natural scenes and environments. We propose the development of new algorithms and software for such a device. Specifically, the software will read text from digital camera input in highly diverse and complex environments, such as those found in street scenes or inside commercial buildings and convert that text to speech or Braille for use by people who are blind. We focus on three central issues: Accuracy. The most fundamental task is to increase the accuracy of the basic text detection and recognition algorithms. Current systems simply do not recognize enough words to be practically useful. Incorporating user input and goals.
We aim to develop software mechanisms whereby a user can provide essential input to the device to appropriately narrow the image analysis when appropriate. For example, if the user can specify that he or she is seeking """"""""coffee"""""""", then the search can be tailored to the user's request, lowering the detection threshold for text that matches the user's goal, and pruning out irrelevant text. Graceful failure. Just as important as increasing accuracy-perhaps even more important to the user of such a device-is to provide what we refer to as graceful failure, i.e. the minimization of harmful effects due to errors made by the device. This is a critical and often overlooked aspect of such systems. It is essential that the user of such a device not be misled into believing that the device is correct when it is not, for example, when crossing the street. Because the technical task of reading text in outdoor environments can be arbitrarily difficult, the device will inevitably make errors. It is a primary goal to design software that produces feedback about the confidence level of returned results, and other cues that will mitigate the impact of errors. The user can then assess the reliability of the information provided by the device and make an intelligent decision about whether to accept the results, depending upon the specifics of the current situation. Currently people who are visually impaired must rely heavily on others who are sighted to travel and destinations that are important for everyday living. The goal of this project is to produce software for a device that can read (and speak) words on signs, placards, marquees, and store fronts to visually impaired users. Such a device would dramatically increase the independence and autonomy of such individuals. ? ? ?

Agency
National Institute of Health (NIH)
Institute
National Eye Institute (NEI)
Type
Exploratory/Developmental Grants (R21)
Project #
1R21EY018398-01
Application #
7298443
Study Section
Special Emphasis Panel (ZRG1-BDCN-F (12))
Program Officer
Oberdorfer, Michael
Project Start
2007-09-01
Project End
2009-08-31
Budget Start
2007-09-01
Budget End
2008-08-31
Support Year
1
Fiscal Year
2007
Total Cost
$228,843
Indirect Cost
Name
University of Massachusetts Amherst
Department
Biostatistics & Other Math Sci
Type
Schools of Arts and Sciences
DUNS #
153926712
City
Amherst
State
MA
Country
United States
Zip Code
01003
Weinman, Jerod J; Learned-Miller, Erik; Hanson, Allen R (2009) Scene text recognition using similarity and a lexicon with sparse belief propagation. IEEE Trans Pattern Anal Mach Intell 31:1733-46