This application proposes new technology development and user studies aiming to facilitate the use of mobile Optical Character Recognition (OCR) for blind people. Mobile OCR systems, implemented as smartphones apps, have recently appeared on the market. This technology unleashes the power of modern computer vision algorithms to enable a blind person to hear (via synthetic speech) the content of printed text imaged by the smartphone's camera. Unlike traditional OCR, that requires scanning of a document with a flatbed scanner, mobile OCR apps enable access to text anywhere, anytime. Using their own smartphones, blind people can read store receipts, menus, flyers, business cards, utility bills, and many other printed documents of the type normally encountered in everyday life. Unfortunately, current mobile OCR systems suffer from a chicken-and-egg problem, which limits their usability. They require the user to take a well-framed snapshot of the document to be scanned, with the full text in view, and at a close enough distance that each character can be well resolved and thus readable by the machine. However, taking a good picture of a document is difficult without sight, and thus without the ability to look at the scene being imaged by the camera through the smartphone's screen. Anecdotal evidence, supported by results of preliminary studies conducted by the principal investigator's group, confirms that acquisition of an OCR-readable image of a document can indeed by very challenging for some blind users. We plan to address this problem by developing and testing a new technique of assisted mobile OCR. As the user aims the camera at the document, the system analyzes in real time the stream of images acquired by the camera, and determines how the camera position and orientation should be adjusted so that an OCR-readable image of the document can be acquired. This information is conveyed to the user via a specially designed acoustic signal. This acoustic feedback allows users to quickly adjust and reorient the camera or the document, resulting in reduced access time and in more satisfactory user experience. Multiple user studies with blind participants are planned with the purpose of selecting an appropriate acoustic interface and of evaluating the effectiveness of the proposed assisted mobile OCR modality.

Public Health Relevance

This application is concerned with the development of new technology designed to facilitate use of mobile Optical Character Recognition (OCR) systems to access printed text without sight. Specifically, this exploratory research will develop and test a novel system that, by means of a specially designed acoustic interface, will help a blind person take a well-framed, well-resolved image of a document for OCR processing using a smartphone or wearable camera. If successful, this novel approach to assisted mobile OCR will reduce access time and improve user experience of blind mobile OCR users.

National Institute of Health (NIH)
National Eye Institute (NEI)
Exploratory/Developmental Grants (R21)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1-ETTN-P (02))
Program Officer
Wiggs, Cheri
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of California Santa Cruz
Engineering (All Types)
Schools of Engineering
Santa Cruz
United States
Zip Code
Qin, Siyang; Manduchi, Roberto (2017) Cascaded Segmentation-Detection Networks for Word-Level Text Spotting. Proc Int Conf Doc Anal Recognit 2017:1275-1282
Cutter, Michael; Manduchi, Roberto (2017) Improving the Accessibility of Mobile OCR Apps Via Interactive Modalities. ACM Trans Access Comput 10:
Qin, Siyang; Manduchi, Roberto (2016) A Fast and Robust Text Spotter. Proc IEEE Workshop Appl Comput Vis 2016:
Cutter, Michael; Manduchi, Roberto (2015) Towards Mobile OCR: How To Take a Good Picture of a Document Without Sight. Proc ACM Symp Doc Eng 2015:75-84