The Internet has become the primary medium for accessing information and for conducting many types of online transactions, including shopping, paying bills, making travel plans, applying for college or employment, and participating in civic activities. The primary mode of interaction over the Internet is via graphical browsers designed for visual navigation. This seriously limits the access of people with impaired vision or blindness, a population that is large and growing ever larger. Existing assistive technology for non-visual Internet access typically forces users with visual impairments into an inefficient, sequential mode of information access. To do better, two kinds of models are needed. First, we need to build computational models to represent the structure of web pages and online transactions, and to present them effectively using non-visual modalities. Second, we need to better understand how users' mental models for online transactions are built and utilized; we then need to align the computational models with the users' mental models, so as to combine their strengths and significantly improve the efficiency of non-visual interactions. In previous work, the PI developed the HearSay non-visual web browser, which permits users to perform basic non-visual web browsing and search, contextual browsing, and online form-filling. However, HearSay does not take full advantage of the interaction context or the unique perceptual and processing strengths of people with visual impairments. In the current project, the PI seeks to combine basic computational and psychological research designed to produce accessibility technology embodying the synergy of computational modeling and users' mental models. In terms of computational research, the PI will: (i) automatically track the interaction context of user browsing actions; (ii) automatically build models for transactions that users perform online; and (iii) develop ways in which users can interact with transaction models through non-visual modalities efficiently and effectively. In terms of psychological research, user studies will be conducted to examine (i) how people build mental models for online transactions, and (ii) how they use modality-specific cues and their own short-term memory to utilize these mental models. The PI will incorporate the findings from these user studies into the computational models for online transaction processing, so as to align them with the users' mental models.

Broader Impacts: The ultimate goal of the PI's research is to empower people with visual impairments to lead completely independent lives with the help of the Internet. To this end, the PI has planned an extensive dissemination campaign involving workshops, collaborations with institutions that serve people who have visual impairments, and online dissemination of HearSay prototypes and HearSay component technologies. HearSay will also provide a means, in principle, for anyone who wishes to have non-visual Internet access (e.g., listening to Internet content while driving).

Project Report

The World Wide Web has become the primary medium for accessing information and for conducting many types of online transactions, including shopping online, paying bills, making travel plans, applying for college or employment, and participating in civic activities. The primary mode of interaction with the Web is via graphical browsers designed for visual navigation. This seriously limits the access of people with impaired vision or blindness, a population that is large and growing ever larger. Just to give a sense of the size of the target population: in the U.S. alone, there are over 10 million visually impaired people, of whom approximately 1.3 million people are legally blind, as reported by the American Foundation for the Blind, and, according to the World Health Organization’s 2003 report there are more than 175 million people with visual impairment worldwide (40 to 45 million blind and 135 million with low vision). Blind people use screen readers to access the Web. Screen-readers and more generally non-visual web browsers typically read out the content of the screen, ignoring the graphics and layout of web pages while giving other audio or Braille feedback to help navigate web pages. However a large gap remains between the ways sighted and blind users browse the Web due to the differences in their perception and the mode of interaction. Sighted users can quickly process web content by visually segmenting any web page into sections, classifying the whole page and its parts, finding patterns, filtering out irrelevant information, and quickly identifying possible steps and actions they can take on any web page (e.g.: log in, add to cart, etc.). On the other hand, blind users are forced to process information sequentially, as screen readers read the page in the order it is presented in web pages. While shortcuts are provided for skipping through text or reading it more rapidly, blind users still have to listen to much irrelevant content before reaching the content of interest as screen-readers provide almost no content-analysis to facilitate access to relevant information. When a blind user visits a web page for the first time, s/he cannot easily tell, without listening to all of it, how much information it contains. Navigating back and forth among pages, blind users often have to listen to redundant information, design strategies to find relevant content, or remember page structure to make web browsing more efficient. All of these difficulties make non-visual web browsing slow and difficult. In short, blind users can experience considerable information overload when using assistive tools. This is especially true of web transactions such as shopping, registrations, online banking and bill-payments, which often involve a number of steps spanning several web pages To significantly advance the state of the art in web accessibility technology for blind people the project proposed developing a computational model based on how blind users use the non-visual interface for browsing and search and on how they use non-visual contextual cues that are constructed and maintained during non-visual transactions. The computational model would facilitate repositioning of web content for doing efficient search and retrieval of information as well as for conducting online transactions with non-visual modalities, namely keyboards and audio. Development of such a model was the principal objective of this research project. A major outcome of the project is Hearsay, a multi-modal non-visual web browser incorporating such a model. Four years in the making HearSay is a working system. On the algorithmic side, it incorporates a number of robust and scalable techniques based on Information Retrieval and Machine Learning, including: content analysis that partitions web pages into meaningful sections for ease of navigation; context-directed browsing that exploits the content surrounding a link to find relevant information as users move from page to page; detection of actionable objects that help users quickly perform web transactions; detection and handling of changes in web pages, helping users stay focused; statistical models for associating labels with web elements even if they are missing, as in images without alternative text; personalized speech-enabled macros for automating repetitive tasks; automatically learned accessibility models for facilitating online transactions using non-visual modalities and statistical language detection, etc. On the interface side, HearSay supports multiple output (audio, visual, Braille) and input (speech, keyboard, touch, phone keypad) modalities. It can be used as a desktop application, or can be used remotely via the plain phone. The power and usability of Hearsay has been demonstrated in a series of user studies with blind subjects. Hearsay has gone a long way towards bridging the Web Accessibility divide between ways sighted and blind people browse and use the Web. From a broader perspective, Hearsay exemplifies the vision of the Universally Accessible Web whose thesis is "equal access for all", i.e. anyone should be able to reap the benefits of the Web without being constrained by any disability.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
0808678
Program Officer
Ephraim P. Glinert
Project Start
Project End
Budget Start
2008-09-01
Budget End
2014-08-31
Support Year
Fiscal Year
2008
Total Cost
$1,623,540
Indirect Cost
Name
State University New York Stony Brook
Department
Type
DUNS #
City
Stony Brook
State
NY
Country
United States
Zip Code
11794