One challenge in artificial intelligence is to enable natural interactions between people and computers via multiple modalities. It is often desirable to convert information between modalities. One example is the conversion between text and speech using speech synthesis and speech recognition. However, such conversion is rare between other modalities. In particular, relatively little research has considered the transformation from general text to pictorial representations. This project will develop general-purpose Text-to- Picture synthesis algorithms that automatically generate pictures from natural language sentences so that the picture conveys the main meaning of the text. Unlike prior systems that require hand-crafted narrative descriptions of a scene, algorithms will generate static or animated pictures that represent important objects, spatial relations, and actions for general text. Key components include extracting important information from text, generating corresponding images for each piece of information, composing the images into a coherent picture, and evaluation. The proposed approach uses statistical machine learning and draws ideas from automatic machine translation, text summarization, text-to-speech synthesis, computer vision, and graphics. This research will produce computational methods as well as working systems.
Text-to-picture synthesis is likely to have a number of important broad impacts. First, it has the potential for improving literacy across a range of groups including children who need additional support in learning to read, and adults who are learning a second language. Second, it may be used as an assistive communication tool for people with disabilities such as dyslexia and brain damage, and as a universal language when communication is needed simultaneously to many people who speak different languages. Third, it can be a summarization tool for rapidly browsing long text documents. This research will foster collaboration between researchers in computer science and other disciplines, including psychology and education. Results of the project will be disseminated through technical publications, public web pages and software, seminars and talks, and classroom education.