This project undertakes the first steps necessary to learn which simplifications and transformations in medical text increase understanding. A corpus of documents from sites such as WebMD, government health sites, patient educational material and patient blogs will be compared for grammatical and vocabulary features and their frequency of occurrence in both complex and simplified document styles. The goal will be to find structures, - new or previously found by others to be associated with understanding, - that appear in one set but not in the other, or with significantly lower frequency. This corpus will then be used to develop a second corpus with sentences containing the difficult linguistic structures and a parallel corpus with simplified versions. A user study will help relate understanding to specific structures and vocabulary. The project will focus on seniors because they constitute a large and growing portion of health information consumers. If successful, this project will lead to the development of a metric that reflects text characteristics associated with comprehension difficulties and the development of an ""intra-lingual machine translation"" program to move from difficult to easier-to-understand text. The intellectual merit lies in discovery of systematic differences in linguistic features in health and medical text that can be measured and that are associated with understanding by senior readers. The project is especially suitable as a SGER project because studies are necessary to evaluate the degree to which automatic text simplification can help. Such automatic simplification of medical information must be absolutely accurate. ""Simplifications"" that result in a different meaning are not acceptable within the healthcare field. At the same time, it must be fully automatic if it is be useful in simplifying the vast amounts of text already on line and that continues to be produced. High quality, fully automatic machine translation is currently not achievable on unrestricted text, so this research goal must be classified as of fairly high risk. However, limitation of the domain to medical texts and the application to within-language ""translation"" make this goal more plausible.
Millions of people read health information online but many lack understanding of this information. Such misunderstanding of health information increases the number of unwise decisions and leads to poorer health and higher healthcare costs. Even a small improvement in readers' understanding will have a significant impact because it may lead to fewer unwise decisions. The broader impact lies in computational approaches to automatic simplification of medical texts and the impact that even a small increase in understanding may have on healthcare. This research, if successful, will point the way towards structures in text suitable for automatic simplification of medical texts. This has the potential to make the vast amount of web-based medical and health information more accessible to consumers, resulting in more informed patients, and ultimately better outcomes. The research may also provide some guidelines for the newly emerging phenomenon of electronic communication between healthcare providers and patients.