Lifespans continue to increase, chronic disease survival rates are drastically improved, and treatments are being discovered for a variety of illnesses. This rapidly changing scenario requires patients to participate in their recovery, understand written information and directions thereby calling upon patients to have increasingly more complex health literacy. However, time availability of practitioners or other resources to explain the required information has not increased to match. As a result, finding efficient means to improving patient health literacy is an increasingly important topic in healthcare. Increased health literacy may promote healthy lifestyle behaviors and increase access to health services by the population. It has been argued that for the Patient Protection and Affordable Care Act to be successful, more effort is needed to increase the health literacy of millions of Americans. Similarly, the Healthy People 2020 statement by the Department of Health and Human Services identified improving health literacy (HC/HIT-1) as an important national goal. The broad- long term objectives of this project are to contribute to increasing the health literacy of patients and health information consumers and provide caregivers an evidence-based tool for simplifying text. The most commonly used tool for estimating the difficulty of text is the readability formula. They are not sufficient, however, because there is no evidence to support a connection between their use and decreases in difficulty. This problem is addressed by using modern resources and techniques for discovering traits that make health-related text difficult and developing a tool to guide the simplification of text. . There are four specific aims of this project: 1) Identify differentiating features of easy versus difficult texts, 2) Design a simplification strategy using computer algorithms, 3) Measure the impact of simplification on perceived and actual text difficulty with online participants and a representative community sample, 4) Create free, online software that incorporates proven features algorithmically. Corpus analysis will be conducted to compare easy and difficult texts with each other and discover lexical, grammatical, semantic, and composition and discourse features typical for each. Then, simplification algorithms will be designed and developed relying on rule-based techniques to leverage available resources, e.g., vocabularies, or on machine learning approaches for discovering the best combinations of features for simplification. A representative writer will simplify text by relying on the suggestios provided by an online that tool that uses simplification algorithms. The effect of simplification wll be tested in comprehensive user studies to evaluate the effect on both actual and perceived difficulty. Features successfully shown to decrease text difficulty will be incorporated in an onlie software program designed to reduce text difficulty.

Public Health Relevance

Improving health literacy is an important national goal and necessary trait for a healthy population. Providing understandable information is critical but few tools exist to help write understandable text. We aim to discover features indicative of difficult text, design translation algorithms and create a free, online software tool for rewriting health-related text with demonstrated impact on perceived and actual text difficulty

National Institute of Health (NIH)
National Library of Medicine (NLM)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Vanbiervliet, Alan
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Arizona
Sch of Business/Public Admin
United States
Zip Code
Kloehn, Nicholas; Leroy, Gondy; Kauchak, David et al. (2018) Improving Consumer Understanding of Medical Text: Development and Validation of a New SubSimplify Algorithm to Automatically Generate Term Explanations in English and Spanish. J Med Internet Res 20:e10779
Kauchak, David; Leroy, Gondy; Hogue, Alan (2017) Measuring Text Difficulty Using Parse-Tree Frequency. J Assoc Inf Sci Technol 68:2088-2100
Mukherjee, Partha; Leroy, Gondy; Kauchak, David et al. (2017) The Role of Surface, Semantic and Grammatical Features on Simplification of Spanish Medical Texts: A User Study. AMIA Annu Symp Proc 2017:1322-1331
Gu, Yang; Leroy, Gondy; Kauchak, David (2017) When synonyms are not enough: Optimal parenthetical insertion for text simplification. AMIA Annu Symp Proc 2017:810-819
Mukherjee, Partha; Leroy, Gondy; Kauchak, David et al. (2017) NegAIT: A new parser for medical text simplification using morphological, sentential and double negation. J Biomed Inform 69:55-62
Kauchak, David; Leroy, Gondy (2016) Moving Beyond Readability Metrics for Health-Related Text Simplification. IT Prof 18:45-51
Leroy, Gondy; Kauchak, David; Hogue, Alan (2016) Effects on Text Simplification: Evaluation of Splitting Up Noun Phrases. J Health Commun 21 Suppl 1:18-26