Adobe’s Portable Document Format (PDF) has become the standard for electronic documents. Academic and collegiate papers, business write-ups and fact sheets, advertisements for print, and anything else meant to be viewed as a final product make use of the PDF standard. End users cannot easily change the text of a PDF document, so most come to expect a degree of integrity present in all PDF documents encountered. Nevertheless, content masking attacks were discovered against the content integrity of PDF documents themselves. Specifically, these attacks cause humans to view a masked version of the content these computer systems read. This project will create techniques and tools to deal with content masking attacks against document integrity, and has the potential to significantly impact on the research and engineering effort in the field of document security, due to the wide use of digital documents in government and commercial entities. With the substantial security enhancement for PDF content integrity, the project could promote the wide adoption of PDF content verification tools in today's data loss prevention systems.
This project will improve the state-of-the-art in document security via creating a set of defense tools to assure the integrity of documents. The research team aims at (i) designing a lightweight PDF font verification tool that can effectively verify the integrity of font files embedded into documents; (ii) understanding the impact of adversarial machine learning on document integrity; (iii) creating an advanced content verification tool, which integrates functionalities of random partition, feature extraction, and decision aggregation, to address content masking attacks leveraging the adversarial machine learning techniques; and (iv) performing comprehensive evaluation to ensure the efficiency, reliability, and security of the designed content integrity verification tools.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.