This project will develop and empirically evaluate methods for creating subjective knowledge bases: databases of opinions and viewpoints as they are asserted by individuals in books, web forums, and social media. While most knowledge base research seeks to extract real-world truth from text, many factual assertions are either about inherently subjective propositions (such as "apples are delicious") or are non-subjective assertions that happen to contradict other belief holders or even consensus reality (such as "the Earth is flat"). This project pioneers new methods to automatically extract expressions of opinions and viewpoints from a textual corpus and use those assertions to build a subjective knowledge base that can accommodate contradictory and conflicting statements from different authors. Such a subjective knowledge base will help researchers answer a range of questions: What contradictory claims are being made in historical books, or contemporary social media? What propositions does a particular ideological community hold, and are they compatible with, or contradictory to, those held by other communities? This project lays the foundation for understanding a broad range of phenomena that can be seen as conflicts between coherent viewpoints. The resulting computational models will lay the groundwork for intelligent systems that are robust with respect to the way in which propositions are used in the real world; as applications in artificial intelligence are being deployed more and more in social contexts, this research will inform these methods with more nuanced information about the diversity of human viewpoints. This work will also include a substantial educational component, incorporating human context into algorithm design in undergraduate STEM education and broadening the use of natural language processing and machine learning across a range of disciplines.
While previous work has focused on the primary task of identifying degrees of certainty (belief, viewpoints) in text, the primary contribution of this project will be modeling the structure of individual extracted viewpoints through the variables of the viewpoint holders and the viewpoint communities to which they belong. Models for building subjective knowledge bases accept subjective claims as fully semantic relational propositions, like recent research in open information extraction. However, instead of relying on the typical assumption of cross-document consensus, these models will embrace the simultaneous presence of contradictory claims across different author groups or even within the writings of the same individual. Major project components include: developing and refining broad-domain part-of-speech and syntactic parsing to be effective across both social media and historical books; using these tools to support author-centric latent-variable models of structured knowledge, which infers latent positions for both propositions and their viewpoint-holders; and improving the model with linguistic analysis of factuality and viewpoint (belief) commitment.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.