Big data leads to big challenges, not only in the volume of data but also in its dynamics and variety. Multiple descriptions about the same set of objects or events from different sources unavoidably lead to data or information inconsistency. Then, among conflicting pieces of data or information, it is crucial to tell which data source is reliable or which piece of information is correct. Accurate information is referred to as the truth and the chance of a source providing accurate information is denoted as source reliability or trustworthiness. The objective of this project is to detect truths without supervision, by integrating source reliability estimation and truth finding. A unified framework is developed to model complex trustworthiness factors, heterogeneous data types, incremental and parallel computation, and source and data dependencies so that truth and trustworthiness can be inferred from multiple conflicting sources of heterogeneous, disparate, correlated, gigantic, scattered, and streaming data.
This project makes tangible contributions to data integration, information understanding and decision making, and benefits many applications where critical decisions have to be made based on the correct information extracted from diverse sources. Research results of this project are integrated into course materials and projects, and into training students and new generation researchers, especially female and minority students. For further information about this project, please refer to the project website: www.cse.buffalo.edu/~jing/truth.htm