Unstructured data spread across a gamut of mediums - electronic communications such as emails, chat transcripts, and telephone transcripts. The gamut of unstructured documents makes it a time consuming, labor-intensive, and at times an impossible task to derive intelligence from unstructured text. The proposed technology solutions in this proposal will have a significant impact in many different areas. For example, since unstructured data represent up to 80 percent of all data in the Banking and Finance Industry domain, the proposed solutions will provide a means of sifting through this data much faster than the existing ones, resulting in identifying frauds and compliance breaches, and cost savings for taxpayers and financial firms. Similarly, in the Healthcare domain, more efficient representations of digitized medical data can help physicians better tailor treatments by patient history, or identify health trends to benefit public health.
This team is leveraging their research in natural language processing (NLP), machine learning (ML), and social network analysis (specifically for extracting social networks from narrative text using the notion of a 'social event') to build a tool called eExplorer product suite. Rather than trying to convert unstructured text into traditional structured database design, eExplorer creates an adaptable, flexible and dynamic soft-structure on the unstructured text. eExplorer significantly reduces the time between data collection and analysis by supporting an interaction between an analyst and the data. While exploring the data, an analyst may provide examples of the kind of structure he/she wants to impose on the data. Using NLP and ML techniques, eExplorer learns the type of structure an analyst is trying to impose and adds a flexible and soft-structure on the entire data. This has an additional advantage that each analyst may build a different and their own view (or soft-structure) of the data.