Graph Data mining and learning via text analytics over financial documents can be a powerful tool to enhance financial risk measurement and management, as well as to enable regulatory oversight and compliance. This proposal seeks to develop novel knowledge graph mining methods for the vast set of financial news and events streams, and disclosure documents (filings, call reports, etc.) from financial institutions for systemic risk assessment of the banking and financial system. Novel methods developed in this proposal will help both the financial institutions, such as banks and insurance firms, as well as regulators of such financial institutions. Risk indicators based on text analytics can be used for a range of asset and risk management decisions for the former, and for identifying specific risk type for regulatory oversight for the latter. Since textual data is ubiquitous, and extracting valuable insights from text remains a challenge, the risk analysis techniques developed in this project, will also help analyze other domains where textual data can offer valuable insights. Examples include international crises, natural disasters, humanitarian efforts and so on, where an assessment of the risk is essential for appropriate responses.

In this research project the PI will develop innovative data mining and learning methods to create a "financial risk" knowledge graph from textual and semantic features mined from the publicly available annual and quarterly reports filed with the SEC. The PI will also use textual data from news articles and credit assessment reports. The key underlying methods rely on developing innovative state-of-the-art text and graph mining methods for effective domain-specific approaches to deal with financial text. Especially, the PI plans to define, extract and track nuance and sentiment topics corresponding to different risk exposures of financial institutions, such as credit, interest rate, liquidity, exchange rate risks, as well as the impact of operational, regulatory and reputation risks, to improve risk prediction and monitoring. The outcome of this task will be a rich knowledge graph of risk based on text and graph mining. The knowledge graph will comprise risk-nuanced sentiment words and phrases for the use in risk analytics and a set of financial risk concepts and relationships among different entities.

National Science Foundation (NSF)
Division of Information and Intelligent Systems (IIS)
Standard Grant (Standard)
Application #
Program Officer
Sylvia Spengler
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Rensselaer Polytechnic Institute
United States
Zip Code