The innovation is the development of a linguistically-driven machine learning system that will extract financial data from financial text such as 10-Q documents, with an accuracy of over 85%. To be useful to the analysts, financial data needs to be a triple of "Financial Concept", "Numeric Value" and "Date Range." Because of the complexity of sentences in the financial domain, detecting the Financial Concept and attaching it to the correct Numeric Value and Date Range remains a challenge. Current financial extraction systems record an accuracy of less than 50%. The proposed method will use a combination of Financial Named Entity Recognition, Semantic Nearest Neighbor location and Support Vector Machines to improve Financial Concept detection, attachment and semantic tagging to 85%. By combining these methods in its Phase II Research, the innovation is the development of an end-to-end 'Automatic Extraction of Financial Data from Text' system that is usable by computerized systems. At the end of Phase I, the proposed method will demonstrate the feasibility of financial data extraction on the Notes section of 10-Q documents. The Phase II system will be designed to scale up to handle very large data sets, including non-American English documents in near real-time.

The broader/commercial impact of Automatic Extraction of Financial Data from Text system is the availability of relevant financial information in computer-readable format with high accuracy in near real time. Currently, data embedded in financial text are extracted manually by hundreds of people working for data warehouses. This manual effort takes on the order of weeks making the bulk of the data unavailable in easily computer-usable forms in real time. The benefit of Automatic Extraction of Financial Data from Text will be in three areas: 1. Algorithmic Trading programs will be able to use all data published worldwide immediately after the data is published; 2. Financial data warehouses will be able to provide much larger types of data concepts - there are 18,498 concepts in the US Generally Accepted Accounting Principles taxonomy versus less than 180 available in commercial data warehouses; 3. There will be increased transparency in the financial market as financial information embedded in the text becomes computer readable. The algorithmic trading was estimated to reach over $5 Trillion with 750 Billion shares traded, generating a profit of over $600 Million in 2012. The impact of financial transparency is an intangible benefit that will improve financial market efficiency.

Project Report

In its SBIR phase I and phase IB SBIR research, BCL Technologies performed a multi-lingual feasibility study to tag and extract financial concepts with corresponding monetary and temporal entities from financial text in English, Spanish, Japanese and Chinese. We employed machine learning and natural language processing methods to identify financial concepts and link them to numerical entities. During phase I research, we implemented a financial data extraction and tagging system called 'Automatic Extraction of Financial Data from Text' for English language documents. The phase I English language system extracted financial concepts, linked them with numerical values and dates, and applied semantic tags with an average accuracy of 89%. The broader/commercial impact of ‘Automatic Extraction of Financial Data from Text’ system is the availability of relevant financial information in computer-readable format with high accuracy in near real time. Currently, data embedded in financial text are extracted manually by hundreds of people working at data warehouses. This manual effort takes on the order of weeks making the bulk of the data unavailable in easily computer-usable forms in real time. The benefit of the Automatic Extraction of Financial Data from Text system will be in three areas: 1. Algorithmic Trading programs will be able to use all data published worldwide immediately after the data is published; 2. Financial data warehouses will be able to provide much larger types of data concepts - there are over 18,000 concepts in the US Generally Accepted Accounting Principles taxonomy versus less than 180 available in commercial data warehouses; 3. There will be increased transparency in the financial market as financial information embedded in the text becomes computer readable. The algorithmic trading was estimated to reach over $5 Trillion with 750 Billion shares traded, generating a profit of over $600 Million in 2012. The impact of financial transparency is an intangible benefit that will improve financial market efficiency.

Project Start
Project End
Budget Start
2013-07-01
Budget End
2014-06-30
Support Year
Fiscal Year
2013
Total Cost
$179,999
Indirect Cost
Name
Bcl Technologies
Department
Type
DUNS #
City
San Jose
State
CA
Country
United States
Zip Code
95128