The broader impact/commercial potential of this Small Business Innovation Research (SBIR) Phase I project will be improved effectiveness of document management systems in U.S. Businesses. The project integrates novel approaches to unsupervised machine learning, concept identification, and ontology construction to create a sustainable content management system that will allow companies to find and associate information more accurately and efficiently. Corporations run on information, and routine operations depend on finding information efficiently. For example, corporate acquisitions require filing information quickly in the acquiring company's systems; employee turnover necessitates intelligent analysis to enable continuing operations; and regulatory compliance and legal retention requirements demand consistent categorization and correct retention of records. While creating electronic documents is easy, finding and analyzing them remain difficult tasks. The proposed project is intended to provide effective assistance to companies, within everyday business practices, without requiring major investments in change. Distribution of information between corporate data centers and the cloud further necessitates tools to help with classification consistency and searchability. If successful, this project will provide an encompassing framework within which company workflows are integrated and corporate workers can more easily and efficiently extract usable information from corporate IT systems.

This Small Business Innovation Research (SBIR) Phase I project provides new software tools for knowledge workers. Industrial information technology requires the integration of proven methods in a robust, sustainable framework. The investigators' prior work in artificial intelligence demonstrated that a well-designed framework, with open source packages and interstitial software, can provide an effective knowledge management system. In this project the company intends to mine and extend research ideas from knowledge management, artificial intelligence, natural language processing, machine learning, information retrieval and human-computer interfaces. Work in artificial intelligence has shown that domain knowledge is necessary for high performance problem solving. The company intends to leverage corporate knowledge to augment keyword search with semantics of the domain. Concept identification methods developed for natural language processing will be used to augment the powerful statistical tools provided by unsupervised machine learning and information retrieval technology.

Project Report

PD/PI Name: Bruce G Buchanan, Principal Investigator Recipient Organization: i2k Connect LLC Project/Grant Period: 07/01/2014 - 12/31/2014 Reporting Period: 07/01/2014 - 12/31/2014 In this SBIR Phase I project, we were able to launch a new start-up company, i2k Connect. The product we are developing is software that will classify documents and send personal alerts about new documents of interest. Large companies store many millions of documents, mostly on their computers. Finding information is problematic with simple keyword searching. Our software adds additional information to describe what each document is about, not just the words that are mentioned. Also, it is difficult for persons to learn about new developments that make a difference to them. Our software sends alerts when new documents of interest appear on the web or in the company's computers. We accomplished our three main goals: (a) we interviewed several dozen potential clients to verify that they would find our software useful in their business environments; (b) we established a joint development project with one paying client; and (c) we met our milestones for technical development of the prototype software. In effect, our software acts like an intelligent assistant who is providing information to people. (a) The persons we interviewed mentioned several different ways our software could be helpful. For example, they expressed a need to manage many millions of documents -- to classify them according to their company's policies about saving or discarding various types; to eliminate the duplicates; to analyze what was there. They also mentioned a need to be alerted to new information -- to act on breaking news that is posted on the Internet as soon as it happens; and to know about new developments being discussed within internal company documents. (b) In our joint development project, we are providing an alerting service to managers in a Fortune-500 company who ordinarily spend considerable time monitoring online news services for breaking news that may impact the company's business. Our intelligent assistant is continually tracking over a hundred news sources and selecting the few stories that fit the managers' criteria for what is interesting. We also work with their technical staff to tailor our software for their needs. (c) Our milestones for technical development were scheduled to improve the software to the point that it could provide a useful service. We had started with software that classifies documents for one non-profit scientific society with a specific focus. We improved the program's abilities to classify documents within new classification systems, to recognize duplicate versions of the same document, to learn new classifiers, to identify key locations mentioned in documents, and to display results in an understandable way. For decades, information in text records has not been readily accessible unless it was read and interpreted manually. Until very recently, keyword searching was essentially the only automated way to search or analyze text documents. We are part of a growing trend to change that. Although there is considerable development remaining to turn our software into a commercial product, it is proving to be useful in the commercial world.

Project Start
Project End
Budget Start
2014-07-01
Budget End
2014-12-31
Support Year
Fiscal Year
2014
Total Cost
$143,800
Indirect Cost
Name
I2 Connect LLC
Department
Type
DUNS #
City
Missouri City
State
TX
Country
United States
Zip Code
77459