This Small Business Innovation Research Phase I project will test the commercial feasibility of "Custom Computer Programming Services" supporting human and machine text classification. Initially, the company's research, tools, and methods will target public and private sector markets where the ability to review, analyze, and summarize public communications in e-mail, blogs, Web sites, Twitter feeds, and many other forms of digitized documents, is crucial. A The effort will support programming and evaluation activities for the proposed "Sifter" document classification software. The SBIR-funded research will focus on three key technical challenges that must be met to ensure commercial success: usability, reliability, and scalability.

Texifter's technology addresses the widespread government and private sector problem that e-mail and other Web-based public comment channels currently overload government rule-writers and other text analysts. This overload has been identified as among the most significant concerns facing federal agency and congressional personnel today in focus group sessions involving over 200 federal agency end-users from more than a dozen of the major 180 federal rule-writing agencies held between 2003 and 2008. If successful, this effort will improve the democratic underpinning of the public comment process. The Sifter tool will enable the voice of the public to be heard more clearly by agency officials by amplifying genuinely unique utterances while suppressing the stifling effect of duplicate and insubstantial public comments.

Project Report

During the Phase I/IB period, Texifter's research, tools, and methods were planned to target public sector markets where there is a growing need for the ability to search, filter, review, code, analyze, and summarize public communications found in mass email campaigns, blog commentaries, Web sites, databases, Twitter feeds, and many other forms of digitized documents. The Phase IB supplement was intended as a follow on to the commercialization efforts of Phase I by integrating live feed imports from partner and other third-party APIs to further ease ingestion of documents into the system and open up possibilities for a private sector commercialization strategy. The Phase I and IB SBIR awards supported research, development, programming and evaluation activities for the Texifter’s text analysis software. During the Phase I period, SBIR-funded technical research focused on three key challenges that must be met to ensure commercial success: scalability, reliability, and usability. During the last two months under the Phase IB supplement, Texifter explored alternative ingestion methods via the use of external partner and third-party party APIs. Concurrently, Texifter has been developing channels and implementing strategic plans targeting distinct market verticals beyond public communication with government officials, for example, eDiscovery in social media, market research, educational licensing. These experiments led to a decision to prepare to cross the market chasm via entry to the legal hold segment of the Discovery market during 2011. Texifter has a disruptive product for faster, more efficient, cheaper, highly accurate and reusable application of retention policies to large email collection. Much of the usability research during the Phase I and IB projects consisted of informal user studies and relying on feedback from current and beta users of both PCAT and DiscoverText. We further conducted some informal tests on the scalability of both PCAT and DiscoverText. To provide some background, the proof of concept platform CAT has close to 1,900 user accounts, but according to Google Analytics, rarely more than 50-60 users in a single day. Planned Phase I activities included systematic PCAT load-testing experiments. Although no formal scalability studies were conducted, we did perform some informal testing of the system under various loads of data. During the first quarter of 2011, we executed a small formal user study in conjunction with the University of Washington’s department of Human and Computer Interaction. From this small user study, we elicited March 3, 2011 - NSF Award 1014000 - Texifter, LLC. – Phase 1 / 1B Final Report13a substantial "To Do" list of feedback from the UI students and experts at UW, which we have gone through and completed tasks on the list. For efforts in formal testing, we plan on conducting load testing experiments once the usability pieces are in place. As a small business with ever changing requirements, and a stressed need for good user interaction, we will not have adequate resources for formal scalability testing until after we nail down the usability side of the software. This will free up personnel from their work on creating the user experience to be able to run some formal scalability tests. Although the technical research done was informal, we have gained much insight into the voice of our own customer, as well as insights into potential markets for our software. For moving forward past the Phase I and IB periods, and into a SBIR Phase II, we intend to further enhance our DiscoverText product to make it more attractive to the eDiscovery and educational markets. We believe that our social media focus in the software will be a huge boost for eDiscovery and in turn, will make our software more attractive to potential investors – a much needed source of funding for Texifter in the months to come. By also marketing our software to educational markets, we intend to promote our research and software by good word of mouth advertising as well as starting to get end users used to the availability of functions in DiscoverText, and to allow them to find uses for the software in their own classrooms, research and beyond.

Project Start
Project End
Budget Start
2010-07-01
Budget End
2011-06-30
Support Year
Fiscal Year
2010
Total Cost
$180,000
Indirect Cost
Name
Texifter, LLC
Department
Type
DUNS #
City
Amherst
State
MA
Country
United States
Zip Code
01002