This award is funded under the American Recovery and Reinvestment Act of 2009 (Public Law 111-5).
This Small Business Innovation Research (SBIR) Phase II project involves development of real-time algorithms for Optical Character Recognition (OCR) from documents. This real-time recognition (RT/OCR) system, to be fully developed under this SBIR award, performs recognition an order of magnitude faster than current commercial systems and will allow for real-time recognition that can be embedded on a system device and done at the time of capture. The RT/OCR system will also have no loss in recognition accuracy, and will, in fact, be more accurate for complex documents that include color, graphics, and multiple fonts. This technology, when successfully commercialized within Phase II of the SBIR award, could be deployed on every corporate MFP and digital copier device, converting corporate paper to searchable, electronic files and bringing us one step closer to the paperless office. The technology we intend to use in developing this real-time OCR recognition system is based on methods using Intra- and Inter-Frame Machine Learning. The algorithms to be developed are not, in any way, language specific and can run on virtually any platform (e.g. server or handheld device). The basic technology is completely different from the recognition kernels of current commercial OCR recognition systems.
This project is focused on developing revolutionary technology that will take OCR technology to a new level. This technology is designed to bridge the gap between paper and digital media, a much needed engine for Bill Payment Machine (BMP), document capture and document processing industry. The capture industry will grow to $2.42 billion in 2010, a CAGR of 16.4%. Real-time OCR for automated and semi-automated field coding addresses the needs of an industry that uses $14.5 billion/year of manual labor just in the US. RT/OCR will be part of a solution that addresses manual paper-based indexing for complex documents, potentially saving the industry and the government billions of dollars every year. This recognition technology, after being successfully developed and commercialized within the context of the Phase II research and development, can be generalized and extended to handle real-time video recognition, with application to autonomous vehicle navigation, aids for the visually impaired, and robotic factory automation.
Real-Time OCR Millions of digital images and documents are created each and every day. Business documents, such as invoices and reports, are printed and scanned, and smart phones and cameras are used to take digital photos. However, it is often not possible to access or find the information in these documents, or to find the images using search engines, because the documents and images typically remain in their original scanned form. That is, they are stored as images and graphics files. Optical Character Recognition (OCR) technology exists today that can convert the text in documents and images into a searchable form, like the text in a word processing document or spreadsheet, but the time required to perform OCR on every document and image makes it prohibitive. This is because current OCR technology requires considerable computing power, and scanner devices and mobile phones do not posses powerful enough processors (also known as CPU’s) nor do they have enough memory to perform OCR rapidly. CVISION Technologies, with funding from NSF SBIR Phase I and Phase II grants, was successful in leveraging its experience in smart image compression to build a real-time OCR system (RT/OCR) that converts all the text in scanned documents or camera images to searchable form nearly instantaneously.. At the core of the RT/OCR system is the ability to both compress a captured image or document and make it searchable at the same time. Smart image compression allows for analyzing the text in an image or document in a form that requires less memory and less computing power, accelerating the OCR process greatly. The RT/OCR system’s compression process (utilizing machine learning and mapping to the industry standard known as JB2) includes the ability to learn from the fonts and size of text on one part of a page in a document to very quickly recognize text on other parts of the page and on other pages. This is in contrast to conventional OCR software, which starts the learning process virtually "from scratch" with every page. The benefits of this approach are dramatic; with RT/OCR recognition rates as high as 10 images per second per CPU core. Modern desktop and laptop PC’s often have 2 or 4 CPU cores (for example, Intel® Core™ i5 and Core™ i7 CPU’s). Therefore OCR rates of 20 to 40 pages per second are possible with RT/OCR, which exceeds the capture speed of current document capture systems and cameras. The RT/OCR system is both ultra-fast and highly portable, and it will be further developed to directly support mobile phones, tablets, and popular multifunction printer/scanner/fax devices.