The civil litigation system of the United States serves as the ultimate arbiter for commercial and personal disputes. Under this system, plaintiffs and defendants are entitled to request relevant evidence from each other. Although digital records seem easier to find than their older paper counterparts, rapid growth in the volume, diversity, and possible locations of these records has actually made it harder to find the proverbial needles within the digital haystacks. The resulting rapid increase in the cost of discovery and exchange of relevant evidence, if left unchecked, raises concerns about access to justice. Hence, there is an urgent need for demonstrably accurate and cost-effective technologies to support "e-discovery" of the relevant records.

Professor Douglas W. Oard and colleagues of the University of Maryland are developing techniques to automatically decide within minutes the responsiveness of more documents than one person could examine in a lifetime. These techniques use "semi-supervised learning" algorithms for "training" the software to replicate the kinds of decisions that people make on representative examples. Using Finite Population Annotation, a new framework for integrating learning with evaluation, novel methods are being developed to achieve and measure the highest possible effectiveness for any specified level of human effort. These learning methods draw on rich approaches to representing the content of both born-digital structured documents and scanned paper. Measures for rigorously assessing the effectiveness of the resulting automated review techniques are being developed both to support decisions by legal professionals and by the courts about which methods to use, and to help developers further improve their algorithms.

The legal system demands technology whose effectiveness has been demonstrated on collections that are representative of what is actually expected in a real case. For that reason, this project is creating real world benchmarks in collaboration with the National Institute of Standards and Technology's Text Retrieval Conference (TREC). The project's results are expected to help to shape professional practice through workshops for legal and technical stakeholders, and through university courses to prepare the next generation of attorneys and information professionals to employ these new capabilities. "E-discovery" technologies resulting from this effort are likely to be broadly applicable in domains beyond the law practice, including preparation of systematic reviews of scientific literature, scholarly access to digital archives, and government responses to public information requests from citizens. Additional information is available at http://ediscovery.umiacs.umd.edu.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
1065250
Program Officer
Sylvia Spengler
Project Start
Project End
Budget Start
2011-06-01
Budget End
2017-05-31
Support Year
Fiscal Year
2010
Total Cost
$1,199,996
Indirect Cost
Name
University of Maryland College Park
Department
Type
DUNS #
City
College Park
State
MD
Country
United States
Zip Code
20742