This project proposes a novel unified model to help software developers license software and (re)use components complying with legal requirements. The solution will investigate novel combinations of information retrieval, internet-scale source code search, repository mining, and static analysis approaches to detect origins of software components. The research will also rely on a feedback-driven hybrid blending of information retrieval and machine learning techniques for identifying components' licenses with high accuracy. In addition, the proposed model will unify these building blocks for license compliance analysis and verification to reason about the given software, components, dependencies, and licenses, as well as their trustworthiness, constraints, and existing or potential legal compliance issues.

The proposed research will lead to both theoretical foundations and practical solutions for the comprehensive analysis of complex legal compliance concerns to enable lawful software development and evolution. Among the broader impacts, the project will develop educational course content, involve underrepresented student groups, and produce software tools under open source licenses, collaborating with industry to transfer technology and empirically evaluate proposed research, and conducting K-12 outreach activities.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Communication Foundations (CCF)
Application #
1253837
Program Officer
Sol Greenspan
Project Start
Project End
Budget Start
2013-09-01
Budget End
2018-08-31
Support Year
Fiscal Year
2012
Total Cost
$478,010
Indirect Cost
Name
College of William and Mary
Department
Type
DUNS #
City
Williamsburg
State
VA
Country
United States
Zip Code
23187