This project proposes a novel unified model to help software developers license software and (re)use components complying with legal requirements. The solution will investigate novel combinations of information retrieval, internet-scale source code search, repository mining, and static analysis approaches to detect origins of software components. The research will also rely on a feedback-driven hybrid blending of information retrieval and machine learning techniques for identifying components' licenses with high accuracy. In addition, the proposed model will unify these building blocks for license compliance analysis and verification to reason about the given software, components, dependencies, and licenses, as well as their trustworthiness, constraints, and existing or potential legal compliance issues.
The proposed research will lead to both theoretical foundations and practical solutions for the comprehensive analysis of complex legal compliance concerns to enable lawful software development and evolution. Among the broader impacts, the project will develop educational course content, involve underrepresented student groups, and produce software tools under open source licenses, collaborating with industry to transfer technology and empirically evaluate proposed research, and conducting K-12 outreach activities.