Software developers rely on reusing source code snippets from existing libraries or applications to develop software features on time and within budget. The reality is such that most previously implemented features are embedded in billions of lines of scattered source code. State-of-the-art code search engines provide no guarantee that retrieved code snippets implement these features. Even if relevant code fragments are located, developers face rather complex task of selecting and moving these fragments into their applications. Finally, synthesizing new functionality by composing selected code fragments requires sophisticated reasoning about the behavior of these fragments and the consequent code. The result of this process is an overwhelming complexity, a steep learning curve, and a significant cost of building customized software.

This research program proposes an integrated model for addressing fundamental problems of searching, selecting, and synthesizing (S3) source code. The S3 model relies on integrating program analysis and information retrieval to produce transformative models to automatically search, select, and synthesize relevant source code fragments. The S3 model will directly support new methodologies for software change and automated tools that assist programmers with various development, reuse and maintenance activities. Among the broader impacts the project includes collaboration with industry to transfer technology.

Project Report

This research program proposes an integrated model for addressing fundamental problems of searching, selecting, and synthesizing (S3) source code. The S3 model relies on integrating program analysis and information retrieval to produce transformative models to automatically search, select, and synthesize relevant source code fragments. The S3 model directly supports new methodologies for software change and automated tools that assist programmers with various development, reuse and maintenance activities. Among the broader impacts the project includes collaboration with industry to transfer technology. Under this grant we conducted research to address the fundamental problems of searching, selecting, and synthesizing software. The main research idea investigated under this grant is based on a common abstraction and behavior­ specific compositional mechanisms, where the abstraction is derived from the fact that programmers heavily utilize well-­known third-­party Application Programming Interface (API) calls to implement some high­level requirements. The underlying idea is to use these abstractions to unify searching, selecting, and synthesizing applications in a novel way, where searching activity returns software applications containing API calls that implement requirements described in search queries, selecting source code fragments is centered on retrieving API calls from relevant applications, and synthesizing exploits static program analysis to guide programmers in using and composing relevant code fragments effectively. The number of research papers has been published as a result of this research project. The most notable papers are the following: Moritz, E., Linares­-Va?squez, M., Poshyvanyk, D., McMillan, C., Grechanik, M., Gethers, M., "ExPort: Detecting and Visualizing API Usages in Large Source Code Repositories", in Proceedings of 28th IEEE/ACM International Conference on Automated Software Engineering (ASE'13), New Ideas Paper Track, Palo Alto, CA, November 11­15, 2013. McMillan, C., Grechanik, M., Poshyvanyk, D., Xie, Q., and Fu, C., "Searching for Relevant Functions and Their Usages in Millions of Lines of Code", ACM Transactions on Software Engineering and Methodology (TOSEM), vol. 22, no. 4, October 2013 Dit, B., Revelle, M., and Poshyvanyk, D., "Integrating Information Retrieval, Execution and Link Analysis Algorithms to Improve Feature Location in Software", Empirical Software Engineering (EMSE), vol. 18, no. 2, April 2013, pp. 277­-309 McMillan, C., Grechanik, M., and Poshyvanyk, D., "Detecting Similar Software Applications", inProceedings of 34th IEEE/ACM International Conference on Software Engineering (ICSE'12), Zurich, Switzerland, June 2­9, 2012, pp. 364­-374 McMillan, C., Hariri, N., Poshyvanyk, D., Cleland­Huang, J., and Mobasher, B., "Recommending Source Code for use in Rapid Software Prototypes", in Proceedings of 34th IEEE/ACM International Conference on Software Engineering (ICSE'12), Zurich, Switzerland, June 2­9, 2012, pp. 848­-858 McMillan, C., Grechanik, M., Poshyvanyk, D., Fu, C., and Xie, Q., "Exemplar: A Source Code Search Engine For Finding Highly Relevant Applications", IEEE Transactions on Software Engineering (TSE), vol. 38, no. 5, Sept.­Oct. 2012, pp. 1069­-1087 Poshyvanyk, D., Gethers, M., and Marcus, A., "Concept Location using Formal Concept Analysis and Information Retrieval", ACM Transactions on Software Engineering and Methodology (TOSEM), vol. 21, no. 4, November 2012 McMillan, C., Grechanik, M., Poshyvanyk, D., Xie, Q., and Fu, C., "Portfolio: Finding Relevant Functions And Their Usages", in Proc. of 33rd IEEE/ACM International Conference on Software Engineering (ICSE'11), Honolulu, HI, USA, May 21­28 2011, pp. 111­-120. The results of this research have been evaluated with industrial collaborators. Portfolio search engine, for instance, was recently evaluated in a user study with 50 professional C/C++ programmers from industry and shown to provide more relevant results with higher precision and confidence as compared to widely used commercial-grade source code engines, such as Google Code Search and Koders. We have also done some outreach efforts on this project. For instance, we supervised a number of M.S. and undergraduate students under NSF REU supplements. All these part­-time students were supported via teaching assistantships and NSF REU supplements. We have also trained and graduated several Ph.D. students under this research project. All the Ph.D. students, involved in this project, co­-authored at least several papers. We anticipate that our work will lay a foundation for a new direction of research, and we will support it with a set of software development and maintenance tools. The short ­term impact of our work is already visible in the open­-source and research software engineering community, which is attested by the number of visits to the Portfolio and Exemplar servers and publications.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Communication Foundations (CCF)
Type
Standard Grant (Standard)
Application #
0916260
Program Officer
Sol J. Greenspan
Project Start
Project End
Budget Start
2009-09-01
Budget End
2013-08-31
Support Year
Fiscal Year
2009
Total Cost
$216,141
Indirect Cost
Name
College of William and Mary
Department
Type
DUNS #
City
Williamsburg
State
VA
Country
United States
Zip Code
23187