The project is aimed at enhancing an existing infrastructure that supports software engineering research on very large bodies of source code. The srcML infrastructure currently includes a generic XML representation for a variety of programming languages, a robust and efficient parser for C/C++, and a limited set of tools to support analysis and manipulation of large bodies of source code. The infrastructure is currently being used by a wide number of researchers in the fields of software engineering and programming languages. It is also being directly applied to practical problems in a variety of industrial settings. The srcML infrastructure is open source and freely available to the public via a GPL license. Documentation for the infrastructure is available online and a number of other resources, including online tutorials, are under development. Additionally, tutorials on how to use the infrastructure to support various research efforts are planned for a number of software engineering conferences.

The PIs plan to extend the current efficient and robust parsing and markup to a broader variety of widely used programming languages (namely Java and C#) and allows for the addition of new languages via a plugin grammar architecture. The current toolkit is being greatly expanded to support the exploration, analysis, and manipulation of very large code bases. The tools include such things as a static slicer, metrics computation, various static analysis tools, a fact extractor, a call graph generator, syntactic querying tools, and a syntactic differencing tool. Additionally, a set of tools to support the construction and application of transformation rules is being developed. Extending the infrastructure to a broader set of widely used languages enables researchers to investigate more production and commercial software. The enhancements to the srcML infrastructure can drastically reduce the entry cost for individuals to conduct research by enabling them to explore, analyze, and manipulate software in an extremely easy and flexible manner, thus allowing them more time to pursue novel and transformative research on software, software engineering, and software languages. The addition of analysis, transformation, and syntactic differencing tools enable unproblematic and flexible exploration of large code bases written in widely used programming languages.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Network Systems (CNS)
Type
Standard Grant (Standard)
Application #
1305292
Program Officer
Almadena Chtchelkanova
Project Start
Project End
Budget Start
2013-07-01
Budget End
2018-06-30
Support Year
Fiscal Year
2013
Total Cost
$618,877
Indirect Cost
Name
Kent State University
Department
Type
DUNS #
City
Kent
State
OH
Country
United States
Zip Code
44242