Over the last decade, the field of Software Engineering has seen a rapid and widespread adoption of automated refactoring tools: tools that analyze the source code under the direction of the programmer, and make systematic changes to that program that improve its internal structure without affecting its behavior. In the C programming language, which is one of the most popular languages in use, there is only a limited portfolio of refactorings available, with limited scalability and limited applicability to real-world programs. This research will address the technical problems that make it difficult to build automated refactoring tools (and other program transformation tools) for C: the ability to "configure" C programs using preprocessor macros, the need to perform sophisticated analyses in the presence of many such configurations, and the need to analyze and transform C when it is mixed with other programming languages. Solving these problems, and producing a tool that incorporates these solutions, will provide much needed tool improvements for C programmers.
The research will culminate in a prototype refactoring and program transformation tool for C that addresses the aforementioned problems. Handling multiple preprocessor configurations will involve the exploration of both a parsing algorithm and a program representation: the parsing algorithm extends the LALR(1) algorithm to handle preprocessor directives, while the program representation accommodates multiple configurations in a single abstract syntax tree (AST). Semantic information (from various static analyses) will be superimposed on the AST; however, this will require extending the static analyses to handle the complications presented by multiple preprocessor configurations. The tool will also allow for transforming mixed-language C programs--in particular, C programs mixed with Fortran or Yacc (two languages that are commonly combined with C). Handling multiple languages may be treated as an extension of the multiple configurations problem, where declarations in one language and definitions in a different language are treated, at least conceptually, as different configurations. The tool will be available under an open source license.