Successful software systems continue to change. Most programmers work on projects that they did not start, and most companies spend more on maintaining old systems than on building new ones. This goal of this research is to make programs easier to change by developing better software tools and by studying how programmers change software. The project is extending Photran, an open-source programming environment for FORTRAN, so that it better supports the way FORTRAN programmers change their software to make them run on next-generation supercomputers. The new version of Photran will have the potential to make it much less expensive to port high-performance software, and the ideas have the potential to reduce the cost of software development in general.
The new system will record each change that a programmer makes and will represent these changes at a high-level, i.e. not just as textual changes, but as more meaningful units of changes, such as refactoring or optimizations. It will let programmers modify these changes after the fact, making it possible to change the portable version of a program and then replay the hand-crafted optimizations. Programmers can port a program to a new architecture by starting with a portable version and then choosing optimizations that were useful for similar machines, or that were discovered by an auto-tuner, or that were invented as needed. They will be able to think of a program as a sequence of program transformations, and to generate a new program by reusing sub-sequences from other programs. Thus, a sequence of changes will be just as valid a representation of a program as a set of modules
Programs are usually thought of as a document or a collection of documents. But a program can also be thought of as a sequence of changes starting with an original, possibly empty, program. This is an important point of view since more programmers work at changing existing programs than are making new ones. One problem with thinking of a program as a sequence of changes is deciding on the level of detail of a change. Some changes are high-level, such as "add a feature" or "fix a bug". But others are low-level, such as changing some characters. High-level changes can be represented as a series of low-level changes, but the converse is not always true. We need a way of representing change that allows both low-level changes and high-level changes, and makes it easy to convert between them. To represent program changes at multiple levels, we built an Eclipse plugin called CodingTracker. The lowest level is a series of text edits, i.e. character changes. The second level is a series of changes to the abstract syntax, i.e. the logical program structure. The third level is high-level changes, mostly of a class of program transformations called "refactorings". Eclipse has built-in tools for automatically performing refactorings. CodingTracker records these automatic refactorings, and can also detect when some low-level changes could have been performed by one of the refactoring tools. We used CodingTracker to study the changes made by a few dozen programmers, comparing the refactorings they performed using the built-in tools to the ones they could have performed with the tools but instead performed by editing text. They used the tools for a little less than half of them. We interviewed some of the programmers to try to learn why they avoided the tools. They thought the tools were complex and hard to use. The tools had lots of options, and programmers couldn't remember (or never got around to learning) all the options. Because the tools were complex, programmers were not sure what they would do. One way to make these tools easier to use is to provide parameters using direct manipulation, i.e. drag and drop. This works primarily for program transformations based on moving code. For example, to indicate that a part of a program should be split into pieces, a programmer could select one piece and drag it outside the part of the program that contains it. We developed a plugin DNDRefactor that changes the Eclipse refactoring tools for Java to use drag and drop. Another way to simplify these tools is to replace options with defaults. A program transformation will usually have parameters or choices that have to be made, and often a programmer has to fill in a large dialog box finish a program transformation. The tool seems easier to use if most of the options are replaced by defaults. Instead of setting an option before performing a refactoring, a programmer will perform another program transformation afterwards to change the default. This replaces a single complex transformation by several smaller ones. We call this way of designing a tool "compositional" because the programmers compose several transformations to cause the same effect. We changed some of the Eclipse tools to be more compositional, and did some experiments to confirm that these changes improved usability. Some program transformations are complex because the tool designers were trying to make them as automated as possible. They can be simpler if they are less automated and rely on programmers to compose transformations. A good example is type inference. Most programming languages require each variable to have its type declared. Some do not require type declarations for variables. A few do not require type declarations, but the system can calculate types for each variable. This is called "type inference". Cascade is a tool that solves the type inference problem by having a programmer compose transformations, instead of by complete automation. Cascade can infer type qualifiers, which are a new feature of Java that allows programmers to define new kinds of type systems so they can catch type errors. Because using a type qualifier system requires a programmer to add a lot of information to a program, several of them come with an inference tool that will add this information automatically. However, each type qualifier system needs its own inference tool. Type inference tools will usually work well for simple programs, but when a program has type errors then it produces error messages and stops. Cascade will work with any type qualifier system. It relies on the programmer to make choices. This makes it slower than the completely automated type inference tools for programs with no type errors, but it seems to be faster for programs with type errors, since they have to be changed before types can be inferred.