The PI has two hypotheses about software engineers? information needs during code reviews. The first hypothesis is that different roles in code review, such as an author and a reviewer, lead to different information needs in terms of abstraction levels; thus, existing static and dynamic program analysis that do not distinguish the role of information producer (code author) and consumer (code reviewer) may not be effective in supporting peer reviews. The second hypothesis is that existing communication, awareness, and management support features in collaborative development tools such as an instant messenger, email, and work-flow management provide high-level, yet shallow information, as these tools lack in the ability to provide code-centric information. In order to test these hypotheses, the PI will use several empirical study methods, including focus groups, semi-structured interviews, case studies, and surveys, to acquire comprehensive and systematic understanding of engineers? information needs during peer code reviews.
The outcome of this study will guide the construction of innovative software analyses that can satisfy programmers? information needs, improving the effectiveness of peer code review tasks, ultimately improving programmer productivity and software quality. Furthermore, this study will serve as a basis for identifying what types of information at which abstraction level can best support developers in examining software modification. The findings from this study will also contribute to developing necessary program delta representations, inference algorithms, and infrastructures that will enable engineers to reason about software modification at a high level.
Awareness Interests about Software Modifications Our focus group and surveys with professional developers found that developers have daily information needs about code changes that affect or interfere with their code, yet it is extremely challenging for them to identify relevant events out of a large number of change-events. The study also found that different stakeholders often reason about software modifications at a different abstraction level and users' awareness-interests are rapidly-evolving as their tasks change. The users are left to filter out irrelevant code modifications such as the changes that do not semantically affect their own changes or to ignore insignificant changes such renaming or indentation changes from a large volume of check-in notifications. Currently, this filtering process requires substantial effort for developers to identify and analyze software modifications relevant to their tasks, focus, and interests. While these results are aligned with the findings from prior work on change impact analysis, awareness, and coordination, our study makes a unique, new contribution by producing a prioritized list of awareness interests about others' software modifications. The study results are reported at the 4th International Workshop on Cooperative Human Aspects of Software Engineering, co-located with the 33rd International Conference on Software Engineering (CHASE workshop at ICSE 2011). Empirical Studies on Verilog Code Reviews To understand the benefit of using advanced program differencing tools during peer code reviews, we designed a Verilog syntax and semantics-aware program-differencing algorithm, Vdiff, and conducted a user study with eight hardware design experts. The study found that Vdiff's syntactic change classifications is better aligned with the experts' classification of Verilog changes than an existing textual program differencing tool, diff. Study participants reported that Vdiff robustly recognizes re-arranged code blocks and filters out non-semantic differences and that Vdiff helps them to grasp a high level structure of design changes. This indicates that the effectiveness of code review task can be significantly improved by an advanced program differencing algorithm (Journal of Automated Software Engineering, 2012, Volume 19, Number 4, Pages 459-490, ASE Journal 2012, and the 25th IEEE/ACM International Conference on Automated Software Engineering, ASE 2010). Refactoring-Aware Code Reviews To understand refactoring practices, we conducted a field study at Microsoft using three complementary study methods: a survey, semi-structured interviews with professional software engineers, and quantitative analysis of version history data. The study found that the refactoring definition in practice is not confined to semantics-preserving code transformations and that developers perceive that refactoring involves substantial cost and risks. During peer code reviews, developers want to see higher-level summary of refactoring edits, such as the types and locations of refactorings (ACM SIGSOFT the 20th International Symposium on the Foundations of Software Engineering, FSE 2012). To help developers recognize refactoring edits during peer code reviews, we designed and implemented RefFinder that takes two program versions as input and automatically identifies the locations and types of refactoring edits. Its precision and recall are 79% and 95% respectively. A live demonstration of RefFinder was presented at the 18th ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE 2010 DEMO), and the evaluation results were presented at the 26th IEEE International Conference on Software Maintenance. (ICSM2010) Empirical Studies on Refactoring Edits Our analysis of open source project version histories show that API-level refactoring and bug fixes are correlated temporally and spatially, and refactoring serves the role of both facilitating bug fixes and introducing bugs (ACM and IEEE 33rd International Conference on Software Engineering, ICSE 2011). Our another study of the impact of refactoring edits on regression tests show that only 22% of refactored methods and fields are tested by existing regression tests. While refactorings only constitutes 8% of atomic changes, 38% of affected tests are relevant to refactorings. Refactorings are involved in almost half of the failed test cases. These results call for new techniques for validating refactoring edits (28th IEEE International Conference on Software Maintenance, ICSM 2012). In addition to these activities, during the duration of this grant, we presented an automated API usage migration approach (OOPSLA 2010), code clone evolution analysis (FASE 2010), modularity violation detection analysis (ICSE 2011), cross-system porting analysis (FSE 2012, FSE 2012 Demo), and analysis of omission errors (MSR 2012). We also invented a novel program transformation approach that learns abstract, context-aware syntactic edits from an example to automated repetitive edits (PLDI 2011, FSE Demo 2011) and presented position papers on technical debt and validity concerns raised from using open source projects at the FSE/SDP Workshop on the Future of Software Engineering Research 2010.