This planning project brings together the community to begin addressing the challenges of using text analysis (i.e., the analysis of the information extracted from the natural language used in software artifacts and program code) in software engineering. A set of workshops will be organized with researchers in the software engineering community interested in the use of text analysis, in order to elicit requirements for the design of the necessary infrastructure for the research community. The common infrastructure of foundational text analyses for the software domain will include software libraries, data, and educational materials.

Because the textual information embedded in software artifacts, program code and documentation encodes the software?s domain incepts and developers? knowledge about them, it is critical in supporting developers to effectively maintain today?s software and it is essential to the other stake- holders in the software engineering process, including managers, clients, and users. A study of recent literature shows that over 25 distinct software engineering tasks, ranging from requirements analysis to program comprehension, utilize textual analysis based tools. This project?s outcomes will have a broad impact, enabling researchers to: combine different types of text analysis techniques; transfer results of successful research to industry and facilitate entry for software engineering researchers in applying text analysis techniques; integrate new components for exploration of novel approaches to specific analyses; gain continual user evaluation of text analysis-based software engineering tools; and enable other researchers and practitioners to easily leverage text analysis to solve new software engineering problems. The long-term goal is to create such an infrastructure, which will allow text analysis to integrate seamlessly with current technology and environments used by software engineers. The increase in software size and complexity, as well as larger development teams, have made it significantly more challenging for humans to maintain software without tools to support them.

Project Report

Intellectual Merit. There is a clearer community understanding of existing infrastructure for research in text analysis for software engineering, and the kinds of information that needs to be shared with other researchers to effectively enable reuse of infrastructure. Broader Impacts. The project produced a community portal, available at https://textse.wikispaces.com/, which brings together resources for the researchers, practitioners, and educators, who use text analysis in support of software engineering tasks. Anyone can request access to the portal and subsequently post relevant materials, curated by the organizers. The portal will evolve in time, with the addition of data sets, software packages, and education material, contributed by the community. The use of text analysis in software engineering is an emerging research area and it relies on background from two distinct computing sub-fields, with little previous interaction. People new to the field, such as students or software developers, cannot rely on existing textbooks focused on this area of research. It is inevitable that people entering the field go through similar experiences and investment in research and education infrastructure, such as, bibliographies, software packages, data, education material, etc. This portal will facilitate the reuse of such materials, saving people time and effort. It will also host a calendar of events related to this area of research and contact information for other people active in the field. During the project, several meetings have been organized, which brought together researchers to exchange ideas and artifacts. These meetings are the seeds for future ones, which will take place in collocation with software engineering conferences. Such meetings, as well as the portal itself, will lead to new collaborations and new research projects. The shared infrastructure materials among these researchers will help to bring better tools for software developers more quickly and effectively. Better software tools have been shown to positively impact software that is used by individuals, institutions, and society.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Network Systems (CNS)
Type
Standard Grant (Standard)
Application #
1205321
Program Officer
Anindya Banerjee
Project Start
Project End
Budget Start
2012-06-15
Budget End
2014-05-31
Support Year
Fiscal Year
2012
Total Cost
$38,307
Indirect Cost
Name
Montclair State University
Department
Type
DUNS #
City
Montclair
State
NJ
Country
United States
Zip Code
07043