Analyzing and Relating Bug Report Data for Feature Tracking

By Michael Fischer, Marting Pinzger and Harald Gall

Thoughts/Things of notice:

Concept Lattice – visual representation of analysis for objects and attributes. Can this be useful elsewhere as a simple process for giving relative strengths to relationships that are found by data extraction?

Combining the modifications tracked through the information in CVS and the information tracked in Bugzilla make sense, though the approach of essentially scanning the CVS versioning information and log files for a Bug report ID number seems hit and miss. It would imply that a more integrated toolset for CVS and Bug tracking is needed to track evolution.

The final graph constructed is hard to comprehend. Reading through the paper there is almost too much information depicted visually in the final picture. Different sizes of circles, colors and proximity as well as other symbols overlaid on top of the circles whose size, color and proximity also represented information was too much. I had to re-read the information of which each symbol, size and color meant several times before I began to understand some of what the graph represented. When I started to get the idea it become clear that this picture would need also to be ‘moved’ through time ( i.e. several snapshots to show changes in the information as the process progressed) before patterns of evolution would start to emerge and we could then see some trends.

Summary:

Understanding software evolution and utilizing the information produced during the software development lifecycle is key to constructing better software development processes. This paper proposes a way to analyze and relate the information to enhance comprehension.

Understanding software evolution requires information gathering

  • To store this information a Release History Database(RHDB) is constructed
  • Two main sources of input into RHDB
  • Versioning information from CVS
  • Bug tracking information from Bugzilla
  • To create a relational basis for the information use Features
  • Feature – observable and relatively closed behavior
  • Using features relate/cluster data and observe the changes
  • Looking for file dependencies both in a specific version and changes across versions

Versioning information comes in Modification Reports (MR)

Bug data and patch information comes in Problem reports (PR)

Once both of the above are extracted into the RHDB, a way is required to link the information, since there is nothing in CVS to do this.

  • Use the PR ID number
  • Whenever a PR ID is found in an MR assume a link
  • A problem is that the number found may not be an actual PR ID, so a confidence level is established based on a textual analysis(either high, medium or low)
  • Focus is on PR’s with a ‘fixed’ status, so the assumption can be that the changes made to the code are already checked into the CVS repository and so tracking information that can be associated with the PR exists somewhere in a MR

The goal of feature definition is to create an association between the feature under observation and the files involved with implementing that feature.

  • This is accomplished by profiling different runs of the program which exercise specific features(s) and then creating a call graph to associate files with the feature
  • From the information gathered a Concept lattice is produced. The purpose of the lattice is to justify/indicate the associations between files and features.

Now given the information stored in the RHDB, and the Feature-File relationship identified by the Concept Lattice, Multidimensional scaling is applied to create a visualization of the information

  • All code identified as related by the PR-MR from the RHDB are placed as circles on a graph. The
  • The size of the circles refers to the number of modifucaitons made to the set
  • The distance between circles referes to the number of shared files
  • Color codes are applied and symbols for the feature sets are overlaid on the picture of circles to produce a visual representation

The end result was a visualization of the dependencies between features introduced through PR’s and the resulting files modified.