Professor Serge Demeyer – Joris Van Geet – Bart Van Rompaey / Last modified: 6 March 2009
Dynamic Analysis Lab Session
These grey boxes are teacher notes. They are marked as hidden text, so when you print or give the document to the students, uncheck the "hidden text" boxes in both the "View" and "Print" preferences.
*** This is intended to be a 2.5 hour lecture.***
In this session, we will use dynamic analysis for two purposes:
- Program Understanding – Collecting information to understand the inner working of the system and to identify refactoring opportunities.
- Coverage Measurement – Collecting information about the quality of a test suite, and the confidence we can have during regression testing.
Note: OORP - p.xx refers to a page in the book "Object-Oriented Reengineering Patterns".
Using Dynamic Analysis to gain Program Understanding
Extravis is a dynamic analysis tool for visualizing and navigating through (large) execution traces. Extravis runs on Windows. The next page contains a reference chart for Extravis’ usage.
Our system under study remains Checkstyle, a software system representing a code style checker for Java. For this exercise, we have a trace obtained through the instrumentation of Checkstyle’s core functionalities, which means that calls to and from external libraries were not registered. The trace is the result of Checkstyle’s execution with Simple.java and several_checks.xml as its parameters.
Figure 1 ExtraVis tutorial
Gaining a general understanding (OORP p65-84)
The first thing the developer might desire is a first impression of how Checkstyle works, especially in case domain knowledge is lacking.
· Having glanced through the resources at your disposal for several minutes, which do you think are the main stages in a typical Checkstyle scenario? Formulate your answer from a high-level perspective: do not get into details (e.g., method identifiers) and stick to a maximum of six main stages.
Checkstyle’s purpose is the application of checks on its input source file. These checks each have their own class and are located in the checks-package. They can be written by a developer and contributed to the Checkstyle package: For example, one could write a check to impose a limit on the number of methods in a class. If our developer wants to write a new check, one way to gain the necessary knowledge is to study existing checks (i.e., learning by example).
· In general terms, describe the activities of a check during execution. Use no more than five sentences. The TreeWalker class plays an important role in Checkstyle’s inner workings and interacts extensively with the checks. We now take a closer look at the protocol between TreeWalker and the various checks.
· List the identifiers of all method/constructor calls that typically occur between TreeWalker and a checks.whitespace.TabCharacterCheck instance, and the order in which they are called.
· Now do the same for TreeWalker and checks.coding.IllegalInstantiationCheck. Can you explain the differences?
Once the developer has written a new check, he/she would like to know if it works, i.e., whether it reports warnings when appropriate.
· How is a check’s warning handled, i.e., what happens when a violation is found and how is it communicated to the user?
· Given Simple.java as the input source and several_checks.xml as the configuration, does checks.whitespace.WhitespaceAfterCheck report warnings? Specify how your answer was obtained.
Refactoring opportunities (OORP p84-93)
In certain cases (not necessarily Checkstyle) it is desirable to modify the program’s package hierarchy. Examples include the placement of tightly coupled classes in the same package, and the movement of classes with high fan-in and (almost) no fan-out to a utility package.
The fan-in of a class is defined as the number of distinct methods/constructors directed toward that class, not counting self-calls. Its fan-out is defined as the number of distinct methods/constructors directed toward other classes.
· Name three classes in Checkstyle that have a high fan-in and (almost) no fan-out.
· Assume that a tight coupling is characterized by a relatively large number of different method calls between two structural entities (e.g., classes or packages). Name a class in the default package (i.e., classes not in any package) that could be a candidate for movement to the api package because of its tight coupling with classes therein.
Measuring and Assessing Test Coverage (OORP p121-144)
The presence of a test suite can be an important support during reengineering. Tests help to:
- Reveal unwanted side effects of refactoring. Frequent execution of a regression test suite can reveal defects early and fast, thereby forming a harness that protects the developer.
- Understanding the inner working of (part of) a system. In particular do unit tests show typical user scenarios as well as scenarios where the system can not or must not function.
- Give developer trusts in the quality of their work. Making the test suite pass confirms or rejects the assumptions the developer makes about the system.
- Write new tests. Existing tests serve as examples or even as the basis for further tests. You can systematically extend tests based on criteria important for the project (testing old bugs, increasing the tests for a critically important part, testing new functionality, etc).
The presence of automated tests does however not offer any guarantee about its quality. Do the tests cover the whole system or are some parts left untested? Which parts are covered to which extend? Hence, measuring test coverage is a useful, even necessary, way to assess the quality and usefulness of a test suite in the context of reengineering.
Measuring Test Coverage
Measuring test coverage requires information about the executed code during a test run. To obtain this information, some additional code is typically injected. This can be done via a pre-step in the build process, before compilation. For virtual machine technology, this manipulation can also be performed at the bytecode level. In both cases the additional code reports during test execution to a given test coverage measurement and reporting tool.
Practically speaking, there exist three ways to obtain the measurement. We discuss these approaches using the test coverage tool Emma (http://emma.sourceforge.net) for Java.
Firstly, The command line approach consists of using the byte code instruction approach. First check whether the tests execute, e.g. by executing
java –cp [my class path] myTestSuite.testAll
Then, you can make Emma inject code and report in HTML using:
java –p [my class path]:emma.jar emmarun –f –r html –sp [path/to/src] –cp . myTestSuite.testAll
The “–r html” option asks Emma to generate a HTML report such as in Figure X, reporting coverage in terms of class, method, block and statement level. The interface allows you to gradually refine the coverage report up the level of coverage-colored source code.
Secondly, you can inspect the build system of the project and investigate whether it already supports coverage measurements or whether you can modify it to do so. The Emma Ant documentation provides an example (http://emma.sourceforge.net/userguide/ar01s03.html):
<!-- directory that contains emma.jar and emma_ant.jar: -->
<property name="emma.dir" value="${basedir}/../lib" />
<path id="emma.lib" >1
<pathelement location="${emma.dir}/emma.jar" />
<pathelement location="${emma.dir}/emma_ant.jar" />
</path>
<taskdef resource="emma_ant.properties" classpathref="emma.lib" />
Thirdly, you can use a coverage plugin for your favourite IDE. Eclemma (http://www.eclemma.org) is an Emma plugin for Eclipse. Given that you can configure Eclipse to compile the system under study, you can launch a “Coverage run” and obtain coverage results reported as tables and colored source code within the IDE.
· Choose an appropriate way to apply Emma to your own project.
Evaluating Test Coverage
Open the resulting report and browse through the results.
· How do the results coincide with how you would intuitively test the system? Do more complex or critical parts have higher coverage? Discuss.
· What would you propose as a good level of code coverage?
· Do you think 100\% code coverage is feasible?
· Suppose you program contains assertions, does it make sense to switch these on during regression testing? What will be the impact (better or worse coverage?) Why?
The essence of this discussion should target the inherent benefits of both kinds of tools:
- Metric tools: Are good at summarizing quantitative properties of code parts. However, most of the analysis required (e.g. minima, maxima, standard deviation, etc.) are more easily detected using visualizations.
- Visualization tools: Are good at verifying encapsulation and (de)localization. This allows them to be used for analysing coupling and cohesion, which are excellent guidelines for redistributing responsibilities.
In contrast to metric tools, visualizations do not rely on thresholds but mostly present images which are relative to the software system at hand by using different kind of layout mechanisms. This is an advantage in patterns like “Study the Exceptional Entities”.