BNSI / Bulgarian National Statistical Institute

ESSNET on SDMX II

WP4 “Harmonization of existing tools

SDMX RI Source Code Analysis – Java

Modules:

SDMX Information Model

Query Parser

Version 1.3

September 2012

1

Type of Document / Project deliverable
Reference: / SDMX RI Source Code Analysis – Java, version 1.3
Issue: / Revision: / 6 / Status: / Final
Created by: / Emil Hurmuzov (EH) / Date: / 15 Nov 2011
Updated by: / Emil Hurmuzov (EH) / 10 May 2012
Updated by: / Emil Hurmuzov (EH) / 10 Sept 2012
Approved by:

Document Change Record

Issue/ Revision / Date / Change
1.1 / 15 Nov 2011 / Draft first version
1.2 / 10 May 2012 / Updated version, submitted to ESTAT for review
1.3 / 19 Sep 2012 / Final version

Table of contents Page

1Introduction......

1.1Purpose......

1.2Scope......

1.3References......

2Code Quality Analysis......

2.1Introduction......

2.2Quality characteristics......

2.3Automated Code Analysis......

2.3.1Static Code Analysis......

2.3.2Tools......

2.3.3Rules......

2.3.4Results – SDMX Information Model......

2.3.5Results – Query Parser......

2.3.6Conclusions and Recommendations (and impact of Refactoring)......

2.4Manual Code Analysis......

2.4.1Results – SDMX Information Model......

2.4.1.1Wrong import class (com.agilis.sdmx.model.structure.CodeRefBean.java, line 11)...

2.4.1.2Local Maven repository in the pom.xml (pom.xml, line 129)......

2.4.1.3Non-UTF-8 symbols in the header of all java class files......

2.4.2Results – Query Parser......

2.4.2.1Using “main” class for unit testing (QueryParser.java, line 464)......

2.4.2.2Missing test data (QueryParser.java, line 468)......

2.4.2.3Incorrect Utility class (QueryDefs.java)......

2.4.2.4Direct use of String values (Magic values) (QueryParser.java)......

2.4.3Conclusions and Recommendations (and impact of Refactoring)......

3Final Words......

1Introduction

1.1Purpose

The goal of this document is to provide information about the quality of the two modules from SDMX Reference Infrastructure – SDMX Information Model version 1.4.1 and Query Parser 1.0.1 and to provide some recommendation on the way of refactoring particular problems in the source code.

This information could be used to increase the quality of the code and reducing the risk of potential errors when these modules are using in a production deployments in National Statistics.

1.2Scope

The document provides analysis information only for two modules from SDMX Reference Infrastructure - SDMX Information Model version 1.4.1 and Query Parser 1.0.1.

The analysis is focused only on improvement of the existing implementation of two modules but not to suggest another approach for design or environment.

1.3References

Reference / Document/Resource Name
SDMX Standard /
Java Programming Style / Java Programming Style Guidelines
NSIWebServices Java /
SDMX Information Model version 1.4.1 / sdmx_model module from above application
SDMX Query Parser /

2Code Quality Analysis

2.1Introduction

Two general methods are used in analysing the quality of the code for selected modules.

The first method is automated check by using a specialized software tool (Sonar in this case). The results are summarized in a table and later for each major type of errors/alerts are provided an explanation of the problem and suggested correction. This can be a good starting point for refactoring the code.

The second method is “manually” looking into the code, trying to build, deploy and test the NSIWebService 2.3.0 application which contains the two selected modules. The results of this test can be useful for improving the distribution of modules to National Statistics or third parties of developers.

2.2Quality characteristics

The both modules - SDMX Information Model version 1.4.1 and Query Parser 1.0.1 are pretty simple and very specialized.

SDMX Information Model is a simple POJO representation of the SDMX Model from SDMX Standard. This module does not provide any functionality and is used only to transfer SDMX objects between other SDMX modules. That’s why there are no way to test it for scalability, reliability and performance.

Query Parser is a just “one-class” module and its functionality is to parse the incoming XML messages. It uses a SAX based parser which is stream based parsing and provides a better resource utilization than the DOM based parsers. The simplicity of the module does not allow it to influence the performance of the application.

2.3Automated Code Analysis

2.3.1Static Code Analysis

The term “static analysis” has indeed been around for decades. The primary purpose of the first-generation tools such as lint was to do stronger type checking, but they could also do some style checking and lightweight bug-finding. Nowadays there are three classes of tool:

◦Lightweight tools like lint, and others that find violations of coding standards are in one class. They typically use superficial techniques that examine the syntax of the program to find issues. A common complaint about them is their excessive rate of false alarms and that the warnings they issue do not correlate very well with real defects.

◦Advanced static-analysis tools such as Sonar are bug hunters—their primary purpose is to find serious programming defects.

◦Tools that support formal methods can prove the absence of certain classes of error. These are heavyweight tools that require programs conform to a strict subset of the language, so have limited applicability and are not widely used.

There is of course some overlap between these classes. The lightweight tools can be useful if there is a requirement for the code to conform to coding standards; they can also find some kinds of bugs, albeit not nearly as effectively as the other classes of tools. Similarly the advanced tools can also be used to check lightweight properties.

Advanced static-analysis tools, or bug hunters are those whose primary purpose is to find serious bugs. It is wrong to state that these tools are little different from those that came before.

They are simultaneously interprocedural, flow sensitive, context sensitive, whole program, and path sensitive. They scale to tens of millions of lines of code and yield results with a low level of false positives.

These techniques give them the capability to find deep semantic errors in huge code bases. There is no contradiction in observing that they can also subsume the capabilities of the earlier generation of tools. The commercial success and wide adoption of these tools indicates that many users find that these tools deliver real value.

Testing is very good at finding ways in which the software does not meet its requirements. However testing is only as good as the test cases, and it is expensive to create test cases and time-consuming to execute them. It is not uncommon for half of a software development effort to be consumed by testing activities.

The reason advanced static-analysis tools are successful is that they change the economics for several reasons:

◦They can find bugs early in the development cycle, and bugs found earlier are less expensive to fix.

◦They do not require program inputs, so bugs can be found and eliminated without incurring the expense of developing test cases.

◦They can make it easier and less expensive to develop dynamic test cases. The consequence is they can eliminate more bugs for less expense.

Static analysis can be used as soon as the code can be compiled. The code does not have to be complete, or integrated with the rest of the program in order for the technique to be able to begin to find bugs. It can be run on a single file at a time, or a complete codebase, or anything in between. The results are better when the entire program is analyzed, but an analysis of small parts can also be useful. Thus developers can get very quick feedback on their code quality.

2.3.2Tools

Sonar is an open platform to manage code quality. As such, it covers the 7 axes of code quality:


Java language is built in. Open Source and commercial plugins enable to cover C, C#, Flex, Natural, PHP, PL/SQL, Cobol and Visual Basic 6.

An open source code quality management software, combines the expertise of Check Style, Find Bugs and PMD as well as provides a graphical way of analyzing and reporting code quality.

Link to installation instructions:

There are several ways Sonar analysis can be performed:

▪using the Maven plugin

▪using the Ant Task

▪using the Java Runner

▪using a CI engine

We’ll start with using the Maven Plugin. The final goal is to build a complete development environment – Version control (Subversion), Continuous Integration Server (Hudson / Jenkins), Code Analysis (Sonar).

2.3.3Rules

The rules are classified according to the type of irregularity in the code – conventions rules (Checkstyle), bad practices (PMD) and potential bugs (FindBugs). The Sonar way rule set checks for java language conventions and bad coding practices. Every rule has a severity level according to a classification of violations – blocker, critical, major, minor, info.

The following 30 conventions rules were used in the source code analysis:

Anon Inner Length

Checks for long anonymous inner classes.

Boolean Expression Complexity

Restricts nested boolean operators (&, || and ^) to a specified depth (default = 3).

Constant Name

Checks that constant names conform to the specified format

Cyclomatic Complexity

Checks cyclomatic complexity of methods against a specified limit. The complexity is measured by the number of if, while, do, for, ?:, catch, switch, case statements, and operators & and || (plus one) in the body of a constructor, method, static initializer, or instance initializer. It is a measure of the minimum number of possible paths through the source and therefore the number of required tests. Generally 1-4 is considered good, 5-7 ok, 8-10 consider re-factoring, and 11+ re-factor now !

Default Comes Last

Check that the default is after all the cases in a switch statement.

Double Checked Locking

Detect the double-checked locking idiom, a technique that tries to avoid synchronization overhead but is incorrect because of subtle artifacts of the java memory model.

Empty Statement

Detects empty statements (standalone ';').

Equals Hash Code

Checks that classes that override equals() also override hashCode().

Final Class

Checks that class which has only private constructors is declared as final.

Hidden Field

Checks that a local variable or a parameter does not shadow a field that is defined in the same class.

Hide Utility Class Constructor

Make sure that utility classes (classes that contain only static methods) do not have a public constructor.

Illegal Throws

Throwing java.lang.Error or java.lang.RuntimeException is almost never acceptable.

Inner Assignment

Checks for assignments in subexpressions, such as in String s = Integer.toString(i = 2);.

Local Final Variable Name

Checks that local final variable names, including catch parameters, conform to the specified format

Local Variable Name

Checks that local, non-final variable names conform to the specified format

Magic Number

Checks for magic numbers.

Member name

Checks that name of non-static fields conform to the specified format

Method Name

Checks that method names conform to the specified format

Modifier Order

Checks that the order of modifiers conforms to the suggestions in the Java Language specification, sections 8.1.1, 8.3.1 and 8.4.3. The correct order is : public, protected, private, abstract, static, final, transient, volatile, synchronized, native, strictfp.

Package name

Checks that package names conform to the specified format. The default value of format has been chosen to match the requirements in the Java Language specification and the Sun coding conventions. However both underscores and uppercase letters are rather uncommon, so most configurations should probably assign value ^[a-z]+(\.[a-z][a-z0-9]*)*$ to format

Parameter Assignment

Disallow assignment of parameters.

Parameter Name

Checks that parameter names conform to the specified format

Redundant Modifier

Checks for redundant modifiers in interface and annotation definitions.

Redundant Throws

Checks for redundant exceptions declared in throws clause such as duplicates, unchecked exceptions or subclasses of another declared exception.

Simplify Boolean Expression

Checks for overly complicated boolean expressions.

Simplify Boolean Return

Checks for overly complicated boolean return statements.

Static Variable Name

Checks that static, non-final fields conform to the specified format

String Literal Equality

Checks that string literals are not used with == or !=.

Unused Imports

Checks for unused import statements.

Visibility Modifier

Checks visibility of class members. Only static final members may be public; other class members must be private unless property protectedAllowed or packageAllowed is set.

2.3.4Results – SDMX Information Model

Violated Rules detected by software

Statistics of the project.

Complexity of the project.

Violations / Compliance of the project.

Severity / Rule / Description / Violations
Major / Visibility Modifier / Checks visibility of class members. Only static final members may be public; other class members must be private. / 79
Major / If Stmts Must Use Braces / Avoid using if statements without using curly braces. / 13
Major / Loose coupling / Avoid using implementation types (i.e., HashSet); use the interface (i.e, Set) instead / 12
Major / System Println / System.(out|err).print is used, consider using a logger. / 12
Major / Avoid Throwing Raw Exception Types / Avoid throwing certain exception types. Rather than throw a raw RuntimeException, Throwable, Exception, or Error, use a subclassed exception or error instead. / 5
Major / If Else Stmts Must Use Braces / Avoid using if..else statements without using curly braces. / 4
Major / Signature Declare Throws Exception / It is unclear which exceptions that can be thrown from the methods. It might be difficult to document and understand the vague interfaces. Use either a class derived from RuntimeException or a checked exception. / 4
Major / Member name / Checks that name of non-static fields conform to the specified format / 3
Major / Illegal Throws / Throwing java.lang.Error or java.lang.RuntimeException is almost never acceptable. / 1
Major / Parameter Name / Checks that parameter names conform to the specified format / 1
Major / Empty Finalizer / If the finalize() method is empty, then it does not need to exist. / 1
Major / Hide Utility Class Constructor / Make sure that utility classes (classes that contain only static methods) do not have a public constructor. / 1
Major / Simplify Boolean Return / Checks for overly complicated boolean return statements. / 1
Info / Unused Imports / Checks for unused import statements. / 1
Rule: Visibility Modifier
Defining a class member without modifier (default modifier) makes this member direct accessible by all classes in the same package.
If other programmers use your class, you want to ensure that errors from misuse cannot happen. Access levels can help you do this.
Use the most restrictive access level that makes sense for a particular member. Use private unless you have a good reason not to.
Avoid public fields except for constants. (Many of the examples in tutorials use public fields. This may help to illustrate some points concisely, but is not recommended for production code.) Public fields tend to link you to a particular implementation and limit your flexibility in changing your code.
List of classes with violations
Class Name / No / Rows with violations
RegistryInterfaceBean / 11 / 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25
StructureBean / 10 / 21, 22, 24, 25, 26, 28, 29, 30, 31, 32
KeyFamilyBean / 10 / 21, 22, 24, 25, 26, 28, 29, 30, 31, 32
QueryProvisioningResponseBean / 5 / 17, 18, 19, 20, 22
RefBean / 4 / 13, 14, 15, 16
QueryProvisioningRequestBean / 4 / 17, 18, 19, 20, 22
RefBean / 4 / 12, 13, 14, 15
QueryStructureRequestBean / 3 / 17, 25, 26
AssociationBean / 3 / 12, 13, 14
CodelistMapBean / 3 / 15, 16, 17
QueryStructureResponseBean / 2 / 25, 26
CrossSectionalMeasureBean / 2 / 10, 11
GroupBean / 2 / 15, 16
QueryBean / 2 / 15, 16
SubmissionResultBean / 2 / 14, 15
SubmitProvisioningRequestBean / 1 / 19
MessageBean / 1 / 10
FullTargetIdentifierBean / 1 / 15
SubmitSubscriptionRequestBean / 1 / 19
SubmitProvisioningResponseBean / 1 / 17
ItemSchemeBean / 1 / 24
StructureSetBean / 1 / 15
MetadataAttributeBean / 1 / 14
SubmitStructureResponseBean / 1 / 17
SubmitStructureRequestBean / 1 / 22
ProvisioningStatusBean / 1 / 18
PartialTargetIdentifierBean / 1 / 15
Source code with violation
List codelistMaps = new ArrayList();
Suggested Correction
private List codelistMaps = new ArrayList();
/** Only if access from other classes is required */
public getCodelistMaps() {
return (codelistMaps);
}
Rule: If Statement Must Use Braces
It is good programming style to always write curly braces {}, although they are not needed if the clause contains only a single statement. There are two reasons this is good:
Reliability: When code is modified, the indentation is such a strong indicator of structure that the programmer may not notice that the addition of a statement at the "correct" indentation level really isn't included in the scope of the if statement. This is a surprisingly common error.
Readability: It is faster to read code with the braces because the reader doesn't have to keep in mind whether they are dealing with an un-braced single statement or a braced block.
List of classes with violations
Class Name / No / Rows with violations
KeyFamilyBean / 4 / 156, 239, 241, 243
QueryBean / 2 / 48, 50
CategorySchemeBean / 2 / 38, 43
IdentifierComponentBean / 1 / 79
ReportStructureBean / 1 / 77
DataMessage / 1 / 62
MetadataStructureDefinitionBean / 1 / 53
StatusMessageBean / 1 / 32
Source code with violation
com.agilis.sdmx.model.structure.CategorySchemeBean – row 41
while (cit.hasNext()) {
CategoryBean category = (CategoryBean) cit.next();
if (parentCategory != null)
category.setParentId(parentCategory.getId());
categories.add(category);
iterateDeapthFirst(category, categories, category.getCategories());
}
Suggested Correction
while (cit.hasNext()) {
CategoryBean category = (CategoryBean) cit.next();
if (parentCategory != null) {
category.setParentId(parentCategory.getId());
}
categories.add(category);
iterateDeapthFirst(category, categories, category.getCategories());
}
Rule: Loose coupling
A common programming philosophy is to declare variables with a type that makes the least assumptions. This allows flexibility to make changes in the underlying type if this latter becomes necessary. For example, the most obvious way to declare x is
ArrayList x = new ArrayList(); // OK
But if x only uses the methods defined in the List interface, it would be better to do the following.