1

ON INTER-METHOD

ON INTER-METHOD AND INTRA-METHOD OBJECT-ORIENTED CLASS COHESION

Frank Tsui, Orlando Karam, Sheryl Duggins, Challa Bonja

School of Computing and Software Engineering

Southern PolytechnicStateUniversity

Marietta, Georgia, USA 30060

KEYWORDS: Object-Oriented Design, Software Metrics, Software Quality, Systems Evaluation

Abstract

Cohesion has been a topic of interest since structured design in the 1970’s. Cohesion may also be viewed as a characterization of a system attribute. Today, there are numerous researchers continuing this work into object-oriented designs. Most of the current research has focused on the interaction of methods within a class, the inter-method cohesion. In this paper, we consider both the inter-method cohesion and the intra-method cohesion of a class. We have utilized the concept of program slice (Weiser, 1981) andhave extended Functional Cohesion (Bieman & Ott, 1994) to devise a new intra-method cohesion metric, ITRA-C, for measuring cohesion of each method within the class. This intra-method cohesion is based on the notion of effects and chaining in an effect-slice. We further combine the (inter-method, intra-method)-tuple into one combined Class Cohesion, which provides a quick view of bands of cohesion for categorizing classes.

Introduction

Developing high quality software continues to be a difficult task. Many attributes may be studied to understand software. Since software engineering is still in a relatively young stage, applying the “systems approach” as defined by R. L. Ackoff (Ackoff, 1971)where the complete software system is studied in a holistic manner is still a challenge.In this paper, we will focus on a specific software attribute, cohesion, and study it further through measuring this attribute from an object oriented class perspective. Cohesion has been shown to be an important attribute for good quality software (Bansiya & Davis, 2002; Bieman & Ott,1994; Briand ,Morasca ,Basili, 1995). In this paper, instead of the complete software, the object oriented class itself is viewed as the system. Cohesion is an attribute that characterizes connectedness and thus allows us to view a system as a set of connected elements (Checkland, 1981), rather than in separate parts. We pursue an in-depth analysis of this single attribute of the system through the various views of inter and intra method cohesion metrics. We will also show how the cohesion metrics may be used to help us design better object oriented classes. Thus, the value of the paper is not only in extending the concepts of cohesion and the various associated metrics, but also the application of theses metrics in guiding us in improving our class, or system, design.This emphasis on engineering software has lead to research into measurements for evaluating the quality of software. Low coupling and high cohesion have been identified as attributes of good software design (Bansiya & Davis, 2002; Briand et al, 1994) and a wide number of metrics have been developed to measure these quality attributes. The notion of cohesion has been in existence for several decades (Stevens, et al 1974; Yourdon & Constantine, 1979). These early papers introduced the concept of “functional relatedness” of modules. The relatedness among modules was called coupling and the relatedness within a module was called cohesion. Relatedness itself is an abstract concept which asks if items belonged together. Intuitively, those that “belonged” together ought to be designed into one entity. This made sense, especially, for the follow-on maintenance people who had to understand and make modifications to the design and the code. That is, if the “related” entities are spread across the system, then it is more difficult to find them. As Checkland (1981) advocates, a system should be thought of as a connected set of elements rather than separate parts. Other than the now well know seven levels of cohesion (coincidental, logical, temporal, procedural, communicational, sequential, and functional), which defined ordered categories of cohesion, there was not a numeric metric for modular cohesion in those early days. Bieman and Ott (1994) and Bieman and Kang (1995; 1998) introduced numeric metrics based on program slices to gauge “relatedness,” or cohesion.

Following the same concept of relatedness, there are several metrics designed to measure cohesion of an object-oriented class. Briand et al. (1994;1998), Hitz and Montazeri (1995), Chidamber and Kemerer (1994), Bansiya and Davis (2002), Counsel et al. (2006), Henderson-Sellers (1996), Bonja and Kidanmariam (2006), Chae et al. (2004), and Zhou et al. (2002;2004) have proposed different approaches to measuring cohesion in an objected oriented class. For the most part, these metrics all revolve around the notion of relatedness of the methods in a class. The relatedness of the methods is primarily gauged by the amount of and the type of sharing of the attributes, or data. The methods in a class are considered more cohesive if the amount of or type of (or both) sharing of attributes is higher. Also, the amount of interaction among methods in the form of method evocation of other methods in the class is considered an important factor for cohesion among methods in a class. That is, the connectedness of the methods is considered important. But, still, whether each individual method itself is cohesive or not is not clearly accounted for. In this paper, we will consider class cohesiveness to be composed of two attributes:

-Relatedness, and

-Singularity in function or purpose.

If one views a class as a system, then the relatedness concept of methods in that system is similar to the concept of coupling of the methods in the class. The more “coupled” the methods within a class are, the higher the cohesion of that class is. Thus, inter-method cohesion is captured by the notion of coupling of methods in the class. In such a context, one is lead to ask what an individual method cohesion is. That is, the singularity of function for each method in a class is important. Thus, intra-method cohesion must also be considered. The intra-method cohesion should answer how singular, or the degree of singularity in purpose,is the method. Intuitively, the more singular the method’s functional purpose is, the more cohesive its intra-method cohesion is. The ideal situation for a class is to maximize single purpose methods (intra-method cohesion) and also have these cohesive methods be strongly related in a class (inter-method cohesion). Both inter-method cohesion and intra-method cohesion need to be included when discussing the cohesion of a class. Furthermore, one may want to consider which one of the sub-attributes, inter-method or intra-method, is more important.

Design quality metrics for object oriented systems can be categorized as either static or dynamic. Dynamic metrics measure object level coupling and dynamic complexity (Yacoub et al, 1999). This paper will address static metrics which measure the static cohesion of a class. The static structure of a class is considered to have only three main parts, the class name, the instance variables, and the methods. Class cohesion is analyzed by utilizing the instance variables and the methods of the class and their interplays within the class. We will first discuss the traditional relatedness of methods in a class, the inter-method cohesion metric. The notion of inter-method coupling will be studied through a set of evolving scenarios with adding instance variable and adding method to an “ideally” inter-method wise cohesive class. We will then explore the concept of intra-method cohesion and introduce an intra-method cohesion metric. In the process of extending the metric definition, we also expand the notion of intra-method cohesion. Finally, the combination of inter-method and intra-method cohesion will be considered. Here, the difficulties involving multi-attribute metric as pointed out by Fenton and Pfleeger (1997) is explored. A potential combination metric will be proposed, and its characteristics will be discussed.

Inter-Method Cohesion

Cohesion of an entity is based on several basic and similar concepts (Stevens et al, 1974; Yourdon & Constantine 1979). These range from how much the entity serves a common goal to how related the parts of the entity are. These are intuitively similar in that if an entity had many unrelated parts, then chances are they may be serving more than a singular purpose. Here, we will use a very simple and contrived example for illustration purpose. Consider, as an example, where a Class Math is designed to perform a single service of providing the sum of a set of integer numbers. This Class Math may be expanded to include more services in the form of methods to provide the maximum of the set of integer numbers, the minimum of the set of integer numbers, and the average of the set of integers. As Class Math matures and enters into maintenance mode, it is further expanded to also accept floating point numbers. Further enhancement of Class Math may include a method that performs the input check and restricts the input to be only integers and floating point. In a way, these enhancements are not atypical of a Class that evolves through its post-release enhancements. We can easily see how a very limited single purpose Class Math can be expanded to a broader multi-purpose Class Math.

Using this Class Math example, let us examine how the various inter-method metrics would treat the change in single purposefulness and relatedness of a Class. The inter-method metrics which define cohesion based on the interaction of methods with the instance variables will all treat the above Class Math in a similar way. That is, the methods are all interacting with the same set of instance variables, the input integers and the input floating numbers. In Table 1, we have summarized these different cohesion metrics of interest which we will use to trace their respective changes as Class Math evolves.

Table 1: Some Major Inter-Method Cohesion Metrics

Metric / Metric Explanation
Briand, et al. (1998):
RCI / RCI = |CI (C)| / |Max (C)|, where CI( C) is the set of all data declaration, or DD, interactions and data-method, or DM, interactions in Class. Max (C) is the set of all possible DD and DM interactions.
Bieman and Kang(1995;1998):
TCC and LCC / TCC = NDC/ NP, where NDC = # of pairs of methods that directly or indirectly use common attributes, or directly connected methods. NP = all possible # of pairs of methods that directly or indirectly use common attributes, or all possible directly connected pairs.
LCC = (NDC + NIC)/NP, where NIC are the pairs of methods that are indirectly connected.
Bonja and Kidanmariam(2006):
CC / CC = ( ∑(|IVC|/|IVT|) )/ |Max Pairs|, where IVC is the set of common instance variables used by a pair of methods. IVT is the set of instance variables used by a pair of methods. The numerator is the sum of these ratios summed over all the pairs of methods, or n!/(2*(n-2)!) pairs, in the Class. Max pairs is the maximum possible pairs of methods, which is n!/(2*(n-2)!) pairs for a class with n methods.
Chidamber and Kemerer (1994):
LCOM / LCOM= |P| - |Q| if |P| > |Q|; otherwise 0. If there are n methods, then {Ii} is the set of instance variables used by method i, Mi. Then P = { (Ii, Ij) where Ii ∩ Ij = Ø}, and Q = {(Ii, Ij) where Ii ∩ Ij ≠ Ø. If for all i, {Ii} = Ø, then P = Ø .
Hitz and Montazeri (1995):
LCOM4 / LCOM4 = # of connected components in a class, where method a and method b is connected if 1) they share an instance variable or 2) either method a invokes method b or vise versa.
Henderson-Sellers (1996):
LCOM5 / LCOM5 = [((1/a) ( ∑ u(Aj) )) - m ] / ( 1- m) where a = # of attributes or instance variables, u(Aj) = number of methods accessing attribute Aj, m = number of methods, and ∑u(Aj) is summed over all the attributes j=1, ---, a.
Bansiya et al. (2002):
CACM / CACM = (∑ ∑ Oij )/ ( KL), where Oij is the (i,j )th entry in the parameter occurrence matrix. Oij = 1 if the jth data type occurs as a parameter in the ith method, and Oij = 0 otherwise. K is the number of columns or number of data types in the parameter occurrence matrix, and L is the number of rows or the number of methods in the parameter occurrence matrix. ∑∑ Oij is summed over all the parameter data types, K, and over all the methods, L.
Counsel et al. (2006):
NHD / NHD = (∑∑ Aij ) / [L * (K(K-1)/2) ] where Aij is the entry of parameter agreement matrix. Aij = number of parameter agreements between method mi and mj. K = number of methods and L = number of attribute types. NHD is the ratio of methods agreeing on parameter types to the maximum potential of every method agreeing with every other method in parameter types. The denominator is L attributes times the number of pairs of methods out of K methods.

Consider the values that each of the metrics in Table 1 will evolve from the most ideal cohesive situation to a less ideal case as described through the following five scenarios.

a)All the methods within the Class use/share the single instance variable (e.g. integers)

b)Add another instance variable (e.g. floating type) that is shared by all the methods.

c)Add one more method that also uses/shares the same instance variables

d)Add an instance variable that disturbs the “uniformity” of all the methods sharing all the instance variables. That is, in the software maintenance or evolution mode, we often will introduce an additional instance variable into a Class without fully considering the erosion to cohesiveness of that Class.

e)Add a method that similarly disturbs the “uniformity” of all the methods sharing all the instance variables. Again, during software evolution we often will introduce an additional method into a Class without considering how it might erode the cohesiveness of that Class.

In Table 2, we have summarized the evolution of the metric values as the Class Math evolves from the above condition (a) through condition (e). Class Math starts with inputting a set of integers as an instance variable. There are four methods in Class Math that compute the sum, min, max, and average of the integers respectively. Then Class Math is expanded to input floating point numbers and the same four methods are enhanced to compute the sum, min, max, and average of the floating point numbers. Then an additional fifth method is included to perform a check to ensure that both integers and floating point input numbers are between -10,000 and +10,000. Then Class Math may evolve to either include an instance variable that only some of the methods use or include a method that uses some of the instance variables. Let us pick a sample metric in Table 2, the Henderson-Sellars’ (1996) LCOM5, as we consider these scenarios. As we go through the scenarios, it will become evident that even this simple evolution is not as clear cut as it looks.

LCOM5 Computations

LCOM5 was defined by Henderson-Sellers (1996). It predominantly looks at the number of methods that access each of the set of attributes or data, specifically only the instance variables. Thus, LCOM5 does not deal with data to data interactions and the non-instance variables. It focuses on instance variables to method interactions. For LCOM5 having a value of 0 is considered perfect cohesion.

For scenario (a), there are 5 data elements. One is an instance variable, I1, which is accessed by all 4 methods. Let I1 be represented as A1, and the other 4 data elements be A2 through A5. Recall that these 4 data elements are variables: sum, min, max, and average. They are defined within each of the methods and accessed only by their respective methods, m1 through m4. We realize that one may argue to have these declared as instance variables. But for this example, we will purposely chose not to do so. Since these 4 data elements are not instance variables, they will not enter into LCOM5 computation. We only have u(A1) = 4. LCOM5 = [(1/1) *(4) – (4)] / (1-4) = 0. For scenario (a), LCOM5 =0 is considered perfect cohesion.

For scenario (b), we introduce another instance variable, a floating type I2, to be accessed by all 4 methods again. In this case, u(A1) = 4 and u(A2) = 4. LCOM5 = [ (1/2)* (4+4) – 4 ] / ( 1- 4) = 0. Thus for scenario (b), LCOM5 = 0 indicates that the Class cohesion remains perfect.

Consider scenario (c) where a fifth method, input check method, is introduced to check both of the instance variables. Therefore, u(A1) = 5 and u(A2) = 5. LCOM5 = [(1/2) * (5 + 5) – 5]/ (1-5) = 0. For scenario (c), LCOM5 continues to indicate that the Class cohesion remains perfect.

For scenario (d), we introduce a third instance variable, I3 that is only accessed by 1 of the 5 methods. In this case, u(A1) and u(A2) both remain the same as before, and u(A3) = 1. LCOM5 = [(1/3)* (5 +5 +1) – 5] / ( 1- 5) = (- 4/3) / (-4) = 1/3. This indicates that the Class cohesion has deteriorated as it is moving from the perfect 0 case towards 1, the worst case.

The final scenario (e) is to introduce a sixth method that accesses only 1 of the 3 existing instance variables. We will arbitrarily pick that 1 instance variable to be I1. Now, u(A1) = 6, u(A2) = 5, and u(A3) = 1. LCOM5 = [(1/3)* (6+5+1) – 6] / (1-6) = (-2)/(-5) = 2/5. This time LCOM5 has slightly increased in value, indicating further deterioration of Class cohesion.

LCOM5 metric showed perfect cohesion for scenarios (a) through (c). Intuitively, this made sense when one is only considering the instance variable. As the two cases in scenarios (d) and (e) show, the class cohesion eroded and increased in value as we introduced an instance variable that is only utilized by one method followed by the introduction of a method that only uses I1. The only inconvenient part is LCOM5 starts with a perfect 0 and increases in value to the worst case, 1 as cohesion deteriorates.

Summarizing Scenario (a) through (e):

In Table 2, we have summarized the evolution of the metric values as the Class Math evolves from condition (a) through condition (e) for all the metrics listed in Table 1.

Table 2: Summarizing Scenarios (a) through (e)

(a) / (b) / (c) / (d)
add an instance variable / (e)
add a method
Briand et al(1998):
RCI / .6 / .6 / .65 / - adding an instance variable that only interacts with some methods further decreases RCI / - adding a method that does not access all instance variables also further decreases the value of RCI
Bieman and Kang(1995;1998):
TCC / 1 / 1 / 1 / - adding an instance variable that only
interacts with some methods creates no change to TCC / - adding a method that does not access all the instance variable, decreases TCC
Bonja and Kidanmariam (2006):
CC (X) / 1 / 1 / 1 / -adding an instance variable that only interacts with some methods decreases CC (X) / - adding a method that does not interact with all the instance variables, decreases CC (X)
Chidamber and Kemerer (1994):
LCOM / 0 / 0 / 0 / - adding an instance variable that creates a “non-uniform’ situation that affects |P| and |Q|; if |P| < |Q| then LCOM = 0 and if |P| >|Q|, then LCOM increases in value from 0 / - adding a method that creates a “non-uniform’ situation that affects |P| and |Q|; if |P| < |Q| then LCOM = 0 and if |P| >|Q|, then LCOM increases in value from 0.
Hitz and Montazeri (1995):
LCOM4 / 1 / 1 / 1 / - adding an instance variable that is not accessed by all methods creates a non-uniformity and increases LCOM 4. / - adding a method that does not access all instance variables creates a non-uniformity and increases LCOM 4.
Henderson-Sellers (1996):
LCOM5 / 0 / 0 / 0 / - adding instance variable that is accessed by some methods increases LCOM5 above 0 towards 1, deteriorating cohesion / - adding a method that accesses some instance variable also increases LCOM5 towards 1 and further deteriorates cohesion
Bansiya, et al. (2002):
CACM / 1 / 1 / 1 / -adding an instance variable that is not used by all the methods decreases CACM from1 / - adding a method that does not use all instance variables decreases CACM from 1
Counsel, et al. (2006):
NHD / 1 / 1 / 1 / - adding an instance variable that is not used by all methods decreases NHD from 1 / - adding a method that does not access all the instance variables decreases NHD from 1

This plethora of cohesion metrics of a Class shows that the ideal value for cohesion varies. Some start at 0 and increase in value as cohesion is compromised, and others start at 1 and decrease in value as cohesion erodes. While each has its own strength as a metric, it is nevertheless difficult to keep track of all the details behind these metrics.