Action-rules, re-classification of objects andecommerce

Zbigniew W. Ras<1,2> and Shishir Gupta<1>

<1> UNC-Charlotte, Computer Science Dept., Charlotte, NC 28223, USA <2>Polish Academy of Sciences, Institute of Computer Science, Ordona 21, 01237 Warsaw, Poland e-mail: and

Abstract: Consumers use electronic commerce to purchase a variety of goods and services online, including books, flowers, cars, food, banking, music and other forms of entertainment. Businesses also use these models to improve internal communication, help manage supply chains, conduct technical and market research, and locate potential customers and business partners. The U.S. Census Bureau of the U.S. Department of Commerce Web site ( is an example of an organization using many of these electronic commerce business models. Clearly, consumers decision to purchase some services or goods online is often dependent on satisfaction of previous buyers or features which for a current buyer are essential but they are not listed in the electronic commerce business model the customer wants to use. We claim that the query answering system for DKS proposed in [5] jointly with action rules proposed in [3] and extended in this paper to distributed agents framework can be effectively used to address this problem.

Keywords: knowledge discovery, distributed information systems, e-commerce.

1. Introduction

In the paper by Ras and Wieczorkowska (see [3]), the notion of an action rule was introduced. The main idea there was to generate, from a decision table, special type of rules which basically form a hint to business users showing a way to re-classify objects with respect to the decision attribute. In e-commerce application, this re-classification may just mean that a consumer not interested in a certain product, now may buy it. Decision table schema has a list of classification attributes used to classify consumers. Values of some of these attributes, for a given consumer, can be changed and this change can be influenced and controlled by a business user. However, some of these changes (for instance to the attribute “profit”) can not be done directly to a chosen attribute. In this case, definitions of such an attribute in terms of other attributes have to be learned. These new definitions are used to construct action rules showing what changes in values of some attributes (called here flexible attributes), for a given consumer, are needed in order to re-classify this consumer the way business user wants. But, business user may be either unable or unwilling to proceed with actions leading to such changes. In all such cases we may search for definitions of these flexible attributes looking at either local or remote sites for help. If needed, this process can be continued leading to a chain of definitions of all these flexible attributes. This justifies that the distributed knowledge system framework, introduced in [4], can be a right vehicle for business users to generate action rules linking flexible attributes in several remote sites. These action rules are called in this paper global action rules.

2. Information Systems and Decision Tables

An information system is used for representing business knowledge. Its definition, given here, is due to Pawlak [2].

By an information system we mean a pair S = (U, A), where:

1. U is a nonempty, finite set of objects (called customer identifiers),

2. A is a nonempty, finite set of attributes i.e. a:U Va for a A, where Va is called the domain of a.

Information systems can be seen as generalizations of decision tables [2]. In any decision table together with the set of attributes a partition of that set into conditions and decisions is given. Additionally, we assume that the set of conditions is partitioned into stable conditions and flexible conditions [3]. Attribute a in A is called stable for the set U if its values assigned to objects from U are time dependent in a deterministic way. Otherwise, it is called flexible. Date of Birth is an example of a stable attribute. Interest rate on any customer account is an example of a flexible attribute. For simplicity reason, we will consider decision tables with only one decision. We adopt the following definition of a decision table:

By a decision table we mean any information system of the form S = (U, A1A2 {d}), where d A1A2 is a distinguished attribute called the decision. Elements of A1 are called stable conditions, whereas the elements of A2 {d} are called flexible conditions.

The assumption that attribute d is flexible is quite essential. We may want to re-classify an object x in U from the point of view of attribute d. In this case we may have to change values of some of its attributes from A2, so the value of its attribute d might be changed as well. Before we proceed, certain relationships between values of attributes from A2 and values of the attribute d will have to be discovered first.

3. Action Rules

In this section we recall a method proposed by Ras & Wieczorkowska (see [3]) to construct action rules from a decision table containing both stable and flexible attributes.

Before we introduce new definitions, assume that for any two collections of sets X, Y, we write, X  Y if (xX)(yY)[ x y ]. Let S = (U, A1A2 {d}) be a decision table and B  A1A2. We say that attribute d depends on B if CLASSS(B)  CLASSS(d), where CLASSS(B) is a partition of U generated by B (see [2]). Assume now that attribute d depends on B where B  A1A2. The set B is called d-reduct in S if there is no proper subset C of B such that d depends on C. The concept of d-reduct in S was introduced in rough sets theory (see [2,6]) to identify minimal subsets of A1A2 such that rules describing the attribute d in terms of these subsets are the same as the rules describing d in terms of A1A2. Saying another words, it was shown that in order to induce rules in which THEN part consists of the decision attribute d and IF part consists of attributes belonging to A1A2, only subtables (U, B {d}) of S where B is a d-reduct in S can be used for rules extraction.

By Dom(r) we mean all attributes listed in IF part of rule r. For example, if r = [ (a1,3)*(a2,4)  (d,3)] is a rule then Dom(r) = {a1,a2}. By d(r) we denote the decision value of a rule r. In our example d(r) = 3.

If r1, r2 are rules and B  A1A2 is a set of attributes, then r1/B = r2/B means that the conditional parts of rules r1, r2 restricted to attributes B are the same. For example if r1 = [(a1,3)  (d,3)], then r1/{a1} = r/{a1}.

Example 1. Assume that S = ({x1,x2,x3,x4,x5,x6,x7,x8}, {a,c} {b} {d}) be a decision table represented by Figure 1. The set {a,c} contains stable attributes, b is a flexible attribute and d is a decision attribute.

It can be easily checked that {b,c}, {a,b} are the only two d-reducts in S.

Applying for instance LERS discovery system (developed by J. Grzymala-Busse), the following definitions are extracted from S:

(a,0)  (d,L),(c,0)  (d,L),

(b,R)  (d,L),(c,1)  (d,L),

(b,P)  (d,L), (a,2)(b,S)  (d,H),(b,S)(c,2)  (d,H).

Now, let us assume that (a, v  w) denotes the fact that the value of attribute a has been changed from v to w. Similarly, the term (a, v  w)(x) means that a(x)=v has been changed to a(x)=w. Saying another words, the property (a,v) of object x has been changed to property (a,w).

a / b / c / d
x1 / 0 / S / 0 / L
x2 / 0 / R / 1 / L
x3 / 0 / S / 0 / L
x4 / 0 / R / 1 / L
x5 / 2 / P / 2 / L
x6 / 2 / P / 2 / L
x7 / 2 / S / 2 / H
x8 / 2 / S / 2 / H

Figure 1

Assume now that S = (U, A1A2 {d}) is a decision table, where A1 is the set of stable attributes and A2 is the set of flexible attributes. Assume that rules r1, r2 have been extracted from S and r1/A1 = r2/A1, d(r1)=k1, d(r2)=k2 and k1 k2. Also, assume that (b1, b2,…, bp) is a list of all attributes in Dom(r1)  Dom(r2)  A2 on which r1, r2 differ and r1(b1)= v1, r1(b2)= v2,…, r1(bp)= vp, r2(b1)= w1, r2(b2)= w2,…, r2(bp)= wp.

By (r1,r2)-action rule on (x,y)UU we mean a formula r(x,y):

[ (b1, v1 w1)  (b2, v2  w2) … (bp, vp  wp)](x)  [(d, k1  k2)](y).

If the value of the rule r on (x,y) is true then the rule r is valid for (x,y). Otherwise it is false. For simplicity reason, if x=y, we will say:

“value of the rule on x” instead of “value of the rule on (x,x)”, and

“rule is valid for x” instead of “rule is valid for (x,x)”.

Let us denote by U<r1> the set of objects in U supporting the rule r1. If (r1,r2)-action rule r is valid for x U<r1> then we say that the action rule r supports value k2  Dom(d) in object x.

Example 2. Let S = (U, A1A2{d}) be a decision table from Example 1, A2={b}, A1 ={a, c}. It can be checked that rules r1=[(b,P)  (d,L)], r2=[(a, 2)(b, S)  (d,H)], r3=[(b, S)(c, 2)  (d,H)] can be extracted from S. Clearly x5, x6 U<r1>. Now, we can construct (r1,r2)-action rule r executed on x:

[ (b, PS)](x)  [(d, LH)](x).

Action rule r supports value H Dom(d) in objects x5 and x6.

Example 3. Let S = (U, A1A2{d}) be a decision table represented by Figure 2. Assume that A1= {c, b}, A2 = {a}.

Clearly rules r1=[(a,1)(b,1)  (d,L)], r2=[(c,2)(a,2)  (d,H)] extracted from S are optimal. Also, U<r1>= {x1, x4}.

It can be checked that (r1,r2)-action rule

[ (a, 12)](x)  [(d, LH)](x)

certainly supports new profit ranking H for d in object x1 but only possibly in x4.

c / a / b / d
x1 / 2 / 1 / 1 / L
x2 / 1 / 2 / 2 / L
x3 / 2 / 2 / 1 / H
x4 / 1 / 1 / 1 / L

Figure 2

Algorithm for Constructing Action Rules was presented in [3].

4 Action Rules and Distributed Knowledge Systems

In this section we discuss the process of discovering action rules in distributed knowledge systems introduced in [4].

Let us assume that S = (U,A) and d  A is a flexible attribute with Dom(d)= {d1,d2,d3}. Also, assume that A – {d} = A1A2, where A1 are stable attributes, A2 are flexible attributes and,

U1= {x : d1 = d(x)}, U2= {x : d2 = d(x)}, U3= {x : d3 = d(x)}.

Our goal, if possible, is to change the classification of objects in U1 from d1 to either d2 or d3. To achieve this goal, first we extract rules from S = (U,A) describing values of attribute d in terms of attributes from A1A2. Several situations can occur:

1. All d-reducts in S are subsets of A2.

2. There is a reduct in S which is a subset of A2.

3. All d-reducts have non-empty intersection with A1.

In the first case we have the highest chance to change the classification of objects from U1. The second case is similar assuming that the only reduct we want to consider is the one, which is a subset of A2. In e-commerce scenario, the second case may describe situation when only one reduct is a subset of A2 and it contains attributes which correspond to offers business user is willing to send out to customers. In the last case, we have the lowest chance to extract action rules from S because of the existence of stable attributes in a reduct. For instance, let us assume that age is a necessary condition for classification of objects from U with respect to the decision d. Now, if every person in U1 is minimum 30 years old and every person in U2 is maximum 20 years old, then objects from U1 can not be re-classified to a group U2 because the value of attribute age can not be changed.

In [4], we introduced the notion of a Distributed Knowledge Systems (DKS) framework. DKS is seen as a collection of knowledge systems where each knowledge system is initially defined as an information system coupled with a set of rules (called a knowledge base) extracted from that system. These rules are transferred between sites due to the requests of local query answering systems. Each rule transferred from one site of DKS to another remains at both sites so changes in the contents of knowledge bases may easily be handled. Useless rules (either of very low confidence or not used by query answering system for a long time) are removed from a knowledge base. Assume now that S, defined before, represents one of DKS sites. If rules extracted from S describing values of some flexible attribute d in terms of attributes from A1A2 do not lead us to any useful action rules (user is not willing to undertake actions leading to required changes of values of flexible attributes B2 A2 within the classification part of a rule), we may:

1. find definitions of attributes from B2 in terms of other local flexible attributes (local mining for rules),

2. find definitions of attributes from B2 in terms of flexible attributes from other sites (mining for rules at a remote site),

3. find definitions of the decision attribute d in terms of flexible attributes from other sites (mining for rules at a remote site).

Now, we will discuss all three cases.

Example 4. Assume that S1 = (U1, {a}{b, e, d}) is an information system, represented by Figure 3, where {a} is a set of stable attributes and {b, e, d} is a set of flexible attributes. Information system S = (U, A1A2) where A2={b, d }, A1 ={a, c} is represented by Figure 1. These two systems represent two sites of DKS.

a / b / e / d
y1 / 0 / P / 2 / L
y2 / 0 / P / 2 / H
y3 / 0 / S / 4 / H
y4 / 2 / S / 4 / H
y5 / 2 / P / 3 / L
y6 / 2 / P / 3 / H

Figure 3

From the rules describing attribute d and extracted from system S, the following action rule can be discovered: [(b, PS)](x)  [(d, LH)](x), where x  U.

Now, assume that the user would like to re-classify objects x5, x6 in the information system S with respect to attribute d but he does not know any appropriate action which can be taken to re-classify them with respect to attribute b. Attribute b is the only attribute in A2 which is flexibleso we can not search for a definition of b inside S. In this case, we have to look for another system S’ in DKS in which attribute b is both local and it can be defined in terms of other flexible attributes in S’. In our example, it can be checked that the following action rule can be extracted from S1 :

[(e, 24) ](y)  [b, P S)](y).

This rule is referring to objects y1, y2. The question is, how similar these two objects are to objects x5, x6? Clearly, higher similarity between objects x and y, where x  {x5, x6} and y  {y1, y2}, will guarantee higher probabilty of success for the following action rule (called a global action rule because its classification attribute is at a remote site):

If (x,y)  , then [(e, 24) ](y) [(d, LH)](x)

where (x,y)  U  U1, (x,y) describes the similarity between objects x, y, and  is some threshold value.

Locally, for the system S, this global action rule can be re-written as:

[(e, ?4) ](x) [(d, LH)](x) with confidence k, where k is calculated on the basis of (x,y) from the paragraph above.

The interpretation of the last rule is quite interesting. It is referring to re-classification of objects x5, x6 with respect to a remote attribute e. If this step is successful, these objects should be re-classified with respect to the decision attribute d. Two situations can still happen:

  1. The user knows and is willing to undertake an appropriate action needed to verify and eventually change the value of the remote attribute e for objects x5, x6.

2.The user does not know such actions. In this case we have to search for a definition of attribute e either in terms of local or new global attributes for S (see [4]).

Finally, the last case, represents situation when the decision attribute d belongs to both local and remote site. It happens in the current example. The global action rule, we already learned, has the following local form for S:

r1 = [[(e, ?4) ](x) [(d, LH)](x)].

As we have stated before, this rule is a possible rule and its confidence is calculated on the basis of (x,y).

At site S1, this rule is referring to objects y1, y2. The following two certain rules can be extracted from S1:

(b,S)  (d,H) , (e,4)  (d,H).

Clearly, the second rule “supports” our global action rule r1 when it is seen as a local rule in S1 and the same will increase our confidence in r1 when applied in S.

References

1.Chmielewski M. R., Grzymala-Busse J. W., Peterson N. W., Than S., “The Rule Induction System LERS - a Version for Personal Computers”, in Foundations of Computing and Decision Sciences, Vol. 18, No. 3-4, 1993, Institute of Computing Science, Technical University of Poznan, Poland, 181-212.

2.Pawlak Z., “Rough Sets and Decision Tables”, in Lecture Notes in Computer Science 208, Springer-Verlag, 1985, 186-196.

3.Ras, Z., Wieczorkowska, A., “Action Rules: how to increase profit of a company”, in "Principles of Data Mining and Knowledge Discovery", (Eds. D.A. Zighed, J. Komorowski, J. Zytkow), Proceedings of PKDD'00, Lyon, France, LNCS/LNAI, No. 1910, Springer-Verlag, 2000, 587-592

4.Ras, Z., Zytkow, J.M., "Mining for attribute definitions in a distributed two-layered DB system", in Journal of Intelligent Information Systems, Kluwer, Vol. 14, No. 2/3, 2000, 115-130

5.Ras, Z., “Query Answering based on Distributed Knowledge Mining”, Invited paper, in Proceedings of IAT’2001, Maebashi City, Japan, October 23-26

6.Skowron A., Grzymala-Busse J., “From the Rough Set Theory to the Evidence Theory”, in ICS Research Reports 8/91, Warsaw University of Technology, October, 1991