ROUGH SET THEORY AND REDUCT BASED RULE GENERATIONshodhganga.inflibnet.ac.in/bitstream/10603/22715/11... · ROUGH SET THEORY AND REDUCT BASED RULE GENERATION 3.1 Introduction Rough

35

Chapter-3

ROUGH SET THEORY AND REDUCT

BASED RULE GENERATION

3.1 Introduction

Rough Set Theory (RST), proposed in 1982 by Zdzislaw Pawlak, is since then in a state

of constant development. Its methodology is concerned with the classification and

analysis of imprecise, uncertain or incomplete information and knowledge, and has been

considered as one of the first non-statistical approaches in data analysis [107]. The

fundamental concept behind RST is the approximation of lower and upper spaces of a set,

the approximation of spaces being the formal classification of knowledge regarding the

interest domain. The subset generated by lower approximations is characterized by

objects that will definitely form part of an interest subset, whereas the upper

approximation is characterized by objects that will possibly form part of an interest

subset. Every subset defined through upper and lower approximation is known as Rough

Set [107]. Over the years RST has become a valuable tool in the resolution of various

problems, such as representation of uncertain or imprecise knowledge; knowledge

analysis; evaluation of quality and availability of information with respect to consistency

and presence of data patterns; identification and evaluation of data dependency and

reasoning based on uncertain and reduct of information data [105].

The extent of rough set applications used today is much wider than in the past, principally

in the areas of medicine, analysis of database attributes and process control. RST has

some overlaps with other methods of data analysis, e.g., statistics, cluster analysis, fuzzy

sets, evidence theory and others but it can be viewed in its own rights as an independent

discipline [114]. The rough set approach seems to be of fundamental importance to AI

and cognitive sciences, especially in the areas of machine learning, knowledge

acquisition, decision analysis, knowledge discovery from databases, expert systems,

36

inductive reasoning and pattern recognition. It seems of particular importance to decision

support systems and data mining [103]. This theory has been successfully applied in

many real-life problems in medicine, pharmacology, engineering, banking, financial and

market analysis and others [115].

An early application of this theory was to classify imprecise and incomplete information.

Reduct and Core are the two important concepts in rough sets theory. RST is an elegant

theory when applied to small data sets because it can always find the minimal reduct and

generate minimal rule sets. However, the general solution for finding the minimal reduct

is NP-hard [78]. An NP-hard problem is defined as a problem that cannot be solved in

polynomial time. In other words as data sets grow large both in dimension and volume

then finding the minimal reduct becomes computationally infeasible.

Rough sets approach shows many advantages. The most important ones are [115].

Synthesis of efficient algorithms for finding hidden patterns in data;

Identification of relationships that would not be found using statistical methods;

Representation and processing of both qualitative and quantitative parameters and

mixing of user-defined and measured data;

Reduction of data to a minimal representation (data reduction);

Evaluation of the significance of data;

Synthesis of classification or decision rules from data;

Legibility and straightforward interpretation of synthesized models;

Generates sets of decision rules from data;

• It is easy to understand;

• Offers straightforward interpretation of obtained results;

So rough set theory can be very useful in many intelligent industrial applications as an

independent approach or combined together with other areas of soft computing, e.g.

fuzzy sets, neural networks, logistic regression etc.

37

3.2 Basic Philosophy of Rough Set

In this section, all the important concepts related to Rough sets theory have been defined

[106].

3.2.1 Equivalence Relation

Let U be a non-empty set and let p, q, and r be elements of U. Consider R such that pRq

if and only if (p, q) is in R. R is an equivalence relation if it satisfies the following three

properties:

i) Reflexive Property: (p, p) is in R for all p in U.

ii) Symmetric Property: if (p, q) is in R, then (q, p) is in R.

iii) Transitive Property: if (p, q) and (q, r) are in R, then (p, r) is in R.

3.2.2 Decision Table or Information System and Indiscernibility Relation

Let T = (U, A, Q, ρ) be a Information system, where U is a non-empty finite set of

objects called the universe, A is a set of attributes, Q is the union of domains of attributes

in A and ρ : U X Q A is a total description function. For classification of objects, set

of attributes A is divided into condition attributes denoted by CON and decision attribute

denoted by DEC. In the context of classification the information table is known as

decision table. The elements of U are called objects, cases, instances or observations[110].

Attributes are interpreted as features, variables or characteristic conditions. Given a

feature a, such that:

a :U Va for a A, Va is called the value set of a .

Let a A, P A, the indiscernibility relation IND(P), is defined as:

IND(P) = {(x, y)U U : for all a P, a(x) = a( y)} In simple words, two objects are

indiscernible if we can not differentiate between them, because they do not differ enough

on the subset P of attributes.

3.2.3 Lower Approximation of a Subset

Let B C and X U , the B-lower approximation set of X, is the set of all elements

of U which can be with certainty classified as elements of X.

XxBUxXB )(:)(

38

This is the B-lower approximation of the subset of X.

3.2.4 Upper Approximation of a Subset

The B-upper approximation set of X is the set of all element of U, that can possibly

belong to the subset of interest X [111].

This represents the B-upper approximation of the subset of X.

3.2.5 Boundary Region of a Subset

It is the collection of elementary sets defined by:

Boundary region consists of those objects that we cannot decisively classify into X in B.

3.2.6. Rough Set

A subset defined through its lower and upper approximations is called a Rough Set.

When the boundary region is a non-empty set that is )(XB ≠ )(XB then the set is

called a Rough Set.

3.2.7 Crisp Set.

A set is called Crisp set when its boundary region is empty that is )(XB = )(XB

3.2.8 Positive Region of a Subset

It is the set of all objects from the universe U which can be classified with certainty to

classes of U / D employing attributes from C.

CPOS )(D = )(XC

XxBUxXB )(:)(

)()()( XBXBXBNB

39

Where )(XC denotes the lower approximation of the set X with respect to C. The

positive region of the subset X belonging to the partition U/D is also called the lower

approximation of the set X. The positive region of a decision attribute with respect to a

subset C represents approximately the quality of C. The union of the positive and the

boundary regions constitutes the upper approximation [109].

Definition 3.2.9 Negative Region of a Subset

The negative region consists of those elementary sets that have no predictive power for a

subset X given a concept R. They consist of all classes that have no overlap with the

concept. That is,

RNEG )(X = )(XRU

3.2.10. Reduct

A system T = (U, A, C, D) is independent if all c in C are indispensable. A set of features

R C is called the reduct of C if T'= (U, A, R, D) is independent and

POSR (D) = POSC (D). Furthermore, there is no T R such that

POST (D) = POSC (D)

A Reduct is a minimal set of features that preserves the indiscernibility relation produced

by a partition of C. There could be several subsets of attributes like R. Similar or

indiscernible objects may be represented several times on an information table, some of

the attributes maybe superfluous or irrelevant, and they could be removed without loss of

classification performance [140].

3.2.11 Core

The set of all the features indispensable in C is denoted by CORE(C). We have

CORE(C) = RED(C)

Where RED(C) is the set of all reducts of C. Thus, the Core is the intersection of all

reducts of an information system [112]. The Core does not consider the dispensable

features and it can be expanded using Reducts.

40

3.2.12. The Dependency Coefficient

Let T = (U, A, C, D) be a decision table. The Dependency Coefficient between the

condition attributes C, and the decision attribute D is given by

γ(C,D) = |POSC(D)| / |U|

The dependency coefficient varies between 0 and 1, since it expresses the proportion of

the objects correctly classified with respect to the total, considering the conditional

features set. If γ = 1, D depend totally on C, if 0<γ<1, the D depends partially on C, and

if γ = 0, then D does not depend on C [113]. A decisional attribute depends on the set of

conditional features if all values of decisional feature D are uniquely determined by

values of conditional attributes; that is there exist a dependency between values of

decisional and conditional features.

3.2.13 Accuracy of the Approximation

The accuracy of the approximation of the set X from the elementary subsets is measured

as the ratio of the lower and the upper approximation size. The ratio is equal to 1, if no

boundary region exists, which indicates a perfect classification. In this case, deterministic

rules for the data classification can be generated [112].

α(X) = Lower(X) / Upper(X)

Thus, a set X with accuracy equal to 1 is crisp, otherwise X is rough. Obviously 0 α

1.

3.2.14 Significance of Attributes

One of the first ideas was to consider the relevant features in the core of an information

system, i.e. the features that belong to the intersection of all reducts of the information

system [108]. It can be easily checked that several definitions of relevant features that are

used by machine learning community can be interpreted by choosing a relevant decision

system corresponding to the given information system [9].

It is also possible to find the relevant features from some approximate reducts of

sufficiently high quality. In attribute reduction some of the attributes can be eliminated

from the information table without loosing relevant information contained in the table [8].

The idea of attribute reduction can be generalized by an introduction of the concept of

41

significance of attributes, which enables an evaluation of attributes not only by a two-

valued scale, {dispensable , indispensable} but by associating with an attribute a real

number from the [0,1] closed interval; this number expresses the importance of the

attribute in the information table.

Let C and D be the sets of condition and decision attributes respectively and let a be a

condition attribute, i.e., Ca . As indicated earlier the number ),( DC expresses the

degree of consistency of the decision table, or the degree of dependency between

attributes C and D, or accuracy of approximation of U/D by C. We can ask how the

coefficient ),( DC changes when removing the attribute a, i.e., what is the difference

between ),( DC and )},{( DaC . We can normalize the difference and define the

significance of the attribute a as

),(

)},{(1

),(

))},{(),(()(),(

DC

DaC

DC

DaCDCaDC

,

Where C and D are condition and decision attributes respectively. In general we have

1)(0 a , for any sets of C and D.

3.3 Information System of a Patient Dataset

A data set is represented as a decision table, where each row represents a case, an event, a

patient, or simply an object. Every column represents an attribute (a variable, an

observation, a property, etc.) and its value for different objects. The attribute values may

be supplied by human expert or user. This table is called an information system or

decision table [159].

A decision table is used to specify what conditions lead to decisions. A decision table is

defined as T = (U, A, Q, ρ) where U is the set of objects in the table, A is a set of

attributes, Q is the union of domains of attributes in A and ρ : U X C A is a total

description function. For classification of objects, set of attributes A is divided into

condition attributes denoted by CON and decision attribute denoted by DEC and A =

CON DEC and CON ∩ DEC = φ [58]. In order to take an example we have collected

the data of 15 patients from a hospital as shown in table 3.1. Patients may or may not

have heart problem depending on the values of condition attributes. There are seven

42

condition attribute and one decision attribute. Each condition attribute has some value

from the domain of that particular attribute.

Let B = { P1, P2, P3, P4, P5, P6, P7, P8, P9, P10, P11, P12, P13, P14, P15} be the set

of 15 patients.

The set of Condition attributes of Information System C = {Heart Palpitation, Blood

Pressure, Chest Pain, Cholesterol, Fatigue, Shortness of Breath, Rapid Weight Gain}

The set of Decision attribute of information system D = {Heart Problem}

Patient H.P. B.P. C.P. Cholesterol Fatigue S.O.B Rapid

Weight

Gain

Heart

Problem

Decision

P1 High High High Normal Low Low High Yes

P2 Low Low High Normal Low Normal Low No

P3 Low High Low Low Normal Low Low No

P4 V_high Low V_high Low Low Normal High Yes

P5 High High V_high Low Low Low High Yes

P6 Low High Low High Low Normal Low No


P8 High High High Low Low Low High Yes

P9 High High High Low Low Low High No


P11 V_high High High Low Normal Normal High Yes

P12 V_high Low High Low Low Low High Yes

P13 High Low Low Low High Low Low No

P14 Low Low High High Low High Low No

P15 High V_high V_high Normal Low Normal High Yes

Table 3.1 Information System or Patient Data Set of 15 Patients

43

Table 3.2 represents the set of values for each condition attribute of the information table

3.1. This table represents the nominal values of patient dataset.

Attributes Nominal Values

Condition Attributes

Heart Palpitation,

Blood Pressure

Chest Pain,

Cholesterol,

Fatigue,

Shortness of Breath,

Rapid Weight Gain,

High, Low, V_high

High, Low, V_high

High, Low, V_high

Normal, High, Low

Normal, High, Low

Normal, High, Low

High, Low

Decision Attribute Heart Problem Yes, No

Table 3.2 Values of Patient Dataset

The Heart Problem of a patient is related to the Heart Palpitation, Blood Pressure, Chest

Pain, Rapid Weight Gain, Fatigue, Shortness of Breath, and Cholesterol. Table 3.1 can

be used to decide whether a patient has heart problem or not according to various values

of its condition attributes (Heart Palpitation, Blood Pressure, Chest Pain, Rapid Weight

Gain, Fatigue, Shortness of Breath, and Cholesterol). For example, the first row of this

table specifies that a patient with Heart Palpitation = High, Blood Pressure = High,

Chest Pain= High, Cholesterol = Normal, Fatigue = Low, Shortness of Breath = Low and

Rapid Weight Gain = High, then Heart Problem= Yes.

The rows in this table are called the objects, and the columns in this table are called

attributes [107]. Condition attributes are the features of a patient related to its Heart

Problem.

There are 15 patients in this dataset, and there are no missing attribute values. The

situation when some of the condition attributes have missing values has not been

discussed here. The reduct and the core are important concepts in rough sets theory. A

reduct is a subset of condition attributes that are sufficient to classify the decision table. A

44

reduct may not be unique. The core is contained in all the reduct sets, and it is the

necessity of the whole data. In other words, any reduct generated from the original data

set cannot exclude the core attributes.

3.4 Indiscernible Relation of the Patient Dataset

Indiscernible Relation is a central concept in RST, and is considered as a relation

between two or more objects, where all the values are identical in relation to a subset of

considered attributes. Indiscernible relation is an equivalence relation, where all identical

objects of set are considered as the elementary sets [112].

It can be observed from table 3.1 that the set is composed of attributes that are directly

related to patients symptoms = {Heart Palpitation, Blood Pressure, Chest Pain,

Cholesterol , Fatigue, Shortness of Breath, Rapid Weight Gain}, the indiscernibility

relation of condition attribute is given by IND(C). When table 3.1 is broken down it can

be seen that indiscernibility relation is given in relationship to conditional attributes. The

indiscernible relations for all the conditions and decision attribute are shown from table

3.3 to table 3.11.

The Heart Palpitation attribute generates three indiscernible elementary sets because this

condition attribute has three values ‘High’, ‘Low’ and ‘V-high’.

IND ({Heart Palpitation}) = {{P1, P5, P7, P8, P9, P13, P15}, {P4, P10, P11, P12},

{P2, P3, P6, P14}}

45

Patient H.P. B.P. C.P. Cholesterol FT S.O.B R.W.G Heart

Problem

Decision
















Table 3.3 Indiscernible Relation of Heart Palpitation

- The Blood Pressure attribute generates three indiscernible sets according to its values

IND ({Blood Pressure}) = {{P1, P3, P5, P6, P7, P8, P9, P11},{ P2, P4, P10, P12, P13,

P14},{P5}}

46


Problem

Decision
















Table 3.4 Indiscernible Relation of Blood Pressure

- The Chest Pain attribute generates three indiscernible elementary sets:

IND ({Chest Pain}) = {{P1, P2, P8, P9, P11, P12, P14}, {P3, P6, P13}, { P4, P5, P7,

P10, P15}}

47


Problem

Decision
















Table 3.5 Indiscernible Relation of Chest Pain

- The Cholesterol attribute generates three indiscernible elementary sets:

IND ({Cholesterol}) = {{P1, P2, P15}, {P3, P4, P5, P7, P8, P9, P10, P11, P12,

P13},{ P6, P14}}

48


Problem

Decision
















Table 3.6 Indiscernible Relation of Cholesterol

- The Fatigue attribute generates three indiscernible elementary sets because this

condition attribute assumes three values.

IND ({Fatigue}) = {{P1, P2, P4, P5, P6, P7, P8, P9, P10, P12, P14, P15}, {P3,

P11},{ P13}}

49


Problem

Decision
















Table 3.7 Indiscernible Relation of Fatigue

- The Shortness of Breath attribute generates three indiscernible elementary sets:

IND ({Shortness of Breath }) = {{P1,P3, P5, P5, P7, P8, P9, P12, P13}, {P2, P4, P6,

P10, P11, P15},{ P14}}

50

Patient H.P. B.P. C.P. R.W.G FT S.O.B Cho Heart

Problem

Decision
















Table 3.8 Indiscernible Relation of Shortness of Breath

- The Rapid Weight Gain attribute generates two indiscernibility elementary sets:

IND ({Rapid Weight Gain}) = {{P1, P4, P5, P7, P8, P9, P10, P11, P12, P15}, { P2,

P3, P6, P13, P14}}

51


Problem

Decision
















Table 3.9 Indiscernible Relation of Cholesterol

- The Decision attribute generates two indiscernible elementary sets one is for decision

value ‘Yes’ and other is for decision value ‘No’.

IND ({Heart Problem}) = {{P1, P4, P5, P7, P8, P10, P11, P12, P15},{ P2, P3, P6, P9,

P13, P14}}

52


Problem

Decision
















Table 3.10 Indiscernible Relation of the Decision/ Heart Problem

53

3.5 Approximations of the Patient Dataset

The starting point of RST is the indiscernible relation, generated by information

concerning objects of interest. The indiscernible relation is intended to express the fact

that due to the lack of knowledge it is unable to discern some objects employing the

available information [111]. Approximation is also an important concept in RST;

topological operations are associated with the meaning of the approximations. The lower

and the upper approximations of a set are interior and closure operations in a topology

generated by the indiscernibility relation as described below.

a. Lower Approximation (B”)

Lower Approximation is a description of the domain objects that are known with

certainty to belong to the subset of interest. The Lower Approximation Set of a set X,

with regard to R is the set of all of objects, which certainly can be classified in X

regarding R, and is denoted by B”.

b. Upper Approximation (B*)

Upper Approximation is a description of the objects that possibly belong to the subset of

interest. The Upper Approximation Set of a set X regarding R is the set of all the objects

which can possibly be classified in X regarding R, and is denoted by B*.

c. Boundary Region (BR)

Boundary Region is description of the objects of a set X regarding R containing all the

objects, which can neither be classified as X nor as ⌐X regarding R. If the boundary

region is a null set, then the set X is considered "Crisp", that is, exact in relation to R,

otherwise, if the boundary region is not empty then the set X is consider as "Rough Set".

In that case the boundary region is BR = B* - B”.

Lower Approximation of Patient Dataset (B”)

Lower Approximation set (B”) is the set of the patients that definitely have Heart

Problem. These patients are identified as

B” = {P1, P4, P5, P7, P10, P11, P12, P15}

54

Lower Approximation set (B”) of patients that certainly do not have Heart Problem are

identified as

B” = {P2, P3, P6, P13, P14}

Upper Approximation of Patient Dataset B*

Upper Approximation set (B*) of the patients that possibly have Heart Problem are

identified as

B* = {P1, P4, P5, P7, P8, P9, P10, P11, P12, P15}

Upper Approximation set (B*) of the patients that possibly do not have Heart Problem

are identified as

B* = {P2, P3, P6, P8, P9, P13, P14}

Boundary Region of Patient Dataset (BR)

Boundary Region (B*) of the patients that do not have Heart problem are identified as:

BR = { P2, P3, P6, P8, P9, P13, P14} - { P2 ,P3 ,P6, P13, P14} = {P8, P9}.

Boundary Region (B*), the set of the patients that have Heart Problem are identified as:

BR = { P1, P4, P5, P7, P8, P9, P10, P11, P12, P15} - { P1, P4, P5, P7, P10,P11, P12,

P15} = {P8, P9}

55

Fig 3.1 Lower and Upper approximations of the Patients with Heart Problem

P1,P4,P5,P7,P10,

P11,P12,P15

Lower Approximation

The Universe Upper

Approximation

The Set

56

3.6 Quality of Patient Dataset

Quality of Patient dataset can be obtained numerically using its own elements,

specifically the lower and upper approximations. The coefficient used in measuring the

quality is represented by αB(X), where X is a set of objects in B. The quality of

approximation uses two coefficients that are defined as under [115]:

• Imprecision coefficient αB (X) or Accuracy of Approximation

Where αB is the quality of approximation of X, given by:

αB(X) = |B”(X)| / |B*(X)|

(1)

Where |B”(X)| and |B*(X)| represent the cardinality of lower and upper approximations,

and the upper approximation of the set set ≠ ∅. Therefore, 0 ≤ αB ≤ 1. If αB(X) = 1, X

is a definable set regarding the attributes B, that is, X is crisp set. If αB(X) < 1, X is a

rough set regarding the attributes B.

Quality Coefficient of upper and lower approximation [104]

Quality Coefficient of upper approximation αB (B*(X)) is define as the percentage of the

elements that possibly are classified as belonging to X, to all the elements in A and is

given by

αB(B*(X)) = | B*(X)| / |A| (2)

Quality Coefficient of lower approximation αB(B”(X)) is define as the percentage of all

the elements that certainly are classified as belonging to X to all the elements in A and is

given by

αB(B”(X))= |B”(X)| / |A| (3)

|A| represents the cardinality of the given set A of objects.

The two coefficients of quality of approximation are:

On applying these definitions to the dataset under consideration, we find that imprecision

coefficient

57

• For the patients with possibility of Heart Problem αB(X) = 8/10 = 0.80.

• For the patients with possibility of no Heart Problem αB(X) = 5/7 = 0.71.

Quality Coefficient of upper and lower approximation, using Eq. 2 and 3:

• αB(B*(X)) = 10/15, for the patients that have the possibility of Heart Problem.

• αB(B*(X)) = 7/15, for the patients that have the possibility of no Heart Problem.

• αB(B”(X)) = 8/15, for the patients that certainly have Heart Problem.

• αB(B”(X)) = 5/15, for the patients that certainly do not have Heart Problem.

Observations:

1. Patients certainly with heart problem: αB(B”(X)) = 8/15, that is, 53% of patients

certainly have Heart problem.

2. Patients that certainly do not have heart problem: αB(B”(X)) = 5/15, that is,

approximately 33% of patients certainly don't have heart problem.

3. Patients possibly with heart problem: αB(B*(X)) = 10/15, that is, approximately 66%

of patients have the possibility of heart problem.

4. Patients that possibly do not have heart problem: αB(B*(X)) = 7/15, that is, 46% of

the patients do not have the possibility of heart problem.

5. 13% of patients (P8 and P9) cannot be classified as either with heart problem or with

no heart problem.

3.7 Dependency of Patient Dataset

For the analysis of data, it is always important to discover the dependencies between

various attributes. In rough set data analysis also, it is important to discover dependencies

between condition and decision attributes. Intuitively, the set of decision attributes D

depends totally on a set of condition attributes C, and is denoted by C D, provided all

values of attributes of D are uniquely determined by values of attributes of C. In other

words, D depends totally on C, provided there exists a functional dependency between

values of D and C. Formally dependency can be defined in the following way. Let D and

C be the subsets of A. We will say that D depends on C with a degree k (0 k 1),

denoted as C kD, where k = (C, D) [131] (refer to 3.2.12 page 39).

58

γ(C,D) = |POSc(D)| / |U|

In patient dataset we have 8 elements in lower approximation for patients that have heart

problem and 5 elements in lower approximation for patients that do not have heart

problem and the total element in lower approximation is 13 then the dependency

coefficient is calculated as

γ(C,D) = 13/15 = 0.86

So D depends partially (with a degree k=0.86) on C.

3.8 Reduction of Attributes of Patient Dataset

For many applications, it is often necessary to maintain a concise form of the information

system. In every dataset there exists data that can be ignored, without altering the basic

properties and more importantly the consistency of the system. The form in which the

data is presented within an information system should make sure that the redundancy is

avoided as it implicates the minimization of computational complexity in relation to the

creation of rules to extract knowledge. After removing some of the attributes from the

given dataset, if the resultant data set is equivalent to the original dataset in relation to

approximation and dependency, it will be considered as a reduct, since it has the same

approximation precision and same dependency degree as that of the original set of

attributes, however with one fundamental difference, the set of attributes to be considered

will be fewer [104].

The process of reducing an information system such that the set of attributes of the

reduced information system is independent and if no further elimination of attribute can

be done without loosing some information from the system, and then the resultant is

known as a reduct. If an attribute from the subset B A preserves the indiscernibility

relation RA, then the attributes A - B are dispensable. Reducts are such minimal subsets

that do not contain any dispensable attributes. Therefore, the reduct should have the

capacity to classify objects, without altering the form of representation of knowledge.

When the above definition is applied to the information system presented in the table 3.1,

B is a subset of A and the attribute ‘a’ belongs to B such that

• The attribute ‘a’ is dispensable in B if I (B) = I (B - {a}); otherwise ‘a’ is indispensable

attribute in B;

59

• Set B is independent if all of its attributes are indispensable;

• A subset B' of B is a reduct of B if B' is independent and I (B') = I (B);

So a reduct is a set of attributes that preserves the basic characteristics of the original data

set, and the attributes that do not belong to the reduct are superfluous with regard to

classification of elements of the universe.

For finding the reduct of table 3.1 first of all remove inconsistent or inconclusive data

from table 3.1. Data is called inconsistent or inconclusive when all the condition

attributes values are same but they lead to different decision. For patient8 and patient9 in

table 3.1, all the condition attribute values are the same but decision attribute value is

different for them. For patient8 decision value is ‘Yes’ whereas for patient9 decision

value is ‘No’.

After removing Patient8 and Patient9 from Table 3.1 only the consistent part of table is

left as given in table 3.11.

Finding all the reducts in the table is NP Hard problem. Since there can be 2n

-1 reducts

of the dataset where n is the number of attributes. For the above table there may be 127

reducts since there are seven attributes in the dataset. Some of the reducts of table 3.11

have been found as under by using Rose2 software [123]. We have three reduct sets for

patient information system as shown in table 3.12.

60


Problem

Decision














Table 3.11 Consistent Part of Patient Dataset

No Reduct Sets

1 {Blood Pressure, Chest Pain, Shortness of Breath, Cholesterol}

2 {Heart Palpitation, Chest Pain, Cholesterol}

3 {Blood Pressure, Chest Pain, Fatigue, Cholesterol}

Table 3.12 Multiple Reducts for the Patient Data Set

For example, a reduct can be a set of condition attributes containing {Blood Pressure,

Chest Pain, Shortness of Breath, Cholesterol} of a patient. With this reduct, the entire 15

61

patient can be correctly classified completely (according to their Heart Problem). A

subset of {Blood Pressure, Chest Pain} is not a reduct of this patient data, because with

only these two attributes one cannot fully classify all the patients of the data set. In

addition, there exists redundancy and contradictions. For example, in table 3.11, with a

subset of {Blood Pressure, Chest Pain}, we cannot correctly classify patient12 and

patient14 as they both have the same values for Blood Pressure and Chest Pain attribute

but different values for Heart Problem decision attribute that is they have different

decisions.

A reduct is often used in the attribute selection process to reduce the number of attributes

by eliminating superfluous attributes towards decision making. We can generate rules

using any of the reducts of table 3.12, for example a patient with Blood Pressure = High

and Chest Pain = High and with Shortness of Breath = Low and Cholesterol = High will

have heart problem. Core attributes form the essential information in a data set and are

contained in all the reducts. From table 3.12, we can see the intersection of all the reducts

is as follows.

Core = {Chest pain, Cholesterol}

Hence Chest Pain and Cholesterol represents the indispensable attributes of the patient

dataset. We can’t remove any of these condition attributes from patient dataset because it

adversely affects our classification.

3.8 Reduct Based Rule Generation

An association rule [6] is a rule of the form α → β, where α and β represent item sets

which do not share common items. The association rule α → β holds in the transaction set

D with confidence c if c% of transactions in D that contain α also contain β. Confidence

can be represented as c = αUβ ⁄α. The rule α → β has support s in the transaction set D if

s% of transactions in D contains α U β. Support can be represented as s = αUβ / D. Here,

we call α antecedent, and β consequent. Confidence gives a ratio of the number of

transactions that the antecedent and the consequent appear together to the number of

transactions the antecedent appears. Support measures how often the antecedent and the

consequent appear together in the transaction set. [137].

62

A decision rule is always in the form of φ→ψ, read as if φ then ψ where φ contains the

condition attributes and ψ contains the decision attributes. Φ and ψ are considered as

condition and decision attributes. We can generate rules from any of the reduct from the

reduct set represented in table 3.12; since the reduct set are the true representations of the

whole dataset. We choose reduct2 of the reduct set i.e. {Heart Palpitation, Chest Pain,

Cholesterol} for generating reduct rules. The rules generated by reduct set are

comparatively lesser in number and are the true representative of the domain. The rules

generated by reduct are called ‘Reduct Rules’ and decision based on these rules are

generally more precise and accurate.

Patient H.P. C.P. Cho Heart

Problem

Decision

P1 High High High Yes

P2 Low High Low No

P3 Low Low Low No

P4 V_high V_high High Yes

P5 High V_high High Yes

P6 Low Low Low No



P11 V_high High High Yes


P13 High Low Low No

P14 Low High Low No


Table 3.13 Reduct of Patient Dataset

63

The first step towards the ‘Reduct Rule’ generation is removal of redundancy. For rule

generation the emphasis is on the reduction of the table 3.13 by removing the duplicate

rows, because duplicate rows or cases do not add to any information. The cases which

have equivalent information for all the condition and decision attributes are called

duplicate rows. For patient P3 and P6 in table 3.13 all the decision and condition

attribute values are the same. We can eliminate any one of the patient P3 or P6 from table

3.13. Similarly for patient P5 and P7 all the condition and decision attribute values are

the same. So, we can remove patient P7 from the table further the values of condition and

decision attribute for patient P4 and P10 are the same. So P10 can be eliminated from

table 3.13. Likewise P11 and P12, P2 and P14, P5 and P15 are also same as per their

condition and decision attribute values. So eliminating P6, P7, P10, P12, P14 and P15

from table 3.13 we are left with table 3.14.


Problem

Decision


P2 Low High Low No

P3 Low Low Low No




P13 High Low Low No

Table 3.14 Reduced Table

Rearranging table 3.14 according to Decision attribute values, we get table 3.15.

64


Problem

Decision





P2 Low High Low No

P3 Low Low Low No

P13 High Low Low No

Table 3.15 Rearranging the Reduced Table as per Decision Values

The next step towards the removal of redundancy or reduction is to analyze each

condition attribute one by one independently with decision attribute. Considering the

condition attribute Heart Palpitation with decision attribute i.e. Heart Problem of table

3.13.

Patient H.P. Heart

Problem

Decision

P1 High Yes

P4 V_high Yes

P5 High Yes

P11 V_high Yes

P2 Low No

P3 Low No

P13 High No

Table 3.16 Heart Palpitation with Decision

65

Further reduction in table 3.16 is possible because the values of condition and decision

attributes for patients P1 and P5 are the same. Likewise P4 and P11, P2 and P3 are same.

Removing P3, P5 and P11 from the table 3.16 result is table 3.17.

Patient H.P. Heart

Problem

Decision

P1 High Yes

P4 V_high Yes

P2 Low No

P13 High No

Table 3.17 Reduced Heart Palpitation with Decision

Now considering the condition attribute Chest Pain with decision attribute Heart Problem

and removing duplicate cases that is P5, P11 and P13 a new table 3.18 is formed.

Patient C.P. Heart

Problem

Decision

P1 High Yes

P4 V_high Yes

P2 High No

P3 Low No

Table 3.18 Reduced Chest Pain with Decision

Similarly considering the condition attribute Cholesterol with decision attribute Heart

Problem and removing the duplicate rows of P3, P4, P5 and P11, P13 we are left with the

table 3.19.

66

Patient Cho Heart

Problem

Decision

P1 High Yes

P2 Low No

Table 3.19 Reduced Cholesterol with Decision

In the next step taking the intersection (Verification of equivalent) of data records in

tables 3.17, 3.18 and 3.19 and joining the data values for various attributes, we get table

3.20, where data records is the element of reduct information in relation to Table 3.1.


Problem

Decision


P2 Low High Low No

Table 3.20 Joining of Information

The next step of the analysis is to consider the condition attributes in pairs with decision

attribute and then eliminating the duplicate rows. Considering condition attributes Heart

Palpitation and Chest Pain of table 3.15 with decision attribute Heart Problem and

analyzing the table.

67

Patient H.P. C.P. Heart

Problem

Decision

P1 High High Yes

P4 V_high V_high Yes

P5 High V_high Yes

P11 V_high High Yes

P2 Low High No

P3 Low Low No

P13 High Low No

Table 3.21 Heart Palpitation, Chest pain with Decision

No further reduction in the above table is possible because all the rows or cases are

different.

Considering condition attribute Heart Palpitation and Cholesterol of table 3.15 with the

decision attribute we get table 3.22. Reduction in this table is possible.

Patient H.P. Cholesterol Heart

Problem

Decision

P1 High High Yes

P4 V_high High Yes

P5 High High Yes

P11 V_high High Yes

P2 Low Low No

P3 Low Low No

P13 High Low No

Table 3.22 Heart Palpitation, Cholesterol with Decision

68

After removing duplicate cases of P3, P5 and P11 from table 3.22, a new table 3.23 is

formed.

Patient H.P. Cho Heart

Problem

Decision

P1 High High Yes

P4 V_high High Yes

P2 Low Low No

P13 High Low No

Table 3.23 Reduced Heart Palpitation, Cholesterol with Decision

Considering condition attributes Chest Pain and Cholesterol with decision attribute Heart

Problem of table 3.15 and getting table 3.24.

Patient C.P. Cho Heart

Problem

Decision

P1 High High Yes

P4 V_high High Yes

P5 V_high High Yes

P11 High High Yes

P2 High Low No

P3 Low Low No

P13 Low Low No

Table 3.24 Chest Pain, Cholesterol with Decision

69

In the above table there are three duplicate cases i.e. P5, P11 and P13. Removing the

redundancy from the above table we are left with the following table 3.25.

Patient C.P. Cho Heart

Problem

Decision

P1 High High Yes

P4 V_high High Yes

P2 High Low No

P3 Low Low No

Table 3.25 Reduced Chest Pain, Cholesterol with Decision

Taking the intersection of the records in reduced tables 3.21, 3.23 and 3.25 will lead us to

the table 3.26.


Problem

Decision



P2 Low High Low No

Table 3.26 Join of Information

In the next step taking the intersection of records in table 3.20 and table 3.26 we get the

table 3.27 that provides the minimum number of significant rules.

70


Problem

Decision


P2 Low High Low No

Table 3.27 Rule Generation Table

Decision rules are often presented as implications and are called ”if... then...” rules. A set

of decision rules is called a decision algorithm. Thus with each decision table we can

associate a decision algorithm consisting of all decision rules occurring in the decision

tables. We must however, make distinction between decision tables and decision

algorithms. A decision table is a collection of data, whereas a decision algorithm is a

collection of implications, or logical expressions. To deal with data we use various

mathematical methods, such as statistics [78], but to analyze implications we must

employ logical tools [103]. Thus these two approaches are not equivalent; however for

simplicity we will often present here decision rules in form of implications, without

referring deeper to their logical nature, as it is often practiced in AI.

Decision Rules according to table 3.27 are

Rule 1

If (Heart Palpitation = High) and (Chest Pain = High) and (Cholesterol = High)

Then Heart Problem = Yes

Rule 2

If (Heart Palpitation = Low) and (Chest Pain = High) and (Cholesterol = low)

Then Heart Problem = No

These rules are most significant rules and are the true representative of the patient dataset.

These rules are sufficient to define the patient dataset.

71

Rules generated by Rose2 [123] software for patient dataset is shown in fig. 3.2.

Fig. 3.2 Rules Generated by Rose2 Software for Patient Dataset

The number of rules generated by our method is less than the rules generated by Rose2

software. In this chapter we have demonstrated the use of Rough Set Theory for the

generation of rules with minimum number of condition attributes that provides the sound

inference. The rule set may not be complete as it may not define the decision for certain

sets of values for the attributes.

72

But the aim here is to find the minimum number of rules that could be the representative

of knowledge provided by the dataset, and must be sound in nature as per AI

nomenclature. The rules generated by reduct based method are validated. These rules

proved to be sound in nature.

The numbers of rules generated by the proposed method are relatively less in number as

compared to the rules generated by the Rose2 software available in literature.

Documents

ROUGH SET THEORY AND REDUCT BASED RULE GENERATIONshodhganga.inflibnet.ac.in/bitstream/10603/22715/11... · ROUGH SET THEORY AND REDUCT BASED RULE GENERATION 3.1 Introduction Rough