An Overview of Rough Set Theory from the Point of View of

An Overview of Rough Set Theory from the Point ofView of Relational Databases

T.Y. [email protected] Berkeley Initiative in Soft Computing,

Department of Electrical Engineering and Computer Science,University of California, Berkeley, California 94720

Abstract Both relational database theory (RDB) and rough set theory (RS) are formaltheories derived from the study of tables. However, they were based on different philosophiesand went on to different directions. RDB assumes semantics of data is known and focuseson organizing data through its semantics. RS, in this table format, assumes data semantics isdefined by the given data and focuses on discovering patterns, rules and data semantics throughthose available data - a data mining theory. In this paper, fundamental notions of two theoriesare compared and summarized. Further, RS can also take abstract format. In this format itcan be used to analyze imprecise, uncertain or incomplete information in data. It is a new settheory complimentary to fuzzy set theory. In this paper, this aspect is only lightly touched.Keywords: Rough sets,Relational Databases.

1 Introduction

Rough set theory (RS)is a formal theory derived from fundamental research on logical propertiesof information tables, also known as (Pawlak) information systems or knowledge representationsystems, in Polish Academy of Sciences and the University of Warsaw, Poland around 1970’s.Developed independently and differently from relational databases, RS in the table format isanother theory on extensional relational databases (ERDB) - snap shots of relational databases.However unlike usual ERDB focusing on storing and retrieving data, RS focuses on discoveringpatterns, rules and knowledge in data - a modern data mining theory. Fundamentally RS andERDB are very different theory, even though the entities, namely tables, of their respectivestudies are the same. In this paper we compare and summarize their fundamental notions.Further RS in the abstract format can be used to analyze imprecise, uncertain or incompleteinformation in data - a new set theory complimentary to fuzzy set theory. In this paper, thisaspect is only lightly touched.

1

2 Information Table

The syntax of information tables in RS is very similar to relations in RDB. Entities in RS arealso represented by tuples of attribute values. However, the representation may not be faithful,namely, entities and tuples may not be one to one correspondence.A relation R consists of

1. U = x,y.... is a set of entities.

2. T is a set of attributes A1, A2,... An.

3. Dom(Ai) is the set of values of attribute Ai..Dom = dom(A1)⋃

dom(A2)⋃

...⋃

dom(An),

4. Each entity in U is represented uniquely by a map t : T → Dom, where t(A) ε dom(Ai)for each Ai ε T.

Informally, one can view relation as a table consists of rows of elements. Each row representsan entity uniquely.

An information table (also known as information system, knowledge representation system)consists of

1. U = u, v,..is a set of entities.

2. T is a set of attributes A1, A2,...An.

3. Dom(Ai) is the set of values of attribute Ai. Dom = dom(A1)* dom(A2)* ...* dom(An),

4. ρ : U x T → Dom , called description function, is a map such that ρ(u, Ai) is in dom(Ai)for all u in U and Ai in T.

Note that ρ induces a set of mapst= ρ(u, •) : T → Dom .Each map is a tuple:

t=(ρ(u, A1), ρ(u, A2),....,ρ(u, Ai),..ρ(u, An))Note that the tuple t is not necessarily associated with entity uniquely. In an information

table, two distinct entities could have the same tuple representation, which is not permissiblein relational databases.A decision table(DT ) is an information table (U, T, V, ρ) in which the attribute set T = C⋃

D is a union of two non-empty sets, C and D, of attributes. The elements in C are calledconditional attributes. The elements in D are called decision attributes

2

3 Knowledge Dependencies

Mathematically, a classifications or partition is a decomposition of the domain U of interestinto mutually disjoint subsets, called equivalence classes. Such a partition defines and is definedby a binary relation, called an equivalence relation that is a reflexive, symmetric and transitivebinary relation. In this section, we will compare the functional dependencies in RDB withPawlak theory of equivalence relations derived from the structure of information tables

3.1 Attributes and Equivalence Relations

In an information table, any subset of attributes induces an equivalence relation as follows: LetB be a non empty subset of T. Two entities u, v are equivalent (or indiscernible by B) in U,denoted by

u ∼= v (mod B) if ρ(u, Ai) = ρ(v,Ai) for every attribute Ai in B

It is not difficult to verify that ∼= is indeed an equivalence relation; ∼= is called indiscerni-bility relation and denoted by IND(B). We will denote the equivalence class containing u by[u]IND(B) or simply by [u]B. Note that B can be a singleton {Ai}, in this case, we simplydenoted by IND(Ai). It is easy to see that the following is valid:

IND(B) = ∩{IND(Ai) : Ai in B}

As observed earlier the equivalence classes of IND(B) consists of all possible intersectionsof equivalence classes of IND(Ai)’s.A set B of attributes names gives us a finite collection of equivalence relations,

RCol(B) = {IND(Ai) : Ai in B}

In particular, the set of all attributes give us the following set of equivalence relations,

RCol(T ) = {IND(A1), IND(A2), ...., IND(Ai, ..., IND(An)}

3.2 Pawlak Knowledge Bases

Let RCol be a collection of equivalence relations, often a finite collection, over U. An orderedpair

K = (U,RCol)

is called Pawlak knowledge base [?]. Pawlak calls a collection of equivalence relations a knowl-edge, because our knowledge about a domain is often represented by its classifications.Let P and Q be two equivalence relations or classifications in K. If every Q-equivalence class isa union of P-equivalence classes, then we say that Q is depended on P, Q is coarse than P, or Pis finer than Q. The dependency is called knowledge dependency (KD) [?]. It is easy to verifythat the intersection P ∩ Q of two equivalence relations is another equivalence relation whose

3

partition consists of all possible intersections of P- and Q- equivalence classes. More gener-ally, the intersection of all the equivalence relations in Rcol, denoted by IND(RCol), is anotherequivalence relation. IND(RCol) is referred to as the indiscernibility relation over RCol. LetPCol and QCol be two sub-collections of RCol. Following Pawlak, we define the following [?]:

1. QCol depends on PCol iff IND(QCol) is coarse than IND(PCol).This dependency is denoted by PCol ⇒ QCol.

2. PCol and QCol are equivalent iff PCol ⇒ QCol, and QCol ⇒ PCol.It is easy to see that PCol and QCol are equivalent iff IND(PCol)=IND(QCol).

3. PCol and QCol are independent iff neither PCol ⇒ QCol, nor QCol ⇒ PCol

Now we can treat an information table as a Pawlak knowledge base (U, RCol(T)) and applythe notion of knowledge dependencies to information tables. Though Pawlak knowledge basesappear to be an abstract notion, it is in fact a very concrete object.

Proposition. There is an one-to-one correspondence between information tables andPawlak knowledge bases.

3.3 Knowledge and functional Dependencies

In relational database, a finite collection of attribute names

A1, A2, A2, ...., Ai, ...., , An

is called a relational scheme [?]. A relation instance, or simply relation, R on relation schemeR is a finite set of tuples,

t = (t1, t2, ..., tn)

as defined in Section 2. A functional dependency uniquely determine the value of another setof attributes. Formally, let X and Y be two subsets of T. A relation R satisfies an extensionalfunction dependency EFD: X → Y if for very X-value there is a uniquely determined Y-value in the relation instance R. An intensional function dependency FD: X → Y exists onrelation scheme R, if FD satisfied by all relation instances R of the scheme R. In databasecommunity, FD always refers to intensional FD. One should note that at any given moment, arelation instance may satisfy some family of extension functional dependencies EFDs however,the same family may not be satisfied by other relation instances. The family that is satisfiedby all the relation instances is the intensional functional dependency FD. In this paper, we willbe interested in the extensional functional dependency, so the notation ”X → Y ” is an EFD.As pointed out earlier that attributes induce equivalence relations on information tables. Soan EFD can be interpreted as a knowledge dependency (KD). The following proposition isimmediate from the definitions. Proposition. An EFD,X → Y , between two sets of attributesX and Y is equivalent to a KD, RCol(X) ⇒ RCol(Y ), of the equivalence relations induced bythe attributes X and Y.

4

4 Decision Tables and Decision Rules

Rough set theory is an effective methodology to extract rules from information tables, moreprecisely, from decision tables [?]. In this section we will introduce some fundamental conceptsand illustrate the procedure by an example.

4.1 Reducts and Candidate Keys

In RDB, an attribute, or a set of attributes K is called candidate key if all attributes is function-ally depended on K, and K is such a minimal set. We export these notions to extensional world.The ”extensional candidate key” is a special form of reduct [?]. Let S = (U, T = CD, V, ρ) bea decision table, where

C = A1, A2, ..., Ai, ..., An,

D = B1, B2, ..., Bi, ..., Bm.

Then there are two Pawlak knowledge bases on U:

RCol(C) = IND(A1), IND(A2), ..., IND(Ai), ..., IND(An)

RCol(D) = IND(B1), IND(B2), ..., IND(Bi), ..., IND(Bm)

S is a consistent decision table, if RCol(C) ⇒ RCol(D). B is called a reduct of S, if B isa minimal subset of C such that RCol(B) ⇒ RCol(D). It is clear such a B is not necessarilyunique.If we choose D=T, then the reduct is the extensional candidate key.

4.2 Decision Rules and Value Reducts

Rough set theory is an effective methodology to extract rules from information tables (IT),more precisely, from decision tables (DT); each IT can be viewed as many DT’s. Each entity uin a DT can be interpreted as a decision rule. Let X → Y be an EFD (or equivalently a KD),that is, for any X-value c, there is a unique Y-value d. We can rephrase it as a decision rule:

Ift(X) = c, thent(Y ) = d.

Rough set theorists are interested in simplified these decision rules [?, Pawlak91]

4.3 Illustration

We will illustrate, without losing the general idea from the following table:

5

U Location TEST NEW CASE RESULTID-1 Houston 10 92 03 10ID-2 San Jose 10 92 03 10ID-3 Palo Alto 10 90 02 10ID-4 Berkeley 11 91 04 50ID-5 New York 11 91 04 50ID-6 Atlanta 20 93 70 99ID-7 Chicago 20 93 70 99ID-8 Baltimore 20 93 70 99ID-9 Seattle 20 93 70 99

ID-10 Chicago 51 95 70 94ID-11 Chicago 51 95 70 95

Two rows below the bold double line will be cut off.

4.3.1

Select a DT: We will consider a decision table from the table given above: U is the universe,RESULT is the decision attribute, and C=TEST, NEW, CASE is the set of conditional at-tributes.

4.3.2

Split DT: ID-10 and ID-10 form two inconsistent rules. So we split that tables. One consistsof entities, ID-1 to ID-9, called consistent table, and another one is ID-10 and ID-11 is calledtotally inconsistent table. From now on the term in this section ”decision table” is referred tothe consistent table.

4.3.3

Decision Classes: Using notation of Section 3.1, the equivalence relation IND(RESULT) classi-fies entities into three equivalence classes, called decision classes

DECISION1={ID-1,ID-2,ID-3}=[10]RESULT ,DECISION2={ID-4,ID-5}=[50]RESULT ,

DECISION3={ID-6,ID-7,ID-8,ID-9}=[99]RESULT

4.3.4

Condition Classes: Let C = {TEST, NEW, CASE} be the conditional attributes. The equiva-lence relation IND(C) classifies entities into four equivalence classes, called condition classes

6

CASE1={ID-1, ID-2}, CASE2={ID-3},CASE3={ID-4, ID-5}, CASE4={ID-6, ID-7,ID-8,ID-9}

4.3.5

Knowledge dependencies: It is not difficult to verify that entities are indiscernible by conditionalattributes are also indiscernible by decision attributes, namely we have following inclusions

CASE1 ⊆ DECISION1; CASE2 ⊆ DECISION1;CASE3 ⊆ DECISION2; CASE4 ⊆ DECISION3;

These inclusions implies that the equivalence relation IND (RESULT) depends on IND(C).Or equivalently, RESULT are KD on C.

4.3.6

Inference Rules: The relationship induces inference rules:

1. If TEST= 10 , NEW= 92, CASE= 03, then RESULT= 10,



4. If TEST= 20 , NEW= 93, CASE= 70, then RESULT= 99.

One can rewrite these four rules into one universal inference rule ” (TEST, NEW, CASE)implies RESULT.”

4.3.7

Reducts: It is clear that TEST, NEW and Case are two reducts. So the conditions on CASEcan be deleted from the rules given above.

4.3.8

Value Reducts: Note that the rule 1 is derived from the inclusion CASE1 DECISION1. Wecan sharpen the description of CASE1 by the condition TEST=10 alone, called value reducts,i.e.,

CASE1={u : u.TEST = 10}By similar arguments, we can simplify the four rules to:

1. If TEST= 10 , then RESULT= 10,

7

2. If TEST= 11 ,then RESULT= 50,

3. If TEST= 20 ,then RESULT= 99.

We will have another set of rules, if the other reduct is used.

5 My Example 1


U City a b c d1 H 30 112 23 302 S 30 112 23 303 P 30 110 22 304 B 31 111 24 705 N 31 111 24 706 A 40 113 90 1197 C 40 113 90 1198 Ba 40 113 90 1199 Se 40 113 90 119

10 C 71 115 90 11411 C 71 115 90 115


5.0.9

Select a DT: We will consider a decision table from the table given above: U is the universe, dis the decision attribute, and C=a, b, c is the set of conditional attributes.

5.0.10

Split DT: 10 and 11 form two inconsistent rules. So we split that tables. One consists of entities,1 to 9, called consistent table, and another one is 10 and 11 is called totally inconsistent table.From now on the term in this section ”decision table” is referred to the consistent table.

5.0.11

Decision Classes: Using notation of Section 3.1, the equivalence relation IND(d) classifies enti-ties into three equivalence classes, called decision classes

DECISION1={1,2,3}=[30]d,DECISION2={4,5}=[70]d,

DECISION3={6,7,8,9}=[119]d

8

5.0.12

Condition Classes: Let C = {a, b, c} be the conditional attributes. The equivalence relationIND(C) classifies entities into four equivalence classes, called condition classes

CASE1={1, 2}, CASE2={3},CASE3={4, 5}, CASE4={6, 7,8,9}

5.0.13



These inclusions implies that the equivalence relation IND (d) depends on IND(C). Orequivalently, RESULT are KD on C.

5.0.14


1. If a= 30 , b= 112, c= 23, then d= 30,

2. If a= 30 , b= 110, c= 22, then d= 30,

3. If a= 31 , b= 111, c= 24, then d= 70,

4. If a= 40 , b= 113, c= 90, then d= 119.

One can rewrite these four rules into one universal inference rule ” (a, b, c) implies d.”

5.0.15

Reducts: It is clear that a, b and c are two reducts. So the conditions on c can be deleted fromthe rules given above.

5.0.16


CASE1={u : u.a = 30}

9

By similar arguments, we can simplify the four rules to:

1. If a= 30 , then d= 30,

2. If a= 31 ,then d= 70,

3. If a= 40 ,then d= 119.


6 My Example 2


U Area a b c d1 t 47 100 89 902 tt 47 100 89 903 ttt 47 120 99 904 g 59 78 98 1005 gg 59 78 98 1006 ggg 23 66 80 1707 e 23 66 80 170

8 ee 14 38 72 759 eee 14 38 72 76

10 r 23 66 80 17011 rr 23 66 80 170


6.0.17


6.0.18

Split DT: 8 and 9 form two inconsistent rules. So we split that tables. One consists of entities,1 to 7,10,11 called consistent table, and another one is 8 and 9 is called totally inconsistenttable. From now on the term in this section ”decision table” is referred to the consistent table.

6.0.19


10

DECISION1={1,2,3}=[90]d,DECISION2={4,5}=[100]d,

DECISION3={6,7,10,11}=[170]d

6.0.20

Condition Classes: Let C = {a, b, c} be the conditional attributes. The equivalence relationIND(C) classifies entities into four equivalence classes, called condition classes

CASE1={1, 2}, CASE2={3},CASE3={4, 5}, CASE4={6,7} ,CASE5={10,11}

6.0.21



CASE4 ⊆ DECISION3;


6.0.22


1. If a= 47 , b= 100, c= 89, then d= 90,

2. If a= 47 , b= 120, c= 99, then d= 90,

3. If a= 59 , b= 78, c= 98, then d= 100,

4. If a= 23 , b= 66, c= 80, then d= 170.

One can rewrite these four rules into one universal inference rule ” (a, b, c) implies d.”

6.0.23


11

6.0.24


CASE1={u : u.a = 47}By similar arguments, we can simplify the four rules to:

1. If a= 47 , then d= 90,

2. If a= 59 ,then d= 100,

3. If a= 23 ,then d= 170.


7 My Example 3


U Food a b d

1 x 10 20 302 y 10 20 31

3 z 40 50 604 x1 44 54 645 y1 40 50 60


7.0.25


7.0.26

Split DT: 1 and 2 form two inconsistent rules. So we split that tables. One consists of entities,3,4,5 are called consistent table, and another one is 1 and 2 is called totally inconsistent table.From now on the term in this section ”decision table” is referred to the consistent table.

12

7.0.27


DECISION1={3,5}=[60]d,DECISION2={4}=[64]d,

7.0.28

Condition Classes: Let C = {a, b} be the conditional attributes. The equivalence relationIND(C) classifies entities into four equivalence classes, called condition classes

CASE1={3,5}, CASE2={4}

7.0.29


CASE1 ⊆ DECISION1; CASE2 ⊆ DECISION2;


7.0.30


1. If a= 40 , b= 50, then d= 89,

2. If a= 44 , b= 54, then d= 64,

One can rewrite these four rules into one universal inference rule ” (a, b) implies d.”

7.0.31


13

7.0.32



1. If a= 40 , then d= 90,

2. If a= 44 ,then d= 64,


8 My Example 4


U Article a b d

1 Sports 30 50 70

2 News 60 90 1203 Ad 60 90 122

4 Weather 90 100 1305 Cooking 30 50 70


8.0.33


8.0.34

Split DT: 2 and 3 form two inconsistent rules. So we split that tables. One consists of entities,1,4,5 are called consistent table, and another one is 2 and 3 is called totally inconsistent table.From now on the term in this section ”decision table” is referred to the consistent table.

8.0.35


DECISION1={1,5}=[70]d,DECISION2={4}=[130]d,

14

8.0.36


CASE1={1,5}, CASE2={4}

8.0.37




8.0.38


1. If a= 30 , b= 50, then d= 70,

2. If a= 90 , b= 100, then d= 130,


8.0.39


8.0.40



1. If a= 30 , then d= 70,

2. If a= 90 ,then d= 130,


15

9 My Example 5


U Sales a b d

1 Furniture 100 100 2002 Groceries 100 100 2013 Reading 100 100 202

4 Gardening 60 70 805 Vehicle 81 91 100


9.0.41


9.0.42

Split DT: 1,2 and 3 form three inconsistent rules. So we split that tables. One consists ofentities, 4,5 are called consistent table, and another one is 1,2, and 3 is called totally inconsistenttable. From now on the term in this section ”decision table” is referred to the consistent table.

9.0.43


DECISION1={4}=[80]d,DECISION2={5}=[100]d,

9.0.44


CASE1={4}, CASE2={5}

9.0.45



16


9.0.46


1. If a= 60 , b= 70, then d= 80,

2. If a= 81 , b= 91, then d= 100,


9.0.47

Reducts: It is clear that a, b is the reduct. So the conditions on c can be deleted from the rulesgiven above.

9.0.48



1. If a= 60 , then d= 80,

2. If a= 81 ,then d= 100,


10 My Example 6


U a b c d e1 1 0 0 1 12 1 0 0 0 13 0 0 0 0 04 1 1 0 1 05 1 1 0 2 26 2 1 0 2 27 2 1 0 2 28 2 2 2 2 2

17

10.0.49

Select a DT: We will consider a decision table from the table given above: U is the universe, eis the decision attribute, and C=a, b, c,d is the set of conditional attributes.

U a b d e1 1 0 1 12 1 0 0 13 0 0 0 04 1 1 1 05 1 1 2 26 2 1 2 27 2 1 2 28 2 2 2 2

10.0.50

Split DT: Drop attribute since it is e-dispensable. So we split that tables. One consists ofentities, 3,4, called consistent table, and another one is 1,2,5,6,7 is called totally inconsistenttable. From now on the term in this section ”decision table” is referred to the consistent table.

10.0.51


DECISION1={1,2,}=[1]e,DECISION2={3,4}=[0]e,DECISION3={5,6,7}=[2]d

10.0.52

Condition Classes: Let C = {a, b, d} be the conditional attributes. The equivalence relationIND(C) classifies entities into four equivalence classes, called condition classes

CASE1={1}, CASE2={2},CASE3={3}, CASE4={4} CASE5={5}, CASE6={6},

CASE7={7}, CASE8={8}

10.0.53


18



10.0.54


1. If a= 1 , b= 1, d= 1, then e= 0,

One can rewrite these four rules into one universal inference rule ” (a, b,d) implies e.”

10.0.55


10.0.56


CASE1={u : u.a = 1}By similar arguments, we can simplify the following rule to:

1. If a= 1 , then e= 0,

The core value of b(1) =0We have two value reducts b(1)=0 and d(1)=1 or a(1)=0 and b(1)=0We will have another set of rules, if the other reduct is used.

11 Partial Dependencies

The table given above is assuming that data have no noise. In practices, noise are unavoidable.So we will examine next the partial dependency. As explained, D and C may induce two distinctpartitions of the universe. Consider the problem of approximating equivalence classes of IN(D)using the equivalence classes of IND). For an equivalent class [X]D, the lower approximation isgiven by:

C L(X) = {u|[u]c ⊂ C} = ∪{[u]c|[u]c ⊂ C}

The upper approximation of X is defined as,

19

C H(X) = {u|[u]c ∩ C 6= φ} = ∪{[u]c|[u]c ∩ C 6= φ}

With the lower approximations of every equivalence class of D, the C-positive region of D,denoted by POSC(D), is defined by:

POSC(D) = ∪{C L(X) :X are all decision classes}.

The number k = |POSC(D)|/|U |is called degree of dependency of D on C, where | ” | denotes the cardinality of a set. If degree

=1, then we have the knowledge dependency. By applying k-dependency, one can convert theprevious example into an example of approximate rules or soft rules [Lin93], [Ziarko93], [Lin96c].In RDB, there are no such a concept of partial dependencies.

12 Conclusion

In this paper, we use relational databases to explain some aspects of rough set theory. Roughset theory, in its table form, has been an effective method in rule mining for small to mediansize data. The theory and methodology has been extended to very large databases [?]. Someapplications to intelligent control design have been examined [?], [?], [?], [?]. RS, in its abstractformat, is a new set theory complimentary to fuzzy theory [?]. Its approximation aspectis closely related to modal logic and (pre-)topological spaces (neighborhood systems) [?],[?].Rough set theory provides a new point of views to the theory of relation databases. We believesome profound applications in databases through this view may soon be flourished.Tsau Young (T. Y.) Lin received his Ph.D from Yale University, and now is a Professor at SanJose State University and Visiting Scholar at BISC, University of California-Berkeley. He isthe president of International Rough Set Society. He has been the chair and a member of theprogram committee of various conferences and workshops. He is also serving as an associateeditor and member of editorial board of several international journals. He is interesting inapproximate retrieval, approximate reasoning, data mining, data security, fuzzy sets, intelligentcontrol, Petri nets, and rough sets (alphabetical order).

Glossary

1.Rough Set Theory(RS):It assumes data semantics is defined by the given data and focuses on discoveringpatterns,rules,and data semantics through those available data.KnowledgeEngineering

2.Relational Database Theory(RDB):It assumes semantics of data is known and focuses on organizing datathrough its semantics.Data Engineering.

20

3.Data Mining Theory:The process of analyzing data from different perspectives andsummarizing it into useful information. Knowledge of Data fromDatabase.

4.Fuzzy set Theory:Fuzzy set theory defines set membership as apossibility distribution.

5.Indiscernibility Relation:Intersection of all the Knowledge.

6.Knowledge Base:A family of classifications over U.

7.Reduct:Essential part of Knowledge.Extensional Candidate Key.

8.Core: Most important part of Knowledge.Intersection of all the reducts

9.Information Table:Also known as Information System and KnowledgeRepresentation System is a table that has set of entities,set ofattributes,set of values of attribute,and a description function.

10.Decision Table:is an information tabe in which attribute set isthe union of non empty conditional and decision attributes.

11.Decision Attribute:The attribue in a table that shows the decision derived from allpossible conditions.

12.Conditional Attributes:a set/sets of attributes that provides conditions in the table.

13.Decision Classes:A set of equivalence classes that produce the same decision.Different combinations of conditional attributes that produces thesame Decision.

21

14.Conditional Classes: set of same possible combinations ofconditional attributes that produce the same decision.

15.Knowledge Dependency:Entities that are indiscernible by conditional attributes are alsoindiscernible by Decision attributes.

22

Documents

An Overview of Rough Set Theory from the Point of View of