28
Towards Using Grid Services for Mining Fuzzy Association Rules Mihai Gabroveanu, Ion Iancu, Mirel Cosulschi, Nicola e Constantinescu Faculty of Mathematics and Computer Science, University of Craiova, ROMANIA {mihaiug, mirelc,nikyc}@central.ucv.ro,i iancu@yahoo. com

PPT

  • Upload
    tommy96

  • View
    666

  • Download
    1

Embed Size (px)

Citation preview

Page 1: PPT

Towards Using Grid Services forMining Fuzzy Association Rules

Mihai Gabroveanu, Ion Iancu, Mirel Cosulschi, Nicolae ConstantinescuFaculty of Mathematics and Computer Science,

University of Craiova, ROMANIA{mihaiug, mirelc,nikyc}@central.ucv.ro,i [email protected]

Page 2: PPT

Introduction

• In this paper we show how the Knowledge Grid infrastructure can be used to implement a distributed algorithm for mining fuzzy association rules from distributed databases over a Grid network.

Grid network

FUZZY

MINING+

Page 3: PPT

Outline

• Knowledge Grid services

• Distributed fuzzy association rules mining

• Distributed problem definition

• The distributed algorithm

• Rules mining implementation over the Grid

• Conclusion

Page 4: PPT

Knowledge Grid Services-1

• The Knowledge Grid ([4], [5], [6]) defines an integrating architecture for distributed data mining and knowledge discovery.

• It uses basic grid services to build specific knowledge services.

• the Core K-grid layer - offers services directly implemented on the top of generic grid services;

• the High level K-grid layer - is used to describe, develop and execute distributed knowledge discovery computations;

Page 5: PPT

Knowledge Grid Services-2

Knowledge directory service (KDS). This service extends the basic Globus MDS service and it is responsible formaintaining a description of all the data and tools used in the Knowledge Grid.

it is used metadata information stored in a Knowledge Metadata Repository(KMR).

The Knowledge Base Repository (KBR)is used to maintain discovered knowledge.Another important repository is the Knowledge Execution Plan Repository (KEPR). It store the execution plans of data mining processes.Resource allocation and execution management service (RAEMS). These services are used to find best mapping between an execution plan and available resources,with the goal of satisfying the application requirements.

Page 6: PPT

Knowledge Grid Services-2

Data Access Service (DAS). This service is responsible for the search, selection (data search services), extraction,transformation and delivery (data extraction service) of data to be mined.

Tools and algorithms access service (TASS). This service is responsible for the search, selection, and downloading of data mining tools and algorithms.

Execution plan management service (EPMS). This service is a semi-automatic tool that takes data and programs selected by user, and generate a set of different,possible plans that meet user, data and algorithms requirements and constrains.

Results presentation service (RPS). This service specifies how to generate, present and visualize the modelsextracted.

Page 7: PPT

Distributed fuzzy association rules mining-1

DB = {t1, . . . , tn}

I = {i1, . . . , im}

Ex: I = {Age, Income, Weight}

Page 8: PPT

Distributed fuzzy association rules mining-2

For example, we can take into onsideration for the attribute Weight the following three fuzzy sets: ”thin”,”middle” and ”fat”.

Fweigth = { thin , middle , fat }

Page 9: PPT

Distributed fuzzy association rules mining-3

〈 X,Fx 〉 = 〈 {Age, Income}, {young, high} 〉

Page 10: PPT

Distributed fuzzy association rules mining-4

X ={Age, Income}, Y = {Weight}, FX = { middle, high }, FY = { fat }

“ If Age is middle and Income is high then Weight is fat ”

〈 X,Fx 〉 = > 〈 Y,FY 〉〈 {Age, Income}, {middle, high} 〉 ⇒〈 {Weight}, {fat} 〉

Page 11: PPT

Distributed fuzzy association rules mining-4

T1= 〈 {Age, Income}, {middle, high} 〉 = 〈 {Age, Income}, { 0.5 , 1 } 〉 T2= 〈 {Age, Income}, {middle, high} 〉 = 〈 {Age, Income}, { 1 , 1 } 〉

The fuzzy support value of itemset 〈 X,Fx 〉 = 〈 {Age, Income}, {middle, high} 〉

0.5 * 1 + 1 * 1 = 1.5 / 2 = 0.750.5 * 1 + 1 * 1 = 1.5 / 2 = 0.75

Page 12: PPT

Distributed fuzzy association rules mining-5

An association rule is considered as interesting if it has eenough supportnough support and high confidencehigh confidence value. This association rule can be encountered under the name strong rule.

Page 13: PPT

Distributed fuzzy association rules mining-6

• The problem of sequential mining of fuzzy association rules can be decomposed in two subproblems:

1. find all large fuzzy itemsets.

2. generate the fuzzy association rules from the large fuzzy itemsets founded.

Page 14: PPT

Example

age weight

15 40

30 70

age weightyoung old thin fat

1 0 0.5 0

0 0.5 0.5 1

〈 {Age, Weight}, {young, thin} 〉 => 1*0.51*0.5 + 0*0.5 0*0.5

〈 {Age, Weight}, {young, fat} 〉 => 1*01*0 +0*10*1

〈 {Age, Weight}, {old, thin} 〉 = > 0*0.50*0.5 +0.5*0.50.5*0.5

〈 {Age, Weight}, {old, fat} 〉 = > 0*00*0 +0.5*10.5*1

Support countSupport count> Minsup

large fuzzy itemsets

Page 15: PPT

Distributed problem definition-1

• Let DB = { DB1,DB2, . . . ,DBn } be a distributed database over n sites S1, S2, . . . , Sn.

DB2DB1

……. DBn

…..

Page 16: PPT

Distributed problem definition-2

Page 17: PPT

Distributed problem definition-3

Page 18: PPT

Distributed problem definition-4

Page 19: PPT

Distributed problem definition-5

Distributed Mining Fuzzy Association Rules Given the set of items I, the distributed database DBDB = {DB1,DB2, . . . ,DBn}, the fuzzy sets associated with attributes from II, the minimum support threshold (minsupminsup) and the minimum confidence threshold (minconfminconf), extract all global fuzzy association rules.

1. find all global large fuzzy itemsets.

2. generate the global fuzzy association rules from the global large fuzzy itemsets founded.

Page 20: PPT
Page 21: PPT

Fuzzy Count Distribution Algorithm

………….

First generated L1globally large fuzzy 1-itemsets L(1).

local large local large fuzzy 1-itemsetsfuzzy 1-itemsets

local large local large fuzzy 1-itemsetsfuzzy 1-itemsets

local large local large fuzzy 1-itemsetsfuzzy 1-itemsets

globally large fuzzy 1-itemsets L(1).

global large candidates 1-itemsets CA(1).

CA(k) = Fuzzy_Apriori_Gen(L(k−1)).

Page 22: PPT

Rules mining implementation over the Grid-1

Distributed Rules Mining Scenario

Page 23: PPT

Rules mining implementation over the Grid-2

In order to present the implementation of this process in a Grid network we shall consider that:

• the database DB is stored on K-grid node NodeANodeA.

• the tools needed for mining association rules (the partitioner P, mining frequent itemsets tool and association rules extractor) are available as multiplatform executables on K-grid node NodeSNodeS.

• the results will be stored into the Knowledge Base Repository (KBR) on NodeUNodeU.

Page 24: PPT

Rules mining implementation over the Grid-3

• Let’s suppose that a Grid User (GU) needs to extract all association rules from database DB using tools available on K-grid node NodeS.

• Step 1.The GU starts the search of computational resources for executing the data mining process from his K-grid node NodeU. In order to locate the computation resources needed to execute the mining process the KDS (Knowledge Discovery Service) will be used.

Page 25: PPT

Rules mining implementation over the Grid-4

• Step 2. The GU builds an execution plan for the data mining task, specifying strategies for tools and data movements.The execution plan is constructed by using the EPMS (Execution Plan Management Service). This plan will be stored into local KEPR (Knowledge Execution Plan Repository).

• Step 3. The GU sends the execution plan to RAEMS (Resource Allocation and Execution Management ervice) which starts the application.

• Step 4. The GU visualizes and evaluates the result of computation stored in KBR by means of the RPS (Result Presentation Service) tools.

Page 26: PPT

Conclusion

• In this article, it is proposed an implementation of a distributed algorithm for mining fuzzy association rules from distributed databases into a Knowledge Grid environment.

• The proposed algorithm uses some properties of global large fuzzy itemsets and local large fuzzy itemsets, reduction of computationsreduction of computations made heavily relying on them.

Page 27: PPT
Page 28: PPT

Knowledge Grid Services-2