PPT

Towards Using Grid Services forMining Fuzzy Association Rules

Mihai Gabroveanu, Ion Iancu, Mirel Cosulschi, Nicolae ConstantinescuFaculty of Mathematics and Computer Science,

University of Craiova, ROMANIA{mihaiug, mirelc,nikyc}@central.ucv.ro,i [email protected]

Introduction

• In this paper we show how the Knowledge Grid infrastructure can be used to implement a distributed algorithm for mining fuzzy association rules from distributed databases over a Grid network.

Grid network

FUZZY

MINING+

Outline

• Knowledge Grid services

• Distributed fuzzy association rules mining

• Distributed problem definition

• The distributed algorithm

• Rules mining implementation over the Grid

• Conclusion

Knowledge Grid Services-1

• The Knowledge Grid ([4], [5], [6]) defines an integrating architecture for distributed data mining and knowledge discovery.

• It uses basic grid services to build specific knowledge services.

• the Core K-grid layer - offers services directly implemented on the top of generic grid services;

• the High level K-grid layer - is used to describe, develop and execute distributed knowledge discovery computations;


Knowledge directory service (KDS). This service extends the basic Globus MDS service and it is responsible formaintaining a description of all the data and tools used in the Knowledge Grid.

it is used metadata information stored in a Knowledge Metadata Repository(KMR).

The Knowledge Base Repository (KBR)is used to maintain discovered knowledge.Another important repository is the Knowledge Execution Plan Repository (KEPR). It store the execution plans of data mining processes.Resource allocation and execution management service (RAEMS). These services are used to find best mapping between an execution plan and available resources,with the goal of satisfying the application requirements.


Data Access Service (DAS). This service is responsible for the search, selection (data search services), extraction,transformation and delivery (data extraction service) of data to be mined.

Tools and algorithms access service (TASS). This service is responsible for the search, selection, and downloading of data mining tools and algorithms.

Execution plan management service (EPMS). This service is a semi-automatic tool that takes data and programs selected by user, and generate a set of different,possible plans that meet user, data and algorithms requirements and constrains.

Results presentation service (RPS). This service specifies how to generate, present and visualize the modelsextracted.

Distributed fuzzy association rules mining-1

DB = {t1, . . . , tn}

I = {i1, . . . , im}

Ex: I = {Age, Income, Weight}


For example, we can take into onsideration for the attribute Weight the following three fuzzy sets: ”thin”,”middle” and ”fat”.

Fweigth = { thin , middle , fat }


〈 X,Fx 〉 = 〈 {Age, Income}, {young, high} 〉


X ={Age, Income}, Y = {Weight}, FX = { middle, high }, FY = { fat }

“ If Age is middle and Income is high then Weight is fat ”

〈 X,Fx 〉 = > 〈 Y,FY 〉〈 {Age, Income}, {middle, high} 〉 ⇒〈 {Weight}, {fat} 〉


T1= 〈 {Age, Income}, {middle, high} 〉 = 〈 {Age, Income}, { 0.5 , 1 } 〉 T2= 〈 {Age, Income}, {middle, high} 〉 = 〈 {Age, Income}, { 1 , 1 } 〉

The fuzzy support value of itemset 〈 X,Fx 〉 = 〈 {Age, Income}, {middle, high} 〉

0.5 * 1 + 1 * 1 = 1.5 / 2 = 0.750.5 * 1 + 1 * 1 = 1.5 / 2 = 0.75


An association rule is considered as interesting if it has eenough supportnough support and high confidencehigh confidence value. This association rule can be encountered under the name strong rule.


• The problem of sequential mining of fuzzy association rules can be decomposed in two subproblems:

1. find all large fuzzy itemsets.

2. generate the fuzzy association rules from the large fuzzy itemsets founded.

Example

age weight

15 40

30 70

age weightyoung old thin fat

1 0 0.5 0

0 0.5 0.5 1

〈 {Age, Weight}, {young, thin} 〉 => 1*0.51*0.5 + 0*0.5 0*0.5

〈 {Age, Weight}, {young, fat} 〉 => 1*01*0 +0*10*1

〈 {Age, Weight}, {old, thin} 〉 = > 0*0.50*0.5 +0.5*0.50.5*0.5

〈 {Age, Weight}, {old, fat} 〉 = > 0*00*0 +0.5*10.5*1

Support countSupport count> Minsup

large fuzzy itemsets

Distributed problem definition-1

• Let DB = { DB1,DB2, . . . ,DBn } be a distributed database over n sites S1, S2, . . . , Sn.

DB2DB1

……. DBn

…..





Distributed Mining Fuzzy Association Rules Given the set of items I, the distributed database DBDB = {DB1,DB2, . . . ,DBn}, the fuzzy sets associated with attributes from II, the minimum support threshold (minsupminsup) and the minimum confidence threshold (minconfminconf), extract all global fuzzy association rules.

1. find all global large fuzzy itemsets.

2. generate the global fuzzy association rules from the global large fuzzy itemsets founded.

Fuzzy Count Distribution Algorithm

………….

First generated L1globally large fuzzy 1-itemsets L(1).

local large local large fuzzy 1-itemsetsfuzzy 1-itemsets



globally large fuzzy 1-itemsets L(1).

global large candidates 1-itemsets CA(1).

CA(k) = Fuzzy_Apriori_Gen(L(k−1)).

Rules mining implementation over the Grid-1

Distributed Rules Mining Scenario


In order to present the implementation of this process in a Grid network we shall consider that:

• the database DB is stored on K-grid node NodeANodeA.

• the tools needed for mining association rules (the partitioner P, mining frequent itemsets tool and association rules extractor) are available as multiplatform executables on K-grid node NodeSNodeS.

• the results will be stored into the Knowledge Base Repository (KBR) on NodeUNodeU.


• Let’s suppose that a Grid User (GU) needs to extract all association rules from database DB using tools available on K-grid node NodeS.

• Step 1.The GU starts the search of computational resources for executing the data mining process from his K-grid node NodeU. In order to locate the computation resources needed to execute the mining process the KDS (Knowledge Discovery Service) will be used.


• Step 2. The GU builds an execution plan for the data mining task, specifying strategies for tools and data movements.The execution plan is constructed by using the EPMS (Execution Plan Management Service). This plan will be stored into local KEPR (Knowledge Execution Plan Repository).

• Step 3. The GU sends the execution plan to RAEMS (Resource Allocation and Execution Management ervice) which starts the application.

• Step 4. The GU visualizes and evaluates the result of computation stored in KBR by means of the RPS (Result Presentation Service) tools.

Conclusion

• In this article, it is proposed an implementation of a distributed algorithm for mining fuzzy association rules from distributed databases into a Knowledge Grid environment.

• The proposed algorithm uses some properties of global large fuzzy itemsets and local large fuzzy itemsets, reduction of computationsreduction of computations made heavily relying on them.


Documents

PPT