17
Characteristic Relational Patterns Arne Koopman & Arno Siebes KDD 2009 Speaker: 王王王 王王王

Characteristic Relational Patterns Arne Koopman & Arno Siebes KDD 2009 Speaker: 王思元 黎引得

Embed Size (px)

Citation preview

Page 1: Characteristic Relational Patterns Arne Koopman & Arno Siebes KDD 2009 Speaker: 王思元 黎引得

Characteristic Relational Patterns

Arne Koopman & Arno SiebesKDD 2009

Speaker: 王思元黎引得

Page 2: Characteristic Relational Patterns Arne Koopman & Arno Siebes KDD 2009 Speaker: 王思元 黎引得

Outline

2

Introduction Method Experiment Results Conclusion

Page 3: Characteristic Relational Patterns Arne Koopman & Arno Siebes KDD 2009 Speaker: 王思元 黎引得

Introduction

3

Relational database models Local models: frequent pattern mining Global models: probabilistic relational model

Charactering the database Combine patterns to form a global model

Page 4: Characteristic Relational Patterns Arne Koopman & Arno Siebes KDD 2009 Speaker: 王思元 黎引得

Relational Databases

4

KDD cup relational databases Genes Hepatitis Financial

ORDERorderID

loanID

accountID

Bank-ToAmountK-Symbol

LOANloanID

accountID

DateAmountDurationPaymentStatus

ACCOUNTloanID

accountID

FrequencyDate

CLIENTloanID

clientID

BirthyearSex

CARDcardID

loanID

dispositionID

TypeIssueDate

DISPOSITION

loanID

accountID

dispositionID

clientID

Type

1513 records

682 records

682 records 36 records

827 records

827 records

Page 5: Characteristic Relational Patterns Arne Koopman & Arno Siebes KDD 2009 Speaker: 王思元 黎引得

Local Model

5

Frequent pattern mining: too many patterns

ACCOUNTaccountID Date

10 200911 2009

LOANloanID accountID Duration100 10 12101 11 12102 11 24

ACCOUNTDate=2009

Patterns

ACCOUNTDate=2009

ACCOUNTDate=2009

LOANDuration=12

LOANDuration=12

LOANDuration=24

Page 6: Characteristic Relational Patterns Arne Koopman & Arno Siebes KDD 2009 Speaker: 王思元 黎引得

Global Model

6

Probabilistic Relational Models: local co-occurrence information lost

ACCOUNTaccountID Date

10 200911 2009

LOANloanID accountID Duration100 10 12101 11 12102 11 24

Probabilistic Relational Models

ACCOUNTDate=2009

LOANDuration=12

LOANDuration=24

0.67

0.33

Page 7: Characteristic Relational Patterns Arne Koopman & Arno Siebes KDD 2009 Speaker: 王思元 黎引得

Combined Model

7

Relational Code Table: compact and lossless description of the complete database

ACCOUNTaccountID Date

10 200911 2009

LOANloanID accountID Duration100 10 12101 11 12102 11 24

Code Table

ACCOUNTDate=2009

ACCOUNTDate=2009

LOANDuration=12

LOANDuration=12

LOANDuration=24

Page 8: Characteristic Relational Patterns Arne Koopman & Arno Siebes KDD 2009 Speaker: 王思元 黎引得

Outline

8

Introduction Method Experiments Conclusion

Page 9: Characteristic Relational Patterns Arne Koopman & Arno Siebes KDD 2009 Speaker: 王思元 黎引得

RDB-KRIMP RDB-KRIMP selects

patterns that describe the database well

Candidates are frequent relational patterns

Describing patterns are placed in a code table shortness code length

proportional to usagecode table

emptycodetable

database

database

select pattern

codetable

patterns

add to table

MDL

accept / reject

compress database

9

Page 10: Characteristic Relational Patterns Arne Koopman & Arno Siebes KDD 2009 Speaker: 王思元 黎引得

Outline

10

Introduction Method Experiments Conclusion

Page 11: Characteristic Relational Patterns Arne Koopman & Arno Siebes KDD 2009 Speaker: 王思元 黎引得

Experiments

11

Page 12: Characteristic Relational Patterns Arne Koopman & Arno Siebes KDD 2009 Speaker: 王思元 黎引得

Result for Different Minimum Support Value

12

100%

Page 13: Characteristic Relational Patterns Arne Koopman & Arno Siebes KDD 2009 Speaker: 王思元 黎引得

13

Page 14: Characteristic Relational Patterns Arne Koopman & Arno Siebes KDD 2009 Speaker: 王思元 黎引得

Comparison to Other Pattern Types

14

Page 15: Characteristic Relational Patterns Arne Koopman & Arno Siebes KDD 2009 Speaker: 王思元 黎引得

15

Page 16: Characteristic Relational Patterns Arne Koopman & Arno Siebes KDD 2009 Speaker: 王思元 黎引得

Outline

16

Introduction Method Experiments Conclusion

Page 17: Characteristic Relational Patterns Arne Koopman & Arno Siebes KDD 2009 Speaker: 王思元 黎引得

Conclusion Code tables describe the database while

preserving local information Code tables stay compact

Stay compact for low minimum support values Reductions up to 4 orders of magnitude

Richer patterns lead to better models Smaller encodings Better descriptions for database

17