22
FLAIRS 2001 1 Graph-Based Concept Learning Jesus Gonzalez, Lawrence Holder and Diane Cook Department of Computer Science and Engineering The University of Texas at Arlington

FLAIRS 20011 Graph-Based Concept Learning Jesus Gonzalez, Lawrence Holder and Diane Cook Department of Computer Science and Engineering The University

  • View
    216

  • Download
    1

Embed Size (px)

Citation preview

Page 1: FLAIRS 20011 Graph-Based Concept Learning Jesus Gonzalez, Lawrence Holder and Diane Cook Department of Computer Science and Engineering The University

FLAIRS 2001 1

Graph-Based Concept Learning

Jesus Gonzalez, Lawrence Holder and Diane Cook

Department of Computer Science and EngineeringThe University of Texas at Arlington

Page 2: FLAIRS 20011 Graph-Based Concept Learning Jesus Gonzalez, Lawrence Holder and Diane Cook Department of Computer Science and Engineering The University

FLAIRS 2001 2

Outline Relational concept learning Graph-based concept learning

Conceptual graphs and Galois lattice Graph-based discovery in Subdue SubdueCL

Empirical results Conclusions

Page 3: FLAIRS 20011 Graph-Based Concept Learning Jesus Gonzalez, Lawrence Holder and Diane Cook Department of Computer Science and Engineering The University

FLAIRS 2001 3

Relational Concept Learning Inductive Logic Programming (ILP)

FOIL Progol

First-order logic vs. graphs Expressiveness Interpretability

Conceptual graphs

Page 4: FLAIRS 20011 Graph-Based Concept Learning Jesus Gonzalez, Lawrence Holder and Diane Cook Department of Computer Science and Engineering The University

FLAIRS 2001 4

Conceptual Graphs Logic-based knowledge

representation

Object

On

Shape

Triangle

SquareObject

Shape

shape(X,triangle)shape(Y,square)on(X,Y)

Page 5: FLAIRS 20011 Graph-Based Concept Learning Jesus Gonzalez, Lawrence Holder and Diane Cook Department of Computer Science and Engineering The University

FLAIRS 2001 5

Conceptual Graphs Graph Logic PAC-learning CGs [Jappy & Nock]

Size of CG class and generalization (projection) operator polynomial in

Number of relations Number of concepts Number of labels

Page 6: FLAIRS 20011 Graph-Based Concept Learning Jesus Gonzalez, Lawrence Holder and Diane Cook Department of Computer Science and Engineering The University

FLAIRS 2001 6

Galois Lattice Each node consists of a description

graph and set of subsumed examples

Begins with positive examples Generalization operator

Most specific generalization Union of example sets

Page 7: FLAIRS 20011 Graph-Based Concept Learning Jesus Gonzalez, Lawrence Holder and Diane Cook Department of Computer Science and Engineering The University

FLAIRS 2001 7

Galois Lattice

triangle

on

square

on

rectangle

triangle

on

square

on

circle

triangle

on

circle

on

rectangle

circle

on

rectangle

on

triangle

triangle

on

square

on

rectangle

triangle

on

square

on

circle

triangle

on

circle

on

rectangle

circle

on

rectangle

on

triangle

triangle

on

square

on

rectangle

triangle

on

square

on

rectangle

triangle

on

square

on

circle

triangle

on

square

on

circle

triangle

on

circle

on

rectangle

triangle

on

circle

on

rectangle

circle

on

rectangle

on

triangle

circle

on

rectangle

on

triangle

[ 1] [ 2] [ 3] [ 4]

[ 1, 2 ] [ 3, 4 ]

[ 1, 2, 3 ] [ 1, 2, 4 ] [ 1, 3, 4 ] [ 2, 3, 4 ]

[ 1, 2, 3, 4 ]

[ 1] [ 2] [ 3] [ 4]

[ 1, 2 ] [ 3, 4 ]

[ 1, 2, 3 ] [ 1, 2, 4 ] [ 1, 3, 4 ] [ 2, 3, 4 ]

[ 1, 2, 3, 4 ]

[ 1] [ 2] [ 3] [ 4]

[ 1, 2 ] [ 3, 4 ]

[ 1, 2, 3 ] [ 1, 2, 4 ] [ 1, 3, 4 ] [ 2, 3, 4 ]

[ 1, 2, 3, 4 ]

[ 1] [ 2] [ 3] [ 4]

[ 1, 2 ] [ 3, 4 ]

[ 1, 2, 3 ] [ 1, 2, 4 ] [ 1, 3, 4 ] [ 2, 3, 4 ]

[ 1, 2, 3, 4 ]

[ 1, 2 ] [ 3, 4 ] [ 1, 2, 3 ] [ 1, 2, 4 ] [ 1, 3, 4 ] [ 2, 3, 4 ] [ 1, 2, 3, 4 ]

triangle

on

square

on

circle

on

rectangle

triangle

on

triangle

on

on

rectangle

triangle

triangle

on

triangle

on

[ 1, 2 ]

triangle

on

square

on

circle

on

rectangle

triangle

on

triangle

on

on

rectangle

triangle

triangle

on

triangle

on

[ 1, 2 ]

triangle

on

square

on

triangle

on

square

on

circle

on

rectangle

circle

on

rectangle

triangle

on

triangle

on

triangle

on

on

rectangle

triangle

on

rectangle

triangle

triangle

on

triangle

on

Page 8: FLAIRS 20011 Graph-Based Concept Learning Jesus Gonzalez, Lawrence Holder and Diane Cook Department of Computer Science and Engineering The University

FLAIRS 2001 8

Galois Lattice Galois lattice creation O(n3p)

n examples p nodes in lattice

Tractable for poly-time generalization

GRAAL system

Page 9: FLAIRS 20011 Graph-Based Concept Learning Jesus Gonzalez, Lawrence Holder and Diane Cook Department of Computer Science and Engineering The University

FLAIRS 2001 9

Graph-Based Discovery Finding “interesting” and repetitive

substructures (connected subgraphs) in data represented as a graph

object

triangle

R1

C1

T1

S1

T2

S2

T3

S3

T4

S4

Input Database Substructure S1 (graph form)

Compressed Database

R1

C1object

squareon

shape

shape S1S1 S1S1 S1S1

S1S1

Page 10: FLAIRS 20011 Graph-Based Concept Learning Jesus Gonzalez, Lawrence Holder and Diane Cook Department of Computer Science and Engineering The University

FLAIRS 2001 10

Graph-Based Discovery “Interesting” defined according to

the Minimum Description Length principle min [DL(S) + DL(G|S)]

General-to-specific beam search through substructure space

Poly-time inexact graph match Subdue system

S

Page 11: FLAIRS 20011 Graph-Based Concept Learning Jesus Gonzalez, Lawrence Holder and Diane Cook Department of Computer Science and Engineering The University

FLAIRS 2001 11

Subdue System Graph-based…

Discovery Concept learning Hierarchical conceptual clustering

Background knowledge Parallel/distributed capability http://cygnus.uta.edu/subdue

Page 12: FLAIRS 20011 Graph-Based Concept Learning Jesus Gonzalez, Lawrence Holder and Diane Cook Department of Computer Science and Engineering The University

FLAIRS 2001 12

Graph-Based Concept Learning

object

object

object

on

on

triangle

square

shape

shape

Page 13: FLAIRS 20011 Graph-Based Concept Learning Jesus Gonzalez, Lawrence Holder and Diane Cook Department of Computer Science and Engineering The University

FLAIRS 2001 13

Graph-Based Concept Learning

Extension to graph-based discovery

Input now a set of positive graphs and a set of negative graphs

Set-covering approach Iterate until all positive graphs and no

negative graphs covered Result is a substructure DNF

Page 14: FLAIRS 20011 Graph-Based Concept Learning Jesus Gonzalez, Lawrence Holder and Diane Cook Department of Computer Science and Engineering The University

FLAIRS 2001 14

Graph-Based Concept Learning Solution 1

Find substructure compressing positive graphs, but not negative graphs

Compress graphs and iterate until no further compression

Problem Compressing, instead of removing,

partially-covered positive graphs leads to overly-specific hypotheses

Page 15: FLAIRS 20011 Graph-Based Concept Learning Jesus Gonzalez, Lawrence Holder and Diane Cook Department of Computer Science and Engineering The University

FLAIRS 2001 15

Graph-Based Concept Learning

Solution 2 Find substructure covering positive

graphs, but not negative graphs Remove covered positive graphs and

iterate until all covered Substructure value = 1 - Error

Page 16: FLAIRS 20011 Graph-Based Concept Learning Jesus Gonzalez, Lawrence Holder and Diane Cook Department of Computer Science and Engineering The University

FLAIRS 2001 16

Empirical Results Comparison with ILP systems Non-relational domains from UCI

repositoryGolf Vote Diabetes Credit TicTacTo

e

FOIL 66.67 93.02 70.66 66.16 100.00

Progol 33.33 76.98 51.97 44.55 100.00

SubdueCL

66.67 94.88 64.21 71.52 100.00

Page 17: FLAIRS 20011 Graph-Based Concept Learning Jesus Gonzalez, Lawrence Holder and Diane Cook Department of Computer Science and Engineering The University

FLAIRS 2001 17

Empirical Results Comparison with ILP systems Relational domains: Chess

endgame

WKC

WKR

WRR WRC

BKC

BKR

pos

adj

adj

pos

adj

adj

pos eq0 1 2

0

1

2

WK

WR

BK

lt

lt

lt

lt

Page 18: FLAIRS 20011 Graph-Based Concept Learning Jesus Gonzalez, Lawrence Holder and Diane Cook Department of Computer Science and Engineering The University

FLAIRS 2001 18

Empirical Results: Chess FOIL: 11 rules, 99.34% Progol: 5 rules, 99.74% SubdueCL: 7 rules, 99.74%

WKC

pos

BKC

WKR BKRadj

ltWKC

pos

BKC

WKR BKRadj

adjWKCWKC

pos

BKCBKC

WKRWKR BKRBKRadj

ltWKCWKC

pos

BKCBKC

WKRWKR BKRBKRadj

adj

Page 19: FLAIRS 20011 Graph-Based Concept Learning Jesus Gonzalez, Lawrence Holder and Diane Cook Department of Computer Science and Engineering The University

FLAIRS 2001 19

Empirical Results Relational domain: Cancer SubdueCL: 62% Progol: 64%

[72%]

compound

atom

atom

c

22

-13

c

22

-13

element

element

type

type

charge

charge

7

contains

contains

six_ring

in_groupin_group

halide10

ashby_alertashby_alert

p

6

positiveames

di227

cytogen_ca

compound

atom

atom

c

22

-13

c

22

-13

element

element

type

type

charge

charge

7

contains

contains

six_ring

in_groupin_group

halide10

ashby_alertashby_alert

p

6

positiveames

di227

cytogen_ca

compound

amine

pchromaberr

has_group

compound

amine

pchromaberr

has_group

compoundcompound

amine

pchromaberr

has_group

compound pdrosophila_slrl

compound pdrosophila_slrl

compoundcompound pdrosophila_slrl

Page 20: FLAIRS 20011 Graph-Based Concept Learning Jesus Gonzalez, Lawrence Holder and Diane Cook Department of Computer Science and Engineering The University

FLAIRS 2001 20

Empirical Results Relational domain: Web

Professor (+) vs. student (-) websites Hyperlink structure and page content

page boxwordpage boxword

Page 21: FLAIRS 20011 Graph-Based Concept Learning Jesus Gonzalez, Lawrence Holder and Diane Cook Department of Computer Science and Engineering The University

FLAIRS 2001 21

Empirical Results Relational domain: Web

Computer store (+) vs. professor (-) websites

Hyperlink structure only

page 8page 17

page 18

page 19

page 22

page 21

page 1

link

link link

link

linklink

link

link

page 8page 17

page 18

page 19

page 22

page 21

page 1

link

link link

link

linklink

link

link

Page 22: FLAIRS 20011 Graph-Based Concept Learning Jesus Gonzalez, Lawrence Holder and Diane Cook Department of Computer Science and Engineering The University

FLAIRS 2001 22

Conclusions Theoretical analysis of graph-based

concept learning PAC-learning conceptual graphs Galois lattice Next step: Relax graph constraints

Empirical analysis Competitive with other relational concept

learners (ILP) Next step: More relational domains

SubdueCL (http://cygnus.uta.edu/subdue)