56
Characterizing and mining numerical pattern An FCA point of view Declarative Approaches for Enumerating Interesting Patterns http://liris.cnrs.fr/dag/ Mehdi Kaytoue – Post-doctorant DAG June 2012 [email protected] http://liris.cnrs.fr/mehdi.kaytoue

Characterizing and mining numerical patterns, an FCA point of view

Embed Size (px)

DESCRIPTION

Talk given on June 2012 for the DAG project (Declarative Approaches for Enumerating Interesting Patterns) liris.cnrs.fr/dag

Citation preview

Page 1: Characterizing and mining numerical patterns, an FCA point of view

Characterizing and mining numericalpattern

An FCA point of viewDeclarative Approaches for Enumerating Interesting

Patterns

http://liris.cnrs.fr/dag/

Mehdi Kaytoue – Post-doctorant DAG

June 2012

[email protected]

http://liris.cnrs.fr/mehdi.kaytoue

Page 2: Characterizing and mining numerical patterns, an FCA point of view

Outline

1 Introducing Formal Concept Analysis

2 Main research question

3 Elements of answerInterval pattern structuresTowards condensed representationsIntroducing similarityExtracting biclusters of similar valuesTriadic Concept Analysis for biclustering

4 Conclusion and perspectives

2 / 49Characterizing and mining numerical pattern An FCA point of view

N

Page 3: Characterizing and mining numerical patterns, an FCA point of view

Introducing Formal Concept Analysis

A binary table as a formal context

Given by (G ,M, I ) with

G a set of objects

M a set of attributes

I a binary relation between objects and attributes:(g ,m) ∈ I means that “object g owns attribute m”

m1 m2 m3

g1 × ×g2 × ×g3 × ×g4 × ×g5 × × ×

G = {g1, . . . , g5}M = {m1,m2,m3}

(g1,m3) ∈ I

B. Ganter and R. WilleFormal Concept Analysis.In Springer, Mathematical foundations., 1999.

3 / 49Characterizing and mining numerical pattern An FCA point of view

N

Page 4: Characterizing and mining numerical patterns, an FCA point of view

Introducing Formal Concept Analysis

A maximal rectangle as a formal concept

A Galois connection to characterize formal concepts

A′ = {m ∈ M | ∀g ∈ A ⊆ G : (g ,m) ∈ I}

B ′ = {g ∈ G | ∀m ∈ B ⊆ M : (g ,m) ∈ I}

(A,B) is a concept with extent A = B ′ and intent B = A′

{g3}′ = {m2,m3}

{m2,m3}′ = {g3, g4, g5}

m1 m2 m3

g1 × ×g2 × ×g3 × ×g4 × ×g5 × × ×

({g3, g4, g5}, {m2,m3}) is a formal concept

4 / 49Characterizing and mining numerical pattern An FCA point of view

N

Page 5: Characterizing and mining numerical patterns, an FCA point of view

Introducing Formal Concept Analysis

Concept latticeOrdered set of concepts...

(A1,B1) ≤ (A2,B2)⇔ A1 ⊆ A2 (⇔ B2 ⊆ B1)

({g1, g5}, {m1,m3}) ≤ ({g1, g2, g5}, {m1})

... with interesting properties

Maximality of concepts as rectangles

Overlapping of concepts

Specialization/generalisation hierarchy

Synthetic representation of the data without loss of information

5 / 49Characterizing and mining numerical pattern An FCA point of view

N

Page 6: Characterizing and mining numerical patterns, an FCA point of view

1 Introducing Formal Concept Analysis

2 Main research question

3 Elements of answerInterval pattern structuresTowards condensed representationsIntroducing similarityExtracting biclusters of similar valuesTriadic Concept Analysis for biclustering

4 Conclusion and perspectives

Page 7: Characterizing and mining numerical patterns, an FCA point of view

Main research question

General objectives

Mining numerical data with formal concept analysis

Turning data into binary (?)

Bringing the problem into well-known settings

Allowing a mathematically well defined approach for a correct,exact and non redundant extraction of numerical patterns

Exploiting existing algorithms and “tools”

m1 m2 m3 m4 m5

g1 1 2 2 1 6g2 2 1 1 5 6g3 2 2 1 7 6g4 8 9 2 6 7

m1 m2 m3 m4 m5

g1 ×g2 ×g3 × ×g4 × × × ×

Can we work with FCA directly on numerical data?

7 / 49Characterizing and mining numerical pattern An FCA point of view

N

Page 8: Characterizing and mining numerical patterns, an FCA point of view

1 Introducing Formal Concept Analysis

2 Main research question

3 Elements of answerInterval pattern structuresTowards condensed representationsIntroducing similarityExtracting biclusters of similar valuesTriadic Concept Analysis for biclustering

4 Conclusion and perspectives

Page 9: Characterizing and mining numerical patterns, an FCA point of view

Elements of answer – Interval pattern structures

Outline

1 Introducing Formal Concept Analysis

2 Main research question

3 Elements of answerInterval pattern structuresTowards condensed representationsIntroducing similarityExtracting biclusters of similar valuesTriadic Concept Analysis for biclustering

4 Conclusion and perspectives

9 / 49Characterizing and mining numerical pattern An FCA point of view

N

Page 10: Characterizing and mining numerical patterns, an FCA point of view

Elements of answer – Interval pattern structures

First elements...

10 / 49Characterizing and mining numerical pattern An FCA point of view

N

Page 11: Characterizing and mining numerical patterns, an FCA point of view

Elements of answer – Interval pattern structures

How to handle complex descriptions

An intersection as a similarity operator

∩ behaves as similarity operator

{m1,m2} ∩ {m1,m3} = {m1}

∩ induces an ordering relation ⊆

N ∩ O = N ⇐⇒ N ⊆ O{m1} ∩ {m1,m2} = {m1} ⇐⇒ {m1} ⊆ {m1,m2}

∩ has the properties of a meet u in a semi lattice,a commutative, associative and idempotent operation

c u d = c ⇐⇒ c v dA. Tversky

Features of similarity.In Psychological Review, 84 (4), 1977.

11 / 49Characterizing and mining numerical pattern An FCA point of view

N

Page 12: Characterizing and mining numerical patterns, an FCA point of view

Elements of answer – Interval pattern structures

Pattern structure

Given by (G , (D,u), δ)

G a set of objects

(D,u) a semi-lattice of descriptions or patterns

δ a mapping such as δ(g) ∈ D describes object g

A Galois connection

A� =l

g∈Aδ(g) for A ⊆ G

d� = {g ∈ G |d v δ(g)} for d ∈ (D,u)

B. Ganter and S. O. KuznetsovPattern Structures and their Projections.In International Conference on Conceptual Structures, 2001.

12 / 49Characterizing and mining numerical pattern An FCA point of view

N

Page 13: Characterizing and mining numerical patterns, an FCA point of view

Elements of answer – Interval pattern structures

Ordering descriptions in numerical data

(D,u) as a meet-semi-lattice with u as a “convexification”

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

4 5 6

[4,5] [5,6]

[4,6]

13 / 49Characterizing and mining numerical pattern An FCA point of view

N

Page 14: Characterizing and mining numerical patterns, an FCA point of view

Elements of answer – Interval pattern structures

Numerical data are pattern structuresInterval pattern structures

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

{g1, g2}� =l

g∈{g1,g2}δ(g)

= 〈5, 7, 6〉 u 〈6, 8, 4〉= 〈[5, 6], [7, 8], [4, 6]〉

〈[5, 6], [7, 8], [4, 6]〉� = {g ∈ G |〈[5, 6], [7, 8], [4, 6]〉 v δ(g)}= {g1, g2, g5}

({g1, g2, g5}, 〈[5, 6], [7, 8], [4, 6]〉) is a (pattern) concept

14 / 49Characterizing and mining numerical pattern An FCA point of view

N

Page 15: Characterizing and mining numerical patterns, an FCA point of view

Elements of answer – Interval pattern structures

Interval pattern concept lattice

Existing algorithms

Lowest concepts: few objects, small intervals

Highest concepts: many objects, large intervals

Concept/pattern overwhelming

15 / 49Characterizing and mining numerical pattern An FCA point of view

N

Page 16: Characterizing and mining numerical patterns, an FCA point of view

Elements of answer – Interval pattern structures

Links with conceptual scaling

Interordinal scaling [Ganter & Wille]

A scale to encode intervals of attribute values

m1 ≤ 4 m1 ≤ 5 m1 ≤ 6 m1 ≥ 4 m1 ≥ 5 m1 ≥ 6

4 × × × ×5 × × × ×6 × × × ×

Equivalent concept lattice({g1, g2, g5}, {m1 ≤ 6,m1 ≥ 4,m1 ≥ 5, ... , ... })({g1, g2, g5}, 〈[5, 6] , ... , ... 〉)

Why should we use pattern structures as we have scaling?

Processing a pattern structure is more efficient

M. Kaytoue, S. O. Kuznetsov, A. Napoli and S. DuplessisMining Gene Expression Data with Pattern Structures in Formal Concept Analysis.In Information Sciences. Spec. Iss.: Lattices (Elsevier), 181(10): 1989-2001 (2011).

16 / 49Characterizing and mining numerical pattern An FCA point of view

N

Page 17: Characterizing and mining numerical patterns, an FCA point of view

Elements of answer – Towards condensed representations

Outline

1 Introducing Formal Concept Analysis

2 Main research question

3 Elements of answerInterval pattern structuresTowards condensed representationsIntroducing similarityExtracting biclusters of similar valuesTriadic Concept Analysis for biclustering

4 Conclusion and perspectives

17 / 49Characterizing and mining numerical pattern An FCA point of view

N

Page 18: Characterizing and mining numerical patterns, an FCA point of view

Elements of answer – Towards condensed representations

Interval pattern search spaceCounting all possible interval patterns

〈[am1 , bm1 ], [am2 , bm2 ], ...〉where ami , bmi ∈Wmi

m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

∏i∈{1,...,|M|}

|Wmi | × (|Wmi |+ 1)

2

360 possible interval patterns in our small example

M. Kaytoue, S. O. Kuznetsov, and A. NapoliRevisiting Numerical Pattern Mining with Formal Concept Analysis.In International Joint Conference on Artificial Intelligence (IJCAI), 2011.

18 / 49Characterizing and mining numerical pattern An FCA point of view

N

Page 19: Characterizing and mining numerical patterns, an FCA point of view

Elements of answer – Towards condensed representations

Semantics for interval patterns

Interval patterns as (hyper) rectangles

m1 m3

g1 5 6g2 6 4g3 4 5g4 4 8g5 5 5

〈[4, 5], [5, 6]〉� = {g1, g3, g5}〈[4, 5], [4, 6]〉� = {g1, g3, g5}〈[4, 6], [5, 6]〉� = {g1, g3, g5}

3

4

5

6

7

8

3 4 5 6m1

m3

b

b

b

b

b

δ(g1)

δ(g2)

δ(g3)

δ(g4)

δ(g5)

19 / 49Characterizing and mining numerical pattern An FCA point of view

N

Page 20: Characterizing and mining numerical patterns, an FCA point of view

Elements of answer – Towards condensed representations

Semantics for interval patterns

Interval patterns as (hyper) rectangles

m1 m3

g1 5 6g2 6 4g3 4 5g4 4 8g5 5 5

〈[4, 5], [5, 6]〉� = {g1, g3, g5}

〈[4, 5], [4, 6]〉� = {g1, g3, g5}〈[4, 6], [5, 6]〉� = {g1, g3, g5}

3

4

5

6

7

8

3 4 5 6m1

m3

b

b

b

b

b

δ(g1)

δ(g2)

δ(g3)

δ(g4)

δ(g5)

19 / 49Characterizing and mining numerical pattern An FCA point of view

N

Page 21: Characterizing and mining numerical patterns, an FCA point of view

Elements of answer – Towards condensed representations

Semantics for interval patterns

Interval patterns as (hyper) rectangles

m1 m3

g1 5 6g2 6 4g3 4 5g4 4 8g5 5 5

〈[4, 5], [5, 6]〉� = {g1, g3, g5}〈[4, 5], [4, 6]〉� = {g1, g3, g5}

〈[4, 6], [5, 6]〉� = {g1, g3, g5}

3

4

5

6

7

8

3 4 5 6m1

m3

b

b

b

b

b

δ(g1)

δ(g2)

δ(g3)

δ(g4)

δ(g5)

19 / 49Characterizing and mining numerical pattern An FCA point of view

N

Page 22: Characterizing and mining numerical patterns, an FCA point of view

Elements of answer – Towards condensed representations

Semantics for interval patterns

Interval patterns as (hyper) rectangles

m1 m3

g1 5 6g2 6 4g3 4 5g4 4 8g5 5 5

〈[4, 5], [5, 6]〉� = {g1, g3, g5}〈[4, 5], [4, 6]〉� = {g1, g3, g5}〈[4, 6], [5, 6]〉� = {g1, g3, g5} 3

4

5

6

7

8

3 4 5 6m1

m3

b

b

b

b

b

δ(g1)

δ(g2)

δ(g3)

δ(g4)

δ(g5)

19 / 49Characterizing and mining numerical pattern An FCA point of view

N

Page 23: Characterizing and mining numerical patterns, an FCA point of view

Elements of answer – Towards condensed representations

A condensed representation

Equivalence classes of interval patterns

Two interval patterns with same image are said to be equivalent

c ∼= d ⇐⇒ c� = d�

Equivalence class of a pattern d

[d ] = {c |c ∼= d}

with a unique closed pattern: the smallest rectangle

and one or several generators: the largest rectangles

Y. Bastide, R. Taouil, N. Pasquier, G. Stumme, and L. Lakhal.Mining frequent patterns with counting inference.SIGKDD Expl., 2(2):66–75, 2000.

In our example: 360 patterns ; 18 closed ; 44 generators

20 / 49Characterizing and mining numerical pattern An FCA point of view

N

Page 24: Characterizing and mining numerical patterns, an FCA point of view

Elements of answer – Towards condensed representations

A condensed representation

Remarks

4 5 6

[4,5] [5,6]

[4,6]

Compression rate varies between 107 and 109

Interordinal scaling: encodes ' 30.000 binary patterns

not efficient even with best algorithms (e.g. LCMv2)redundancy problem discarding its use for generator extraction

MDL, quantitative association rule mining, k-anonymisation

Need of fault-tolerant condensed representations

21 / 49Characterizing and mining numerical pattern An FCA point of view

N

Page 25: Characterizing and mining numerical patterns, an FCA point of view

Elements of answer – Introducing similarity

Outline

1 Introducing Formal Concept Analysis

2 Main research question

3 Elements of answerInterval pattern structuresTowards condensed representationsIntroducing similarityExtracting biclusters of similar valuesTriadic Concept Analysis for biclustering

4 Conclusion and perspectives

22 / 49Characterizing and mining numerical pattern An FCA point of view

N

Page 26: Characterizing and mining numerical patterns, an FCA point of view

Elements of answer – Introducing similarity

Introducing a similarity relation

Grouping in a same concept objects having similar values?

A natural similarity relation on numbers

a 'θ b ⇔ |a− b| ≤ θ e.g. 4 '1 5 4 6'1 6

Similarity operator u in pattern structures

4 5 6

[4,5] [5,6]

[4,6]

How to consider a similarity relation w.r.t. a distance?

23 / 49Characterizing and mining numerical pattern An FCA point of view

N

Page 27: Characterizing and mining numerical patterns, an FCA point of view

Elements of answer – Introducing similarity

Introducing a similarity relation

Grouping in a same concept objects having similar values?

A natural similarity relation on numbers

a 'θ b ⇔ |a− b| ≤ θ e.g. 4 '1 5 4 6'1 6

Similarity operator u in pattern structures

θ = 2

4 5 6

[4,5] [5,6]

[4,6]

How to consider a similarity relation w.r.t. a distance?

23 / 49Characterizing and mining numerical pattern An FCA point of view

N

Page 28: Characterizing and mining numerical patterns, an FCA point of view

Elements of answer – Introducing similarity

Introducing a similarity relation

Grouping in a same concept objects having similar values?

A natural similarity relation on numbers

a 'θ b ⇔ |a− b| ≤ θ e.g. 4 '1 5 4 6'1 6

Similarity operator u in pattern structures

θ = 1

4 5 6

[4,5] [5,6]

[4,6]

How to consider a similarity relation w.r.t. a distance?

23 / 49Characterizing and mining numerical pattern An FCA point of view

N

Page 29: Characterizing and mining numerical patterns, an FCA point of view

Elements of answer – Introducing similarity

Introducing a similarity relation

Grouping in a same concept objects having similar values?

A natural similarity relation on numbers

a 'θ b ⇔ |a− b| ≤ θ e.g. 4 '1 5 4 6'1 6

Similarity operator u in pattern structures

θ = 04 5 6

[4,5] [5,6]

[4,6]

How to consider a similarity relation w.r.t. a distance?

23 / 49Characterizing and mining numerical pattern An FCA point of view

N

Page 30: Characterizing and mining numerical patterns, an FCA point of view

Elements of answer – Introducing similarity

Towards a similarity between values

Introduce an element ∗ ∈ (D,u) denoting dissimilarity

c u d = ∗ iff c 6'θ dc u d 6= ∗ iff c 'θ d

Example with θ = 1m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

{g3, g4}� = 〈[4, 4], [8, 9], ∗〉〈[4, 4], [8, 9], ∗〉� = {g3, g4}

({g3, g4}, 〈[4, 4], [8, 9], ∗〉) is a pattern concept:g3 and g4 have similar values for attributes m1 and m2 only

24 / 49Characterizing and mining numerical pattern An FCA point of view

N

Page 31: Characterizing and mining numerical patterns, an FCA point of view

Elements of answer – Introducing similarity

Towards a similarity between values

Introduce an element ∗ ∈ (D,u) denoting dissimilarity

c u d = ∗ iff c 6'θ dc u d 6= ∗ iff c 'θ d

Example with θ = 1m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

{g3, g4}� = 〈[4, 4], [8, 9], ∗〉〈[4, 4], [8, 9], ∗〉� = {g3, g4}

({g3, g4}, 〈[4, 4], [8, 9], ∗〉) is a pattern concept:g3 and g4 have similar values for attributes m1 and m2 only

Is {g3, g4} maximal w.r.t. similarity? We can add g5...

24 / 49Characterizing and mining numerical pattern An FCA point of view

N

Page 32: Characterizing and mining numerical patterns, an FCA point of view

Elements of answer – Introducing similarity

Classes of tolerance in numerical data

Towards maximal sets of similar values

'θ a tolerance relation : reflexive, symmetric, not transitive

Consider an attribute taking values in {6, 8, 11, 16, 17} and θ = 5

8 '5 11, 11 '5 16 but 8 6'5 16

A class of tolerance as a maximal set of pairwise similar values

{6, 8, 11} {11, 16} {16, 17}[6, 11] [11, 16] [16, 17]

S. O. KuznetsovGalois Connections in Data Analysis: Contributions from the Soviet Era and Modern Russian Research.In Formal Concept Analysis, Foundations and Applications, 2005.

25 / 49Characterizing and mining numerical pattern An FCA point of view

N

Page 33: Characterizing and mining numerical patterns, an FCA point of view

Elements of answer – Introducing similarity

Tolerance in pattern structures

Projecting the pattern structure

Each value is replaced by the interval characterizing its class oftolerance (if unique)

Each pattern d is projected with a mapping ψ(d) v d(pre-processing)

rod

Example with θ = 1m1 m2 m3

g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5

{g3, g4}� = ψ(〈[4, 4], [8, 9], ∗〉)= 〈[4, 5], [8, 9], ∗〉

〈[4, 5], [8, 9], ∗〉� = {g3, g4, g5}

26 / 49Characterizing and mining numerical pattern An FCA point of view

N

Page 34: Characterizing and mining numerical patterns, an FCA point of view

Elements of answer – Introducing similarity

Similarity and scaling

m1 m2 m3

g1 6 0 [1, 2]g2 8 4 [2, 5]g3 11 8 [4, 5]g4 16 8 [6, 9]g5 17 12 [7, 10]

'5 6 8 11 16 176 × × ×8 × × ×11 × × × ×16 × × ×17 × ×

(m1,1

1)

(m1,1

6)

(m1,[

6,1

1])

(m1,[

11,1

6])

(m1,[

16,1

7])

(m2,4

)

(m2,8

)

(m2,[

0,4

])

(m2,[

4,8

])

(m2,[

8,1

2])

(m3,[

1,5

])

(m3,[

4,9

])

(m3,[

6,1

0])

(m3,[

4,5

])

(m3,[

6,9

])

g1 × × ×g2 × × × × ×g3 × × × × × × × × ×g4 × × × × × × × × ×g5 × × ×

27 / 49Characterizing and mining numerical pattern An FCA point of view

N

Page 35: Characterizing and mining numerical patterns, an FCA point of view

Elements of answer – Extracting biclusters of similar values

Outline

1 Introducing Formal Concept Analysis

2 Main research question

3 Elements of answerInterval pattern structuresTowards condensed representationsIntroducing similarityExtracting biclusters of similar valuesTriadic Concept Analysis for biclustering

4 Conclusion and perspectives

28 / 49Characterizing and mining numerical pattern An FCA point of view

N

Page 36: Characterizing and mining numerical patterns, an FCA point of view

Elements of answer – Extracting biclusters of similar values

Another type of biclusterGoing back to similarity relation

w1 'θ w2 ⇐⇒ |w1 − w2| ≤ θ with θ ∈ R,w1,w2 ∈W

Bicluster of similar values

A bicluster (A,B) is a bicluster of similar values if

mi (gj) 'θ mk(gl), ∀gj , gl ∈ A, ∀mi ,mk ∈ B

m1 m2 m3 m4 m5

g1 1 2 2 1 6g2 2 1 1 0 6g3 2 2 1 7 6g4 8 9 2 6 7

θ = 1

and maximal if no object/attribute can be added

J. Besson, C. Robardet, L. De Raedt, J.-F. BoulicautMining Bi-sets in Numerical Data.In KDID 2006: 11-23.

29 / 49Characterizing and mining numerical pattern An FCA point of view

N

Page 37: Characterizing and mining numerical patterns, an FCA point of view

Elements of answer – Extracting biclusters of similar values

Can we use the interval pattern lattice?

Concept example ({g2, g3}, 〈[2, 2], [1, 2], [1, 1], [0, 7], [6, 6]〉)

m1 m2 m3 m4 m5

g1 1 2 2 1 6g2 2 1 1 0 6g3 2 2 1 7 6g4 8 9 2 6 7

θ = 1

3 statements to verify

Some intervals have a “size” larger than θ

Some values in two different columns may not be similar

Rectangle may not be maximal

M. Kaytoue, S. O. Kuznetsov, and A. NapoliBiclustering Numerical Data in Formal Concept Analysis.In International Conference on Formal Concept Analysis (ICFCA), 2011.

30 / 49Characterizing and mining numerical pattern An FCA point of view

N

Page 38: Characterizing and mining numerical patterns, an FCA point of view

Elements of answer – Extracting biclusters of similar values

First statement

Avoiding intervals with size larger than θ

[a1, b1]u [a2, b2] =

{[min(a1, a2),max(b1, b2)] if|max(b1, b2)−min(a1, a2)| ≤ θ∗ otherwise

Going back to our example, with θ = 1

({g2, g3}, 〈[2, 2], [1, 2], [1, 1], ∗, [6, 6]〉)m1 m2 m3 m4 m5

g1 1 2 2 1 6g2 2 1 1 0 6g3 2 2 1 7 6g4 8 9 2 6 7

31 / 49Characterizing and mining numerical pattern An FCA point of view

N

Page 39: Characterizing and mining numerical patterns, an FCA point of view

Elements of answer – Extracting biclusters of similar values

Second statement

Values from two columns should be similar

From({g2, g3}, 〈[2, 2], [1, 2], [1, 1], ∗, [6, 6]〉)

we group attributes such as their values form a class of tolerance:

m1 m2 m3 m4 m5

g1 1 2 2 1 6g2 2 1 1 0 6g3 2 2 1 7 6g4 8 9 2 6 7

m1 m2 m3 m4 m5

g1 1 2 2 1 6g2 2 1 1 0 6g3 2 2 1 7 6g4 8 9 2 6 7

({g2, g3}, {m1,m2,m3}) ({g2, g3}, {m5})

32 / 49Characterizing and mining numerical pattern An FCA point of view

N

Page 40: Characterizing and mining numerical patterns, an FCA point of view

Elements of answer – Extracting biclusters of similar values

Third statementMaximal bicluster of similar values

({g1}, 〈1, 2, 2, 1,6〉) ({g2}, 〈2,1,1, 0,6〉) ({g3}, 〈 2,2,1,7,6〉) ({g4}, 〈8, 9,2,6,7 〉)

({g1, g2},

〈[1,2],[1,2],[1,2], [0, 1],6〉)

({g1, g3},

〈[1,2],2,[1,2], ∗,6〉)

({g2, g3},

〈 2,[1,2],1, ∗,6 〉

({g3, g4},

〈∗, ∗, [1,2], [6, 7], [6, 7]〉)

({g1, g2, g3},

〈[1, 2], [1, 2], [1, 2], ∗,6 〉)

({g1, g2, g3, g4},

〈∗, ∗, [1, 2], ∗, [6, 7]〉)

Constructing maximal biclusters: bottom-up/top-down

33 / 49Characterizing and mining numerical pattern An FCA point of view

N

Page 41: Characterizing and mining numerical patterns, an FCA point of view

Elements of answer – Triadic Concept Analysis for biclustering

Outline

1 Introducing Formal Concept Analysis

2 Main research question

3 Elements of answerInterval pattern structuresTowards condensed representationsIntroducing similarityExtracting biclusters of similar valuesTriadic Concept Analysis for biclustering

4 Conclusion and perspectives

34 / 49Characterizing and mining numerical pattern An FCA point of view

N

Page 42: Characterizing and mining numerical patterns, an FCA point of view

Elements of answer – Triadic Concept Analysis for biclustering

Triadic Concept Analysis“Extension” of FCA to ternary relation

An object has an attribute for a given condition

Triadic context (G ,M,B,Y )

Several derivation operators allowing to characterize “triadicconcepts” as maximal cubes of ×

b1 b2 b3

m1 m2 m3

g1 ×g2 × ×g3 × ×g4 × ×g5 × ×

m1 m2 m3

g1 × × ×g2 × ×g3 × × ×g4 × ×g5 × ×

m1 m2 m3

g1 × ×g2 ×g3 × × ×g4 × ×g5 × × ×

({g3, g4, g5}, {m2,m3}, {b1, b2, b3}) is a triadic concept

F. Lehmann and R. Wille.A Triadic Approach to Formal Concept Analysis.In International Conference on Conceptual Structures (ICCS), 1995.

35 / 49Characterizing and mining numerical pattern An FCA point of view

N

Page 43: Characterizing and mining numerical patterns, an FCA point of view

Elements of answer – Triadic Concept Analysis for biclustering

Basic idea

Principle

Start from a numerical dataset (G ,M,W , I )

Build a triadic context (G ,M,B,Y ) with same objects, sameattributes, and discretized dimension

Extract triadic concepts

Interordinal scaling

B and all its intersections characterize any interval over W

We show interesting links between biclusters of similarvalues and triadic concepts

M. Kaytoue, S. O. Kuznetsov, J. Macko, A. Napoli and W. Meira Jr.Mining biclusters of similar values with triadic concept analysis.In International Conference Concept Lattices and their Applications (CLA), 2011.

36 / 49Characterizing and mining numerical pattern An FCA point of view

N

Page 44: Characterizing and mining numerical patterns, an FCA point of view

Elements of answer – Triadic Concept Analysis for biclustering

Discretization method

Interodinal scaling (existing discretization scale)

Let (G ,M,W , I ) be a numerical dataset (with W the set ofdata-values.

Now consider the setT = {[min(W ),w ],∀w ∈W } ∪ {[w ,max(W )],∀w ∈W }.

Known fact: T and all its intersections characterize any intervalof values on W .

Example

With W = {0, 1, 2, 6, 7, 8, 9}, one has

T = {[0, 0], [0, 1], [0, 2], ..., [0, 9], [1, 9], [2, 9], ..., [9, 9]}

and for example [0, 8] ∩ [2, 9] = [2, 8]

37 / 49Characterizing and mining numerical pattern An FCA point of view

N

Page 45: Characterizing and mining numerical patterns, an FCA point of view

Elements of answer – Triadic Concept Analysis for biclustering

Building a triadic contextTransformation procedure

From a numerical dataset (G ,M,W , I ), build a triadic context(G ,M,T ,Y ) such as (g ,m, t) ∈ Y ⇐⇒ m(g) ∈ t

38 / 49Characterizing and mining numerical pattern An FCA point of view

N

Page 46: Characterizing and mining numerical patterns, an FCA point of view

Elements of answer – Triadic Concept Analysis for biclustering

First contributionWe proved that there is a 1-1-correspondence between

(i) Triadic concepts of the resulting triadic context(ii) Biclusters of similar values maximal for some θ ≥ 0

Interesting facts

Efficient algorithm for concept extraction (Data-Peeler,handling several constraints)

L. Cerf, J. Besson, C. Robardet, J.-F. BoulicautClosed patterns meet n-ary relations.In TKDD 3(1): (2009).

Top-k biclusters: Concept (A,B,C ) with high |A|, |B|, and |C |corresponds to bicluster (A,B) as a large rectangle of closevalues (by properties of interordinal scale)

This formalization allows us to design a new algorithm toextract maximal biclusters for a given parameter θ

39 / 49Characterizing and mining numerical pattern An FCA point of view

N

Page 47: Characterizing and mining numerical patterns, an FCA point of view

Elements of answer – Triadic Concept Analysis for biclustering

Triadic diagram

Quasi-order .i and equivalence relation ∼i fori = 1, 2, 3

(A1,A2,A3) .i (B1,B2,B3) ⇐⇒ Ai ⊆ Bi

(A1,A2,A3) ∼i (B1,B2,B3) ⇐⇒ Ai = Bi

Anti-ordinal dependencies

With (A1,A2,A3) .i (B1,B2,B3)and (A1,A2,A3) .j (B1,B2,B3)then (A1,A2,A3) &k (B1,B2,B3)

A concept is uniquely determined by two of its components

40 / 49Characterizing and mining numerical pattern An FCA point of view

N

Page 48: Characterizing and mining numerical patterns, an FCA point of view

Elements of answer – Triadic Concept Analysis for biclustering

Triadic diagram

Equivalence and factor sets, i = 1, 2, 3

[(A1,A2,A3)]i is the equivalence class of concepts w.r.t. ∼i

.i induces an order ≤i on the factor set I(K)/ ∼i s.t.

[(A1,A2,A3)]i ≤ [(B1,B2,B3)]i ⇐⇒ Ai ⊆ Bi

(I(K)/ ∼i ,≤i ) is the ordered set of all extents(i=1)/intents(i=2)/modus(i=3) of K

41 / 49Characterizing and mining numerical pattern An FCA point of view

N

Page 49: Characterizing and mining numerical patterns, an FCA point of view

Elements of answer – Triadic Concept Analysis for biclustering

Triadic diagrams

Triadic diagram I(K)

Geometric structure: (I(K),∼1,∼2,∼3)

Ordered structures: (I(K)/ ∼i ,≤i )

Three systems of parallel lines, one for each ∼i , in which classesof equivalence meet at most in one element: A triangular pattern

42 / 49Characterizing and mining numerical pattern An FCA point of view

N

Page 50: Characterizing and mining numerical patterns, an FCA point of view

Elements of answer – Triadic Concept Analysis for biclustering

Triadic diagrams

43 / 49Characterizing and mining numerical pattern An FCA point of view

N

Page 51: Characterizing and mining numerical patterns, an FCA point of view

Elements of answer – Triadic Concept Analysis for biclustering

Triadic diagrams

Such representation is not always possible...

the tetrahedron case:

a = (A, y ,C)b = (A,B, z)c = (x ,B,C)d = (x , y , z)

The ”Thomsen condition” is violated (?)

Ongoing work

Prove that in our case, such representation is possible

Alternative vizualisation, naviguation

44 / 49Characterizing and mining numerical pattern An FCA point of view

N

Page 52: Characterizing and mining numerical patterns, an FCA point of view

Elements of answer – Triadic Concept Analysis for biclustering

Second contribution

Compute all max. biclusters for a given θ

Use another (but similar) discretization procedure to build thetriadic context based on tolerance blocks

Standard algorithms output biclusters of similar values but notnecessarily maximal

We design a new algorithm TriMax for that task

TriMax is flexible, uses standard FCA algorithms in itscore, seems better than its competitors, can be extended

to n-ary relations and distributed.

45 / 49Characterizing and mining numerical pattern An FCA point of view

N

Page 53: Characterizing and mining numerical patterns, an FCA point of view

Elements of answer – Triadic Concept Analysis for biclustering

New transformation procedure

Tolerance blocks based scaling

Compute the set C of all blocks of tolerance over W

From the numerical dataset (G ,M,W , I ), build the triadiccontext (G ,M,C ,Z ) such that (g ,m, c) ∈ Z ⇐⇒ m(g) ∈ c

Actually, we remove “useless information”

θ = 1

46 / 49Characterizing and mining numerical pattern An FCA point of view

N

Page 54: Characterizing and mining numerical patterns, an FCA point of view

Elements of answer – Triadic Concept Analysis for biclustering

Second contribution

Algorithm TriMax

Any triadic concept corresponds to a bicluster of similar values,but not necessarily maximal!

It lead us to the algorithm TriMax that:

Process each formal context (one for each block of tolerance)with any existing FCA algorithmAny resulting concept is a maximal bicluster candidate andEach context can be processed separately

TriMax allows a complete, correct and non redundantextraction of all maximal biclusters of similar values for auser defined similarity parameter θ

47 / 49Characterizing and mining numerical pattern An FCA point of view

N

Page 55: Characterizing and mining numerical patterns, an FCA point of view

1 Introducing Formal Concept Analysis

2 Main research question

3 Elements of answerInterval pattern structuresTowards condensed representationsIntroducing similarityExtracting biclusters of similar valuesTriadic Concept Analysis for biclustering

4 Conclusion and perspectives

Page 56: Characterizing and mining numerical patterns, an FCA point of view

Conclusion and perspectives

ConclusionA new insight for the mining numerical data

Our main tools...

Formal Concept Analysis and conceptual scaling

Pattern structures and projections

Tolerance relation

Triadic Concept Analysis

... to deal with numerical data

Conceptual representations of numerical data

Bi-clustering

Information fusion

Applications: GED analysis and agricultural practice assessment

49 / 49Characterizing and mining numerical pattern An FCA point of view

N