Characterizing and mining numericalpattern
An FCA point of viewDeclarative Approaches for Enumerating Interesting
Patterns
http://liris.cnrs.fr/dag/
Mehdi Kaytoue – Post-doctorant DAG
June 2012
http://liris.cnrs.fr/mehdi.kaytoue
Outline
1 Introducing Formal Concept Analysis
2 Main research question
3 Elements of answerInterval pattern structuresTowards condensed representationsIntroducing similarityExtracting biclusters of similar valuesTriadic Concept Analysis for biclustering
4 Conclusion and perspectives
2 / 49Characterizing and mining numerical pattern An FCA point of view
N
Introducing Formal Concept Analysis
A binary table as a formal context
Given by (G ,M, I ) with
G a set of objects
M a set of attributes
I a binary relation between objects and attributes:(g ,m) ∈ I means that “object g owns attribute m”
m1 m2 m3
g1 × ×g2 × ×g3 × ×g4 × ×g5 × × ×
G = {g1, . . . , g5}M = {m1,m2,m3}
(g1,m3) ∈ I
B. Ganter and R. WilleFormal Concept Analysis.In Springer, Mathematical foundations., 1999.
3 / 49Characterizing and mining numerical pattern An FCA point of view
N
Introducing Formal Concept Analysis
A maximal rectangle as a formal concept
A Galois connection to characterize formal concepts
A′ = {m ∈ M | ∀g ∈ A ⊆ G : (g ,m) ∈ I}
B ′ = {g ∈ G | ∀m ∈ B ⊆ M : (g ,m) ∈ I}
(A,B) is a concept with extent A = B ′ and intent B = A′
{g3}′ = {m2,m3}
{m2,m3}′ = {g3, g4, g5}
m1 m2 m3
g1 × ×g2 × ×g3 × ×g4 × ×g5 × × ×
({g3, g4, g5}, {m2,m3}) is a formal concept
4 / 49Characterizing and mining numerical pattern An FCA point of view
N
Introducing Formal Concept Analysis
Concept latticeOrdered set of concepts...
(A1,B1) ≤ (A2,B2)⇔ A1 ⊆ A2 (⇔ B2 ⊆ B1)
({g1, g5}, {m1,m3}) ≤ ({g1, g2, g5}, {m1})
... with interesting properties
Maximality of concepts as rectangles
Overlapping of concepts
Specialization/generalisation hierarchy
Synthetic representation of the data without loss of information
5 / 49Characterizing and mining numerical pattern An FCA point of view
N
1 Introducing Formal Concept Analysis
2 Main research question
3 Elements of answerInterval pattern structuresTowards condensed representationsIntroducing similarityExtracting biclusters of similar valuesTriadic Concept Analysis for biclustering
4 Conclusion and perspectives
Main research question
General objectives
Mining numerical data with formal concept analysis
Turning data into binary (?)
Bringing the problem into well-known settings
Allowing a mathematically well defined approach for a correct,exact and non redundant extraction of numerical patterns
Exploiting existing algorithms and “tools”
m1 m2 m3 m4 m5
g1 1 2 2 1 6g2 2 1 1 5 6g3 2 2 1 7 6g4 8 9 2 6 7
⇒
m1 m2 m3 m4 m5
g1 ×g2 ×g3 × ×g4 × × × ×
Can we work with FCA directly on numerical data?
7 / 49Characterizing and mining numerical pattern An FCA point of view
N
1 Introducing Formal Concept Analysis
2 Main research question
3 Elements of answerInterval pattern structuresTowards condensed representationsIntroducing similarityExtracting biclusters of similar valuesTriadic Concept Analysis for biclustering
4 Conclusion and perspectives
Elements of answer – Interval pattern structures
Outline
1 Introducing Formal Concept Analysis
2 Main research question
3 Elements of answerInterval pattern structuresTowards condensed representationsIntroducing similarityExtracting biclusters of similar valuesTriadic Concept Analysis for biclustering
4 Conclusion and perspectives
9 / 49Characterizing and mining numerical pattern An FCA point of view
N
Elements of answer – Interval pattern structures
First elements...
10 / 49Characterizing and mining numerical pattern An FCA point of view
N
Elements of answer – Interval pattern structures
How to handle complex descriptions
An intersection as a similarity operator
∩ behaves as similarity operator
{m1,m2} ∩ {m1,m3} = {m1}
∩ induces an ordering relation ⊆
N ∩ O = N ⇐⇒ N ⊆ O{m1} ∩ {m1,m2} = {m1} ⇐⇒ {m1} ⊆ {m1,m2}
∩ has the properties of a meet u in a semi lattice,a commutative, associative and idempotent operation
c u d = c ⇐⇒ c v dA. Tversky
Features of similarity.In Psychological Review, 84 (4), 1977.
11 / 49Characterizing and mining numerical pattern An FCA point of view
N
Elements of answer – Interval pattern structures
Pattern structure
Given by (G , (D,u), δ)
G a set of objects
(D,u) a semi-lattice of descriptions or patterns
δ a mapping such as δ(g) ∈ D describes object g
A Galois connection
A� =l
g∈Aδ(g) for A ⊆ G
d� = {g ∈ G |d v δ(g)} for d ∈ (D,u)
B. Ganter and S. O. KuznetsovPattern Structures and their Projections.In International Conference on Conceptual Structures, 2001.
12 / 49Characterizing and mining numerical pattern An FCA point of view
N
Elements of answer – Interval pattern structures
Ordering descriptions in numerical data
(D,u) as a meet-semi-lattice with u as a “convexification”
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
4 5 6
[4,5] [5,6]
[4,6]
13 / 49Characterizing and mining numerical pattern An FCA point of view
N
Elements of answer – Interval pattern structures
Numerical data are pattern structuresInterval pattern structures
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
{g1, g2}� =l
g∈{g1,g2}δ(g)
= 〈5, 7, 6〉 u 〈6, 8, 4〉= 〈[5, 6], [7, 8], [4, 6]〉
〈[5, 6], [7, 8], [4, 6]〉� = {g ∈ G |〈[5, 6], [7, 8], [4, 6]〉 v δ(g)}= {g1, g2, g5}
({g1, g2, g5}, 〈[5, 6], [7, 8], [4, 6]〉) is a (pattern) concept
14 / 49Characterizing and mining numerical pattern An FCA point of view
N
Elements of answer – Interval pattern structures
Interval pattern concept lattice
Existing algorithms
Lowest concepts: few objects, small intervals
Highest concepts: many objects, large intervals
Concept/pattern overwhelming
15 / 49Characterizing and mining numerical pattern An FCA point of view
N
Elements of answer – Interval pattern structures
Links with conceptual scaling
Interordinal scaling [Ganter & Wille]
A scale to encode intervals of attribute values
m1 ≤ 4 m1 ≤ 5 m1 ≤ 6 m1 ≥ 4 m1 ≥ 5 m1 ≥ 6
4 × × × ×5 × × × ×6 × × × ×
Equivalent concept lattice({g1, g2, g5}, {m1 ≤ 6,m1 ≥ 4,m1 ≥ 5, ... , ... })({g1, g2, g5}, 〈[5, 6] , ... , ... 〉)
Why should we use pattern structures as we have scaling?
Processing a pattern structure is more efficient
M. Kaytoue, S. O. Kuznetsov, A. Napoli and S. DuplessisMining Gene Expression Data with Pattern Structures in Formal Concept Analysis.In Information Sciences. Spec. Iss.: Lattices (Elsevier), 181(10): 1989-2001 (2011).
16 / 49Characterizing and mining numerical pattern An FCA point of view
N
Elements of answer – Towards condensed representations
Outline
1 Introducing Formal Concept Analysis
2 Main research question
3 Elements of answerInterval pattern structuresTowards condensed representationsIntroducing similarityExtracting biclusters of similar valuesTriadic Concept Analysis for biclustering
4 Conclusion and perspectives
17 / 49Characterizing and mining numerical pattern An FCA point of view
N
Elements of answer – Towards condensed representations
Interval pattern search spaceCounting all possible interval patterns
〈[am1 , bm1 ], [am2 , bm2 ], ...〉where ami , bmi ∈Wmi
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
∏i∈{1,...,|M|}
|Wmi | × (|Wmi |+ 1)
2
360 possible interval patterns in our small example
M. Kaytoue, S. O. Kuznetsov, and A. NapoliRevisiting Numerical Pattern Mining with Formal Concept Analysis.In International Joint Conference on Artificial Intelligence (IJCAI), 2011.
18 / 49Characterizing and mining numerical pattern An FCA point of view
N
Elements of answer – Towards condensed representations
Semantics for interval patterns
Interval patterns as (hyper) rectangles
m1 m3
g1 5 6g2 6 4g3 4 5g4 4 8g5 5 5
〈[4, 5], [5, 6]〉� = {g1, g3, g5}〈[4, 5], [4, 6]〉� = {g1, g3, g5}〈[4, 6], [5, 6]〉� = {g1, g3, g5}
3
4
5
6
7
8
3 4 5 6m1
m3
b
b
b
b
b
δ(g1)
δ(g2)
δ(g3)
δ(g4)
δ(g5)
19 / 49Characterizing and mining numerical pattern An FCA point of view
N
Elements of answer – Towards condensed representations
Semantics for interval patterns
Interval patterns as (hyper) rectangles
m1 m3
g1 5 6g2 6 4g3 4 5g4 4 8g5 5 5
〈[4, 5], [5, 6]〉� = {g1, g3, g5}
〈[4, 5], [4, 6]〉� = {g1, g3, g5}〈[4, 6], [5, 6]〉� = {g1, g3, g5}
3
4
5
6
7
8
3 4 5 6m1
m3
b
b
b
b
b
δ(g1)
δ(g2)
δ(g3)
δ(g4)
δ(g5)
19 / 49Characterizing and mining numerical pattern An FCA point of view
N
Elements of answer – Towards condensed representations
Semantics for interval patterns
Interval patterns as (hyper) rectangles
m1 m3
g1 5 6g2 6 4g3 4 5g4 4 8g5 5 5
〈[4, 5], [5, 6]〉� = {g1, g3, g5}〈[4, 5], [4, 6]〉� = {g1, g3, g5}
〈[4, 6], [5, 6]〉� = {g1, g3, g5}
3
4
5
6
7
8
3 4 5 6m1
m3
b
b
b
b
b
δ(g1)
δ(g2)
δ(g3)
δ(g4)
δ(g5)
19 / 49Characterizing and mining numerical pattern An FCA point of view
N
Elements of answer – Towards condensed representations
Semantics for interval patterns
Interval patterns as (hyper) rectangles
m1 m3
g1 5 6g2 6 4g3 4 5g4 4 8g5 5 5
〈[4, 5], [5, 6]〉� = {g1, g3, g5}〈[4, 5], [4, 6]〉� = {g1, g3, g5}〈[4, 6], [5, 6]〉� = {g1, g3, g5} 3
4
5
6
7
8
3 4 5 6m1
m3
b
b
b
b
b
δ(g1)
δ(g2)
δ(g3)
δ(g4)
δ(g5)
19 / 49Characterizing and mining numerical pattern An FCA point of view
N
Elements of answer – Towards condensed representations
A condensed representation
Equivalence classes of interval patterns
Two interval patterns with same image are said to be equivalent
c ∼= d ⇐⇒ c� = d�
Equivalence class of a pattern d
[d ] = {c |c ∼= d}
with a unique closed pattern: the smallest rectangle
and one or several generators: the largest rectangles
Y. Bastide, R. Taouil, N. Pasquier, G. Stumme, and L. Lakhal.Mining frequent patterns with counting inference.SIGKDD Expl., 2(2):66–75, 2000.
In our example: 360 patterns ; 18 closed ; 44 generators
20 / 49Characterizing and mining numerical pattern An FCA point of view
N
Elements of answer – Towards condensed representations
A condensed representation
Remarks
4 5 6
[4,5] [5,6]
[4,6]
Compression rate varies between 107 and 109
Interordinal scaling: encodes ' 30.000 binary patterns
not efficient even with best algorithms (e.g. LCMv2)redundancy problem discarding its use for generator extraction
MDL, quantitative association rule mining, k-anonymisation
Need of fault-tolerant condensed representations
21 / 49Characterizing and mining numerical pattern An FCA point of view
N
Elements of answer – Introducing similarity
Outline
1 Introducing Formal Concept Analysis
2 Main research question
3 Elements of answerInterval pattern structuresTowards condensed representationsIntroducing similarityExtracting biclusters of similar valuesTriadic Concept Analysis for biclustering
4 Conclusion and perspectives
22 / 49Characterizing and mining numerical pattern An FCA point of view
N
Elements of answer – Introducing similarity
Introducing a similarity relation
Grouping in a same concept objects having similar values?
A natural similarity relation on numbers
a 'θ b ⇔ |a− b| ≤ θ e.g. 4 '1 5 4 6'1 6
Similarity operator u in pattern structures
4 5 6
[4,5] [5,6]
[4,6]
How to consider a similarity relation w.r.t. a distance?
23 / 49Characterizing and mining numerical pattern An FCA point of view
N
Elements of answer – Introducing similarity
Introducing a similarity relation
Grouping in a same concept objects having similar values?
A natural similarity relation on numbers
a 'θ b ⇔ |a− b| ≤ θ e.g. 4 '1 5 4 6'1 6
Similarity operator u in pattern structures
θ = 2
4 5 6
[4,5] [5,6]
[4,6]
How to consider a similarity relation w.r.t. a distance?
23 / 49Characterizing and mining numerical pattern An FCA point of view
N
Elements of answer – Introducing similarity
Introducing a similarity relation
Grouping in a same concept objects having similar values?
A natural similarity relation on numbers
a 'θ b ⇔ |a− b| ≤ θ e.g. 4 '1 5 4 6'1 6
Similarity operator u in pattern structures
θ = 1
4 5 6
[4,5] [5,6]
[4,6]
How to consider a similarity relation w.r.t. a distance?
23 / 49Characterizing and mining numerical pattern An FCA point of view
N
Elements of answer – Introducing similarity
Introducing a similarity relation
Grouping in a same concept objects having similar values?
A natural similarity relation on numbers
a 'θ b ⇔ |a− b| ≤ θ e.g. 4 '1 5 4 6'1 6
Similarity operator u in pattern structures
θ = 04 5 6
[4,5] [5,6]
[4,6]
How to consider a similarity relation w.r.t. a distance?
23 / 49Characterizing and mining numerical pattern An FCA point of view
N
Elements of answer – Introducing similarity
Towards a similarity between values
Introduce an element ∗ ∈ (D,u) denoting dissimilarity
c u d = ∗ iff c 6'θ dc u d 6= ∗ iff c 'θ d
Example with θ = 1m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
{g3, g4}� = 〈[4, 4], [8, 9], ∗〉〈[4, 4], [8, 9], ∗〉� = {g3, g4}
({g3, g4}, 〈[4, 4], [8, 9], ∗〉) is a pattern concept:g3 and g4 have similar values for attributes m1 and m2 only
24 / 49Characterizing and mining numerical pattern An FCA point of view
N
Elements of answer – Introducing similarity
Towards a similarity between values
Introduce an element ∗ ∈ (D,u) denoting dissimilarity
c u d = ∗ iff c 6'θ dc u d 6= ∗ iff c 'θ d
Example with θ = 1m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
{g3, g4}� = 〈[4, 4], [8, 9], ∗〉〈[4, 4], [8, 9], ∗〉� = {g3, g4}
({g3, g4}, 〈[4, 4], [8, 9], ∗〉) is a pattern concept:g3 and g4 have similar values for attributes m1 and m2 only
Is {g3, g4} maximal w.r.t. similarity? We can add g5...
24 / 49Characterizing and mining numerical pattern An FCA point of view
N
Elements of answer – Introducing similarity
Classes of tolerance in numerical data
Towards maximal sets of similar values
'θ a tolerance relation : reflexive, symmetric, not transitive
Consider an attribute taking values in {6, 8, 11, 16, 17} and θ = 5
8 '5 11, 11 '5 16 but 8 6'5 16
A class of tolerance as a maximal set of pairwise similar values
{6, 8, 11} {11, 16} {16, 17}[6, 11] [11, 16] [16, 17]
S. O. KuznetsovGalois Connections in Data Analysis: Contributions from the Soviet Era and Modern Russian Research.In Formal Concept Analysis, Foundations and Applications, 2005.
25 / 49Characterizing and mining numerical pattern An FCA point of view
N
Elements of answer – Introducing similarity
Tolerance in pattern structures
Projecting the pattern structure
Each value is replaced by the interval characterizing its class oftolerance (if unique)
Each pattern d is projected with a mapping ψ(d) v d(pre-processing)
rod
Example with θ = 1m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
{g3, g4}� = ψ(〈[4, 4], [8, 9], ∗〉)= 〈[4, 5], [8, 9], ∗〉
〈[4, 5], [8, 9], ∗〉� = {g3, g4, g5}
26 / 49Characterizing and mining numerical pattern An FCA point of view
N
Elements of answer – Introducing similarity
Similarity and scaling
m1 m2 m3
g1 6 0 [1, 2]g2 8 4 [2, 5]g3 11 8 [4, 5]g4 16 8 [6, 9]g5 17 12 [7, 10]
'5 6 8 11 16 176 × × ×8 × × ×11 × × × ×16 × × ×17 × ×
(m1,1
1)
(m1,1
6)
(m1,[
6,1
1])
(m1,[
11,1
6])
(m1,[
16,1
7])
(m2,4
)
(m2,8
)
(m2,[
0,4
])
(m2,[
4,8
])
(m2,[
8,1
2])
(m3,[
1,5
])
(m3,[
4,9
])
(m3,[
6,1
0])
(m3,[
4,5
])
(m3,[
6,9
])
g1 × × ×g2 × × × × ×g3 × × × × × × × × ×g4 × × × × × × × × ×g5 × × ×
27 / 49Characterizing and mining numerical pattern An FCA point of view
N
Elements of answer – Extracting biclusters of similar values
Outline
1 Introducing Formal Concept Analysis
2 Main research question
3 Elements of answerInterval pattern structuresTowards condensed representationsIntroducing similarityExtracting biclusters of similar valuesTriadic Concept Analysis for biclustering
4 Conclusion and perspectives
28 / 49Characterizing and mining numerical pattern An FCA point of view
N
Elements of answer – Extracting biclusters of similar values
Another type of biclusterGoing back to similarity relation
w1 'θ w2 ⇐⇒ |w1 − w2| ≤ θ with θ ∈ R,w1,w2 ∈W
Bicluster of similar values
A bicluster (A,B) is a bicluster of similar values if
mi (gj) 'θ mk(gl), ∀gj , gl ∈ A, ∀mi ,mk ∈ B
m1 m2 m3 m4 m5
g1 1 2 2 1 6g2 2 1 1 0 6g3 2 2 1 7 6g4 8 9 2 6 7
θ = 1
and maximal if no object/attribute can be added
J. Besson, C. Robardet, L. De Raedt, J.-F. BoulicautMining Bi-sets in Numerical Data.In KDID 2006: 11-23.
29 / 49Characterizing and mining numerical pattern An FCA point of view
N
Elements of answer – Extracting biclusters of similar values
Can we use the interval pattern lattice?
Concept example ({g2, g3}, 〈[2, 2], [1, 2], [1, 1], [0, 7], [6, 6]〉)
m1 m2 m3 m4 m5
g1 1 2 2 1 6g2 2 1 1 0 6g3 2 2 1 7 6g4 8 9 2 6 7
θ = 1
3 statements to verify
Some intervals have a “size” larger than θ
Some values in two different columns may not be similar
Rectangle may not be maximal
M. Kaytoue, S. O. Kuznetsov, and A. NapoliBiclustering Numerical Data in Formal Concept Analysis.In International Conference on Formal Concept Analysis (ICFCA), 2011.
30 / 49Characterizing and mining numerical pattern An FCA point of view
N
Elements of answer – Extracting biclusters of similar values
First statement
Avoiding intervals with size larger than θ
[a1, b1]u [a2, b2] =
{[min(a1, a2),max(b1, b2)] if|max(b1, b2)−min(a1, a2)| ≤ θ∗ otherwise
Going back to our example, with θ = 1
({g2, g3}, 〈[2, 2], [1, 2], [1, 1], ∗, [6, 6]〉)m1 m2 m3 m4 m5
g1 1 2 2 1 6g2 2 1 1 0 6g3 2 2 1 7 6g4 8 9 2 6 7
31 / 49Characterizing and mining numerical pattern An FCA point of view
N
Elements of answer – Extracting biclusters of similar values
Second statement
Values from two columns should be similar
From({g2, g3}, 〈[2, 2], [1, 2], [1, 1], ∗, [6, 6]〉)
we group attributes such as their values form a class of tolerance:
m1 m2 m3 m4 m5
g1 1 2 2 1 6g2 2 1 1 0 6g3 2 2 1 7 6g4 8 9 2 6 7
m1 m2 m3 m4 m5
g1 1 2 2 1 6g2 2 1 1 0 6g3 2 2 1 7 6g4 8 9 2 6 7
({g2, g3}, {m1,m2,m3}) ({g2, g3}, {m5})
32 / 49Characterizing and mining numerical pattern An FCA point of view
N
Elements of answer – Extracting biclusters of similar values
Third statementMaximal bicluster of similar values
⊥
({g1}, 〈1, 2, 2, 1,6〉) ({g2}, 〈2,1,1, 0,6〉) ({g3}, 〈 2,2,1,7,6〉) ({g4}, 〈8, 9,2,6,7 〉)
({g1, g2},
〈[1,2],[1,2],[1,2], [0, 1],6〉)
({g1, g3},
〈[1,2],2,[1,2], ∗,6〉)
({g2, g3},
〈 2,[1,2],1, ∗,6 〉
({g3, g4},
〈∗, ∗, [1,2], [6, 7], [6, 7]〉)
({g1, g2, g3},
〈[1, 2], [1, 2], [1, 2], ∗,6 〉)
({g1, g2, g3, g4},
〈∗, ∗, [1, 2], ∗, [6, 7]〉)
Constructing maximal biclusters: bottom-up/top-down
33 / 49Characterizing and mining numerical pattern An FCA point of view
N
Elements of answer – Triadic Concept Analysis for biclustering
Outline
1 Introducing Formal Concept Analysis
2 Main research question
3 Elements of answerInterval pattern structuresTowards condensed representationsIntroducing similarityExtracting biclusters of similar valuesTriadic Concept Analysis for biclustering
4 Conclusion and perspectives
34 / 49Characterizing and mining numerical pattern An FCA point of view
N
Elements of answer – Triadic Concept Analysis for biclustering
Triadic Concept Analysis“Extension” of FCA to ternary relation
An object has an attribute for a given condition
Triadic context (G ,M,B,Y )
Several derivation operators allowing to characterize “triadicconcepts” as maximal cubes of ×
b1 b2 b3
m1 m2 m3
g1 ×g2 × ×g3 × ×g4 × ×g5 × ×
m1 m2 m3
g1 × × ×g2 × ×g3 × × ×g4 × ×g5 × ×
m1 m2 m3
g1 × ×g2 ×g3 × × ×g4 × ×g5 × × ×
({g3, g4, g5}, {m2,m3}, {b1, b2, b3}) is a triadic concept
F. Lehmann and R. Wille.A Triadic Approach to Formal Concept Analysis.In International Conference on Conceptual Structures (ICCS), 1995.
35 / 49Characterizing and mining numerical pattern An FCA point of view
N
Elements of answer – Triadic Concept Analysis for biclustering
Basic idea
Principle
Start from a numerical dataset (G ,M,W , I )
Build a triadic context (G ,M,B,Y ) with same objects, sameattributes, and discretized dimension
Extract triadic concepts
Interordinal scaling
B and all its intersections characterize any interval over W
We show interesting links between biclusters of similarvalues and triadic concepts
M. Kaytoue, S. O. Kuznetsov, J. Macko, A. Napoli and W. Meira Jr.Mining biclusters of similar values with triadic concept analysis.In International Conference Concept Lattices and their Applications (CLA), 2011.
36 / 49Characterizing and mining numerical pattern An FCA point of view
N
Elements of answer – Triadic Concept Analysis for biclustering
Discretization method
Interodinal scaling (existing discretization scale)
Let (G ,M,W , I ) be a numerical dataset (with W the set ofdata-values.
Now consider the setT = {[min(W ),w ],∀w ∈W } ∪ {[w ,max(W )],∀w ∈W }.
Known fact: T and all its intersections characterize any intervalof values on W .
Example
With W = {0, 1, 2, 6, 7, 8, 9}, one has
T = {[0, 0], [0, 1], [0, 2], ..., [0, 9], [1, 9], [2, 9], ..., [9, 9]}
and for example [0, 8] ∩ [2, 9] = [2, 8]
37 / 49Characterizing and mining numerical pattern An FCA point of view
N
Elements of answer – Triadic Concept Analysis for biclustering
Building a triadic contextTransformation procedure
From a numerical dataset (G ,M,W , I ), build a triadic context(G ,M,T ,Y ) such as (g ,m, t) ∈ Y ⇐⇒ m(g) ∈ t
38 / 49Characterizing and mining numerical pattern An FCA point of view
N
Elements of answer – Triadic Concept Analysis for biclustering
First contributionWe proved that there is a 1-1-correspondence between
(i) Triadic concepts of the resulting triadic context(ii) Biclusters of similar values maximal for some θ ≥ 0
Interesting facts
Efficient algorithm for concept extraction (Data-Peeler,handling several constraints)
L. Cerf, J. Besson, C. Robardet, J.-F. BoulicautClosed patterns meet n-ary relations.In TKDD 3(1): (2009).
Top-k biclusters: Concept (A,B,C ) with high |A|, |B|, and |C |corresponds to bicluster (A,B) as a large rectangle of closevalues (by properties of interordinal scale)
This formalization allows us to design a new algorithm toextract maximal biclusters for a given parameter θ
39 / 49Characterizing and mining numerical pattern An FCA point of view
N
Elements of answer – Triadic Concept Analysis for biclustering
Triadic diagram
Quasi-order .i and equivalence relation ∼i fori = 1, 2, 3
(A1,A2,A3) .i (B1,B2,B3) ⇐⇒ Ai ⊆ Bi
(A1,A2,A3) ∼i (B1,B2,B3) ⇐⇒ Ai = Bi
Anti-ordinal dependencies
With (A1,A2,A3) .i (B1,B2,B3)and (A1,A2,A3) .j (B1,B2,B3)then (A1,A2,A3) &k (B1,B2,B3)
A concept is uniquely determined by two of its components
40 / 49Characterizing and mining numerical pattern An FCA point of view
N
Elements of answer – Triadic Concept Analysis for biclustering
Triadic diagram
Equivalence and factor sets, i = 1, 2, 3
[(A1,A2,A3)]i is the equivalence class of concepts w.r.t. ∼i
.i induces an order ≤i on the factor set I(K)/ ∼i s.t.
[(A1,A2,A3)]i ≤ [(B1,B2,B3)]i ⇐⇒ Ai ⊆ Bi
(I(K)/ ∼i ,≤i ) is the ordered set of all extents(i=1)/intents(i=2)/modus(i=3) of K
41 / 49Characterizing and mining numerical pattern An FCA point of view
N
Elements of answer – Triadic Concept Analysis for biclustering
Triadic diagrams
Triadic diagram I(K)
Geometric structure: (I(K),∼1,∼2,∼3)
Ordered structures: (I(K)/ ∼i ,≤i )
Three systems of parallel lines, one for each ∼i , in which classesof equivalence meet at most in one element: A triangular pattern
42 / 49Characterizing and mining numerical pattern An FCA point of view
N
Elements of answer – Triadic Concept Analysis for biclustering
Triadic diagrams
43 / 49Characterizing and mining numerical pattern An FCA point of view
N
Elements of answer – Triadic Concept Analysis for biclustering
Triadic diagrams
Such representation is not always possible...
the tetrahedron case:
a = (A, y ,C)b = (A,B, z)c = (x ,B,C)d = (x , y , z)
The ”Thomsen condition” is violated (?)
Ongoing work
Prove that in our case, such representation is possible
Alternative vizualisation, naviguation
44 / 49Characterizing and mining numerical pattern An FCA point of view
N
Elements of answer – Triadic Concept Analysis for biclustering
Second contribution
Compute all max. biclusters for a given θ
Use another (but similar) discretization procedure to build thetriadic context based on tolerance blocks
Standard algorithms output biclusters of similar values but notnecessarily maximal
We design a new algorithm TriMax for that task
TriMax is flexible, uses standard FCA algorithms in itscore, seems better than its competitors, can be extended
to n-ary relations and distributed.
45 / 49Characterizing and mining numerical pattern An FCA point of view
N
Elements of answer – Triadic Concept Analysis for biclustering
New transformation procedure
Tolerance blocks based scaling
Compute the set C of all blocks of tolerance over W
From the numerical dataset (G ,M,W , I ), build the triadiccontext (G ,M,C ,Z ) such that (g ,m, c) ∈ Z ⇐⇒ m(g) ∈ c
Actually, we remove “useless information”
θ = 1
46 / 49Characterizing and mining numerical pattern An FCA point of view
N
Elements of answer – Triadic Concept Analysis for biclustering
Second contribution
Algorithm TriMax
Any triadic concept corresponds to a bicluster of similar values,but not necessarily maximal!
It lead us to the algorithm TriMax that:
Process each formal context (one for each block of tolerance)with any existing FCA algorithmAny resulting concept is a maximal bicluster candidate andEach context can be processed separately
TriMax allows a complete, correct and non redundantextraction of all maximal biclusters of similar values for auser defined similarity parameter θ
47 / 49Characterizing and mining numerical pattern An FCA point of view
N
1 Introducing Formal Concept Analysis
2 Main research question
3 Elements of answerInterval pattern structuresTowards condensed representationsIntroducing similarityExtracting biclusters of similar valuesTriadic Concept Analysis for biclustering
4 Conclusion and perspectives
Conclusion and perspectives
ConclusionA new insight for the mining numerical data
Our main tools...
Formal Concept Analysis and conceptual scaling
Pattern structures and projections
Tolerance relation
Triadic Concept Analysis
... to deal with numerical data
Conceptual representations of numerical data
Bi-clustering
Information fusion
Applications: GED analysis and agricultural practice assessment
49 / 49Characterizing and mining numerical pattern An FCA point of view
N