48
Dempster-Shafer theory Application to clustering Dempster-Shafer theory Introduction, connections with rough sets and application to clustering Thierry Denœux 1 1 Université de Technologie de Compiègne HEUDIASYC (UMR CNRS 6599) http://www.hds.utc.fr/˜tdenoeux RSKT 2014 Shanghai, China October 25, 2014 Thierry Denœux Dempster-Shafer theory. Application to clustering 1/ 48

Dempster-Shafer theory - Introduction, connections with ...tdenoeux/dokuwiki/_media/en/rskt2014.pdf · Brief introduction or reminder on DS theory, emphasizing some connections with

  • Upload
    others

  • View
    23

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Dempster-Shafer theory - Introduction, connections with ...tdenoeux/dokuwiki/_media/en/rskt2014.pdf · Brief introduction or reminder on DS theory, emphasizing some connections with

Dempster-Shafer theoryApplication to clustering

Dempster-Shafer theoryIntroduction, connections with rough sets and application to

clustering

Thierry Denœux1

1Université de Technologie de CompiègneHEUDIASYC (UMR CNRS 6599)

http://www.hds.utc.fr/˜tdenoeux

RSKT 2014Shanghai, ChinaOctober 25, 2014

Thierry Denœux Dempster-Shafer theory. Application to clustering 1/ 48

Page 2: Dempster-Shafer theory - Introduction, connections with ...tdenoeux/dokuwiki/_media/en/rskt2014.pdf · Brief introduction or reminder on DS theory, emphasizing some connections with

Dempster-Shafer theoryApplication to clustering

Theories of uncertainty

DS#theory#

Fuzzy#sets#&#Possibility#theory#

Imprecise##probability#

Rough#sets#

Probability##theory#

Thierry Denœux Dempster-Shafer theory. Application to clustering 2/ 48

Page 3: Dempster-Shafer theory - Introduction, connections with ...tdenoeux/dokuwiki/_media/en/rskt2014.pdf · Brief introduction or reminder on DS theory, emphasizing some connections with

Dempster-Shafer theoryApplication to clustering

Focus of this talk

Dempster-Shafer (DS) theory (evidence theory, theory of belieffunctions):

A formal framework for reasoning with partial (uncertain, imprecise)information.Has been applied to statistical inference, expert systems,information fusion, classification, clustering, etc.

Purpose of these talk:Brief introduction or reminder on DS theory, emphasizing someconnections with rough sets;Review the application of belief functions to clustering, showingsome connections with fuzzy and rough approaches.

Thierry Denœux Dempster-Shafer theory. Application to clustering 3/ 48

Page 4: Dempster-Shafer theory - Introduction, connections with ...tdenoeux/dokuwiki/_media/en/rskt2014.pdf · Brief introduction or reminder on DS theory, emphasizing some connections with

Dempster-Shafer theoryApplication to clustering

Outline

1 Dempster-Shafer theoryMass functionBelief and plausibility functionsConnection with rough sets

2 Application to clusteringEvidential partitionEvidential c-means

Thierry Denœux Dempster-Shafer theory. Application to clustering 4/ 48

Page 5: Dempster-Shafer theory - Introduction, connections with ...tdenoeux/dokuwiki/_media/en/rskt2014.pdf · Brief introduction or reminder on DS theory, emphasizing some connections with

Dempster-Shafer theoryApplication to clustering

Mass functionBelief and plausibility functionsConnection with rough sets

Outline

1 Dempster-Shafer theoryMass functionBelief and plausibility functionsConnection with rough sets

2 Application to clusteringEvidential partitionEvidential c-means

Thierry Denœux Dempster-Shafer theory. Application to clustering 5/ 48

Page 6: Dempster-Shafer theory - Introduction, connections with ...tdenoeux/dokuwiki/_media/en/rskt2014.pdf · Brief introduction or reminder on DS theory, emphasizing some connections with

Dempster-Shafer theoryApplication to clustering

Mass functionBelief and plausibility functionsConnection with rough sets

Mass function

Let Ω be a finite set called a frame of discernment.A mass function is a function m : 2Ω → [0,1] such that∑

A⊆Ω

m(A) = 1.

The subsets A of Ω such that m(A) 6= 0 are called the focal setsof Ω.If m(∅) = 0, m is said to be normalized (usually assumed).

Thierry Denœux Dempster-Shafer theory. Application to clustering 6/ 48

Page 7: Dempster-Shafer theory - Introduction, connections with ...tdenoeux/dokuwiki/_media/en/rskt2014.pdf · Brief introduction or reminder on DS theory, emphasizing some connections with

Dempster-Shafer theoryApplication to clustering

Mass functionBelief and plausibility functionsConnection with rough sets

Source

A mass function is usually induced by a source, defined a 4-tuple(S,2S,P, Γ), where

S is a finite set;P is a probability measure on (S, 2S);Γ is a multi-valued-mapping from S to 2Ω.

(S,$2S,P)$ Ω"

Γ$

s$Γ(s)$

Γ carries P from S to 2Ω: for all A ⊆ Ω,

m(A) = P(s ∈ S|Γ(s) = A).

Thierry Denœux Dempster-Shafer theory. Application to clustering 7/ 48

Page 8: Dempster-Shafer theory - Introduction, connections with ...tdenoeux/dokuwiki/_media/en/rskt2014.pdf · Brief introduction or reminder on DS theory, emphasizing some connections with

Dempster-Shafer theoryApplication to clustering

Mass functionBelief and plausibility functionsConnection with rough sets

Interpretation

(S,$2S,P)$ Ω"

Γ$

s$Γ(s)$

Ω is a set of possible states of the world, about which we collectsome evidence. Let ω be the true state.S is a set of interpretations of the evidence.If s ∈ S holds, we know that ω belongs to the subset Γ(s) of Ω,and nothing more.m(A) is then the probability of knowing only that ω ∈ A.In particular, m(Ω) is the probability of knowing nothing.

Thierry Denœux Dempster-Shafer theory. Application to clustering 8/ 48

Page 9: Dempster-Shafer theory - Introduction, connections with ...tdenoeux/dokuwiki/_media/en/rskt2014.pdf · Brief introduction or reminder on DS theory, emphasizing some connections with

Dempster-Shafer theoryApplication to clustering

Mass functionBelief and plausibility functionsConnection with rough sets

Example

A murder has been committed. There are three suspects:Ω = Peter, John,Mary.A witness saw the murderer going away, but he is short-sightedand he only saw that it was a man. We know that the witness isdrunk 20 % of the time.

(S,$2S,P)$ Ω"Γ$drunk$(0.2)$

not$drunk$(0.8)$Peter$

John$

Mary$

We have Γ(¬drunk) = Peter, John and Γ(drunk) = Ω, hence

m(Peter, John) = 0.8, m(Ω) = 0.2

Thierry Denœux Dempster-Shafer theory. Application to clustering 9/ 48

Page 10: Dempster-Shafer theory - Introduction, connections with ...tdenoeux/dokuwiki/_media/en/rskt2014.pdf · Brief introduction or reminder on DS theory, emphasizing some connections with

Dempster-Shafer theoryApplication to clustering

Mass functionBelief and plausibility functionsConnection with rough sets

Special cases

A mass function m is said to be:logical if it has only one focal set; it is then equivalent to a set.Bayesian if all focal sets are singletons; it is equivalent to aprobability distribution.

A mass function can thus be seen asa generalized set, or asa generalized probability distribution.

Thierry Denœux Dempster-Shafer theory. Application to clustering 10/ 48

Page 11: Dempster-Shafer theory - Introduction, connections with ...tdenoeux/dokuwiki/_media/en/rskt2014.pdf · Brief introduction or reminder on DS theory, emphasizing some connections with

Dempster-Shafer theoryApplication to clustering

Mass functionBelief and plausibility functionsConnection with rough sets

Outline

1 Dempster-Shafer theoryMass functionBelief and plausibility functionsConnection with rough sets

2 Application to clusteringEvidential partitionEvidential c-means

Thierry Denœux Dempster-Shafer theory. Application to clustering 11/ 48

Page 12: Dempster-Shafer theory - Introduction, connections with ...tdenoeux/dokuwiki/_media/en/rskt2014.pdf · Brief introduction or reminder on DS theory, emphasizing some connections with

Dempster-Shafer theoryApplication to clustering

Mass functionBelief and plausibility functionsConnection with rough sets

Belief functionDegrees of support and consistency

Let m be a normalized mass function on Ω induced by a source(S,2S,P, Γ).Let A be a subset of Ω.One may ask:

1 To what extent does the evidence support the proposition ω ∈ A?2 To what extent is the evidence consistent with this proposition?

Ω!A!

B1!

B2!

B3!

B4!

(S,2S,P)! Γ!

s3!

s2!

s1! s4!

Thierry Denœux Dempster-Shafer theory. Application to clustering 12/ 48

Page 13: Dempster-Shafer theory - Introduction, connections with ...tdenoeux/dokuwiki/_media/en/rskt2014.pdf · Brief introduction or reminder on DS theory, emphasizing some connections with

Dempster-Shafer theoryApplication to clustering

Mass functionBelief and plausibility functionsConnection with rough sets

Belief functionDefinition and interpretation

For any A ⊆ Ω, the probability that the evidence implies(supports) the proposition ω ∈ A is

Bel(A) = P(s ∈ S|Γ(s) ⊆ A) =∑B⊆A

m(B).

Ω!A!

B1!

B2!

B3!

B4!

(S,2S,P)! Γ!

s3!

s2!

s1! s4!

The function Bel : A→ Bel(A) is called a belief function.

Thierry Denœux Dempster-Shafer theory. Application to clustering 13/ 48

Page 14: Dempster-Shafer theory - Introduction, connections with ...tdenoeux/dokuwiki/_media/en/rskt2014.pdf · Brief introduction or reminder on DS theory, emphasizing some connections with

Dempster-Shafer theoryApplication to clustering

Mass functionBelief and plausibility functionsConnection with rough sets

Belief functionCharacterization

Function Bel : 2Ω → [0,1] is a completely monotone capacity: itverifies Bel(∅) = 0, Bel(Ω) = 1 and

Bel

(k⋃

i=1

Ai

)≥

∑∅6=I⊆1,...,k

(−1)|I|+1Bel

(⋂i∈I

Ai

).

for any k ≥ 2 and for any family A1, . . . ,Ak in 2Ω.Conversely, to any completely monotone capacity Belcorresponds a unique mass function m such that:

m(A) =∑∅6=B⊆A

(−1)|A|−|B|Bel(B), ∀A ⊆ Ω.

Thierry Denœux Dempster-Shafer theory. Application to clustering 14/ 48

Page 15: Dempster-Shafer theory - Introduction, connections with ...tdenoeux/dokuwiki/_media/en/rskt2014.pdf · Brief introduction or reminder on DS theory, emphasizing some connections with

Dempster-Shafer theoryApplication to clustering

Mass functionBelief and plausibility functionsConnection with rough sets

Plausibility function

The probability that the evidence is consistent with (does notcontradict) the proposition ω ∈ A

Pl(A) = P(s ∈ S|Γ(s) ∩ A 6= ∅) = 1− Bel(A)

Ω!A!

B1!

B2!

B3!

B4!

(S,2S,P)! Γ!

s3!

s2!

s1! s4!

The function Pl : A→ Pl(A) is called a plausibility function.

Thierry Denœux Dempster-Shafer theory. Application to clustering 15/ 48

Page 16: Dempster-Shafer theory - Introduction, connections with ...tdenoeux/dokuwiki/_media/en/rskt2014.pdf · Brief introduction or reminder on DS theory, emphasizing some connections with

Dempster-Shafer theoryApplication to clustering

Mass functionBelief and plausibility functionsConnection with rough sets

Special cases

If m is Bayesian, then Bel = Pl and it is a probability measure.If the focal sets of m are nested (A1 ⊂ A2 ⊂ . . . ⊂ An ), m is saidto be consonant. Pl is then a possibility measure:

Pl(A ∪ B) = max (Pl(A),Pl(B))

for all A,B ⊆ Ω and Bel is the dual necessity measure.DS theory thus subsumes both probability theory and possibilitytheory.

Thierry Denœux Dempster-Shafer theory. Application to clustering 16/ 48

Page 17: Dempster-Shafer theory - Introduction, connections with ...tdenoeux/dokuwiki/_media/en/rskt2014.pdf · Brief introduction or reminder on DS theory, emphasizing some connections with

Dempster-Shafer theoryApplication to clustering

Mass functionBelief and plausibility functionsConnection with rough sets

Summary

A probability measure is precise, in so far as it represents theuncertainty of the proposition ω ∈ A by a single number P(A).In contrast, a mass function is imprecise (it assigns probabilitiesto subsets).As a result, in DS theory, the uncertainty about a subset A isrepresented by two numbers (Bel(A),Pl(A)), withBel(A) ≤ Pl(A).This model is thus reminiscent of rough set theory, in which a setis approximated by lower and upper approximations, due tocoarseness of a knowledge base.

Thierry Denœux Dempster-Shafer theory. Application to clustering 17/ 48

Page 18: Dempster-Shafer theory - Introduction, connections with ...tdenoeux/dokuwiki/_media/en/rskt2014.pdf · Brief introduction or reminder on DS theory, emphasizing some connections with

Dempster-Shafer theoryApplication to clustering

Mass functionBelief and plausibility functionsConnection with rough sets

Outline

1 Dempster-Shafer theoryMass functionBelief and plausibility functionsConnection with rough sets

2 Application to clusteringEvidential partitionEvidential c-means

Thierry Denœux Dempster-Shafer theory. Application to clustering 18/ 48

Page 19: Dempster-Shafer theory - Introduction, connections with ...tdenoeux/dokuwiki/_media/en/rskt2014.pdf · Brief introduction or reminder on DS theory, emphasizing some connections with

Dempster-Shafer theoryApplication to clustering

Mass functionBelief and plausibility functionsConnection with rough sets

Interval rough setsBelief and plausibility functions induced by an interval relation

Let S and Ω be two finite sets and R ⊆ S × Ω. R is called aninterval relation (Yao and Lingras, 1998) if

ΓR(s) = ω ∈ Ω|(s, ω) ∈ R 6= ∅,

for all s ∈ S.Any A ⊆ Ω may be approximated in S by an interval rough setdefined by:

R(A) = s ∈ S|ΓR(s) ⊆ A

R(A) = s ∈ S|ΓR(s) ∩ A 6= ∅

Let P be a probability measure on (S,2S). Then, functions Beland Pl defined, for all A ⊆ Ω, by

Bel(A) = P(R(A)), Pl(A) = P(R(A))

are belief and plausibility functions.

Thierry Denœux Dempster-Shafer theory. Application to clustering 19/ 48

Page 20: Dempster-Shafer theory - Introduction, connections with ...tdenoeux/dokuwiki/_media/en/rskt2014.pdf · Brief introduction or reminder on DS theory, emphasizing some connections with

Dempster-Shafer theoryApplication to clustering

Mass functionBelief and plausibility functionsConnection with rough sets

Interval rough setsEquivalence with belief functions

Conversely, let m be a normalized mass function on a finite setΩ, induced by a source (S,2S,P, Γ). The relation

R = (s, ω) ∈ S × Ω|ω ∈ Γ(s)

is an interval relation, and

Bel(A) = P(R(A)), Pl(A) = P(R(A)), ∀A ⊆ Ω.

Equivalence result

Belief function on Ω = interval relation between S and Ω+ probability measure on (S,2S)

Thierry Denœux Dempster-Shafer theory. Application to clustering 20/ 48

Page 21: Dempster-Shafer theory - Introduction, connections with ...tdenoeux/dokuwiki/_media/en/rskt2014.pdf · Brief introduction or reminder on DS theory, emphasizing some connections with

Dempster-Shafer theoryApplication to clustering

Mass functionBelief and plausibility functionsConnection with rough sets

Rough mass functions

Let Ω be the frame of discernment and let R be an equivalencerelation on Ω defining a partition of Ω.Any A ⊆ Ω may be approximated by a (Pawlak) rough set definedby:

R(A) = ω ∈ Ω|[ω]R ⊆ A

R(A) = ω ∈ Ω|[ω]R ∩ A 6= ∅Given a mass function m with focal sets A1, . . . ,An, we candefine:

Its lower approximation m with focal sets R(A1), . . . ,R(An);Its upper approximation m with focal sets R(A1), . . . ,R(An).

The pair (m,m) may be called a rough mass function. Thisnotion extends that of rough set.Remark: these notions was introduced by Shafer (1976) with adifferent terminology, before the introduction of rough sets!

Thierry Denœux Dempster-Shafer theory. Application to clustering 21/ 48

Page 22: Dempster-Shafer theory - Introduction, connections with ...tdenoeux/dokuwiki/_media/en/rskt2014.pdf · Brief introduction or reminder on DS theory, emphasizing some connections with

Dempster-Shafer theoryApplication to clustering

Evidential partitionEvidential c-means

Outline

1 Dempster-Shafer theoryMass functionBelief and plausibility functionsConnection with rough sets

2 Application to clusteringEvidential partitionEvidential c-means

Thierry Denœux Dempster-Shafer theory. Application to clustering 22/ 48

Page 23: Dempster-Shafer theory - Introduction, connections with ...tdenoeux/dokuwiki/_media/en/rskt2014.pdf · Brief introduction or reminder on DS theory, emphasizing some connections with

Dempster-Shafer theoryApplication to clustering

Evidential partitionEvidential c-means

Clustering

n objects described byAttribute vectors x1, . . . , xn (attributedata) orDissimilarities (proximity data).

Goal: find a meaningful structure in thedata set, usually a partition into c crispor fuzzy subsets.Belief functions may allow us to expressricher information about the datastructure.

Thierry Denœux Dempster-Shafer theory. Application to clustering 23/ 48

Page 24: Dempster-Shafer theory - Introduction, connections with ...tdenoeux/dokuwiki/_media/en/rskt2014.pdf · Brief introduction or reminder on DS theory, emphasizing some connections with

Dempster-Shafer theoryApplication to clustering

Evidential partitionEvidential c-means

Different clustering concepts

Hard clustering: each object belongs to one and only one group.Group membership is expressed by binary variables uik such thatuik = 1 if object i belongs to group k and uik = 0 otherwise.Fuzzy clustering: each object has a degree of membershipuik ∈ [0,1] to each group, with

∑ck=1 uik = 1.

Possibilistic clustering: the condition∑c

k=1 uik = 1 is relaxed.Each number uik can be interpreted as a degree of possibilitythat object i belongs to cluster k .Rough clustering: the membership of object i to cluster k isdescribed by a pair (uik ,uik ) ∈ 0,12 indicating its membershipto the lower and upper approximations of cluster k .

Thierry Denœux Dempster-Shafer theory. Application to clustering 24/ 48

Page 25: Dempster-Shafer theory - Introduction, connections with ...tdenoeux/dokuwiki/_media/en/rskt2014.pdf · Brief introduction or reminder on DS theory, emphasizing some connections with

Dempster-Shafer theoryApplication to clustering

Evidential partitionEvidential c-means

Evidential clustering

In Evidential clustering, the group membership of each object isdescribed by a (not necessarily normalized) mass function miover Ω.Example:

-5 0 5 10-2

0

2

4

6

8

10

1

2

3

4

5 6 7

8

9

10

11

12

Evidential partition∅ ω1 ω2 ω1, ω2

m3 0 1 0 0m5 0 0.5 0 0.5m6 0 0 0 1m12 0.9 0 0.1 0

Thierry Denœux Dempster-Shafer theory. Application to clustering 25/ 48

Page 26: Dempster-Shafer theory - Introduction, connections with ...tdenoeux/dokuwiki/_media/en/rskt2014.pdf · Brief introduction or reminder on DS theory, emphasizing some connections with

Dempster-Shafer theoryApplication to clustering

Evidential partitionEvidential c-means

Relationship with other clustering structures

Hard par''on

Fuzzy par''on Possibilis'c par''on Rough par''on

Eviden'al par''on

mi certain

mi Bayesian mi consonant mi logical

mi general More general

Less general

Thierry Denœux Dempster-Shafer theory. Application to clustering 26/ 48

Page 27: Dempster-Shafer theory - Introduction, connections with ...tdenoeux/dokuwiki/_media/en/rskt2014.pdf · Brief introduction or reminder on DS theory, emphasizing some connections with

Dempster-Shafer theoryApplication to clustering

Evidential partitionEvidential c-means

Rough clustering as a special case

m(ω1)=1( m(ω1, ω2)=1( m(ω2)=1(

Lower(approxima4ons(

Upper(approxima4ons(

ω1L( ω2

L( ω2U(ω1

U(

Thierry Denœux Dempster-Shafer theory. Application to clustering 27/ 48

Page 28: Dempster-Shafer theory - Introduction, connections with ...tdenoeux/dokuwiki/_media/en/rskt2014.pdf · Brief introduction or reminder on DS theory, emphasizing some connections with

Dempster-Shafer theoryApplication to clustering

Evidential partitionEvidential c-means

From evidential to hard/fuzzy/possibilistic clustering

Let (m1, . . . ,mn) be an evidential partition.Induced hard partition:

uik =

1 if Pli (ωk) = max` Pli (ω`)0 otherwise.

Induced fuzzy partition:

uik =Pli (ωk)∑` Pli (ω`)

Induced possibilistic partition:

uik = Pli (ωk)

Thierry Denœux Dempster-Shafer theory. Application to clustering 28/ 48

Page 29: Dempster-Shafer theory - Introduction, connections with ...tdenoeux/dokuwiki/_media/en/rskt2014.pdf · Brief introduction or reminder on DS theory, emphasizing some connections with

Dempster-Shafer theoryApplication to clustering

Evidential partitionEvidential c-means

From evidential to rough clustering

Let (m1, . . . ,mn) be an evidential partition.For each i , let Ai ⊆ Ω such that

mi (Ai ) = maxA⊆Ω

mi (A).

Lower approximations:

uik =

1 if Ai = ωk0 otherwise.

Upper approximations:

uik =

1 if ωk ∈ Ai

0 otherwise.

Thierry Denœux Dempster-Shafer theory. Application to clustering 29/ 48

Page 30: Dempster-Shafer theory - Introduction, connections with ...tdenoeux/dokuwiki/_media/en/rskt2014.pdf · Brief introduction or reminder on DS theory, emphasizing some connections with

Dempster-Shafer theoryApplication to clustering

Evidential partitionEvidential c-means

Algorithms

EVCLUS (Denoeux and Masson, 2004):Proximity (possibly non metric) data,Multidimensional scaling approach.

Evidential c-means (ECM): (Masson and Denoeux, 2008):Attribute data,HCM, FCM family (alternate optimization of a cost function).

Relational Evidential c-means (RECM): (Masson and Denoeux,2009): ECM for proximity data.Constrained Evidential c-means (CECM) (Antoine et al., 2011):ECM with pairwise constraints.Constrained EVCLUS (CEVCLUS) (Antoine et al., 2014):EVCLUS with pairwise constraints.

Thierry Denœux Dempster-Shafer theory. Application to clustering 30/ 48

Page 31: Dempster-Shafer theory - Introduction, connections with ...tdenoeux/dokuwiki/_media/en/rskt2014.pdf · Brief introduction or reminder on DS theory, emphasizing some connections with

Dempster-Shafer theoryApplication to clustering

Evidential partitionEvidential c-means

Outline

1 Dempster-Shafer theoryMass functionBelief and plausibility functionsConnection with rough sets

2 Application to clusteringEvidential partitionEvidential c-means

Thierry Denœux Dempster-Shafer theory. Application to clustering 31/ 48

Page 32: Dempster-Shafer theory - Introduction, connections with ...tdenoeux/dokuwiki/_media/en/rskt2014.pdf · Brief introduction or reminder on DS theory, emphasizing some connections with

Dempster-Shafer theoryApplication to clustering

Evidential partitionEvidential c-means

Principle

Problem: generate an evidential partition M = (m1, . . . ,mn) fromattribute data X = (x1, ...,xn), xi ∈ Rp.Generalization of hard and fuzzy c-means algorithms:

Each class represented by a prototype;Alternate optimization of a cost function with respect to theprototypes and to the evidential partition.

Thierry Denœux Dempster-Shafer theory. Application to clustering 32/ 48

Page 33: Dempster-Shafer theory - Introduction, connections with ...tdenoeux/dokuwiki/_media/en/rskt2014.pdf · Brief introduction or reminder on DS theory, emphasizing some connections with

Dempster-Shafer theoryApplication to clustering

Evidential partitionEvidential c-means

Fuzzy c-means (FCM)

Minimize

JFCM(U,V ) =n∑

i=1

c∑k=1

uβik d2ik

with dik = ||xi − vk || under the constraints∑

k uik = 1, ∀i .Alternate optimization algorithm:

vk =

∑ni=1 uβik xi∑n

i=1 uβik∀k = 1, . . . , c,

uik =d−2/(β−1)

ik∑c`=1 d−2/(β−1)

i`

.

Thierry Denœux Dempster-Shafer theory. Application to clustering 33/ 48

Page 34: Dempster-Shafer theory - Introduction, connections with ...tdenoeux/dokuwiki/_media/en/rskt2014.pdf · Brief introduction or reminder on DS theory, emphasizing some connections with

Dempster-Shafer theoryApplication to clustering

Evidential partitionEvidential c-means

ECM algorithmPrinciple

v1

v2

v3

v1

v2

v3

v4

Each class ωk represented by a prototype vk .Each non empty set of classes Aj representedby a prototype vj defined as the center of massof the vk for all ωk ∈ Aj .Basic ideas:

For each non empty Aj ∈ Ω, mij = mi (Aj )should be high if xi is close to vj .The distance to the empty set is defined as afixed value δ.

Thierry Denœux Dempster-Shafer theory. Application to clustering 34/ 48

Page 35: Dempster-Shafer theory - Introduction, connections with ...tdenoeux/dokuwiki/_media/en/rskt2014.pdf · Brief introduction or reminder on DS theory, emphasizing some connections with

Dempster-Shafer theoryApplication to clustering

Evidential partitionEvidential c-means

ECM algorithmObjective criterion

Criterion to be minimized:

JECM(M,V ) =n∑

i=1

∑j/Aj 6=∅,Aj⊆Ω

|Aj |αmβij d2

ij +n∑

i=1

δ2mβi∅,

Parameters:α controls the specificity of mass functions;β controls the hardness of the evidential partition;δ controls the amount of data considered as outliers.

JECM(M,V ) can be iteratively minimized with respect to M and Vusing an alternate optimization scheme.

Thierry Denœux Dempster-Shafer theory. Application to clustering 35/ 48

Page 36: Dempster-Shafer theory - Introduction, connections with ...tdenoeux/dokuwiki/_media/en/rskt2014.pdf · Brief introduction or reminder on DS theory, emphasizing some connections with

Dempster-Shafer theoryApplication to clustering

Evidential partitionEvidential c-means

Butterfly dataset

−6 −4 −2 0 2 4 6 8 10−2

0

2

4

6

8

10

1

2

3

4

5 6 7

8

9

10

11

12

x1

x 2

0 2 4 6 8 10 120

0.2

0.4

0.6

0.8

1

m(2)m(

1)

m()m(emptyset)

point number

bbas

Thierry Denœux Dempster-Shafer theory. Application to clustering 36/ 48

Page 37: Dempster-Shafer theory - Introduction, connections with ...tdenoeux/dokuwiki/_media/en/rskt2014.pdf · Brief introduction or reminder on DS theory, emphasizing some connections with

Dempster-Shafer theoryApplication to clustering

Evidential partitionEvidential c-means

4-class data set

−8 −6 −4 −2 0 2 4 6 8 10−8

−6

−4

−2

0

2

4

6

8

10

x1

x 2

Thierry Denœux Dempster-Shafer theory. Application to clustering 37/ 48

Page 38: Dempster-Shafer theory - Introduction, connections with ...tdenoeux/dokuwiki/_media/en/rskt2014.pdf · Brief introduction or reminder on DS theory, emphasizing some connections with

Dempster-Shafer theoryApplication to clustering

Evidential partitionEvidential c-means

4-class data setHard evidential partition

−8 −6 −4 −2 0 2 4 6 8 10−8

−6

−4

−2

0

2

4

6

8

10

1

2

3

4

12

23

34

14

124

123

234

134

13

24

Thierry Denœux Dempster-Shafer theory. Application to clustering 38/ 48

Page 39: Dempster-Shafer theory - Introduction, connections with ...tdenoeux/dokuwiki/_media/en/rskt2014.pdf · Brief introduction or reminder on DS theory, emphasizing some connections with

Dempster-Shafer theoryApplication to clustering

Evidential partitionEvidential c-means

4-class data setLower approximations

−8 −6 −4 −2 0 2 4 6 8 10−8

−6

−4

−2

0

2

4

6

8

10

1L

2L

3L

4L

Thierry Denœux Dempster-Shafer theory. Application to clustering 39/ 48

Page 40: Dempster-Shafer theory - Introduction, connections with ...tdenoeux/dokuwiki/_media/en/rskt2014.pdf · Brief introduction or reminder on DS theory, emphasizing some connections with

Dempster-Shafer theoryApplication to clustering

Evidential partitionEvidential c-means

4-class data setUpper approximations

−8 −6 −4 −2 0 2 4 6 8 10−8

−6

−4

−2

0

2

4

6

8

10

1U

2U

3U

4U

Thierry Denœux Dempster-Shafer theory. Application to clustering 40/ 48

Page 41: Dempster-Shafer theory - Introduction, connections with ...tdenoeux/dokuwiki/_media/en/rskt2014.pdf · Brief introduction or reminder on DS theory, emphasizing some connections with

Dempster-Shafer theoryApplication to clustering

Evidential partitionEvidential c-means

Brain dataProblem

(a) (b)

Magnetic resonance imaging of pathological brain, 2 sets ofparameters.Three regions: normal tissue (Norm), ventricles + cerebrospinalfluid (CSF/V) and pathology (Path).Image 1 highlights CSF/V (dark), image 2 highlights pathology(bright).

Thierry Denœux Dempster-Shafer theory. Application to clustering 41/ 48

Page 42: Dempster-Shafer theory - Introduction, connections with ...tdenoeux/dokuwiki/_media/en/rskt2014.pdf · Brief introduction or reminder on DS theory, emphasizing some connections with

Dempster-Shafer theoryApplication to clustering

Evidential partitionEvidential c-means

Brain dataResults in grey level space

0 20 40 60 80 100 120 140 160 180 2000

50

100

150

Grey levels in image 1

Gre

y le

vels

in im

age

2

ω3

ω12

ω123

ω23

ω13

ω1

ω2

Thierry Denœux Dempster-Shafer theory. Application to clustering 42/ 48

Page 43: Dempster-Shafer theory - Introduction, connections with ...tdenoeux/dokuwiki/_media/en/rskt2014.pdf · Brief introduction or reminder on DS theory, emphasizing some connections with

Dempster-Shafer theoryApplication to clustering

Evidential partitionEvidential c-means

Brain dataImage segmentation

(a) (b) (c)

Pathology (left); CSF and ventricles (center); normal brain tissues(right). The lower approximations of the clusters are represented bylight grey areas, the upper approximations by the union of light and

dark grey areas.

Thierry Denœux Dempster-Shafer theory. Application to clustering 43/ 48

Page 44: Dempster-Shafer theory - Introduction, connections with ...tdenoeux/dokuwiki/_media/en/rskt2014.pdf · Brief introduction or reminder on DS theory, emphasizing some connections with

Dempster-Shafer theoryApplication to clustering

Evidential partitionEvidential c-means

Determining the number of groups

If a proper number of classes is chosen, the prototypes will coverthe clusters and most of the mass will be allocated to singletonsof Ω.On the contrary, if c is too small or too high, the mass will bedistributed to subsets with higher cardinality or to ∅.Nonspecificity of a mass function:

N(m) ,∑

A∈2Ω\∅

m(A) log2 |A|+ m(∅) log2 |Ω|,

Proposed validity index of an evidential partition:

N∗(c) ,1

n log2(c)

n∑i=1

∑A∈2Ω\∅

mi (A) log2 |A|+ mi (∅) log2(c)

,Thierry Denœux Dempster-Shafer theory. Application to clustering 44/ 48

Page 45: Dempster-Shafer theory - Introduction, connections with ...tdenoeux/dokuwiki/_media/en/rskt2014.pdf · Brief introduction or reminder on DS theory, emphasizing some connections with

Dempster-Shafer theoryApplication to clustering

Evidential partitionEvidential c-means

Determining the number of groupsResult with the 4-class dataset

2 3 4 5 6

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0.55

α=1

α=2

α=3

Number of clusters

N*(

c)

Thierry Denœux Dempster-Shafer theory. Application to clustering 45/ 48

Page 46: Dempster-Shafer theory - Introduction, connections with ...tdenoeux/dokuwiki/_media/en/rskt2014.pdf · Brief introduction or reminder on DS theory, emphasizing some connections with

ConclusionsReferences

ConclusionDS theory vs. Rough set theory

Dempster-Shafer theory and Rough set theory have differentagendas:

DS theory formalizes reasoning with uncertainty;Rough set theory is a tool for knowledge extraction from databases.

However, they are both concerned with coarseness ofrepresentation, and they have strong connections from a formalpoint of view:

A belief function Ω can be seen as being generated from aprobability measure on some underlying space S and an intervalrelation between S and Ω.The notions of lower and upper approximations of a set induced byan equivalence relation can be extended to mass functions.

Thierry Denœux Dempster-Shafer theory. Application to clustering 46/ 48

Page 47: Dempster-Shafer theory - Introduction, connections with ...tdenoeux/dokuwiki/_media/en/rskt2014.pdf · Brief introduction or reminder on DS theory, emphasizing some connections with

ConclusionsReferences

ConclusionEvidential vs. rough clustering

When applied to clustering, DS theory leads to the notion ofevidential partition, which generalizes most previous clusteringstructures, including rough clustering.Several algorithms have been proposed to generate an evidentialpartition from proximity or attribute data:

EVCLUS;Evidential c-means and its variants (proximity data, optimizeddistance measure, etc.)

These algorithms may also be used to generate a roughclustering structure.A detailed comparison with, e.g., the rough c-means algorithm(Lingras and West, 2004) remains to be done (see a firstapproach in Joshi and Lingras, 2012).

Thierry Denœux Dempster-Shafer theory. Application to clustering 47/ 48

Page 48: Dempster-Shafer theory - Introduction, connections with ...tdenoeux/dokuwiki/_media/en/rskt2014.pdf · Brief introduction or reminder on DS theory, emphasizing some connections with

ConclusionsReferences

Referencescf. http://www.hds.utc.fr/˜tdenoeux

T. Denœux and M.-H. Masson.EVCLUS: Evidential Clustering of Proximity Data.IEEE Transactions on SMC B, 34(1):95-109, 2004.

M.-H. Masson and T. Denœux.ECM: An evidential version of the fuzzy c-means algorithm.Pattern Recognition, 41(4):1384-1397, 2008.

V. Antoine, B. Quost, M.-H. Masson and T. Denoeux.CECM: Constrained Evidential C-Means algorithm.Computational Statistics and Data Analysis, 56(4):894-914, 2012.

B. Lelandais, S. Ruan, T. Denoeux, P. Vera, I. Gardin.Fusion of multi-tracer PET images for Dose Painting.Medical Image Analysis, 18(7):1247-1259, 2014.

Thierry Denœux Dempster-Shafer theory. Application to clustering 48/ 48