63
Computational Intelligence Computational Intelligence for Data Mining for Data Mining Włodzisław Duch Włodzisław Duch Department of Informatics Department of Informatics Nicholas Copernicus University Nicholas Copernicus University Torun, Poland Torun, Poland W W ith help from ith help from R . . Adamczak Adamczak , , K . . Grąbczewski Grąbczewski K . . Grudziński Grudziński , , N . . Jankowski Jankowski , , A . . Naud Naud http://www.phys.uni.torun.pl/kmk http://www.phys.uni.torun.pl/kmk WCCI WCCI 200 200 2 2 , , Honolulu, HI Honolulu, HI

Computational Intelligence for Data Mining

  • Upload
    olinda

  • View
    17

  • Download
    0

Embed Size (px)

DESCRIPTION

Computational Intelligence for Data Mining. Włodzisław Duch Department of Informatics Nicholas Copernicus University Torun, Poland W ith help from R . Adamczak , K . Grąbczewski K . Grudziński , N . Jankowski , A . Naud http://www.phys.uni.torun.pl/kmk. - PowerPoint PPT Presentation

Citation preview

Page 1: Computational Intelligence  for Data Mining

Computational Intelligence Computational Intelligence for Data Miningfor Data Mining

Włodzisław DuchWłodzisław DuchDepartment of InformaticsDepartment of Informatics

Nicholas Copernicus University Nicholas Copernicus University Torun, PolandTorun, Poland

WWith help fromith help fromRR. . AdamczakAdamczak, , KK. . Grąbczewski Grąbczewski

KK. . GrudzińskiGrudziński, , NN. . JankowskiJankowski, , AA. . Naud Naud

http://www.phys.uni.torun.pl/kmkhttp://www.phys.uni.torun.pl/kmk

WCCIWCCI 200 20022, , Honolulu, HIHonolulu, HI

Page 2: Computational Intelligence  for Data Mining

Group membersGroup members

Page 3: Computational Intelligence  for Data Mining

PlanPlanWhat this tutorial is about ?

• How to discover knowledge in data; • how to create comprehensible models of data; • how to evaluate new data.

1. AI, CI & Data Mining2. Forms of useful knowledge3. GhostMiner philosophy4. Exploration & Visualization5. Rule-based data analysis 6. Neurofuzzy models7. Neural models8. Similarity-based models9. Committees of models

Page 4: Computational Intelligence  for Data Mining

AI, CI & DMAI, CI & DMArtificial Intelligence: symbolic models of knowledge. • Higher-level cognition: reasoning, problem solving,

planning, heuristic search for solutions.• Machine learning, inductive, rule-based methods.• Technology: expert systems.

Computational Intelligence, Soft Computing:methods inspired by many sources: • biology – evolutionary, immune, neural computing• statistics, patter recognition• probability – Bayesian networks• logic – fuzzy, rough … Perception, object recognition.Data Mining, Knowledge Discovery in Databases.• discovery of interesting patterns, rules, knowledge. • building predictive data models.

Page 5: Computational Intelligence  for Data Mining

Forms of useful knowledgeForms of useful knowledgeAI/Machine Learning camp: Neural nets are black boxes. Unacceptable! Symbolic rules forever.

But ... knowledge accessible to humans is in: • symbols, • similarity to prototypes, • images, visual representations.

What type of explanation is satisfactory?Interesting question for cognitive scientists.Different answers in different fields.

Page 6: Computational Intelligence  for Data Mining

Forms of knowledgeForms of knowledge

Types of explanation:

• exemplar-based: prototypes and similarity;• logic-based: symbols and rules;• visualization-based: maps, diagrams,

relations ...

• Humans remember examples of each category and refer to such examples – as similarity-based or nearest-neighbors methods do.

• Humans create prototypes out of many examples – as Gaussian classifiers, RBF networks, neurofuzzy systems do.

• Logical rules are the highest form of summarization of knowledge.

Page 7: Computational Intelligence  for Data Mining

GhostMiner PhilosophyGhostMiner Philosophy

• There is no free lunch – provide different type of tools for knowledge discovery. Decision tree, neural, neurofuzzy, similarity-based, committees.

• Provide tools for visualization of data.• Support the process of knowledge

discovery/model building and evaluating, organizing it into projects.

GhostMiner, data mining tools from our lab. • Separate the process of model building and

knowledge discovery from model use =>

GhostMiner Developer & GhostMiner Analyzer

Page 8: Computational Intelligence  for Data Mining

Wine data exampleWine data example

• alcohol content • ash content • magnesium content • flavanoids content • proanthocyanins

phenols content • OD280/D315 of diluted

wines

Chemical analysis of wine from grapes grown in the same region in Italy, but derived from three different cultivars.Task: recognize the source of wine sample.13 quantities measured, continuous features:

• malic acid content • alkalinity of ash • total phenols content • nonanthocyanins

phenols content • color intensity • hue• proline.

Page 9: Computational Intelligence  for Data Mining

Exploration and visualizationExploration and visualizationGeneral info about the data

Page 10: Computational Intelligence  for Data Mining

Exploration: dataExploration: dataInspect the data

Page 11: Computational Intelligence  for Data Mining

Exploration: data statisticsExploration: data statisticsDistribution of feature values

Proline has very large values, the data should be standardized before further processing.

Page 12: Computational Intelligence  for Data Mining

Exploration: data standardizedExploration: data standardizedStandardized data: unit standard deviation, about 2/3 of all data should fall within [mean-std,mean+std]

Other options: normalize to fit in [-1,+1], or normalize rejecting some extreme values.

Page 13: Computational Intelligence  for Data Mining

Exploration: 1D histogramsExploration: 1D histogramsDistribution of feature values in clasess

Some features are more useful than the others.

Page 14: Computational Intelligence  for Data Mining

Exploration: 1D/3D histogramsExploration: 1D/3D histogramsDistribution of feature values in clasess, 3D

Page 15: Computational Intelligence  for Data Mining

Exploration: 2D projectionsExploration: 2D projectionsProjections on selected 2D

Projections on selected 2D

Page 16: Computational Intelligence  for Data Mining

Visualize data Visualize data Hard to imagine relations in more than 3D.

SOM mappings: popular for visualization, but rather inaccurate, no measure of distortions.

Measure of topographical distortions: map all Xi

points from Rn to xi points in Rm, m < n, and ask:

how well are Rij = D(Xi, Xj) distances reproduced by distances rij = d(xi,xj) ?

Use m = 2 for visualization, use higher m for dimensionality reduction.

Page 17: Computational Intelligence  for Data Mining

Visualize data: MDSVisualize data: MDSMultidimensional scaling: invented in psychometry by Torgerson (1952), re-invented by Sammon (1969) and myself (1994) … Minimize measure of topographical distortions moving the x coordinates.

2

1 2

2

2

2

3

1 MDS

11 Sammon

1 1 MDS, more local

ij iji jij

i j

ij

i jij iji j

ij iji jij

i j

S R rR

rS

R R

S r RR

x x

xx

x x

Page 18: Computational Intelligence  for Data Mining

Visualize data: WineVisualize data: Wine3 clusters are clearly distinguished, 2D is fine.

The green outlier can be identified easily.

Page 19: Computational Intelligence  for Data Mining

Decision treesDecision treesSimplest things first: use decision tree to find logical rules.Test single attribute, find good point to split the data, separating vectors from different classes. DT advantages: fast, simple, easy to understand, easy to program, many good algorithms.

Page 20: Computational Intelligence  for Data Mining

Decision bordersDecision bordersUnivariate trees: test the value of a single attribute x < a. Multivariate trees:test on combinations of attributes.Result: feature space is divided in hyperrectangular areas.

Page 21: Computational Intelligence  for Data Mining

SSV decision treeSSV decision treeSeparability Split Value tree: based on the separability criterion.

: ( ) , continuous, ,

: ( ) , discrete

, , , ,

x D f x s fLS s f D

x D f x s f

RS s f D D LS s f D

Define left and right sides of the splits:

SSV criterion: separate as many pairs of vectors from different classes as possible; minimize the number of separated from the same class.

( ) 2 , , , ,

min , , , , ,

c cc C

c cc C

SSV s LS s f D D RS s f D D D

LS s f D D RS s f D D

Page 22: Computational Intelligence  for Data Mining

SSV – complex treeSSV – complex treeTrees may always learn to achieve 100% accuracy.

Very few vectors are left in the leaves!

Page 23: Computational Intelligence  for Data Mining

SSV – simplest treeSSV – simplest treePruning finds the nodes that should be removed to increase generalization – accuracy on unseen data.

Trees with 7 nodes left: 15 errors/178 vectors.

Page 24: Computational Intelligence  for Data Mining

SSV – logical rulesSSV – logical rulesTrees may be converted to logical rules.Simplest tree leads to 4 logical rules:

1. if proline > 719 and flavanoids > 2.3 then class 12. if proline < 719 and OD280 > 2.115 then class 23. if proline > 719 and flavanoids < 2.3 then class 34. if proline < 719 and OD280 < 2.115 then class 3

How accurate are such rules? Not 15/178 errors, or 91.5% accuracy! Run 10-fold CV and average the results.85±10%? Run 10X!

Page 25: Computational Intelligence  for Data Mining

SSV – optimal trees/rulesSSV – optimal trees/rulesOptimal: estimate how well rules will generalize.Use stratified crossvalidation for training;use beam search for better results.1. if OD280/D315 > 2.505 and proline > 726.5 then class 12. if OD280/D315 < 2.505 and hue > 0.875 and malic-acid <

2.82 then class 23. if OD280/D315 > 2.505 and proline < 726.5 then class 24. if OD280/D315 < 2.505 and hue > 0.875 and malic-acid >

2.82 then class 35. if OD280/D315 < 2.505 and hue < 0.875 then class 3

Note 6/178 errors, or 91.5% accuracy! Run 10-fold CV: results are 90.4±6.1%? Run 10X!

Page 26: Computational Intelligence  for Data Mining

Logical rulesCrisp logic rules: for continuous x use linguistic variables (predicate functions).sk(x) ş True [XkŁ x ŁX'k], for example: small(x) = True{x|x < 1}medium(x) = True{x|x [1,2]}large(x) = True{x|x > 2}

Linguistic variables are used in crisp (prepositional, Boolean) logic rules:

IF small-height(X) AND has-hat(X) AND has-beard(X)

THEN (X is a Brownie) ELSE IF ... ELSE ...

Page 27: Computational Intelligence  for Data Mining

Crisp logic decisionsCrisp logic decisions

Crisp logic is based on rectangular membership functions:True/False values jump from 0 to 1.

Step functions are used for partitioning of the feature space.

Very simple hyper-rectangular decision borders.

Sever limitation on the expressive power of crisp logical rules!

Page 28: Computational Intelligence  for Data Mining

Logical rules - advantagesLogical rules - advantagesLogical rules, if simple enough, are preferable.

• Rules may expose limitations of black box solutions.

• Only relevant features are used in rules. • Rules may sometimes be more accurate than

NN and other CI methods. • Overfitting is easy to control, rules usually

have small number of parameters. • Rules forever !?

A logical rule about logical rules is:IF the number of rules is relatively smallAND the accuracy is sufficiently high. THEN rules may be an optimal choice.

Page 29: Computational Intelligence  for Data Mining

Logical rules - limitationsLogical rules - limitationsLogical rules are preferred but ...

• Only one class is predicted p(Ci|X,M) = 0 or 1 black-and-white picture may be inappropriate in many applications.

• Discontinuous cost function allow only non-gradient optimization.

• Sets of rules are unstable: small change in the dataset leads to a large change in structure of complex sets of rules.

• Reliable crisp rules may reject some cases as unclassified.

• Interpretation of crisp rules may be misleading.

• Fuzzy rules are not so comprehensible.

Page 30: Computational Intelligence  for Data Mining

How to use logical rules?How to use logical rules?Data has been measured with unknown error. Assume Gaussian distribution:

( ; , )x xx G G y x s

x – fuzzy number with Gaussian membership function.A set of logical rules R is used for fuzzy input vectors: Monte Carlo simulations for arbitrary system => p(Ci|X)Analytical evaluation p(C|X) is based on cumulant:

1; , 1 erf ( )2 2

a

xx

a xa x G y x s dy a xs

2.4 / 2 xs Error function is identical to logistic f. < 0.02

Page 31: Computational Intelligence  for Data Mining

Rules - choicesRules - choicesSimplicity vs. accuracy. Confidence vs. rejection rate.

true | predicted r

r

p p p pp

p p p p

Accuracy (overall) A(M) = p+ p

Error rate L(M) = p+ p Rejection rate R(M)=p+r+pr= 1L(M)A(M)Sensitivity S+(M)= p+|+ = p++ /p+

Specificity S(M)= p = p /p

p is a hit; p false alarm; p is a miss.

Page 32: Computational Intelligence  for Data Mining

Rules – error functionsRules – error functionsThe overall accuracy is equal to a combination of sensitivity and specificity weighted by the a priori probabilities: A(M) = pS(M)+pS(M)

Optimization of rules for the C+ class; large means no errors but high rejection rate.

E(M)= L(M)A(M)= (p+p) (p+p)minM E(M;) minM {(1+)L(M)+R(M)} Optimization with different costs of errors

minM E(M;) = minM {p+ p} = minM {pS(M))pr(M) + [pS(M))pr(M)]}

ROC (Receiver Operating Curve): p (p, hit(false alarm).

Page 33: Computational Intelligence  for Data Mining

Fuzzification of rulesFuzzification of rulesRule Ra(x) = {xa} is fulfilled by Gx with probability:

( ) T ; , ( )a x xa

p R G G y x s dy x a

Error function is approximated by logistic function; assuming error distribution (x)x)), for s2=1.7 approximates Gauss < 3.5%

Rule Rab(x) = {b> x a} is fulfilled by Gx with probability:

( ) T ; , ( ) ( )b

ab x xa

p R G G y x s dy x a x b

Page 34: Computational Intelligence  for Data Mining

Soft trapezoids and NNSoft trapezoids and NNThe difference between two sigmoids makes a soft trapezoidal membership functions.

Conclusion: fuzzy logic with (x) (x-b) m.f. is equivalent to crisp logic + Gaussian uncertainty.

Page 35: Computational Intelligence  for Data Mining

Optimization of rulesOptimization of rulesFuzzy: large receptive fields, rough estimations.Gx – uncertainty of inputs, small receptive fields.Minimization of the number of errors – difficult, non-gradient, but now Monte Carlo or analytical p(C|X;M).

21{ }; , | ; ( ),2x i i

X i

E X R s p C X M C X C

• Gradient optimization works for large number of parameters.• Parameters sx are known for some features, use them as

optimization parameters for others! • Probabilities instead of 0/1 rule outcomes.• Vectors that were not classified by crisp rules have now non-

zero probabilities.

Page 36: Computational Intelligence  for Data Mining

MushroomsMushroomsThe Mushroom Guide: no simple rule for mushrooms; no rule like: ‘leaflets three, let it be’ for Poisonous Oak and Ivy.

8124 cases, 51.8% are edible, the rest non-edible. 22 symbolic attributes, up to 12 values each, equivalent to 118 logical features, or 2118=3.1035 possible input vectors.

Odor: almond, anise, creosote, fishy, foul, musty, none, pungent, spicySpore print color: black, brown, buff, chocolate, green, orange, purple, white, yellow.Safe rule for edible mushrooms: odor=(almond.or.anise.or.none) Ů spore-print-color = Ř green

48 errors, 99.41% correct

This is why animals have such a good sense of smell! What does it tell us about odor receptors?

Page 37: Computational Intelligence  for Data Mining

Mushrooms rulesMushrooms rulesTo eat or not to eat, this is the question! Not any more ...

A mushroom is poisonous if: R1) odor = Ř (almond anise none); 120 errors, 98.52% R2) spore-print-color = green 48 errors, 99.41% R3) odor = none Ů stalk-surface-below-ring = scaly Ů stalk-color-above-ring = Ř brown 8 errors, 99.90% R4) habitat = leaves Ů cap-color = white no errors!

R1 + R2 are quite stable, found even with 10% of data; R3 and R4 may be replaced by other rules, ex:

R'3): gill-size=narrow Ů stalk-surface-above-ring=(silky scaly) R'4): gill-size=narrow Ů population=clustered Only 5 of 22 attributes used! Simplest possible rules? 100% in CV tests - structure of this data is completely clear.

Page 38: Computational Intelligence  for Data Mining

Recurrence of breast cancerRecurrence of breast cancerInstitute of Oncology, University Medical Center, Ljubljana.

286 cases, 201 no (70.3%), 85 recurrence cases (29.7%)

9 symbolic features: age (9 bins), tumor-size (12 bins), nodes involved (13 bins), degree-malignant (1,2,3), area, radiation, menopause, node-caps. no-recurrence,40-49,premeno,25-29,0-2,?,2, left, right_low, yes

Many systems tried, 65-78% accuracy reported. Single rule:

IF (nodes-involved [0,2] degree-malignant = 3 THEN recurrence ELSE no-recurrence

77% accuracy, only trivial knowledge in the data: highly malignant cancer involving many nodes is likely to strike back.

Page 39: Computational Intelligence  for Data Mining

Neurofuzzy systemNeurofuzzy system

Feature Space Mapping (FSM) neurofuzzy system.Neural adaptation, estimation of probability density distribution (PDF) using single hidden layer network (RBF-like) with nodes realizing separable functions:

1

; ;i i ii

G X P G X P

Fuzzy: x(no/yes) replaced by a degree x. Triangular, trapezoidal, Gaussian or other membership f.

M.f-s in many dimensions:

Page 40: Computational Intelligence  for Data Mining

FSMFSM

Rectangular functions: simple rules are created, many nearly equivalent descriptions of this data exist.

If proline > 929.5 then class 1 (48 cases, 45 correct + 2 recovered by other rules).

If color < 3.79285 then class 2 (63 cases, 60 correct)

Interesting rules, but overall accuracy is only 88±9%

Initialize using clusterization or decision trees.Triangular & Gaussian f. for fuzzy rules.Rectangular functions for crisp rules.

Between 9-14 rules with triangular membership functions are created; accuracy in 10xCV tests about 96±4.5%Similar results obtained with Gaussian functions.

Page 41: Computational Intelligence  for Data Mining

Prototype-based rulesPrototype-based rules

IF P = arg minR D(X,R) THAN Class(X)=Class(P)

C-rules (Crisp), are a special case of F-rules (fuzzy rules).F-rules (fuzzy rules) are a special case of P-rules (Prototype).P-rules have the form:

D(X,R) is a dissimilarity (distance) function, determining decision borders around prototype P. P-rules are easy to interpret! IF X=You are most similar to the P=SupermanTHAN You are in the Super-league. IF X=You are most similar to the P=Weakling THAN You are in the Failed-league. “Similar” may involve different features or D(X,P).

Page 42: Computational Intelligence  for Data Mining

P-rulesP-rulesEuclidean distance leads to a Gaussian fuzzy membership functions + product as T-norm.

Manhattan function => (X;P)=exp{|X-P|}Various distance functions lead to different MF.Ex. data-dependent distance functions, for symbolic data:

2

2

,,

, ,

,i i

i i ii

i i i i ii i

d X PW X PD

P i i ii i

D d X P W X P

e e e X P

X P

X P

X

, | |

, | |

VDM j i j ii j

PDF i j j ii j

D p C X p C Y

D p X C p C Y

X Y

X Y

Page 43: Computational Intelligence  for Data Mining

PromotersPromotersDNA strings, 57 aminoacids, 53 + and 53 - samples tactagcaatacgcttgcgttcggtggttaagtatgtataatgcgcgggcttgtcgt

Euclidean distance, symbolic s =a, c, t, g replaced by x=1, 2, 3, 4

PDF distance, symbolic s=a, c, t, g replaced by p(s|+)

Page 44: Computational Intelligence  for Data Mining

P-rulesP-rulesNew distance functions from info theory => interesting MF.

MF => new distance function, with local D(X,R) for each cluster.

Crisp logic rules: use L norm:

DCh(X,P) = ||XP|| = maxi Wi |XiPi|

DCh(X,P) = const => rectangular contours. Chebyshev distance with thresholds P

IF DCh(X,P) P THEN C(X)=C(P)

is equivalent to a conjunctive crisp rule

IF X1[P1PW1,P1PW1] …XN [PN PWN,PNPWN]THEN C(X)=C(P)

Page 45: Computational Intelligence  for Data Mining

Decision bordersDecision borders

Euclidean distance from 3 prototypes, one per class.

Minkovski =20 distance from 3 prototypes.

D(P,X)=const and decision borders D(P,X)=D(Q,X).

Page 46: Computational Intelligence  for Data Mining

P-rules for WineP-rules for WineManhattan distance: 6 prototypes kept, 4 errors, f2 removed

Chebyshev distance:15 prototypes kept, 5 errors, f2, f8, f10 removed

Euclidean distance:11 prototypes kept, 7 errors

Many other solutions.

Page 47: Computational Intelligence  for Data Mining

Neural networksNeural networks• MLP – Multilayer Perceptrons, most popular NN models.Use soft hyperplanes for discrimination.Results are difficult to interpret, complex decision borders. Prediction, approximation: infinite number of classes.

• RBF – Radial Basis Functions.RBF with Gaussian functions are equivalent to fuzzy systems with Gaussian membership functions, but …No feature selection => complex rules.Other radial functions => not separable! Use separable functions, not radial => FSM.

• Many methods to convert MLP NN to logical rules.

Page 48: Computational Intelligence  for Data Mining

Rules from MLPsRules from MLPsWhy is it difficult?Multi-layer perceptron (MLP) networks: stack many perceptron units, performing threshold logic:

M-of-N rule: IF (M conditions of N are true) THEN ...

Problem: for N inputs number of subsets is 2N. Exponentially growing number of possible conjunctive rules.

Page 49: Computational Intelligence  for Data Mining

MLP2LNMLP2LNConverts MLP neural networks into a network performing logical operations (LN).

Inputlayer

Aggregation: better features

Output: one node per class.

Rule units: threshold logic

Linguistic units: windows, filters

Page 50: Computational Intelligence  for Data Mining

MLP2LN trainingMLP2LN trainingConstructive algorithm: add as many nodes as needed.

Optimize cost function:minimize errors +

enforce zero connections +

leave only +1 and -1 weightsmakes interpretation easy.

2

21

2 2 22

1( ) ( ( ; ) )2

2

( 1) ( 1)2

p pi i

p i

iji j

ij ij iji j

E F t

W

W W W

W X W

Page 51: Computational Intelligence  for Data Mining

L-unitsL-unitsCreate linguistic variables.

1 1 2 2( ) 'L X S W x b S W x b

Numerical representation for R-nodesVsk=() for sk=lowVsk=() for sk=normal

L-units: 2 thresholds as adaptive parameters; logistic (x), or tanh(x)[ Soft trapezoidal functions change into rectangular filters (Parzen windows). 4 types, depending on signs Si.

Product of bi-central functions is logical rule, used by IncNet NN.

Page 52: Computational Intelligence  for Data Mining

Iris exampleIris exampleNetwork after training:

iris setosa: q=1 (0,0,0;0,0,0;+1,0,0;+1,0,0) iris versicolor: q=2 (0,0,0;0,0,0;0,+1,0;0,+1,0)iris virginica: q=1(0,0,0;0,0,0;0,0,+1;0,0,+1)

Rules:

If (x3=sx4=s) setosa If (x3=mx4=m) versicolor If (x3=l x4=l) virginica

3 errors only (98%).

Page 53: Computational Intelligence  for Data Mining

Learning dynamicsLearning dynamicsDecision regions shown every 200 training epochs in x3, x4 coordinates; borders are optimally placed with wide margins.

Page 54: Computational Intelligence  for Data Mining

Thyroid screeningThyroid screening

Garavan Institute, Sydney, Australia15 binary, 6 continuousTraining: 93+191+3488 Validate: 73+177+3178

Determine important clinical factors Calculate prob. of each diagnosis.

Hiddenunits

Finaldiagnoses

TSHT4U

Clinical findingsAgesex……

T3TT4TBG

Normal

HyperthyroidHypothyroid

Page 55: Computational Intelligence  for Data Mining

Thyroid – some results.Thyroid – some results.Accuracy of diagnoses obtained with several systems – rules are accurate.

Method Rules/Features Training % Test %

MLP2LN optimized 4/6 99.9 99.36

CART/SSV Decision Trees 3/5 99.8 99.33

Best Backprop MLP -/21 100 98.5

Naïve Bayes -/- 97.0 96.1

k-nearest neighbors -/- - 93.8

Page 56: Computational Intelligence  for Data Mining

PsychometryPsychometryUse CI to find knowledge, create Expert System.

MMPI (Minnesota Multiphasic Personality Inventory) psychometric test.Printed forms are scanned or computerized version of the test is used.

• Raw data: 550 questions, ex:I am getting tired quickly: Yes - Don’t know - No

• Results are combined into 10 clinical scales and 4 validity scales using fixed coefficients.

• Each scale measures tendencies towards hypochondria, schizophrenia, psychopathic deviations, depression, hysteria, paranoia etc.

Page 57: Computational Intelligence  for Data Mining

Psychometry: goalPsychometry: goal• There is no simple correlation between single

values and final diagnosis. • Results are displayed in form of a histogram,

called ‘a psychogram’. Interpretation depends on the experience and skill of an expert, takes into account correlations between peaks.

Goal: an expert system providing evaluation and interpretation of MMPI tests at an expert level.

Problem: agreement between experts only about 70% of the time; alternative diagnosis and personality changes over time are important.

Page 58: Computational Intelligence  for Data Mining

Psychometric dataPsychometric data1600 cases for woman, same number for men.27 classes: norm, psychopathic, schizophrenia, paranoia, neurosis, mania, simulation, alcoholism, drug addiction, criminal tendencies, abnormal behavior due to ...

Extraction of logical rules: 14 scales = features.

Define linguistic variables and use FSM, MLP2LN, SSV - giving about 2-3 rules/class.

Page 59: Computational Intelligence  for Data Mining

Psychometric resultsPsychometric results

10-CV for FSM is 82-85%, for C4.5 is 79-84%. Input uncertainty ++GGxx around 1.5% (best ROC) improves FSM results to 90-92%.

Method Data N. rules Accuracy +Gx%

C 4.5 ♀ 55 93.0 93.7

♂ 61 92.5 93.1

FSM ♀ 69 95.4 97.6

♂ 98 95.9 96.9

Page 60: Computational Intelligence  for Data Mining

Psychometric ExpertPsychometric ExpertProbabilities for different classes. For greater uncertainties more classes are predicted.

Fitting the rules to the conditions:typically 3-5 conditions per rule, Gaussian distributions around measured values that fall into the rule interval are shown in green.

Verbal interpretation of each case, rule and scale dependent.

Page 61: Computational Intelligence  for Data Mining

VisualizationVisualizationProbability of classes versus input uncertainty.

Detailed input probabilities around the measured values vs. change in the single scale; changes over time define ‘patients trajectory’.

Interactive multidimensional scaling: zooming on the new case to inspect its similarity to other cases.

Page 62: Computational Intelligence  for Data Mining

SummarySummaryComputational intelligence methods: neural, decision trees, similarity-based & other, help to understand the data.Understanding data: achieved by rules, prototypes, visualization.

Small is beautiful => simple is the best!Simplest possible, but not simpler - regularization of models; accurate but not too accurate - handling of uncertainty; high confidence, but not paranoid - rejecting some cases.

• Challenges:hierarchical systems, discovery of theories rather than data models, integration with image/signal analysis, reasoning in complex domains/objects, applications in bioinformatics, text analysis ...

Page 63: Computational Intelligence  for Data Mining

ReferencesReferences

Many papers, comparison of results for numerous Many papers, comparison of results for numerous datasets are kept at:datasets are kept at:

http://www.phys.uni.torun.pl/kmk

See also my homepage at: See also my homepage at:

http://www.phys.uni.torun.pl/~duch

for this and other presentations and some papers.for this and other presentations and some papers.

We are slowly getting there. All this and more is included in the Ghostminer, data mining software (in collaboration with Fujitsu) just released …

http://www.fqspl.com.pl/ghostminer/