20
Attribute Interactions Attribute Interactions in Medical Data Analysis in Medical Data Analysis A A . . Jakulin Jakulin 1 , , I I . . Bratko Bratko 1,2 1,2 , , D D . . Smrke Smrke 3 , , J J . . Dem Dem šar šar 1 , B. Zupan , B. Zupan 1,2,4 1,2,4 1. University of Ljubljana, Slovenia. 2. Jožef Stefan Institute, Ljubljana, Slovenia. 3. Dept. of Traumatology, University Clinical Center, Ljubljana, Slovenia. 4. Dept. of Human and Mol. Genetics, Baylor College of Medicine, USA.

Attribute Interactions in Medical Data Analysis

Embed Size (px)

DESCRIPTION

Attribute Interactions in Medical Data Analysis. A . Jakulin 1 , I . Bratko 1,2 , D . Smrke 3 , J . Dem šar 1 , B. Zupan 1,2,4 University of Ljubljana , Slovenia . Jožef Stefan Institute, Ljubljana, Slovenia . Dept. of Traumatology, University Clinical Center, Ljubljana, Slovenia . - PowerPoint PPT Presentation

Citation preview

Attribute InteractionsAttribute Interactionsin Medical Data Analysisin Medical Data Analysis

AA.. Jakulin Jakulin11,, I I. . BratkoBratko1,21,2,, D D. . SmrkeSmrke33, , JJ.. Dem Demšaršar11, B. Zupan, B. Zupan1,2,41,2,4

1. University of Ljubljana, Slovenia.2. Jožef Stefan Institute, Ljubljana, Slovenia.3. Dept. of Traumatology, University Clinical Center, Ljubljana, Slovenia.4. Dept. of Human and Mol. Genetics, Baylor College of Medicine, USA.

OverviewOverview

1. Interactions:– Correlation can be generalized to more than 2

attributes, to capture interactions - higher-order regularities.

2. Information theory:– A non-parametric approach for measuring

‘association’ and ‘uncertainty’.3. Applications:

– Automatic selection of informative visualizations uncover previously unseen structure in medical data.

– Automatic constructive induction of new features.4. Results:

– Better predictive models for hip arthroplasty.– Better understanding of the data.

Attribute DependenciesAttribute Dependencies

C

BA

label (outcome, diagnosis)

attribute(feature)

attribute(feature)

importance of attribute Bimportance of attribute A

3-Way Interaction: What is common to A, B and C together;

and cannot be inferred from pairs of attributes.

attribute correlation

2-Way Interactions

Shannon’s EntropyShannon’s Entropy

C

Entropy given C’s empirical probability distribution (p = [0.2, 0.8]).

A

H(A)Information

which came with knowledge of A

I(A;C)=H(A)+H(C)-H(AC)Mutual information or information gain ---

How much have A and C in common?

H(C|A) = H(C)-I(A;C)Conditional entropy --- Remaining uncertaintyin C after knowing A.

H(AB)Joint entropy

Interaction InformationInteraction Information

• Interaction information can be:– NEGATIVE – redundancy among attributes (negative int.)

– NEGLIGIBLE – no interaction

– POSITIVE – synergy between attributes (positive int.)

I(A;B;C) :=

I(AB;C) - I(B;C)- I(A;C)

= I(A;B|C) - I(A;B)

History of Interaction InformationHistory of Interaction Information

(Partial) history of independent reinventions:

• McGill ‘54 (Psychometrika) - interaction information• Han ‘80 (Information & Control) - multiple mutual information• Yeung ‘91 (IEEE Trans. Inf. Theory) - mutual information• Grabisch & Roubens ‘99 (game theory) - Banzhaf interaction index• Matsuda ‘00 (Physical Review E) - higher-order mutual inf.• Brenner et al. ‘00 (Neural Computation) - average synergy• Demšar ’02 (machine learning) - relative information gain• Bell ‘03 (NIPS02, ICA2003) - co-information• Jakulin ’03 (machine learning) - interaction gain

Utility of Interaction InformationUtility of Interaction Information

1. Visualization of interactions in data• Interaction graphs, dendrograms

2. Construction of predictive models• Feature construction, combination, selection

Case studies:• Predicting the success of hip arthroplasty (HHS).• Predicting the contraception method used from

demographic data (CMC).

Predictive modeling helps us focus only on interactions that involve the outcome.

Interaction Matrix for CMC DomainInteraction Matrix for CMC Domain

Illustrates the interaction information for all pairs of attributes. red – positive, blue – negative, green – independent.

An attribute’s information gain

Interaction GraphsInteraction GraphsInformation gain:

100% I(A;C)/H(C)The attribute “explains” 1.98% of label entropy

A positive interaction:

100% I(A;B;C)/H(C)The two attributes are in a synergy:

treating them holistically may result in 1.85% extra uncertainty explained.

A negative interaction:

100% I(A;B;C)/H(C)The two attributes are slightly redundant: 1.15% of label uncertainty is explained

by each of the two attributes.

Interaction DendrogramInteraction Dendrogram

cluster “tightness”loose tight

information gain

uninformative attribute

informativeattribute

weakly interacting strongly interacting

Interpreting the DendrogramInterpreting the Dendrogram

an unimportant interaction

a positive interaction

a cluster of negatively interacting attributes

a weakly negative interaction

a useless attribute

Application to the Harris hip Application to the Harris hip score prediction (HHS)score prediction (HHS)

Attribute Structure for HHSAttribute Structure for HHS

Discovered from data Designed by the physician

“Bipolar endoprosthesis and short duration of operation significantly

increases the chances of a good outcome.”

“Presence of neurological disease is a high risk factor only in the

presence of other complications during operation.”

late complications

rehabilitation

A Positive InteractionA Positive Interaction

Both attributes are useless alone, but useful together.They should be combined into a single feature (e.g. with a classification tree, a rule or a Cartesian product attribute).

These two attributes are also correlated: correlation doesn’t imply redundancy.

A Negative InteractionA Negative Interaction

Once we know the wife’s or the husband’s education,the other attribute will not provide much new information.

But they do provide some, if you know how to use it! Feature combination may work: feature selection throws data away.

very fewinstances!

Prediction of HHSPrediction of HHSBrier score - probabilistic evaluation (K classes, N instances):

Models:• Tree-Augmented NBC: 0.227 ± 0.018• Naïve Bayesian classifier: 0.223 ± 0.014• General Bayesian net: 0.208 ± 0.006• Simple feature selection with NBC: 0.196 ± 0.012• FSS with background concepts: 0.196 ± 0.011• 10 top interactions → FSS: 0.189 ± 0.011

– Tree-Augmented NB: 0.207 ± 0.017– Search for feature comb.: 0.185 ± 0.012

2

,, ˆ1

)ˆ,( N

i

K

jjiji pp

KppBS

The Best ModelThe Best Model

An attribute’s information

These two (not very logical) combinations of features are only worth 0.2% loss in performance.

The endoprosthesis and operation duration interaction provides little information that wouldn’t already be provided by these attributes: it interacts

negatively with the model.

A Causal DiagramA Causal Diagram

HHS

pulmonarydisease

loss ofconsciousness

sittingability

diabetes

neurologicaldisease

hospitalizationduration

injuryoperation time

luxation

lateluxation

moderator

effect

cause

OrangeOrange

SummarySummary1. Visualization methods attempt to:

• Summarize the relationships between attributes in data (interaction graph, interaction dendrogram, interaction matrix).

• Assist the user in exploring the domain and constructing classification models (interactive interaction analysis).

2. What to do with interactions:• Do make use of interactions! (rules, trees,

dependency models)• Myopia: naïve Bayesian classifier, linear SVM, perceptron,

feature selection, discretization.

• Do not assume an interaction when there isn’t one! • Fragmentation: classification trees, rules, general Bayesian

networks, TAN.