Upload
lawrence-ferguson
View
20
Download
3
Embed Size (px)
DESCRIPTION
Attribute Interactions in Medical Data Analysis. A . Jakulin 1 , I . Bratko 1,2 , D . Smrke 3 , J . Dem šar 1 , B. Zupan 1,2,4 University of Ljubljana , Slovenia . Jožef Stefan Institute, Ljubljana, Slovenia . Dept. of Traumatology, University Clinical Center, Ljubljana, Slovenia . - PowerPoint PPT Presentation
Citation preview
Attribute InteractionsAttribute Interactionsin Medical Data Analysisin Medical Data Analysis
AA.. Jakulin Jakulin11,, I I. . BratkoBratko1,21,2,, D D. . SmrkeSmrke33, , JJ.. Dem Demšaršar11, B. Zupan, B. Zupan1,2,41,2,4
1. University of Ljubljana, Slovenia.2. Jožef Stefan Institute, Ljubljana, Slovenia.3. Dept. of Traumatology, University Clinical Center, Ljubljana, Slovenia.4. Dept. of Human and Mol. Genetics, Baylor College of Medicine, USA.
OverviewOverview
1. Interactions:– Correlation can be generalized to more than 2
attributes, to capture interactions - higher-order regularities.
2. Information theory:– A non-parametric approach for measuring
‘association’ and ‘uncertainty’.3. Applications:
– Automatic selection of informative visualizations uncover previously unseen structure in medical data.
– Automatic constructive induction of new features.4. Results:
– Better predictive models for hip arthroplasty.– Better understanding of the data.
Attribute DependenciesAttribute Dependencies
C
BA
label (outcome, diagnosis)
attribute(feature)
attribute(feature)
importance of attribute Bimportance of attribute A
3-Way Interaction: What is common to A, B and C together;
and cannot be inferred from pairs of attributes.
attribute correlation
2-Way Interactions
Shannon’s EntropyShannon’s Entropy
C
Entropy given C’s empirical probability distribution (p = [0.2, 0.8]).
A
H(A)Information
which came with knowledge of A
I(A;C)=H(A)+H(C)-H(AC)Mutual information or information gain ---
How much have A and C in common?
H(C|A) = H(C)-I(A;C)Conditional entropy --- Remaining uncertaintyin C after knowing A.
H(AB)Joint entropy
Interaction InformationInteraction Information
• Interaction information can be:– NEGATIVE – redundancy among attributes (negative int.)
– NEGLIGIBLE – no interaction
– POSITIVE – synergy between attributes (positive int.)
I(A;B;C) :=
I(AB;C) - I(B;C)- I(A;C)
= I(A;B|C) - I(A;B)
History of Interaction InformationHistory of Interaction Information
(Partial) history of independent reinventions:
• McGill ‘54 (Psychometrika) - interaction information• Han ‘80 (Information & Control) - multiple mutual information• Yeung ‘91 (IEEE Trans. Inf. Theory) - mutual information• Grabisch & Roubens ‘99 (game theory) - Banzhaf interaction index• Matsuda ‘00 (Physical Review E) - higher-order mutual inf.• Brenner et al. ‘00 (Neural Computation) - average synergy• Demšar ’02 (machine learning) - relative information gain• Bell ‘03 (NIPS02, ICA2003) - co-information• Jakulin ’03 (machine learning) - interaction gain
Utility of Interaction InformationUtility of Interaction Information
1. Visualization of interactions in data• Interaction graphs, dendrograms
2. Construction of predictive models• Feature construction, combination, selection
Case studies:• Predicting the success of hip arthroplasty (HHS).• Predicting the contraception method used from
demographic data (CMC).
Predictive modeling helps us focus only on interactions that involve the outcome.
Interaction Matrix for CMC DomainInteraction Matrix for CMC Domain
Illustrates the interaction information for all pairs of attributes. red – positive, blue – negative, green – independent.
An attribute’s information gain
Interaction GraphsInteraction GraphsInformation gain:
100% I(A;C)/H(C)The attribute “explains” 1.98% of label entropy
A positive interaction:
100% I(A;B;C)/H(C)The two attributes are in a synergy:
treating them holistically may result in 1.85% extra uncertainty explained.
A negative interaction:
100% I(A;B;C)/H(C)The two attributes are slightly redundant: 1.15% of label uncertainty is explained
by each of the two attributes.
Interaction DendrogramInteraction Dendrogram
cluster “tightness”loose tight
information gain
uninformative attribute
informativeattribute
weakly interacting strongly interacting
Interpreting the DendrogramInterpreting the Dendrogram
an unimportant interaction
a positive interaction
a cluster of negatively interacting attributes
a weakly negative interaction
a useless attribute
Application to the Harris hip Application to the Harris hip score prediction (HHS)score prediction (HHS)
Attribute Structure for HHSAttribute Structure for HHS
Discovered from data Designed by the physician
“Bipolar endoprosthesis and short duration of operation significantly
increases the chances of a good outcome.”
“Presence of neurological disease is a high risk factor only in the
presence of other complications during operation.”
late complications
rehabilitation
A Positive InteractionA Positive Interaction
Both attributes are useless alone, but useful together.They should be combined into a single feature (e.g. with a classification tree, a rule or a Cartesian product attribute).
These two attributes are also correlated: correlation doesn’t imply redundancy.
A Negative InteractionA Negative Interaction
Once we know the wife’s or the husband’s education,the other attribute will not provide much new information.
But they do provide some, if you know how to use it! Feature combination may work: feature selection throws data away.
very fewinstances!
Prediction of HHSPrediction of HHSBrier score - probabilistic evaluation (K classes, N instances):
Models:• Tree-Augmented NBC: 0.227 ± 0.018• Naïve Bayesian classifier: 0.223 ± 0.014• General Bayesian net: 0.208 ± 0.006• Simple feature selection with NBC: 0.196 ± 0.012• FSS with background concepts: 0.196 ± 0.011• 10 top interactions → FSS: 0.189 ± 0.011
– Tree-Augmented NB: 0.207 ± 0.017– Search for feature comb.: 0.185 ± 0.012
2
,, ˆ1
)ˆ,( N
i
K
jjiji pp
KppBS
The Best ModelThe Best Model
An attribute’s information
These two (not very logical) combinations of features are only worth 0.2% loss in performance.
The endoprosthesis and operation duration interaction provides little information that wouldn’t already be provided by these attributes: it interacts
negatively with the model.
A Causal DiagramA Causal Diagram
HHS
pulmonarydisease
loss ofconsciousness
sittingability
diabetes
neurologicaldisease
hospitalizationduration
injuryoperation time
luxation
lateluxation
moderator
effect
cause
SummarySummary1. Visualization methods attempt to:
• Summarize the relationships between attributes in data (interaction graph, interaction dendrogram, interaction matrix).
• Assist the user in exploring the domain and constructing classification models (interactive interaction analysis).
2. What to do with interactions:• Do make use of interactions! (rules, trees,
dependency models)• Myopia: naïve Bayesian classifier, linear SVM, perceptron,
feature selection, discretization.
• Do not assume an interaction when there isn’t one! • Fragmentation: classification trees, rules, general Bayesian
networks, TAN.