“To Fuse or Not to Fuse: Cognitive Diversity for Combining Multiple Scoring Systems”

TO FUSE OR NOT TO FUSE:

COGNITIVE DIVERSITY FOR COMBINING

MULTIPLE SCORING SYSTEMS (MSS)

Frank Hsu

Fordham University

IBM Cognitive System Institute Group (CSIG),

Dec. 17, 2015

1

2

To rank a list of choices (subjects, objects, items, options, …)

Genes, ligands,

or DNA

fragments in

Biomedical

Science

Targets, documents,

trajectories, or host

names in Technology

or Engineering

Movies, books,

apartments,

skaters, or sports

teams in Social

Network or Social

Choices

Customers,

vendors,

corporate risks,

or stocks in

Business and

Finance

Customers,

vendors,

corporate risks,

or stocks in

Business and

Finance

Biomedical and Health STEM Areas

Society and Social Choices Business and Finance

Genes, ligands,

or DNA

fragments in

Biomedical

Science

Targets, documents,

trajectories, or host

names in Technology

or Engineering

Movies, books,

apartments,

skaters, or sports

teams in Social

Network or Social

Choices

Labels and

degree of stress

in classification

and affective

computing

respectively

Customers,

vendors,

corporate risks,

or stocks in

Business and

Finance

3

Each choice (or option) has (or can be described by)

a set of variables:

Attributes,

criteria, cues,

features,

indicators, judges,

parameters, …

Variables

A, B, and C, D.

C = SC(A, B)

D = RC(A, B)

Scoring Systems

sA rA sB rB sC rC sD rD

d1

d2

.

.

di

.

.

dn

A B C D

* * * * * *

4

Domain Examples:

Active Search in

Chemical Space

Internet Search Strategy Figure Skating Judgment

Crossing the street

5

Combining Multiple Scoring Systems (MSS) to

rank a group of skaters:

J1 J2 J3 SC Final Rank

d1 8.5 7 9.7 25.2 4

d2 7.6 8.4 9.6 25.6 3

d3 8.3 5.6 9.75 23.65 7

d4 6.4 7.4 9.81 21.61 8

d5 9.4 7.8 9.68 26.88 2

d6 9.5 8.5 9.2 27.2 1

d7 7.9 6.3 10 24.2 6

d8 10 10 5.1 25.1 5

J1 J2 J3 RC Final Rank

d1 4 5 4 13 4.5

d2 7 3 6 16 7

d3 5 7 3 15 6

d4 8 8 2 18 8

d5 3 4 5 12 3

d6 2 2 7 11 2

d7 6 6 1 13 4.5

d8 1 1 8 10 1

(a) Scores and Score Combination (b) Ranks and Rank Combination

6

Similarity between two scoring systems, d(A, B):

(a) Data correlation (1885 - )

Pearson’s correlation coefficiency (P).

Spearman’s footrule (F).

Kendall’s rank correlation tau (T).

Spearman’s rank correlation rho (R).

■ RSC Functions fJ1, fJ2, fJ3

(b) Information Diversity

■ Cognitive Diversity d(A,B) between two

Scoring systems A and B is based on the rank-score

Characteristic (RSC) function of A and B (fA and fB).

J1 J2 J3

1 1 1 1

2 0.86 0.75 0.97

3 0.71 0.63 0.93

4 0.57 0.5 0.9

5 0.43 0.38 0.86

6 0.28 0.25 0.83

7 0.14 0.13 0.8

8 0 0 0

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 2 4 6 8

Score

Rank

J1

J2

J3

fJ1 fJ2 fJ3

fJ2

fJ1

fJ3

7

Combinatorial Fusion Algorithm(CFA):

D= set of classes, documents, genes, molecules with

|D| =n.

N= the set {1,2,….,n}

R= a set of real numbers

f(i)=(s ° r-1) (i) =s (r-1(i))

Ref: Hsu et al in Advanced Data Mining Technologies in Bioinformatics, Idea Group Inc. 2006.

(a) Multiple Scoring Systems (MSS)

Each scoring system has a score function sA, rank function rA, and the rank-

score characteristic function (RSC) fA.

(b) Diversity (or similarity) between two scoring systems A and B, d(A, B) can be defined

using score functions, rank functions, or rank-score characteristic (RSC) functions:

d(A, B) = d(sA, sB), or d(rA, rB), or d(fA, fB).

8

Combining MSS for structure-based virtual screening:

(I) Combining 2 to 5 scoring systems (by rank or by score)

with performance comparisons

Combinations of different methods improve the performances

The combination of B and D works best on thymidine kinase (TK)

Ref: Yang et al. Journal of Chemical Information and Modeling. 45, (2005). pp. 1134-1146.

The Performance of Thymidine Kinase (TK)

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

0 200 400 600 800 1000

Rank

Score

GEMDOCK-BindingGEMDOCK-PharmaGOLD-GoldScoreGOLD-GoldinterGOLD-ChemScore

TK

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

E D C A B

DE

CE

AE

BE

CD

AD

AC

BC

AB

BD

CD

E

AC

E

AB

E

AD

E

BC

E

BD

E

AC

D

AB

D

BC

D

AB

C

AC

DE

BC

DE

AB

CE

AB

DE

AB

CD

AB

CD

E

CombinationsA

vera

ge G

H S

core

rank combinationscore combination

TK

9

Combining MSS for structure-based virtual screening: (II) Positive

cases(o) vs negative cases (x) for 80 2-combinations in terms of

performance ration (x-coordinate) and cognitive

diversity ( y-coordinate)

10

It was shown in the information retrieval domain that under certain

conditions (one of these condition is higher cognitive diversity), rank

combination can be better than score combination.

Ref: Hsu, D.F., Taksa, I. Information Retrieval 8(3), pp. 449–480, 2005.

11

Target Tracking with Three Features:

We use three features:

• Color – average normalized RGB color

• Position – location of the target region centroid

• Shape – area of the target region

+

Color

Position

Shape

Ref: Lyons, D.M., Hsu, D.F. Information Fusion 10(2): pp. 124-136, 2009.

12

Target Tracking

Seq. RUN2

Score fusio

n

MSSD Avg

. MSSD V

ar.

RUN3

Score and r

ank fusion

using groun

d truth to se

lect

MSSD Avg

. MSSD V

ar.

RUN4

Score and r

ank fusion u

sing rank-sc

ore function

to select

MSSD Avg

. MSSD Va

r.

1 1537.22 694.47 1536.65 695.49 1536.9 694.24

2 816.53 8732.13 723.13 3512.19 723.09 3511.41

3 108.89 61.61 108.34 60.58 108.89 61.61

4 23.14 2.39 23.04 2.30 23.14 2.39

5 334.13 120.11 332.89 119.39 334.138 120.11

6 96.40 119.22 66.9 12.91 67.28 13.38

7 577.78 201.29 548.6 127.78 577.78 201.29

8 538.35 605.84 500.9 57.91 534.3 602.85

9 143.04 339.73 140.18 297.07 142.33 294.94

10 260.24 86.65 252.17 84.99 258.64 85.94

11 520.13 2991.17 440.98 2544.69 470.27 2791.62

12 1188.81 745.01 1188.81 745.01 1188.81 745.01

RUN4 is as good or better

(highlighted in gray) than

RUN2 in all cases

RUN4 is, predictably, not

always as good as RUN3

(‘best case’).

Note: Lower MSSD implies

better tracking performance.

13

Cognitive Informatics: Combining Two Visual Perception

Systems

Ref: A Batallones et al; On the combination of two visual cognition systems using

combinatorial fusion, Brain Informatics (2015), 2, p.21 - 32.

14

Cognitive Diversity provides information diversity

(complementary to and in contrast with the statistical

data correlation):

■ In Similarity measurement between two scoring systems(or data

distributions):

■ In Goodness of Fit between two models (or hypotheses):

■ In Cognitive Computing between two hypotheses (or scoring systems) in

order to decide when and how To Fuse (or to combine) multiple scoring

systems.

Pearson, foot-

rule, Kendall

tau, Spearman

rho.

CD vs

Chi-square

test,

Kolomogorov-

Smirnov test.

CD vs

NLP, ML, DM,

IR, ensemble,

MADM

SC, RC, majority

voting, weighted

SC, weighted

RC, POSet, max,

min, ave., …

&

15

Cognitive Systems that are capable of combining a group of diverse

and good-performance scoring systems from a variety of sensors,

sources, and software

Can serve as a resilient engine and effective telescope

For the new scientific discovery paradigm (integration vs. reduction)

In the era of data-driven human-interactive knowledge discovery.

D. F. Hsu; IBM CSIG seminar , Dec. 17, 2015