Aggregating Multiple Dimensions for Computing Document Relevance

1

Aggregating Multiple Dimensions for Computing Document Relevance

Mauro DragoniFondazione Bruno Kessler (FBK), Shape and Evolving Living

Knowledge Unit (SHELL)

2nd KEYSTONE Summer SchoolSantiago de Compostela, July 21st 2016

2

How will we spend time today? Our Goal:

to understand how documents can be evaluated by adopting a multi-criteria framework

Presentation of the theoretical framework

Case Study 1

Representing documents through different layers

Case Study 2

Combining user profiles, queries, and document content for computing relevance

Case Study 3

Merge and explode Case Study 1 and Case Study 2…

3

Why is this topic interesting? Indexing documents and querying repositories is not only a

matter of weighting terms

At the end of this lesson you should be able to: consider a document from different perspectives understand why YOU can be part of the document score know how to treat different type of information content

What might I expect from you? To see a paper on this topic published in the near future… To get new ideas, proposed by you…

4

Some Background The main idea behind this topic is “multi-criteria decision

making”

What does it mean? Suppose to have an entity E and a set C of n criteria We need to evaluate, for each criterion Ci how much E satisfies Ci

We have to aggregate all satisfaction degrees for evaluating E

Some suggested papers Ronald R. Yager. Modeling prioritized multicriteria decision making. IEEE Trans. Systems,

Man, and Cybernetics, Part B 34(6): 2396-2404 (2004) Ronald R. Yager. Prioritized aggregation operators. Int. J. Approx. Reasoning 48(1): 263-

274 (2008) Célia da Costa Pereira, Mauro Dragoni, Gabriella Pasi. Multidimensional relevance:

Prioritized aggregation in a personalized Information Retrieval setting. Inf. Process. Manage. 48(2): 340-357 (2012)

Francesco Corcoglioniti, Mauro Dragoni, Marco Rospocher, Alessio Palmero Aprosio. Knowledge Extraction for Information Retrieval. ESWC 2016: 317-333

5

Further Readings Fuzzy Logic

Zadeh book and papers

Knowledge Extraction Semantic Web (ISWC conference series, KBS and JWS journals, …) Knowledge Management (KR, IJCAI, AAAI, …) Natural Language Processing (ACL, COLING, …)

User Modeling and Interaction UMAP proceedings HCI papers

6

Introductory Example John is looking for a bicycle for his little son John takes care of two criteria: “safety” and “inexpensiveness” John considers “safety” > “inexpensiveness”

We may two scenarios:1. John is not able to find a “safe” bicycle that is also “cheap”.2. John has a low budget. Thus, he has to find a trade-off between the

two criteria.

EC1 C2

7

Problem Representation Components

the set C of the n considered criteria: C = {C1, …, Cn};

the collection D of entities (documents in the specific case of IR);

an aggregation function F computing the score F(C1(d),…, Cn(d)) of each document d contained in D;

a priority model P defined by… someone (user, system maintainer, etc.);

a weighting schema W.

8

Weighting Schema – Expert-based choice Weights are arbitrarily chosen by an expert.

No rules for computing them.

For example: C λ1 = 0.7 C2 λ2 = 0.5 C3 λ3 = 0.6 C4 λ4 = 0.3

You need to justify the values you choose.

9

Weighting Schema – Priority-based choice Weights are computed “automatically” based on the priority

between criteria.

For each document d, the weight of the most important criterion C1 is set to 1.0 by definition.

The weights of the other criteria are computed as follows:

10

Weight Schema - Considerations A weighting schema can be decided a-priori but…

We can learn a new weighting schema: from learn-to-rank dataset, or from the IR system usage.

The choice of the weighting schema, obviously, affects the effectiveness of your information retrieval system.

Where can we apply such weighting schema?

11

Three (not exhaustive) Operators As you can imagine… there are different ways for combining

weights and criteria

Operator 1: “Scoring” weighted criteria scores are summed

Operator 2: “Min” or “And” among weighted criteria scores, minimum score is selected

Operator 3: “Max” or “Or” among weighted criteria scores, maximum score is selected

12

The “Scoring” Operator The overall document score is computed by summing the

weighted scores computed for all criteria.

The score computed on the most important criterion leads the overall document score.

Less important criteria help in refining the overall document score.

13

The “And” (or “Min”) Operator The document score is strongly dependent on the degree of

satisfaction of the least satisfied criterion

Very restrictive operator

Suggestion: consider criteria that are really relevant for a user!!!

14

The “Or” (or “Max”) Operator Dangerous operator!

Recommendation: criteria with a satisfaction degree of zero do not have to be considered.

It is useful only when priority between criteria is not used. Weighting schema is manually defined Weight of less important criteria are not based on the value of the

most important ones.

15

Operators’ Properties

Boundary Conditions

Continuity

Monotonicity (just for Scoring)

Absorbing Element (“0”, for Scoring and Min operators)

16

The Operators in Action Assume to have a document D composed as follows:

Title

Abstract

IntroductionContent

Title C1

Abstract C2

Introduction C3

Content C4

17

The Operators in Action Suppose to perform a query as follows:

Q = {qt1, qt2, qt3} Assume that, for each document field, you have a normalized

similarity values: sim(Q, DTitle) = 0.5 sim(Q, DAbstract) = 0.4 sim(Q, DIntroduction) = 0.2 sim(Q, DContent) = 0.7

As you can imagine, by using different priorities and different aggregations, the document score will be different.

18

The Operators in Action

Criteria score: C1 = 0.5; C2 = 0.8; C3 = 0.2; C4 = 0.7

Priority schemas:P1: C1 > C2 > C3 > C4

P2: C1 > C2 > C4 > C3

Weights:

for P1: for P2:w1: 1.0 w1: 1.0w2: 1.0 * 0.5 = 0.5 w2: 1.0 * 0.5 = 0.5w3: 0.5 * 0.8 = 0.4 w3: 0.5 * 0.8 = 0.4w4: 0.4 * 0.2 = 0.08 w4: 0.4 * 0.7 =

0.28

19

The Operators in Action Document score

“Scoring” operator: • DP1 = (0.5 * 1.0) + (0.8 * 0.5) + (0.2 * 0.4) + (0.7 * 0.08) = 1.036• DP2 = (0.5 * 1.0) + (0.8 * 0.5) + (0.7 * 0.4) + (0.2 * 0.28) = 1.236

“And” operator:• DP1 = min(0.5^1.0, 0.8^0.5, 0.2^0.4, 0.7^0.08) = min(0.5, 0.89, 0.53, 0.97)

= 0.5• DP2 = min(0.5^1.0, 0.8^0.5, 0.7^0.4, 0.2^0.28) = min(0.5, 0.89, 0.87, 0.64)

= 0.5

“Or” operator:• DP1 = max(0.5^1.0, 0.8^0.5, 0.2^0.4, 0.7^0.08) = max(0.5, 0.89, 0.53,

0.97) = 0.97• DP2 = max(0.5^1.0, 0.8^0.5, 0.7^0.4, 0.2^0.28) = max(0.5, 0.89, 0.87,

0.64) = 0.89

20

Any question so far?

Timeout…

21

Case Study 1 – The Scenario Keyword search over a multi-layer representation of documents Documents and queries structure:

Textual layer: natural language text Metadata layers:

• Entity Linking• Predicates• Roles/Types• Timing Information

Problems: How to compute the score for each layer? How to aggregate such scores? How to weight each layer?

22

Case Study 1 – The Scenario Natural language content is enriched with four

metadata/semantic layers URI Layer: links with entities detected into the text and mapped to

DBpedia entities TYPE Layer: conceptual classification of the named entities detected

into the text and mapped with both DBpedia and Yago knowledge bases

TIME Layer: metadata related to the temporal mentions find into the text by using a temporal expression recognizer (ex. “the eighteenth century”, “2015-18-12”, etc.)

FRAME Layer: output of the application of semantic role labeling techniques. Generally, this output includes predicates and their arguments describing a specific role in the context of the predicate.Example: “He has been influenced by Carl Gauss” [framebase:Subjective_influence; dbpedia:Carl_Friedrich_Gauss]

23

Case Study 1 – Example Text: “astronomers influenced by Gauss”

Layers URI Layer: “dbpedia:Carl_Friedrich_Gauss” TYPE Layer: “yago:GermanMathematicians”,

“yago:NumberTheorists”, “yago:FellowsOfTheRoyalSociety”

TIME Layer: “day:1777-04-30”, “day:1855-02-23”, “century:1700” FRAME Layer: “Subjective_influence.v_Carl_Friedrich_Gauss”

Annotations provided by PIKES (https://pikes.fbk.eu)

24

Case Study 1 - Evaluation

331 documents, 35 queries Jörg Waitelonis, Claudia Exeler, Harald Sack. Enabled Generalized Vector Space

Model to Improve Document Retrieval. NLP-DBPEDIA@ISWC 2015: 33-44

Multi-value relevance (1=irrelevant, 5=relevant)

Diverse queries: from keyword-base search to queries requiring semantic capabilities

25

Case Study 1 - Evaluation 2 baselines:

Google custom search API Textual layer only (~Lucene)

Measures: Prec1,5,10, MAP, MAP10, NDCG, NDCG10

Same weights for textual and semantic layers: TEXTUAL (50%) URI (12,5%), TYPE (12,5%), FRAME (12,5%), TIME (12,5%)

26


Approach/

SystemPrec1 Prec5 Prec10 NDCG

NDCG1

0 MAP MAP10

Google 0.543 0.411 0.343 0.434 0.405 0.255 0.219

Textual 0.943 0.669 0.453 0.832 0.782 0.733 0.681

KE4IR 0.971 0.680 0.474 0.854 0.806 0.758 0.713

KE4IR vs.

Textual 3.03% 1.71% 4.55% 2.64% 2.99% 3.50% 4.74%

27


Layers (TEXTUAL+) Prec1 Prec5 Prec10 NDCG NDCG10 MAP MAP10 URI,TYPE,FRAME,TIME 0.971 0.680 0.474 0.854 0.806 0.758 0.713URI,TYPE,FRAME 0.971 0.680 0.474 0.853 0.804 0.757 0.712URI,TYPE,TIME 0.971 0.680 0.474 0.851 0.802 0.757 0.712URI,TYPE 0.971 0.680 0.474 0.849 0.801 0.755 0.710URI,FRAME,TIME 0.971 0.674 0.465 0.844 0.796 0.750 0.702URI,FRAME 0.971 0.674 0.465 0.842 0.795 0.749 0.702URI,TIME 0.971 0.674 0.465 0.840 0.791 0.747 0.700TYPE,FRAME,TIME 0.943 0.674 0.471 0.848 0.799 0.745 0.700TYPE,TIME 0.943 0.674 0.471 0.843 0.794 0.743 0.697TYPE,FRAME 0.943 0.674 0.468 0.847 0.797 0.743 0.695FRAME,TIME 0.943 0.674 0.462 0.842 0.793 0.741 0.693

28


29

Case Study 1 – What We Learnt How the effectiveness of a system can be affected if we change

weights.

In this specific case, the use of an expert-based weighting schema helps you in balancing the importance of the semantic information…

… however, we are using learning to rank for identifying potential priorities between used layers.

Further lessons more related to the use of semantic layers.

Future work: to apply the approach to larger collections.

30

Any question onCase Study 1?

Timeout…

31

Case Study 2 – The Scenario Combine document information with user profiles.

Assumption: you already have computed user profiles.

Which information can you use? RELIABILITY: How much a user trusts the document source. COVERAGE: How strongly a user profiles is represented in a document

(inclusion of a user profiles into a document). APPROPRIATENESS: How much a document satisfies a user profile

(similarity between user profile and document). ABOUTNESS: Trivial criterion, how much a document matches the

performed query

32

Case Study 2 – Reliability Why do I trust information source differently?

How much do you trust an information source? you might fix such values; you might infer them.

33

Case Study 2 – Coverage The “coverage” criterion allows to compute how strongly a user

profile is contained in the document

Suppose to have a profile of a user interested in the following topics:

c = {sports, economics} Suppose to have a document talking about the following topics:

d= {violence, politics, economics, sports}

c = {0, 0, 1, 1} d = {1, 1, 1, 1} Coverage(c,d) = 1.0

34

Case Study 2 – Appropriateness The “appropriateness” criterion allows to compute how much a

document satisfies a user profile

Suppose to have a profile of a user interested in the following topics:

c = {sports, economics} Suppose to have a document talking about the following topics:

d= {violence, politics, economics, sports}

c = {0, 0, 1, 1} d = {1, 1, 1, 1} Appropriateness(c,d) = 0.5

35

Case Study 2 – Aboutness The “classic” similarity between a query and documents

contained in a repository.

Many model available… and various adaptations based on the context.

36

Case Study 2 – Validation The Reuters RCV1 Collection has been used for creating user

profiles and for generating user queries. 20 users have been involved in the evaluation campaign. Different aggregation schemas have been tested.

37

Case Study 2 – Validation (Ab > Ap > C > R)

38

Case Study 2 – What We Learnt When users are involved, it is very difficult to define an

aggregation schema.

The same occurs for the priority between criteria.

Creating (or learning) a user profiles is already a big problem itself.

The quality of user profiles significantly affects the effectiveness of the retrieval algorithm.

If you start playing with criteria and weight schemas, you will never end!!!

39

Any question onCase Study 2?

Timeout…

40

Case Study 3 Let’s get back to the first simple example…

Title

Abstract

IntroductionContent

Title C1

Abstract C2

Introduction C3

Content C4

41

Case Study 3 – Suppose that… Each field has been annotated with different ontologies, but

belonging to the same domain this means that you have, for the same field, many layers with

different annotations… one for each used ontology Your repository contains documents coming from different

sources is the reliability of each repository the same?

Your users have a history Users profiles need to be updated

this aspect is out of the scope of this talk… but you should be aware of it…

Any other idea?

42

Exploding Fields

You have something to think about… Good luck!!!

43

So… for concluding Considering retrieval as a multi-criteria decision making

problem is interesting to explore.

There is room for investigating a lot of stuff.

Do not be scary on using user profiles. I invite you to consider recent works on simulating user interactions

with IR systems• David Maxwell, Leif Azzopardi. Simulating Interactive Information Retrieval:

SimIIR: A Framework for the Simulation of Interaction. SIGIR 2016: 1141-1144 (+ the tutorial he gave)

My suggestion: try to combine content semantic metadata users history

44

It’s time for questions…

Mauro DragoniFondazione Bruno Kessler

https://shell.fbk.eu/index.php/Mauro_Dragoni [email protected]

Data & Analytics

Aggregating Multiple Dimensions for Computing Document Relevance