67
Musings at the Crossroads Musings at the Crossroads of of Digital Libraries, Digital Libraries, Information Retrieval, and Information Retrieval, and Scientometrics Scientometrics http://bit.ly/rguCabanac2012 http://bit.ly/rguCabanac2012 Guillaume Cabanac [email protected] March 28th, 2012

Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

Embed Size (px)

Citation preview

Page 1: Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

Musings at the Crossroads ofMusings at the Crossroads ofDigital Libraries, Information Retrieval, Digital Libraries, Information Retrieval,

and Scientometricsand Scientometrics

http://bit.ly/rguCabanac2012http://bit.ly/rguCabanac2012

Guillaume [email protected]

March 28th, 2012

Page 2: Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

Outline of these Musings

2

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Digital LibrariesDigital Libraries Collective annotations Social validation of discussion threads Organization-based document similarity

Information RetrievalInformation Retrieval The tie-breaking bias in IR evaluation Geographic IR Effectiveness of query operators

ScientometricsScientometrics Recommendation based on topics and social clues Landscape of research in Information Systems

The submission-date bias in peer-reviewed conferences

Page 3: Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

3

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Digital LibrariesDigital Libraries Collective annotations Social validation of discussion threads Organization-based document similarity

Information RetrievalInformation Retrieval The tie-breaking bias in IR evaluation Geographic IR Effectiveness of query operators

ScientometricsScientometrics Recommendation based on topics and social clues Landscape of research in Information Systems

The submission-date bias in peer-reviewed conferences

Outline of these Musings

Page 4: Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

4

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Digital LibrariesDigital Libraries Collective annotations Social validation of discussion threads Organization-based document similarity

Question DL-1

How to transpose paper-based annotations into digital documents?

IRIRDLDL

SCIMSCIM

Guillaume Cabanac, Max Chevalier, Claude Chrisment, Christine Julien. “Collective annotation: Perspectives for information retrieval improvement.” RIAO’07 : Proceedings of the 8th conference on Information Retrieval and its Applications, pages 529–548. CID, may 2007.

Page 5: Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

5

Characteristics of paper annotation Secular activity: older than 4 centuries Numerous applicative contexts: theology, science, literature … Personal use: “active reading” (Adler & van Doren, 1972)

Collective use: review process, opinion exchange …

From Individual Paper-based Annotation …

US students

(Marshall, 1998)

1541

Annotated bible

(Lortsch, 1910)

Fermat’s last theorem

(Kleiner, 2000)

Annotations from Blake, Keats…

(Jackson, 2001)

Les Misérables

Victor Hugo

1630 1790 1830 1881 1998

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 6: Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

6

… to Collective Digital Annotations

author

87%

reader13%

1993 2005

ComMentor … iMarkup … Yawas … Amaya …

> 20 annotation systems(Cabanac et al., 2005)

Web servers (Ovsiannikov et al., 1999)

Annotation server

a discussion thread

Hard to share ‘lost’

hardcopy

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 7: Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

7

W3C Annotea / Amaya (Kahan et al., 2002)

Digital Document Annotation: Examples

a reader’s comment

discussionthread

Arakne, featuring “fluid annotations” (Bouvin et al., 2002)

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 8: Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

8

Collective Annotations Reviewed 64 systems designed during 1989–2008

Collective Annotation Objective data

Owner, creation date Anchoring point within the document. Granularity: all doc, words…

Subjective information Comments, various marks: stars, underlined text… Annotation types: support/refutation, question… Visibility: public, private, group…

Purpose-oriented annotation categoriesAnnotation remark

Annotation reminder

Annotation argumentation

Personal Annotation Space

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 9: Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

9

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Digital LibrariesDigital Libraries Collective annotations Social validation of discussion threads Organization-based document similarity

Question DL-2

How to measure the social validity ofa statement according to the

argumentative discussion it sparked off?

IRIRDLDL

SCIMSCIM

Guillaume Cabanac, Max Chevalier, Claude Chrisment, Christine Julien. “Social validation of collective annotations : Definition and experiment.” Journal of the American Society for Information Science and Technology, 61(2):271–287, feb. 2010, Wiley. DOI:10.1002/asi.21255

Page 10: Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

10

Scalability issue

Which annotationsshould I read?

Social validation = degree of consensus of the group

Social Validation

Social Validation of Argumentative DebatesMusings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 11: Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

11

Social Validation of Argumentative Debates

BeforeAnnotation magma

AfterFiltered display

Informing readers about how validated each annotation is

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 12: Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

12

Overview

Two proposed algorithms Empirical Recursive Scoring Algorithm (Cabanac et al., 2005)

Bipolar Argumentation Framework Extension based on Artificial Intelligence research works (Cayrol & Lagasquie-Schiex, 2005)

Social Validation Algorithms

validity

0socially neutral

– 1 socially refuted

1socially confirmed

case 1case 2case 3 case 4

A

B

A

B

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 13: Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

13

Example

Computing the social validity of a debated annotation

Social Validation AlgorithmMusings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 14: Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

14

Validation with a User-study

Design

Corpus: 13 discussion threads= 222 annotations + answers

Task of a participant Label opinion type Infer overall opinion

Volunteer subjects

53

119

Aim: social validation vs human perception of consensus

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 15: Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

15

Q1 Do people agree when labeling opinions? Kappa coefficient (Fleiss, 1971; Fleiss et al., 2003)

Inter-rater agreement among n > 2 raters

Weak agreement, with variability subjective task

Experimenting the Social Validation of Debates

Debate Id

Fair to good

Poor

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Val

ue o

f Kap

pa

agreement

Page 16: Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

16

Q2 How well SV approximates HP? HP = Human Perception of consensus SV = Social Validation algorithm

1. Test whether PH and VS are different (p < 0.05) Student’s paired t-test: (p = 0,20) > ( = 0,05)

2. Correlate HP et SV Pearson’s coefficient of correlation r

r(HP, SV) = 0.48 shows a weak correlation

Experimenting the Social Validation of Debates

HP – SV

Density y = p(HP – SV)

example: HP = SV for 24 % of all cases

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Den

sity

Page 17: Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

17

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Digital LibrariesDigital Libraries Collective annotations Social validation of discussion threads Organization-based document similarity

Question DL-3

How to harness a quiescent capital present in any community:

its documents?

IRIRDLDL

SCIMSCIM

Guillaume Cabanac, Max Chevalier, Claude Chrisment, Christine Julien. “Organization of digital resources as an original facet for exploring the quiescent information capital of a community.” International Journal on Digital Libraries, 11(4):239–261, dec. 2010, Springer. DOI:10.1007/s00799-011-0076-6

Page 18: Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

18

Personal Documents Filtered, validated, organized information…

… relevant to activities in the organization

Paradox: profitable, but under-exploited Reason 1 – folders and files are private

Reason 2 – manual sharing

Reason 3 – automated sharing

Consequences People resort to resources available outside of the community Weak ROI why would we have to look outside when it’s already there?

Documents as a Quiescent WealthMusings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 19: Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

19

Mapping the documents of the community SOM [Kohonen, 2001] Umap [Triviumsoft] TreeMap [Fekete & Plaisant, 2001]…

Limitations Find the documents with same topicssame topics as D Find documents that colleagues useuse with D

concept of usage: grouping documentsgrouping documents ⇆ keeping stuff in commonkeeping stuff in common

How to Benefit from Documents in a Community?Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 20: Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

20

Organization-based similarities inter-folder

inter-document

inter-user

Musings at the Crossroads of DL, IR, and SCIM

Guillaume CabanacHow to Benefit from Documents in a Community?

Page 21: Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

21

Purpose: Offering a global view of … people and their documents

Based on document contents Based on document usage/organization

Requirement: non-intrusiveness and confidentiality

OperationalOperational needs Find documents

With related materials With complementary materials

Seeking people ⇆ seeking documents

ManagerialManagerial needs Visualize the global/individual activity Work position required documents

How to Help People to Discover/Find/Use Documents?

community

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 22: Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

22

4 views = {documents, people} {group, unit}

1. Group of documents Main topics Usage groups

2. A single document Who to liaise with? What to read?

3. Group of people Community of interest Community of use

4. A single people Interests Similar users (potential help)

Proposed System: Static AspectMusings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 23: Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

23

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Digital LibrariesDigital Libraries Collective annotations Social validation of discussion threads Organization-based document similarity

Information RetrievalInformation Retrieval The tie-breaking bias in IR evaluation Geographic IR Effectiveness of query operators

ScientometricsScientometrics Recommendation based on topics and social clues Landscape of research in Information Systems

The submission-date bias in peer-reviewed conferences

Outline of these Musings

Page 24: Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

24

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Question IR-1

Is document tie-breaking affecting the evaluation of

Information Retrieval systems?

IRIRDLDL

SCIMSCIM

Information RetrievalInformation Retrieval The tie-breaking bias in IR evaluation Geographic IR Effectiveness of query operators

Guillaume Cabanac, Gilles Hubert, Mohand Boughanem, Claude Chrisment. “Tie-breaking Bias : Effect of an Uncontrolled Parameter on Information Retrieval Evaluation.” M. Agosti, N. Ferro, C. Peters, M. de Rijke, and A. F. Smeaton (Eds.) CLEF’10 : Proceedings of the 1st Conference on Multilingual and Multimodal Information Access Evaluation, volume 6360 de LNCS, pages 112–123. Springer, sep. 2010. DOI:10.1007/978-3-642-15998-5_13

Page 25: Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

25

Measuring the Effectiveness of IR systems User-centered vs. System-focused [Spärck Jones & Willett,

1997]

Evaluation campaigns 1958 Cranfield, UK 1992 TREC (Text Retrieval Conference), USA 1999 NTCIR (NII Test Collection for IR Systems), Japan 2001 CLEF (Cross-Language Evaluation Forum), Europe …

“Cranfield” methodology Task Test collection

Corpus Topics Qrels

Measures : MAP, P@X ... using trec_eval

[Voorhees, 2007]

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 26: Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

26

Runs are Reordered Prior to Their EvaluationQrels = qid, iter, docno, rel Run = qid, iter, docno, rank, sim,

run_id

Reordering by trec_evalqid asc, sim desc, docno desc

Effectiveness measure = f (intrinsic_quality, )MAP, P@X, MRR…

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 27: Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

27

Consequences of Run Reordering Measures of effectiveness for an IRS s

RR(s,t) 1/rank of the 1st relevant document, for topic t

P(s,t,d) precision at document d, for topic t

AP(s,t) average precision for topic t

MAP(s) mean average precision

Tie-breaking bias

Is the Wall Street Journal collection more relevant than Associated Press?

Problem 1comparing 2 systemsAP(s1, t) vs. AP(s2, t)

Problem 2 comparing 2 topicsAP(s, t1) vs. AP(s, t2)

ChrisChris

EllenEllen

Sensitive to document

rank

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 28: Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

28

What we Learnt: Beware of Tie-breaking for AP Poor effect on MAP, larger effect on AP

Measure bounds APRealistic APConventionnal APOptimistic

Failure analysis for the ranking process Error bar = element of chance potential for improvement

padre1, adhoc’94

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 29: Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

29

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Question IR-2

How to retrieve documents matching keywords and

spatiotemporal constraints?

IRIRDLDL

SCIMSCIM

Information RetrievalInformation Retrieval The tie-breaking bias in IR evaluation Geographic IR Effectiveness of query operators

Damien Palacio, Guillaume Cabanac, Christian Sallaberry, Gilles Hubert. “On the evaluation of geographic information retrieval systems: Evaluation framework and case study.” International Journal on Digital Libraries, 11(2):91–109, june 2010, Springer. DOI:10.1007/s00799-011-0070-z

Page 30: Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

30

Geographic Information Retrieval Query = “Road trip around Aberdeen summer 1982”

Search engines Topic term {road, trip, Aberdeen, summer}

spatial {AberdeenCity, AberdeenCounty…} Geographic temporal [21-JUN-1982 .. 22-SEP-1982]

term {road, trip, Aberdeen, summer}

1/6 queries = geographic queries Excite (Sanderson et al., 2004) AOL (Gan et al., 2008) Yahoo! (Jones et al., 2008)

Current issue worth studying

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 31: Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

31

The Internals of a Geographic IR System 3 dimensions to process

Topical, spatial, temporal

1 index per dimension Topic bag of words, stemming, weighting, comparing with VSM… Spatial spatial entity detection, spatial relation resolution… Temporal temporal entity detection…

Query processing with sequential filtering e.g., priority to theme, then filtering according to other dimensions

Issue: effectiveness of GIRSs vs state-of-the-art IRSs?

Hypothesis: GIRSs better than state-of-the-art IRSs

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 32: Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

32

Case Study: the PIV GIR System Indexing: one index per dimension

Topical = Terrier IRS Spatial = tiling Temporal = tiling

Retrieval Identification of the 3 dimensions in the query Routing towards each index Combination of results with CombMNZ [Fox & Shaw, 1993; Lee 1997]

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 33: Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

33

Case Study: the PIV GIR System Principle of CombMNZ and Borda Count

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 34: Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

34

Case Study: the PIV GIR System Gain in effectiveness

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 35: Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

35

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Question IR-3

Do operators in search queries improve the effectiveness of search results?

IRIRDLDL

SCIMSCIM

Information RetrievalInformation Retrieval The tie-breaking bias in IR evaluation Geographic IR Effectiveness of query operators

Gilles Hubert, Guillaume Cabanac, Christian Sallaberry, Damien Palacio. “Query Operators Shown Beneficial for Improving Search Results.” S. Gradmann, F. Borri, C. Meghini, H. Schuldt (Eds.) TPDL’11 : Proceedings of the 1st International Conference on Theory and Practice of Digital Libraries, volume 6966 de LNCS, pages 118–129. Springer, sep. 2011. DOI:10.1007/978-3-642-24469-8_14.

Page 36: Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

Various Operators Quotation marks, Must appear (+), boosting operator (^),

Boolean operators, proximity operators…

36

Information need

“I’m looking for research projects funded in the DL domain”

Regular query Query with operators

Search Engines Offer Query Operators

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 37: Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

Our Research Questions

37

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 38: Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

38

Our Methodology in a Nutshell

Regular query V1: Query variant with operators

V3V2

V4VN. . .

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 39: Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

39

Effectiveness of Query Operators TREC-7 per Topic Analysis: Boxplots

‘+’ and ‘^’

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 40: Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

40

Effectiveness of Query Operators Per Topic Analysis: Box plot

AP of TREC’s regular query

Query variant highest AP

32Topics

AP (A

vera

ge P

reci

sion

)

0.2

0.1

0.3

0.4

Query variant lowest AP

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 41: Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

41

Effectiveness of Query Operators TREC-7 Per Topic Analysis

‘+’ and ‘^’

MAP = 0.1554

MAP ┬ = 0.2099+35.1%

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 42: Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

42

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Digital LibrariesDigital Libraries Collective annotations Social validation of discussion threads Organization-based document similarity

Information RetrievalInformation Retrieval The tie-breaking bias in IR evaluation Geographic IR Effectiveness of query operators

ScientometricsScientometrics Recommendation based on topics and social clues Landscape of research in Information Systems

The submission-date bias in peer-reviewed conferences

Outline of these Musings

Page 43: Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

43

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Question SCIM-1

How to recommend researchers according to their research topics

and social clues?

IRIRDLDL

SCIMSCIM

ScientometricsScientometrics Recommendation based on topics and social clues Landscape of research in Information Systems

The submission-date bias in peer-reviewed conferences

Guillaume Cabanac. “Accuracy of inter-researcher similarity measures based on topical and social clues.” Scientometrics, 87(3):597–620, june 2011, Springer. DOI:10.1007/s11192-011-0358-1

Page 44: Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

44

Recommendation of Literature (McNee et al., 2006)

Collaborative filtering Principle: mining the preferencespreferences of researchers

those who liked this paper also liked…

Snowball effect / fad Innovation? Relevance of theme?

Cognitive filtering Principle: mining the contentscontents of articles

profile of resources (researcher, articles) citation graph

Hybrid approach

????

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 45: Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

45

Foundations: Similarity Measures Under Study

Model Coauthors graph authors auteurs Venues graph authors conferences / journals

Social similarities Inverse degree of separation length of the shortest path Strength of the tie number of shortest paths Shared conferences number of shared conference editions

Thematic similarity Cosine on Vector Space Model di = (wi

1, … , win)

built on titles (doc / researcher)

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 46: Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

46

Computing Similarities with Social Clues Task of literature review

Requirement topical relevance Preference social proximity (meetings, project…)

re-rank topical results with social clues

Combination with CombMNZ (Fox & Shaw, 1993)

Final result: list of recommended researchers

CombMNZDegree of separation

Strength of ties

Shared conferences

Social list

Topical list

CombMNZ TS listTS list

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 47: Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

47

Evaluation Design Comparison of recommendations and researchers’ perception

Q1 : Effectiveness of topical (only) recommendations? Q2 : Gain due to integrating social clues?

IR experiments: Cranfield paradigm (TREC…) Does the search engine retrieve relevant documents?

Doc relevant?

assessor

relevance judgments{0, 1} binary[0, N] gradual

qrels

trec_eval

Effectiveness measuresMean Average PrecisionNormalized Discounted Cumulative Gain

topic S1 S2

1 0.5687 0.6521

… … …

50 0.7124 0.7512

avg 0.6421 0.7215

improvement +12.3 % significativity p < 0.05 (paired t-test)

search engine xinput

topic

corpus

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 48: Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

48

Evaluating Recommendations

doc relevant ?

assessor

relevance judgments{0, 1} binary[0, N] gradual

qrels

trec_eval

Effectiveness measures Mean Average PrecisionNormalized Discounted Cumulative Gain

topic S1 S21 0.5687 0.6521

… … …

50 0.7124 0.7512

avg 0.6421 0.7215

improvement +12.3 % significativity p < 0.05 (paired t-test)

search engine xinput

topic

corpus

name of a researcher

researcher

« With whom would you like to chat for improving your research? »

recommender system

topical topical + social

#subjects

Top 25

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 49: Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

49

Experiment Features

Data dblp.xml (713 MB = 1.3M publications for 811,787 researchers) Subjects 90 researchers-contacts contacted by mail

74 researchers began to fill the questionnaire. 71 completed it

Interface for assessing recommendations

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 50: Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

50

Experiments: Profile of the Participants Experience of the 71 subjects Mdn = 13 years

74

Productivity of the 71 subjects Mdn = 15 publications

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Num

ber o

f par

ticip

ants

Num

ber o

f par

ticip

ants

Seniority

Number of publications

Page 51: Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

51

Empirical Validation of our Hypothesis Strong baseline effective approach based on VSM

+8.49 % = significant improvement (p < 0.05 ; n = 70)

of topical recommendations by social clues

0,5

0,6

0,7

0,8

0,9

1

global < 15 publis >= 15 publis < 13 ans >= 13 ans

Thématique Thématique + Social

productivity experience

+8,49 %+8,49 % +10,39 %+10,39 % +7,03 %+7,03 % +6,50 %+6,50 % +10,22 %+10,22 %

NDC

G

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Topical Topical + social

yearsyears

Page 52: Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

52

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Question SCIM-2

What is the landscape of research in Information Systems from the perspective of gatekeepers?

IRIRDLDL

SCIMSCIM

ScientometricsScientometrics Recommendation based on topics and social clues Landscape of research in Information Systems

The submission-date bias in peer-reviewed conferences

Guillaume Cabanac. “Shaping the landscape of research in Information Systems from the perspective of editorial boards : A scientometric study of 77 leading journals.” Journal of the American Society for Information Science and Technology, 63, to appear in 2012, Wiley. DOI:10.1002/asi.22609

Page 53: Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

53

Landscape of Research in Information Systems The gatekeepers of science

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 54: Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

54

Landscape of Research in Information Systems The 77 core peer-reviewed IS journals in the WoS

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 55: Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

55

Landscape of Research in Information Systems Exploratory data analysis

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 56: Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

56

Landscape of Research in Information Systems Exploratory data analysis

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 57: Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

57

Landscape of Research in Information Systems Topical map of the IS field

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 58: Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

58

Landscape of Research in Information Systems Most influential

gatekeepers

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 59: Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

59

Landscape of Research in Information Systems Number of gatekeepers per country

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 60: Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

60

Landscape of Research in Information Systems Geographic and gender diversity

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 61: Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

61

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Question SCIM-3

What if submission date influenced the acceptance of conference papers?

IRIRDLDL

SCIMSCIM

ScientometricsScientometrics Recommendation based on topics and social clues Landscape of research in Information Systems

The submission-date bias in peer-reviewed conferences

Guillaume Cabanac. “What if submission date influenced the acceptance of conference papers?” Submitted to the Journal of the American Society for Information Science and Technology, Wiley.

Page 62: Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

62

Conferences Affected by a Submission-Date bias? Peer-review

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 63: Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

63

The Submission-Date bias Dataset from the ConfMaster conference management system

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 64: Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

64

The Submission-Date bias Influence of submission date on bids

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 65: Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

65

The Submission-Date bias Influence of submission date on average marks

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 66: Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

Conclusion

66

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Digital LibrariesDigital Libraries Collective annotations Social validation of discussion threads Organization-based document similarity

Information RetrievalInformation Retrieval The tie-breaking bias in IR evaluation Geographic IR Effectiveness of query operators

ScientometricsScientometrics Recommendation based on topics and social clues Landscape of research in Information Systems

The submission-date bias in peer-reviewed conferences

Page 67: Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics

Thank you

http://www.irit.fr/~Guillaume.Cabanachttp://www.irit.fr/~Guillaume.Cabanac

Twitter: @tafanorTwitter: @tafanor