40
Diversity in search: what, how, and what for? Bettina Berendt Dept. Computer Science, KU Leuven

Diversity in search: what, how, and what for?

  • Upload
    sereno

  • View
    16

  • Download
    0

Embed Size (px)

DESCRIPTION

Diversity in search: what, how, and what for?. Bettina Berendt Dept. Computer Science, KU Leuven. Thanks to. Sebastian Kolbe-Nusser Anett Kralisch Siegfried Nijssen Ilija Suba ši ć Mathias Verbeke Hugo Zaragoza. Diversity in natural language. diverse (s#2) , various : - PowerPoint PPT Presentation

Citation preview

Page 1: Diversity           in search:  what, how,  and what for?

Diversity in search: what, how, and what for?

Bettina Berendt

Dept. Computer Science, KU Leuven

Page 2: Diversity           in search:  what, how,  and what for?

Thanks toThanks to

Sebastian Kolbe-Nusser Anett Kralisch Siegfried Nijssen Ilija Subašić Mathias Verbeke Hugo Zaragoza ...

Page 3: Diversity           in search:  what, how,  and what for?

Diversity in natural Diversity in natural languagelanguage

diverse (s#2)diverse (s#2), various :, various :distinctly dissimilar or unlikedistinctly dissimilar or unlike

..., ..., diversity (s#1)diversity (s#1), ..., variety :, ..., variety :noticeable heterogeneitynoticeable heterogeneity

(Wordnet)(Wordnet)

““the fact that members of a set are the fact that members of a set are different from one another“ different from one another“

Page 4: Diversity           in search:  what, how,  and what for?

Why is diversity Why is diversity interesting for search? interesting for search?

““People like to see a range of different, People like to see a range of different, non-redundant things/views/etc.“non-redundant things/views/etc.“

““Different people search differently.“Different people search differently.“

How?How? When / under what conditions?When / under what conditions? (What) can (What) can wewe do? do?

Page 5: Diversity           in search:  what, how,  and what for?

What is diverse?What is diverse?

DocumentsDocuments– the relevance of a document must be the relevance of a document must be

determined considering the documents determined considering the documents appearing before it (Goffman, 1964)appearing before it (Goffman, 1964)

– E.g. MMR (Carbonell & Goldstein, 1998)E.g. MMR (Carbonell & Goldstein, 1998)– Many further developments, e.g. for imagesMany further developments, e.g. for images– Presentation choices, e.g. re-ranking or Presentation choices, e.g. re-ranking or

clustering?clustering?

Page 6: Diversity           in search:  what, how,  and what for?

What is diverse?What is diverse?

DocumentsDocuments PeoplePeople

– ““The term The term diversitydiversity is a form of euphemistic is a form of euphemistic shorthand to describe differences in racial or shorthand to describe differences in racial or ethnic classifications, age, gender, religion, ethnic classifications, age, gender, religion, philosophy, physical abilities, socioeconomic philosophy, physical abilities, socioeconomic background, sexual orientation, gender background, sexual orientation, gender identity, intelligence, mental health, physical identity, intelligence, mental health, physical health, genetic attributes, behavior, health, genetic attributes, behavior, attractiveness, place of origin, cultural values, attractiveness, place of origin, cultural values, or political view as well as other identifying or political view as well as other identifying features.”features.”

http://en.wikipedia.org/wiki/Diversity_(politics)http://en.wikipedia.org/wiki/Diversity_(politics)

Page 7: Diversity           in search:  what, how,  and what for?

What is diverse?What is diverse?

DocumentsDocuments PeoplePeople

Knowledge and its articulations Knowledge and its articulations

(= documents in a wider sense?!)(= documents in a wider sense?!)– ““Knowledge and its articulations are strongly Knowledge and its articulations are strongly

influenced by diversity in, e.g., cultural influenced by diversity in, e.g., cultural backgrounds, schools of thought, geographical backgrounds, schools of thought, geographical contexts.”contexts.”

– ““LivingKnowledge will study the effect of diversity LivingKnowledge will study the effect of diversity and time on opinions and bias.”and time on opinions and bias.”

– ““The goal [is] to improve navigation and search in The goal [is] to improve navigation and search in very large multimodal datasets (e.g., the Web very large multimodal datasets (e.g., the Web itself).”itself).”

Page 8: Diversity           in search:  what, how,  and what for?

How we got hereHow we got here

The impact of language and culture on Web usage behaviour

Diversity of Diversity of usersusers

Page 9: Diversity           in search:  what, how,  and what for?

How we got hereHow we got here

The impact of language and culture on Web usage behaviour

Tools for Tools for sense-sense-making in making in literature literature searchsearch

Diversity of Diversity of usersusers

Diversity of Diversity of documentsdocuments

Page 10: Diversity           in search:  what, how,  and what for?

How we got hereHow we got here

The impact of language and culture on Web usage behaviour

Tools for Tools for sense-sense-making in making in literature literature searchsearch

PORPOISE, PORPOISE, STORIES tools STORIES tools for graphical for graphical news summa-news summa-rization and rization and understandinunderstandingg

Diversity of Diversity of usersusers

Diversity of Diversity of documentsdocuments

Page 11: Diversity           in search:  what, how,  and what for?

How we got hereHow we got here

The impact of language and culture on Web usage behaviour

Tools for Tools for sense-sense-making in making in literature literature searchsearch

PORPOISE, PORPOISE, STORIES tools STORIES tools for graphical for graphical news summa-news summa-rization and rization and understandinunderstandingg

CollaborativCollaborative re-use of e re-use of literature literature search search resultsresults

Diversity of Diversity of usersusers

Diversity of Diversity of diversity diversity

Diversity of Diversity of documentsdocuments

Page 12: Diversity           in search:  what, how,  and what for?

Why this talk?Why this talk?

The impact of language and culture on Web usage behaviour

Tools for Tools for sense-sense-making in making in literature literature searchsearch

PORPOISE, PORPOISE, STORIES tools STORIES tools for graphical for graphical news summa-news summa-rization and rization and understandinunderstandingg

CollaborativCollaborative re-use of e re-use of literature literature search search resultsresults

Diversity of Diversity of usersusers

Diversity of Diversity of diversity diversity

Diversity of Diversity of documentsdocuments

Page 13: Diversity           in search:  what, how,  and what for?

Why this talk?Why this talk?

The impact of language and culture on Web usage behaviour

Tools for Tools for sense-sense-making in making in literature literature searchsearch

PORPOISE, PORPOISE, STORIES tools STORIES tools for graphical for graphical news summa-news summa-rization and rization and understandinunderstandingg

CollaborativCollaborative re-use of e re-use of literature literature search search resultsresults

e.g. Information e.g. Information Retrieval J.Retrieval J. 20092009

Proceedings Proceedings

Living Web Living Web WS@ISWC WS@ISWC 20092009

Inf. Processing Inf. Processing & Management& Management 20102010

e.g. Knowledge e.g. Knowledge and Information and Information Systems J.Systems J. 2009 2009

Towards an integratedunderstanding of diversity

Page 14: Diversity           in search:  what, how,  and what for?

The impact of linguistic diversity The impact of linguistic diversity on Web usage and thereby on the on Web usage and thereby on the

WebWebOr: Or:

Why are non-English languages under-Why are non-English languages under-represented on the Web? represented on the Web?

A web-analysis approach asking for A web-analysis approach asking for underlyingunderlying– cognitive-linguisticcognitive-linguistic– behaviouralbehavioural– attitudeattitude

factorsfactors

Page 15: Diversity           in search:  what, how,  and what for?

A simple expectation of how A simple expectation of how much content exists in which much content exists in which

languagelanguage

Page 16: Diversity           in search:  what, how,  and what for?

But: Dynamics of content creation, But: Dynamics of content creation, link setting, link following, attitudes, link setting, link following, attitudes,

and useand use

Page 17: Diversity           in search:  what, how,  and what for?

But: Dynamics of content creation, But: Dynamics of content creation, link setting, link following, attitudes, link setting, link following, attitudes,

and useand use

People create less content

People link less to content

People use links less

People think the contentis bad... and use it less

Page 18: Diversity           in search:  what, how,  and what for?

But: Dynamics of content creation, But: Dynamics of content creation, link setting, link following, attitudes, link setting, link following, attitudes,

and useand use

Under-representation !

Page 19: Diversity           in search:  what, how,  and what for?

Underlying data and Underlying data and methodsmethods

Database of countries and official languagesDatabase of countries and official languages Distribution comparisons betweenDistribution comparisons between

– worldwide proportions of native speakers of different worldwide proportions of native speakers of different languageslanguages

– worldwide distribution of servers registered by countryworldwide distribution of servers registered by country– crawler analysis of links to a multilingual site Scrawler analysis of links to a multilingual site S– log analysis assigning each session a native languagelog analysis assigning each session a native language– log analysis of log analysis of

(user native language) – (S-entry-page language)(user native language) – (S-entry-page language) Questionnaire/TAM analysis of native and non-Questionnaire/TAM analysis of native and non-

native users of S: native users of S: – usability, ease of use, competence in English, beliefs usability, ease of use, competence in English, beliefs

about availability of content in native languageabout availability of content in native language

Page 20: Diversity           in search:  what, how,  and what for?

Some questionsSome questions

Does one find such dynamics also in Does one find such dynamics also in search engines?search engines?

What factors stop or reverse such What factors stop or reverse such language-marginalisation trends?language-marginalisation trends?– Critical mass?Critical mass?– Laws?Laws?– Volunteers?Volunteers?

Did / can Web 2.0/3.0 change this?Did / can Web 2.0/3.0 change this? (When) is it better to work without pre-(When) is it better to work without pre-

defined labels for users?defined labels for users?

Page 21: Diversity           in search:  what, how,  and what for?

Part 2: An approach that Part 2: An approach that ......

Does one find such dynamics also in Does one find such dynamics also in search engines?search engines?

What factors stop or reverse such What factors stop or reverse such language-marginalisation trends?language-marginalisation trends?– Critical mass?Critical mass?– Laws?Laws?– Volunteers?Volunteers?

Did / can Web 2.0/3.0 change this?Did / can Web 2.0/3.0 change this? (When) is it better to work without pre-(When) is it better to work without pre-

defined labels for users?defined labels for users?

Page 22: Diversity           in search:  what, how,  and what for?

Motivation (1): Motivation (1): Diversity of people is ...Diversity of people is ...

Speaking different Speaking different languages (etc.) languages (etc.) localisation / localisation / internationalisationinternationalisation

Having different Having different abilities abilities accessibilityaccessibility

Liking different Liking different things things collaborative collaborative filteringfiltering

Structuring the Structuring the world in different world in different ways ways ? ?

Page 23: Diversity           in search:  what, how,  and what for?

Motivation (2): Motivation (2): Diversity-aware applications ...Diversity-aware applications ...

Must have a (formal) notion of diversityMust have a (formal) notion of diversity Can follow aCan follow a

– ““personalization approach“personalization approach“ adapt to the user‘s value on the diversity adapt to the user‘s value on the diversity

variable(s)variable(s)

transparently? Is this paternalistic?transparently? Is this paternalistic?

– ““customization approach“customization approach“ show the space of diversityshow the space of diversity

allow choice / raise awareness / semi-allow choice / raise awareness / semi-automatic!automatic!

Page 24: Diversity           in search:  what, how,  and what for?

Measuring grouping Measuring grouping diversitydiversity

Diversity = 1 – similarity = 1 - Normalized mutual Diversity = 1 – similarity = 1 - Normalized mutual informationinformation

NMI = 0

NMI = 0.35

By colour &

Page 25: Diversity           in search:  what, how,  and what for?

Measuring user diversityMeasuring user diversity

““How similarly do two users group How similarly do two users group documents?“documents?“

For each query For each query qq, consider their groupings , consider their groupings grgr::

““How similarly do two users group How similarly do two users group documents?“documents?“

For each query For each query qq, consider their groupings , consider their groupings grgr::

For various queries: aggregateFor various queries: aggregate

Page 26: Diversity           in search:  what, how,  and what for?

... and now: the application ... and now: the application domaindomain

... that‘s only the 1st step!

Page 27: Diversity           in search:  what, how,  and what for?

WorkflowWorkflow

1. Query2. Automatic clustering3. Manual regrouping 4. Re-use

1. Learn + present way(s) of grouping2. Transfer the constructed concepts

Page 28: Diversity           in search:  what, how,  and what for?

ConceptsConcepts

ExtensionExtension– the instances in a groupthe instances in a group

IntensionIntension– Ideally: “squares vs. Ideally: “squares vs.

circles“circles“– Pragmatically: defined Pragmatically: defined

via a classifiervia a classifier

Page 29: Diversity           in search:  what, how,  and what for?

Step 1: RetrieveStep 1: Retrieve

CiteseerX via OAI Output: set of

– document IDs, – document details– their texts

Page 30: Diversity           in search:  what, how,  and what for?

Step 2: ClusterStep 2: Cluster

“the classic bibliometric solution“ CiteseerCluster:

– Similarity measure: co-citation, bibliometric coupling, word or LSA similarity, combinations

– Clustering algorithm: k-means, hierarchical Damilicious: phrases Lingo How to choose the How to choose the “best“? best“?

– Experiments: Lingo better than k-means at Experiments: Lingo better than k-means at reconstruction and extension-over-timereconstruction and extension-over-time

Page 31: Diversity           in search:  what, how,  and what for?

Step 3 (a): Re-organise Step 3 (a): Re-organise & work on document groups& work on document groups

Page 32: Diversity           in search:  what, how,  and what for?

Step 3 (b): Step 3 (b): Visualising document groupsVisualising document groups

Page 33: Diversity           in search:  what, how,  and what for?

Steps 4+5: Re-useSteps 4+5: Re-use Basic idea: Basic idea:

1.1. learn a classifier from the final grouping (Lingo phrases)learn a classifier from the final grouping (Lingo phrases)2.2. apply the classifier to a new search result apply the classifier to a new search result

“ “re-use semantics“re-use semantics“ Whose grouping?Whose grouping?

– One‘s ownOne‘s own– Somebody else‘sSomebody else‘s

Which search result?Which search result?– “ “ the same“ (same query, structuring by somebody else)the same“ (same query, structuring by somebody else)– “ “ More of the same“ (same query, later time More of the same“ (same query, later time more more

doc.s)doc.s)– “ “ related“ (... Measured how? ...)related“ (... Measured how? ...)– arbitraryarbitrary

Page 34: Diversity           in search:  what, how,  and what for?

Visualising user diversity (1)Visualising user diversity (1)Simulated users with different Simulated users with different

strategiesstrategies U0: did not change anything U0: did not change anything

(“System“)(“System“) U1: U1: tried produce a better fit of the

document groups to the cluster intensions; 5 regroupings

U2: attempted to move everything that did not fit well into the remainder group “Other topics”, & better fit; 10 regroupings

U3: attempted to move everything from „Other topics“ into matching real groups; 5 regroupings

U4: regrouping by author and institution; 5 regroupings

5*5 matrix of diversities gdiv(A,B,q) multidimensional scaling

Page 35: Diversity           in search:  what, how,  and what for?

Visualising user diversity (2)Visualising user diversity (2)

aggregatedaggregatedusing using gdiv(A,B)gdiv(A,B)

Web miningWeb mining Data miningData mining RFIDRFID

Page 36: Diversity           in search:  what, how,  and what for?

Evaluating the applicationEvaluating the application

Clustering only: Does it generate Clustering only: Does it generate meaningful document groups?meaningful document groups?– yes (tradition in bibliometrics) – but: data?yes (tradition in bibliometrics) – but: data?– Small expert evaluation of CiteseerClusterSmall expert evaluation of CiteseerCluster

Clustering & regroupingClustering & regrouping– End-user experiment with CiteseerClusterEnd-user experiment with CiteseerCluster

– 5-person5-person formative user study of formative user study of DamiliciousDamilicious

Page 37: Diversity           in search:  what, how,  and what for?

The Damilicious tool: Summary The Damilicious tool: Summary and and

(some) open questions(some) open questions A tool that helps users in sense-making, exploring diversity, A tool that helps users in sense-making, exploring diversity,

and re-using semanticsand re-using semantics

diversity measures when queries and result sets are different? how to best present of diversity?

– How to integrate into an environment supporting user and community contexts?

Incentives to use the functionalities? how to find the best balance between similarity and diversity? which measures of grouping diversity are most meaningful?

– Extensional?– Intensional? Structure-based? Hybrid? (cf. ontology matching)

which other sources of user diversity? Diversity and relevance: can we learn from user-dependent

relevance judgements?

Page 38: Diversity           in search:  what, how,  and what for?

Some lessons learned Some lessons learned (or questions raised?)(or questions raised?)

We need to embrace diversity.We need to embrace diversity. We need to take into account We need to take into account

– The diversity of documents / knowledgeThe diversity of documents / knowledge– The diversity of peopleThe diversity of people– The diversity of diversity .The diversity of diversity .

We need to be clear about what we mean.We need to be clear about what we mean. We need to ask whether / when „striving for We need to ask whether / when „striving for

diversity“ is in itself A Good Thing.diversity“ is in itself A Good Thing. We need to ask whether / when „raising We need to ask whether / when „raising

awareness of diversity“ is in itself A Good awareness of diversity“ is in itself A Good Thing.Thing.

Thanks!

Page 39: Diversity           in search:  what, how,  and what for?

Diversity in search: what, how, and what for?

Bettina Berendt

Dept. Computer Science, KU Leuven

Page 40: Diversity           in search:  what, how,  and what for?

... and now: the application ... and now: the application domaindomain

... that‘s only the 1st step!