Upload
sereno
View
16
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Diversity in search: what, how, and what for?. Bettina Berendt Dept. Computer Science, KU Leuven. Thanks to. Sebastian Kolbe-Nusser Anett Kralisch Siegfried Nijssen Ilija Suba ši ć Mathias Verbeke Hugo Zaragoza. Diversity in natural language. diverse (s#2) , various : - PowerPoint PPT Presentation
Citation preview
Diversity in search: what, how, and what for?
Bettina Berendt
Dept. Computer Science, KU Leuven
Thanks toThanks to
Sebastian Kolbe-Nusser Anett Kralisch Siegfried Nijssen Ilija Subašić Mathias Verbeke Hugo Zaragoza ...
Diversity in natural Diversity in natural languagelanguage
diverse (s#2)diverse (s#2), various :, various :distinctly dissimilar or unlikedistinctly dissimilar or unlike
..., ..., diversity (s#1)diversity (s#1), ..., variety :, ..., variety :noticeable heterogeneitynoticeable heterogeneity
(Wordnet)(Wordnet)
““the fact that members of a set are the fact that members of a set are different from one another“ different from one another“
Why is diversity Why is diversity interesting for search? interesting for search?
““People like to see a range of different, People like to see a range of different, non-redundant things/views/etc.“non-redundant things/views/etc.“
““Different people search differently.“Different people search differently.“
How?How? When / under what conditions?When / under what conditions? (What) can (What) can wewe do? do?
What is diverse?What is diverse?
DocumentsDocuments– the relevance of a document must be the relevance of a document must be
determined considering the documents determined considering the documents appearing before it (Goffman, 1964)appearing before it (Goffman, 1964)
– E.g. MMR (Carbonell & Goldstein, 1998)E.g. MMR (Carbonell & Goldstein, 1998)– Many further developments, e.g. for imagesMany further developments, e.g. for images– Presentation choices, e.g. re-ranking or Presentation choices, e.g. re-ranking or
clustering?clustering?
What is diverse?What is diverse?
DocumentsDocuments PeoplePeople
– ““The term The term diversitydiversity is a form of euphemistic is a form of euphemistic shorthand to describe differences in racial or shorthand to describe differences in racial or ethnic classifications, age, gender, religion, ethnic classifications, age, gender, religion, philosophy, physical abilities, socioeconomic philosophy, physical abilities, socioeconomic background, sexual orientation, gender background, sexual orientation, gender identity, intelligence, mental health, physical identity, intelligence, mental health, physical health, genetic attributes, behavior, health, genetic attributes, behavior, attractiveness, place of origin, cultural values, attractiveness, place of origin, cultural values, or political view as well as other identifying or political view as well as other identifying features.”features.”
http://en.wikipedia.org/wiki/Diversity_(politics)http://en.wikipedia.org/wiki/Diversity_(politics)
What is diverse?What is diverse?
DocumentsDocuments PeoplePeople
Knowledge and its articulations Knowledge and its articulations
(= documents in a wider sense?!)(= documents in a wider sense?!)– ““Knowledge and its articulations are strongly Knowledge and its articulations are strongly
influenced by diversity in, e.g., cultural influenced by diversity in, e.g., cultural backgrounds, schools of thought, geographical backgrounds, schools of thought, geographical contexts.”contexts.”
– ““LivingKnowledge will study the effect of diversity LivingKnowledge will study the effect of diversity and time on opinions and bias.”and time on opinions and bias.”
– ““The goal [is] to improve navigation and search in The goal [is] to improve navigation and search in very large multimodal datasets (e.g., the Web very large multimodal datasets (e.g., the Web itself).”itself).”
How we got hereHow we got here
The impact of language and culture on Web usage behaviour
Diversity of Diversity of usersusers
How we got hereHow we got here
The impact of language and culture on Web usage behaviour
Tools for Tools for sense-sense-making in making in literature literature searchsearch
Diversity of Diversity of usersusers
Diversity of Diversity of documentsdocuments
How we got hereHow we got here
The impact of language and culture on Web usage behaviour
Tools for Tools for sense-sense-making in making in literature literature searchsearch
PORPOISE, PORPOISE, STORIES tools STORIES tools for graphical for graphical news summa-news summa-rization and rization and understandinunderstandingg
Diversity of Diversity of usersusers
Diversity of Diversity of documentsdocuments
How we got hereHow we got here
The impact of language and culture on Web usage behaviour
Tools for Tools for sense-sense-making in making in literature literature searchsearch
PORPOISE, PORPOISE, STORIES tools STORIES tools for graphical for graphical news summa-news summa-rization and rization and understandinunderstandingg
CollaborativCollaborative re-use of e re-use of literature literature search search resultsresults
Diversity of Diversity of usersusers
Diversity of Diversity of diversity diversity
Diversity of Diversity of documentsdocuments
Why this talk?Why this talk?
The impact of language and culture on Web usage behaviour
Tools for Tools for sense-sense-making in making in literature literature searchsearch
PORPOISE, PORPOISE, STORIES tools STORIES tools for graphical for graphical news summa-news summa-rization and rization and understandinunderstandingg
CollaborativCollaborative re-use of e re-use of literature literature search search resultsresults
Diversity of Diversity of usersusers
Diversity of Diversity of diversity diversity
Diversity of Diversity of documentsdocuments
Why this talk?Why this talk?
The impact of language and culture on Web usage behaviour
Tools for Tools for sense-sense-making in making in literature literature searchsearch
PORPOISE, PORPOISE, STORIES tools STORIES tools for graphical for graphical news summa-news summa-rization and rization and understandinunderstandingg
CollaborativCollaborative re-use of e re-use of literature literature search search resultsresults
e.g. Information e.g. Information Retrieval J.Retrieval J. 20092009
Proceedings Proceedings
Living Web Living Web WS@ISWC WS@ISWC 20092009
Inf. Processing Inf. Processing & Management& Management 20102010
e.g. Knowledge e.g. Knowledge and Information and Information Systems J.Systems J. 2009 2009
Towards an integratedunderstanding of diversity
The impact of linguistic diversity The impact of linguistic diversity on Web usage and thereby on the on Web usage and thereby on the
WebWebOr: Or:
Why are non-English languages under-Why are non-English languages under-represented on the Web? represented on the Web?
A web-analysis approach asking for A web-analysis approach asking for underlyingunderlying– cognitive-linguisticcognitive-linguistic– behaviouralbehavioural– attitudeattitude
factorsfactors
A simple expectation of how A simple expectation of how much content exists in which much content exists in which
languagelanguage
But: Dynamics of content creation, But: Dynamics of content creation, link setting, link following, attitudes, link setting, link following, attitudes,
and useand use
But: Dynamics of content creation, But: Dynamics of content creation, link setting, link following, attitudes, link setting, link following, attitudes,
and useand use
People create less content
People link less to content
People use links less
People think the contentis bad... and use it less
But: Dynamics of content creation, But: Dynamics of content creation, link setting, link following, attitudes, link setting, link following, attitudes,
and useand use
Under-representation !
Underlying data and Underlying data and methodsmethods
Database of countries and official languagesDatabase of countries and official languages Distribution comparisons betweenDistribution comparisons between
– worldwide proportions of native speakers of different worldwide proportions of native speakers of different languageslanguages
– worldwide distribution of servers registered by countryworldwide distribution of servers registered by country– crawler analysis of links to a multilingual site Scrawler analysis of links to a multilingual site S– log analysis assigning each session a native languagelog analysis assigning each session a native language– log analysis of log analysis of
(user native language) – (S-entry-page language)(user native language) – (S-entry-page language) Questionnaire/TAM analysis of native and non-Questionnaire/TAM analysis of native and non-
native users of S: native users of S: – usability, ease of use, competence in English, beliefs usability, ease of use, competence in English, beliefs
about availability of content in native languageabout availability of content in native language
Some questionsSome questions
Does one find such dynamics also in Does one find such dynamics also in search engines?search engines?
What factors stop or reverse such What factors stop or reverse such language-marginalisation trends?language-marginalisation trends?– Critical mass?Critical mass?– Laws?Laws?– Volunteers?Volunteers?
Did / can Web 2.0/3.0 change this?Did / can Web 2.0/3.0 change this? (When) is it better to work without pre-(When) is it better to work without pre-
defined labels for users?defined labels for users?
Part 2: An approach that Part 2: An approach that ......
Does one find such dynamics also in Does one find such dynamics also in search engines?search engines?
What factors stop or reverse such What factors stop or reverse such language-marginalisation trends?language-marginalisation trends?– Critical mass?Critical mass?– Laws?Laws?– Volunteers?Volunteers?
Did / can Web 2.0/3.0 change this?Did / can Web 2.0/3.0 change this? (When) is it better to work without pre-(When) is it better to work without pre-
defined labels for users?defined labels for users?
Motivation (1): Motivation (1): Diversity of people is ...Diversity of people is ...
Speaking different Speaking different languages (etc.) languages (etc.) localisation / localisation / internationalisationinternationalisation
Having different Having different abilities abilities accessibilityaccessibility
Liking different Liking different things things collaborative collaborative filteringfiltering
Structuring the Structuring the world in different world in different ways ways ? ?
Motivation (2): Motivation (2): Diversity-aware applications ...Diversity-aware applications ...
Must have a (formal) notion of diversityMust have a (formal) notion of diversity Can follow aCan follow a
– ““personalization approach“personalization approach“ adapt to the user‘s value on the diversity adapt to the user‘s value on the diversity
variable(s)variable(s)
transparently? Is this paternalistic?transparently? Is this paternalistic?
– ““customization approach“customization approach“ show the space of diversityshow the space of diversity
allow choice / raise awareness / semi-allow choice / raise awareness / semi-automatic!automatic!
Measuring grouping Measuring grouping diversitydiversity
Diversity = 1 – similarity = 1 - Normalized mutual Diversity = 1 – similarity = 1 - Normalized mutual informationinformation
NMI = 0
NMI = 0.35
By colour &
Measuring user diversityMeasuring user diversity
““How similarly do two users group How similarly do two users group documents?“documents?“
For each query For each query qq, consider their groupings , consider their groupings grgr::
““How similarly do two users group How similarly do two users group documents?“documents?“
For each query For each query qq, consider their groupings , consider their groupings grgr::
For various queries: aggregateFor various queries: aggregate
... and now: the application ... and now: the application domaindomain
... that‘s only the 1st step!
WorkflowWorkflow
1. Query2. Automatic clustering3. Manual regrouping 4. Re-use
1. Learn + present way(s) of grouping2. Transfer the constructed concepts
ConceptsConcepts
ExtensionExtension– the instances in a groupthe instances in a group
IntensionIntension– Ideally: “squares vs. Ideally: “squares vs.
circles“circles“– Pragmatically: defined Pragmatically: defined
via a classifiervia a classifier
Step 1: RetrieveStep 1: Retrieve
CiteseerX via OAI Output: set of
– document IDs, – document details– their texts
Step 2: ClusterStep 2: Cluster
“the classic bibliometric solution“ CiteseerCluster:
– Similarity measure: co-citation, bibliometric coupling, word or LSA similarity, combinations
– Clustering algorithm: k-means, hierarchical Damilicious: phrases Lingo How to choose the How to choose the “best“? best“?
– Experiments: Lingo better than k-means at Experiments: Lingo better than k-means at reconstruction and extension-over-timereconstruction and extension-over-time
Step 3 (a): Re-organise Step 3 (a): Re-organise & work on document groups& work on document groups
Step 3 (b): Step 3 (b): Visualising document groupsVisualising document groups
Steps 4+5: Re-useSteps 4+5: Re-use Basic idea: Basic idea:
1.1. learn a classifier from the final grouping (Lingo phrases)learn a classifier from the final grouping (Lingo phrases)2.2. apply the classifier to a new search result apply the classifier to a new search result
“ “re-use semantics“re-use semantics“ Whose grouping?Whose grouping?
– One‘s ownOne‘s own– Somebody else‘sSomebody else‘s
Which search result?Which search result?– “ “ the same“ (same query, structuring by somebody else)the same“ (same query, structuring by somebody else)– “ “ More of the same“ (same query, later time More of the same“ (same query, later time more more
doc.s)doc.s)– “ “ related“ (... Measured how? ...)related“ (... Measured how? ...)– arbitraryarbitrary
Visualising user diversity (1)Visualising user diversity (1)Simulated users with different Simulated users with different
strategiesstrategies U0: did not change anything U0: did not change anything
(“System“)(“System“) U1: U1: tried produce a better fit of the
document groups to the cluster intensions; 5 regroupings
U2: attempted to move everything that did not fit well into the remainder group “Other topics”, & better fit; 10 regroupings
U3: attempted to move everything from „Other topics“ into matching real groups; 5 regroupings
U4: regrouping by author and institution; 5 regroupings
5*5 matrix of diversities gdiv(A,B,q) multidimensional scaling
Visualising user diversity (2)Visualising user diversity (2)
aggregatedaggregatedusing using gdiv(A,B)gdiv(A,B)
Web miningWeb mining Data miningData mining RFIDRFID
Evaluating the applicationEvaluating the application
Clustering only: Does it generate Clustering only: Does it generate meaningful document groups?meaningful document groups?– yes (tradition in bibliometrics) – but: data?yes (tradition in bibliometrics) – but: data?– Small expert evaluation of CiteseerClusterSmall expert evaluation of CiteseerCluster
Clustering & regroupingClustering & regrouping– End-user experiment with CiteseerClusterEnd-user experiment with CiteseerCluster
– 5-person5-person formative user study of formative user study of DamiliciousDamilicious
The Damilicious tool: Summary The Damilicious tool: Summary and and
(some) open questions(some) open questions A tool that helps users in sense-making, exploring diversity, A tool that helps users in sense-making, exploring diversity,
and re-using semanticsand re-using semantics
diversity measures when queries and result sets are different? how to best present of diversity?
– How to integrate into an environment supporting user and community contexts?
Incentives to use the functionalities? how to find the best balance between similarity and diversity? which measures of grouping diversity are most meaningful?
– Extensional?– Intensional? Structure-based? Hybrid? (cf. ontology matching)
which other sources of user diversity? Diversity and relevance: can we learn from user-dependent
relevance judgements?
Some lessons learned Some lessons learned (or questions raised?)(or questions raised?)
We need to embrace diversity.We need to embrace diversity. We need to take into account We need to take into account
– The diversity of documents / knowledgeThe diversity of documents / knowledge– The diversity of peopleThe diversity of people– The diversity of diversity .The diversity of diversity .
We need to be clear about what we mean.We need to be clear about what we mean. We need to ask whether / when „striving for We need to ask whether / when „striving for
diversity“ is in itself A Good Thing.diversity“ is in itself A Good Thing. We need to ask whether / when „raising We need to ask whether / when „raising
awareness of diversity“ is in itself A Good awareness of diversity“ is in itself A Good Thing.Thing.
Thanks!
Diversity in search: what, how, and what for?
Bettina Berendt
Dept. Computer Science, KU Leuven
... and now: the application ... and now: the application domaindomain
... that‘s only the 1st step!