Upload
sheila-sanders
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
Kshitij: Kshitij: A Search and Page Recommendation A Search and Page Recommendation System for WikipediaSystem for Wikipedia
Center for E-Business TechnologySeoul National University
Seoul, Korea
Nam, Kwang-hyun
Intelligent Database Systems LabSchool of Computer Science & EngineeringSeoul National University, Seoul, Korea
Phanikumar Bhamidipati, Kamalakar Karlapalem
Center for Data Engineering International Institute of Information Technology,
Hyderabad, India
COMAD 2008
Copyright 2009 by CEBT
ContentsContents
Motivation Problem statement Kshitij
Overview Graph Model Architecture Algorithms
– CBR, LBR, YBR, AR
Results Conclusion & Future Work Discussion
IDS Lab Seminar - 2
Copyright 2009 by CEBT
MotivationMotivation
New paradigms in Search Increased interest after PageRank and HITS (Hyperlink-
Induced Topic Search) algorithms
Wikipedia Powerful online collaborative encyclopedia Vast knowledge, available in structured format The links in each page represent some kind of relation with
the base page– Can be mine both the semantics and data from Wikipedia
Need for systems that leverage Wikipedia knowledge in recommendations
IDS Lab Seminar - 3
Copyright 2009 by CEBT
KshitijKshitij
A generic recommendation system based on Wikipedia semantics
Provides two services Search Recommendations Page Recommendations
Uses Yago as the stored knowledge base Extracts additional knowledge dynamically from the Wiki
pages.
IDS Lab Seminar - 4
Copyright 2009 by CEBT
Search RecommendationsSearch Recommendations
IDS Lab Seminar - 5
Kshitij RecommendationsKshitij RecommendationsResult from Search EngineResult from Search Engine
Keyword as inputKeyword as input
Copyright 2009 by CEBT
Page RecommendationsPage Recommendations
When the user visit a page, its identifier is sent as input to the algorithms to obtain recommendations
The most relevant aggregated results
Displayed as hyperlinks
IDS Lab Seminar - 6
Copyright 2009 by CEBT
Kshitij - OverviewKshitij - Overview
Leverages the structured model powered by Wikis
Categories
Links
YAGO
An ontology compiled from Wikipedia
The static source of knowledge
IDS Lab Seminar - 7
Copyright 2009 by CEBT
The Graph StructureThe Graph Structure
IDS Lab Seminar - 8
Atari JaguarAtari Jaguar
Atari Jaguar IIAtari Jaguar II
JaguarJaguar
Jaguar CarsJaguar Cars
Atari 7800Atari 7800SearchSearch
FelidaeFelidae
Black PantherBlack Panther
MammalMammalWilliam LyonsWilliam Lyons AutomobileAutomobile
Copyright 2009 by CEBT
Kshitij - AlgorithmsKshitij - Algorithms
Three individual recommendations that explore different semantics CBR LBR YBR
A link based aggregator (AR) Combines the three into single set of recommendations
IDS Lab Seminar - 10
Copyright 2009 by CEBT
Category Based Recommendations Category Based Recommendations (CBR)(CBR)
Key idea If two pages belong to multiple categories together, the
probability that they belong to the same topic increases– London and Berlin in Capitals In Europe and Host cities of the
Summer Olympic Games
Algorithm Starts with a set of pages (search output) Explores category structure to obtain candidate pages Prunes the list based on similarity values calculated from
shared categories using threshold T1 and T2
IDS Lab Seminar - 11
Copyright 2009 by CEBT
Link Based Recommendations Link Based Recommendations (LBR)(LBR)
Key idea
If two pages are referred together from the same set of pages, they could be considered as related
– Competing sports persons, countries in same alliance
Algorithm
Start with search results and output of CBR
Identify frequent item sets
Support by search results is high over CBR output
IDS Lab Seminar - 12
Copyright 2009 by CEBT
Yago Based Recommendations Yago Based Recommendations (YBR)(YBR)
Set of facts in triplet form <E1, R, E2>
<New Delhi, Is Capital Of, India>
Prune the relation types
Key idea
To find a prioritized set of entities that are related to a given set of Wikipedia pages
Algorithm
Start with search output
Retrieve entities related to these pages based on the weight measure
Merge the lists and identify the related pages
IDS Lab Seminar - 13
Copyright 2009 by CEBT
Diversity of the algorithmsDiversity of the algorithms
Each explores different knowledge space
The graph explored along edges of a specific color
Recommendations of individual algorithms differ
Need for aggregation
Combines and prioritizes the results
IDS Lab Seminar - 14
Copyright 2009 by CEBT
Aggregated Recommendations Aggregated Recommendations (AR)(AR)
To group them based on the topic each result belongs to
A link based approach
Algorithm
Start with search results and an aggregated list of CBR, LBR and YBR (Cumulative List (CL))
Explore the neighborhood for each search result to find how many in CL are reachable
A threshold T on the nearness value to filter the related page
Each result page as a point in k-dimensional space (each dimension by one page in CL)
Run Agglomerative Nesting (AGNES – A hierarchical clustering algorithm) to obtain clusters of result pages
IDS Lab Seminar - 15
Copyright 2009 by CEBT
Results: EvaluationResults: Evaluation
Mean Absolute Error (MAE)
To evaluate the effectiveness of a recommendation system
IDS Lab Seminar - 16
N The total number of result pagesK The total number of recommendationsrij The actual relevance of a given recommendation ij The relevance given by the system
Copyright 2009 by CEBT
Results: Search Results: Search RecommendationsRecommendations
A value of 0.4 for T balances both fetching moderate number of recommendation and keeping good quality
IDS Lab Seminar - 17
Copyright 2009 by CEBT
Results: Search Results: Search RecommendationsRecommendations
Keyword: jaguar
IDS Lab Seminar - 18
Result from Search Engine
Kshitij Recommendations
Jaguar Felidae, Animal, Big cat, Black panther
Jaguar Cars Browns Lane Pant, Automaker, William Lyons
SEPECAT Jaguar Flight altitude record, Flight airspeed record, Aircraft manufacturer, Aviation
HMS Jaguar (F34) HMS Kelvin (F37)
Atari Jaguar, Atari Jaguar CD
Atari 7800, Atari Jaguar II
Jaguar X-Type, Jaguar XK Jaguar XJS, Car classification
Copyright 2009 by CEBT
Results: Search Results: Search RecommendationsRecommendations
Keyword: amazon
IDS Lab Seminar - 19
Result from Search Engine Kshitij Recommendations
Amazon.com Public company, Industry, NASDAQ
Amazon Rainforest, Amazon River, Amazon Basin
Brazil, Peru, Colombia, Bird, South America
Survivor, The Amazon Survivor: All-Stars, Brazil, Survivor: Africa,Survivor: Pearl Islands
HMS Amazon, HMS Amazon (F169)
Royal Navy, HMS Alacrity (F174), HMS Ambuscade (F172)
Volvo Amazon Car classification, Automaker, Car body style
Copyright 2009 by CEBT
Results: Page RecommendationsResults: Page Recommendations
IDS Lab Seminar - 20
Page Recommendations MAE
Hyderabad State
Kolhapur, Delhi Sultanate, List of Indian Princely States, British India
0.17
DAX Stock market index, List of stock market indices, Germany, BMW, Allianz
0.19
Godavari River Krishna River, Kaveri River, Beas River, Eastern Ghats, Ganges, Bay of Bengal, Chilka lake
0.2
Salzburg Vienna, Archbishopric of Salzburg, Augsburg, Austria
0.17
Horlicks Ovaltine, Hot chocolate, Nestle Milo, Maxim's, World War II,Malted milk, GlaxoSmithKline
0.18
Copyright 2009 by CEBT
Conclusion & Future WorkConclusion & Future Work
Good quality recommendations can be obtained from annotated knowledge bases using only semantic information
More Wikipedia structures
Templates, References, Info-Boxes, History
Currently, calculates the recommendations on-demand
Plan to come up with a strategy that pre-calculates and stores recommendations set
IDS Lab Seminar - 21
Copyright 2009 by CEBT
DiscussionDiscussion
Pros
Present a generic recommendation system that utilizes the stored as well as dynamically extracted semantics from Wikipedia
Good examples
Cons
The figures and tables are not sequentially located.
No comparison with other recommendation system
– But, the authors mention that there is no existing recommendation system with which they can directly compare theirs.
IDS Lab Seminar - 22