Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Exploiting the Web of Data for cross-domain
information retrieval and recommendation
VII Jornadas MAVIR
Avances en Tecnologías de la Lengua y Acceso a la Información Multimedia
Escuela Politécnica Superior, Universidad Carlos III de Madrid
26-27 November 2012
Ignacio Fernández-Tobías
under the supervision of
Iván Cantador
Grupo de Recuperación de Información
Universidad Autónoma de Madrid
1
Exploiting the Web of Data for cross-domain information retrieval and recommendation
VII Jornadas MAVIR - Avances en Tecnologías de la Lengua y Acceso a la Información Multimedia
Madrid, 26-27 November 2012
• Introduction: Cross-domain item recommendation
• Case study: Linking music with places of interest
• A semantic-based framework for linking domains
• Cross-domain semantic networks from Wikipedia
• Cross-domain semantic networks from Open Information Extraction
• A social tag-based emotion-oriented approach for linking domains
Contents
2
Exploiting the Web of Data for cross-domain information retrieval and recommendation
VII Jornadas MAVIR - Avances en Tecnologías de la Lengua y Acceso a la Información Multimedia
Madrid, 26-27 November 2012
Introduction: Cross-domain item recommendation
• Recommender systems help users to make choices, by proactively
finding relevant items or services, taking into account or predicting the
users’ tastes, priorities and goals
• The vast majority of the currently available recommender systems predicts
the user’s relevance of items in a specific and limited domain
3
Exploiting the Web of Data for cross-domain information retrieval and recommendation
VII Jornadas MAVIR - Avances en Tecnologías de la Lengua y Acceso a la Información Multimedia
Madrid, 26-27 November 2012
Introduction: Cross-domain item recommendation
• In some applications, it could be useful to offer the user joint personalized
recommendations of items belonging to multiple domains
• In an e-commerce site, we may suggest movies or videogames based on a
particular book bought by a costumer
• In a travel application, we may suggest cultural events may interest a person
who has booked a hotel in a particular place
• In an e-learning system, we may suggest educational websites with topics
related to a video documentary a student has seen
• Potential benefits
• Offering diversity and serendipity
• Addressing the cold-start problem (on a target domain)
• Mitigating the sparsity problem
Fernández-Tobías, I., Cantador, I., Kaminskas, M., Ricci, F. 2012. Cross-domain Recommender Systems: A Survey of the State of the Art. 2nd Spanish Conference on Information Retrieval.
4
Exploiting the Web of Data for cross-domain information retrieval and recommendation
VII Jornadas MAVIR - Avances en Tecnologías de la Lengua y Acceso a la Información Multimedia
Madrid, 26-27 November 2012
Introduction: Cross-domain item recommendation
• Some real applications (e.g. Amazon) do already recommend items from
different domains, but
• their recommendations rely on statistical analysis of popular items, without any
personalization strategy, or
• most of them only exploit information about the user preferences in the target
domain
5
Exploiting the Web of Data for cross-domain information retrieval and recommendation
VII Jornadas MAVIR - Avances en Tecnologías de la Lengua y Acceso a la Información Multimedia
Madrid, 26-27 November 2012
Introduction: Cross-domain item recommendation
• Context
• User and item profiles are distributed in multiple systems
there is no / a few user profiles with preferences on items in different domains
• Goal
• Automatically establishing links or transferring knowledge between domains
6
Exploiting the Web of Data for cross-domain information retrieval and recommendation
VII Jornadas MAVIR - Avances en Tecnologías de la Lengua y Acceso a la Información Multimedia
Madrid, 26-27 November 2012
• Introduction: Cross-domain item recommendation
• Case study: Linking music with places of interest
• A semantic-based framework for linking domains
• Cross-domain semantic networks from Wikipedia
• Cross-domain semantic networks from Open Information Extraction
• A social tag-based emotion-oriented approach for linking domains
Contents
7
Exploiting the Web of Data for cross-domain information retrieval and recommendation
VII Jornadas MAVIR - Avances en Tecnologías de la Lengua y Acceso a la Información Multimedia
Madrid, 26-27 November 2012
• Case study: Suggesting music / musicians highly related to a particular
point of interest (POI)
Case study: Linking music with places of interest
8
Exploiting the Web of Data for cross-domain information retrieval and recommendation
VII Jornadas MAVIR - Avances en Tecnologías de la Lengua y Acceso a la Información Multimedia
Madrid, 26-27 November 2012
• Case study: Suggesting music / musicians highly related to a particular
point of interest (POI)
• Relations between music and places
‐ Based on common emotions caused by listening to music and visiting
POIs social tags
Case study: Linking music with places of interest
Kaminskas, M., Ricci, F. 2011. Location-Adapted Music Recommendation Using Tags. 19th Intl. Conference on User Modeling, Adaptation and Personalization, 183-194.
9
Exploiting the Web of Data for cross-domain information retrieval and recommendation
VII Jornadas MAVIR - Avances en Tecnologías de la Lengua y Acceso a la Información Multimedia
Madrid, 26-27 November 2012
• Case study: Suggesting music / musicians highly related to a particular
point of interest (POI)
• Relations between music and places
‐ Based on common emotions caused by listening to music and visiting POIs
social tags
‐ Based on explicit semantic associations between musicians and POIs
information available in the (Semantic) Web
Case study: Linking music with places of interest
Vienna State Opera
Gustav Mahler
Wolfgang Amadeus Mozart
Arnold Schoenberg
Classical music
Austrian musicians
Opera composers
19th century
Romanticism
10
Exploiting the Web of Data for cross-domain information retrieval and recommendation
VII Jornadas MAVIR - Avances en Tecnologías de la Lengua y Acceso a la Información Multimedia
Madrid, 26-27 November 2012
• Semantic relations between musicians and POIs
• Location relations
‐ Arnold Schoenberg was born in Vienna, which is the city where Vienna State
Opera is located
• Time relations
‐ Gustav Mahler was born in 1869, which is a year in the decade when Vienna State
Opera was built
• Architecture-History/Art-Music “category” relations
‐ Wolfgang A. Mozart was a classical music composer, and classical compositions
are played in Opera houses, which is the building type of the Vienna State Opera
• Arbitrary relations
‐ Gustav Mahler was the director of Vienna State Opera
‐ Ana Belén (a famous Spanish singer) composed a song about La Puerta de Alcalá
(a well known POI in Madrid)
Case study: Linking music with places of interest
11
Exploiting the Web of Data for cross-domain information retrieval and recommendation
VII Jornadas MAVIR - Avances en Tecnologías de la Lengua y Acceso a la Información Multimedia
Madrid, 26-27 November 2012
• Introduction: Cross-domain item recommendation
• Case study: Linking music with places of interest
• A semantic-based framework for linking domains
• Cross-domain semantic networks from Wikipedia
• Cross-domain semantic networks from Open Information Extraction
• A social tag-based emotion-oriented approach for linking domains
Contents
12
Exploiting the Web of Data for cross-domain information retrieval and recommendation
VII Jornadas MAVIR - Avances en Tecnologías de la Lengua y Acceso a la Información Multimedia
Madrid, 26-27 November 2012
Cross-domain semantic networks from Wikipedia
City
Building type
(Architecture) categories
Date
13
Exploiting the Web of Data for cross-domain information retrieval and recommendation
VII Jornadas MAVIR - Avances en Tecnologías de la Lengua y Acceso a la Información Multimedia
Madrid, 26-27 November 2012
Visitor
attractions
Arts
venues
Music
venues Opera
houses
Opera Classical
music
Music
genres
Opera
composers
Architectural
styles
19th century
architecture
19th century
Modern
history
Historical
eras
Music
people
Musicians
Composers
Romanticism
18th century
19th century
in music
19th century
musicians
Romantic
composers
Classical
composers
19th century
composers
Cross-domain semantic networks from Wikipedia
• Linking Wikipedia’s architecture and music categories
Kaminskas, M., Fernández-Tobías, I., Ricci, F., Cantador, I. 2013. Ontology-based Identification of Music for Places. 13th Intl. Conference on Information and Communication Technologies in Tourism.
14
Exploiting the Web of Data for cross-domain information retrieval and recommendation
VII Jornadas MAVIR - Avances en Tecnologías de la Lengua y Acceso a la Información Multimedia
Madrid, 26-27 November 2012
Visitor
attractions
Arts
venues
Music
venues Opera
houses
Opera Classical
music
Music
genres
Opera
composers
Architectural
styles
19th century
architecture
19th century
Modern
history
Historical
eras
Music
people
Musicians
Composers
Romanticism
18th century
19th century
in music
19th century
musicians
Romantic
composers
Classical
composers
19th century
composers
Cross-domain semantic networks from Wikipedia
• Linking Wikipedia’s architecture and music categories
Kaminskas, M., Fernández-Tobías, I., Ricci, F., Cantador, I. 2013. Ontology-based Identification of Music for Places. 13th Intl. Conference on Information and Communication Technologies in Tourism.
15
Exploiting the Web of Data for cross-domain information retrieval and recommendation
VII Jornadas MAVIR - Avances en Tecnologías de la Lengua y Acceso a la Información Multimedia
Madrid, 26-27 November 2012
Cross-domain semantic networks from Wikipedia
• Cross-domain taxonomies from Wikipedia
• Architecture
• History / Art
• Music
Visitor
attractions
Arts
venues
Music
venues
Opera
houses
Historical
eras
Modern
history
Romanticism
19th century
18th century
Centuries
Architectural
styles
Centuries in
architecture
19th century
architecture
Music
genres
Classical
music
Opera
Music
people
Romantic composers
Classical composers
Composers
Musicians
Opera composers
19th century musicians
19th century composers
16
Exploiting the Web of Data for cross-domain information retrieval and recommendation
VII Jornadas MAVIR - Avances en Tecnologías de la Lengua y Acceso a la Información Multimedia
Madrid, 26-27 November 2012
POI
City
Date
Year
Decade
Century
Architectural
style
Musician
type
Musician
located_in
has_style
genre_of
type_of
birth_place_of
death_place_of
residence_place_of
birth_date_of
death_date_of
activity_date_of
Music
genre
Building
type
Musical
era
Historical
era
Architectural
era has_type
subcategory_of
building_start_date_of
building_end_date_of
opening_date_of
17
Exploiting the Web of Data for cross-domain information retrieval and recommendation
VII Jornadas MAVIR - Avances en Tecnologías de la Lengua y Acceso a la Información Multimedia
Madrid, 26-27 November 2012
Vienna,
Austria
1869
Opera
houses in
Austria
19th
century
architecture
Gustav
Mahler
19th
century
Opera
houses
Opera
houses in
Vienna
1869
architecture
Opera
Romanticism
19th
century
Vienna
State
Opera
Romantic
music Architectural
styles
Building types Music genres
Musician
types
Architectural
eras Historical
eras Musical
eras
Date
City
birth_decade_of
activity_century_of
death_place_of
19th
century in
music
1860s
19th
century
composers
Classical
music
Romantic
composers
Classical
composers
18
Exploiting the Web of Data for cross-domain information retrieval and recommendation
VII Jornadas MAVIR - Avances en Tecnologías de la Lengua y Acceso a la Información Multimedia
Madrid, 26-27 November 2012
Cross-domain semantic networks from Wikipedia
• Weight Spreading Activation
• PageRank
• HITS
𝑠𝑐𝑜𝑟𝑒 𝑖 ← 𝑃𝑅 𝑖 = 1 − 𝑑 ·1
𝑁+ 𝑑 ·
1
𝐿(𝑗)𝑗→𝑖
𝑃𝑅(𝑗)
𝑠𝑐𝑜𝑟𝑒 𝑖 ← 𝐴 𝑖
𝐴 𝑖 = 𝐻(𝑗)
𝑗→𝑖
𝐻 𝑖 = 𝐴(𝑗)
𝑖→𝑗
H
A
A
H
i
j
𝑠𝑐𝑜𝑟𝑒 𝑖 ← 𝑆 𝑖 = 1 − 𝑑 · rel 𝑖 + 𝑑 · 𝑤𝑗𝑖𝑆(𝑗)
𝑗→𝑖
j i
19
Exploiting the Web of Data for cross-domain information retrieval and recommendation
VII Jornadas MAVIR - Avances en Tecnologías de la Lengua y Acceso a la Información Multimedia
Madrid, 26-27 November 2012
Cross-domain semantic networks from Wikipedia
• 97 users, 17 cities, 25 POIs, 356 POI-musician pairs, 1155 assessments
20
Exploiting the Web of Data for cross-domain information retrieval and recommendation
VII Jornadas MAVIR - Avances en Tecnologías de la Lengua y Acceso a la Información Multimedia
Madrid, 26-27 November 2012
Cross-domain semantic networks from Wikipedia
Average precision values for the top 5 ranked musicians for each POI
P@1 P@2 P@3 P@4 P@5
Random 0.355* 0.391* 0.363* 0.435* 0.413*
HITS 0.688 0.706 0.711* 0.700* 0.694
PageRank 0.753 0.728 0.707* 0.660* 0.646*
Spreading 0.810 0.804 0.828 0.847 0.837
The values marked with * have differences statistically significant with Spreading algorithm’s
(Wilcoxon signed-rank test, p<0.05)
Fernández-Tobías, I., Kaminskas, M., Cantador, I., Ricci, F. 2013. A semantic framework for supporting cross-domain recommendation: Suggesting music for places of interest. Submitted.
21
Exploiting the Web of Data for cross-domain information retrieval and recommendation
VII Jornadas MAVIR - Avances en Tecnologías de la Lengua y Acceso a la Información Multimedia
Madrid, 26-27 November 2012
Cross-domain semantic networks from Wikipedia
Average number of semantic paths per POI
Interesting Non interesting
Related 78.3% 21.7%
Non-related 8.2% 91.8%
Percentages of interesting and obvious musicians recommended by
Spreading algorithm Non obvious Obvious
58.9% 41.1%
84.2% 15.8%
Fernández-Tobías, I., Kaminskas, M., Cantador, I., Ricci, F. 2013. A semantic framework for supporting cross-domain recommendation: Suggesting music for places of interest. Submitted.
22
Exploiting the Web of Data for cross-domain information retrieval and recommendation
VII Jornadas MAVIR - Avances en Tecnologías de la Lengua y Acceso a la Información Multimedia
Madrid, 26-27 November 2012
• Introduction: Cross-domain item recommendation
• Case study: Linking music with places of interest
• A semantic-based framework for linking domains
• Cross-domain semantic networks from Wikipedia
• Cross-domain semantic networks from Open Information Extraction
• A social tag-based emotion-oriented approach for linking domains
Contents
23
Exploiting the Web of Data for cross-domain information retrieval and recommendation
VII Jornadas MAVIR - Avances en Tecnologías de la Lengua y Acceso a la Información Multimedia
Madrid, 26-27 November 2012
Cross-domain semantic networks from Open Information Extraction
• TextRunner (openie.cs.washington.edu) and ReVerb (reverb.cs.washington.edu):
• Automatically identification and extraction of binary relationships from English
sentences
Etzioni, O., Fader, A., Christensen, J., Soderland, S., Mausam. 2011. Open Information Extraction: The Second Generation. 22nd International Joint Conference on Artificial Intelligence, pp. 3-10.
Linked to Freebase
24
Exploiting the Web of Data for cross-domain information retrieval and recommendation
VII Jornadas MAVIR - Avances en Tecnologías de la Lengua y Acceso a la Información Multimedia
Madrid, 26-27 November 2012
Fernández-Tobías, I., Cantador, I. 2013. Open Cross-domain Semantic Networks: Application to Item-to-item Recommendation. To be submitted.
Cross-domain semantic networks from Open Information Extraction
• Filtering relations based on a TF-IDF heuristic
𝑤 𝑒1, 𝑟, 𝑒2 = 𝜆𝑐 𝑒1, 𝑒2
𝑐 𝑒𝑖 , 𝑒𝑗𝑒𝑖,𝑒𝑗
+ 1 − 𝜆 tfidf(𝑟)
tfidf 𝑟 =𝑒𝑖 , 𝑟, 𝑒𝑗 ∈ 𝐺
max𝑠𝑒𝑖 , 𝑠, 𝑒𝑗
· log𝑁
𝑒𝑖 , 𝑟, 𝑒𝑗 ∈ 𝒞
• Ranking entities according to node categories and graph structure
𝑤 𝑒 = 𝛼1𝑤𝑇 𝑒 + 𝛼2𝑤𝑃(𝑒) + 𝛼3𝑤𝐷(𝑒)
𝑤𝑇 𝑒 = 𝑇 𝑒 ∩ 𝐷 ·𝑇 𝑒 ∩ 𝐷
𝑇(𝑒)
𝑤𝑃 𝑒 = 𝑠 → 𝑒
𝑤𝐷 𝑒 = dist(𝑠, 𝑒)
25
Exploiting the Web of Data for cross-domain information retrieval and recommendation
VII Jornadas MAVIR - Avances en Tecnologías de la Lengua y Acceso a la Información Multimedia
Madrid, 26-27 November 2012
Cross-domain semantic networks from Open Information Extraction
26
Exploiting the Web of Data for cross-domain information retrieval and recommendation
VII Jornadas MAVIR - Avances en Tecnologías de la Lengua y Acceso a la Información Multimedia
Madrid, 26-27 November 2012
Cross-domain semantic networks from Open Information Extraction
27
Exploiting the Web of Data for cross-domain information retrieval and recommendation
VII Jornadas MAVIR - Avances en Tecnologías de la Lengua y Acceso a la Información Multimedia
Madrid, 26-27 November 2012
• Introduction: Cross-domain item recommendation
• Case study: Linking music with places of interest
• A semantic-based framework for linking domains
• Cross-domain semantic networks from Wikipedia
• Cross-domain semantic networks from Open Information Extraction
• A social tag-based emotion-oriented approach for linking domains
Contents
28
Exploiting the Web of Data for cross-domain information retrieval and recommendation
VII Jornadas MAVIR - Avances en Tecnologías de la Lengua y Acceso a la Información Multimedia
Madrid, 26-27 November 2012
A social tag-based emotion-oriented approach for linking domains
• Mining social tagging systems to create linked emotion-oriented
folksonomies
29
Exploiting the Web of Data for cross-domain information retrieval and recommendation
VII Jornadas MAVIR - Avances en Tecnologías de la Lengua y Acceso a la Información Multimedia
Madrid, 26-27 November 2012
• Generic emotion lexicon
• Automatically created by mining online thesauri (e.g. thesaurus.com)
• 16 main emotions: alert, excited, elated, happy, content, serene, relaxed,
calm, fatigued, bored, depressed, sad, upset, stressed, nervous, tense
• Emotion = synonym & antonym vector
‐ Synonyms: positive weights
‐ Antonyms: negative weights
A social tag-based emotion-oriented approach for linking domains
Fernández-Tobías, I., Plaza, L., Cantador, I. 2013. Cross-domain Emotion Folksonomies. To be submitted.
happy:+66, cheerful:+ 21, merry:+19, felicitous:+17, …
unhappy:–11, sad:–10, depressed:–6, serious:–4, ….
happy
30
Exploiting the Web of Data for cross-domain information retrieval and recommendation
VII Jornadas MAVIR - Avances en Tecnologías de la Lengua y Acceso a la Información Multimedia
Madrid, 26-27 November 2012
A social tag-based emotion-oriented approach for linking domains
• Generic emotion lexicon
• In accordance with Russell’s emotion model (1980)
‐ Emotion representation in 2 dimensions: pleasure & arousal
AROUSAL
SLEEPINESS
PLEASUREMISERY
DISTRESS EXCITEMENT
CONTENTMENTDEPRESSION
excited
alert
happy
elated
content
relaxed
calm
serene
bored
fatigued
depressed
sad
stressed
upset
nervous
tense
alert
excited
elated
happy
content
serene
relaxed
calm
fatigued
bored
depressed
sad
upset
stressed
nervous
tense
-0.15
-0.10
-0.05
0.00
0.05
0.10
0.15
-0.15 -0.10 -0.05 0.00 0.05 0.10 0.15Russell, J. A. 1980. A Circumplex Model of Affect. Journal of Personality and Social Psychology 39(6), pp. 1161-1178.
Russell’s emotion model Obtained emotion vectors projected into 2 dimensions (PCA)
31
Exploiting the Web of Data for cross-domain information retrieval and recommendation
VII Jornadas MAVIR - Avances en Tecnologías de la Lengua y Acceso a la Información Multimedia
Madrid, 26-27 November 2012
A social tag-based emotion-oriented approach for linking domains
• Domain-dependent emotion folksonomies
• Particular emotional categories in each domain
• Each category is composed of a set of concurrent tags in the domain
folksonomy
• Movies (MovieLens, Jinni, IMDb)
• bittersweet, emotional, feel good, scary, …
• Music (Last.fm, GEMS)
• wonder, tenderness, nostalgia, peacefulness, …
• Books (BookCrossing, LibraryThing, Whichbook)
• funny, unpredictable, disgusting, violent, …
Exploiting the Web of Data for cross-domain
information retrieval and recommendation
VII Jornadas MAVIR
Avances en Tecnologías de la Lengua y Acceso a la Información Multimedia
Escuela Politécnica Superior, Universidad Carlos III de Madrid
26-27 November 2012
Ignacio Fernández-Tobías
under the supervision of
Iván Cantador
Grupo de Recuperación de Información
Universidad Autónoma de Madrid
33
Exploiting the Web of Data for cross-domain information retrieval and recommendation
VII Jornadas MAVIR - Avances en Tecnologías de la Lengua y Acceso a la Información Multimedia
Madrid, 26-27 November 2012
Case study: Linking music with places of interest
• Vienna State Opera Arnold Schoenberg
• Arnold Schoenberg was born in Vienna, where Vienna State Opera is located
• Arnold Schoenberg was born in the 19th century, when Vienna State Opera was built
• Arnold Schoenberg was a Classical music composer, Classical music genre is related
to Opera houses, which is the building type of Vienna State Opera
• Las Ventas Antonio Flores
• Antonio Flores was born in Madrid, where Las Ventas is located
• Antonio Flores died in the 20th century, when Las Ventas was built
• Antonio Flores was a Flamenco singer, Flamenco is a Romanic music genre and is
related to Moorish architecture, and Moorish Revival architecture is the architectonical
style of Las Ventas
34
Exploiting the Web of Data for cross-domain information retrieval and recommendation
VII Jornadas MAVIR - Avances en Tecnologías de la Lengua y Acceso a la Información Multimedia
Madrid, 26-27 November 2012
Cross-domain semantic networks from Wikipedia
Average precision values obtained by the Spreading algorithm for the top
5 ranked musicians for each POI type
P@1 P@2 P@3 P@4 P@5
Music venues (4)
0.838 0.688 0.838 0.829 0.870
Religious buildings (8)
0.721 0.965 0.844 0.795 0.781
Castles and palaces (6)
0.794 0.704 0.792 0.900 0.825
Other POIs (7)
0.908 0.772 0.836 0.872 0.893
35
Exploiting the Web of Data for cross-domain information retrieval and recommendation
VII Jornadas MAVIR - Avances en Tecnologías de la Lengua y Acceso a la Información Multimedia
Madrid, 26-27 November 2012
Cross-domain semantic networks from Wikipedia
• Evaluating if tracks of the retrieved musicians are relevant to POIs