420
cataldo musto and pasquale lops dept. of computer science university of bari aldo moro”, italy semantics-aware techniques for social media analysis user modelling and recommender systems tutorial@UMAP 2016 Halifax, Canada July 16, 2015

Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Embed Size (px)

Citation preview

Page 1: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

cataldo musto and pasquale lops

dept. of computer science

university of bari “aldo moro”, italy

semantics-aware techniques for

social media analysis

user modelling

and recommender systems

tutorial@UMAP 2016 Halifax, Canada – July 16, 2015

Page 3: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

in this tutorial

how to represent content

to improve information access and build a

new generation of services for social media

analysis, user modeling and

recommender systems?

Page 4: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Agenda

Why?

How?

What?

Why do we need intelligent information access?

Why do we need content?

Why do we need semantics?

How to introduce semantics?

Basics of Natural Language Processing

Encoding exogenous semantics (top-down approaches)

Encoding endogenous semantics (bottom-up approaches)

Semantics-aware Recommender Systems

Cross-lingual Recommender Systems

Explaining Recommendations

Semantic User Profiles based on Social Data

Semantic Analysis of Social Streams

Page 5: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Why?

Why do we need intelligent information access?

Page 6: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016
Page 7: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

physiologically

impossible

to follow the information flow

in real time

Page 8: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

(Source: Adrian C.Ott,

The 24-hour

customer)

we can handle

126 bits of

information/day

we deal with

393 bits of

information/day

ratio: 3x

Page 9: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Information overload (Appeared for the first time in «Future Shock» by Alvin Toffler, 1970)

Page 10: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Information overload (Appeared for the first time in «Future Shock» by Alvin Toffler, 1970)

Page 11: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Information overload (Appeared for the first time in «Future Shock» by Alvin Toffler, 1970)

Page 12: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Information overload

“It is not

information

overload.

It is filter failure”

Clay Shirky

talk @Web2.0 Expo

Page 13: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Challenge

To effectively cope with

information overload

we need to filter the information flow

We need technologies and algorithms for intelligent information access

… and we already have some evidence!

Page 14: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Intelligent Information Access

Information Retrieval (Search Engines)

Page 15: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Intelligent Information Access

Information Filtering (Recommender Systems)

Page 16: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Why? Why do we need content?

Page 17: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Search engines need content

Why do we need content?

Trivial: search engines can’t work without content

Page 18: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Why do we need content?

Recommender Systems: not trivial!

Page 19: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Why do we need content?

Recommender Systems can work without content

Page 20: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Why do we need content?

Several Recommender Systems

perfectly work without using any

content! (e.g.Amazon)

Collaborative Filtering and Matrix

Factorization are state of the art

techniques for implementing

Recommender Systems

(ACM RecSys 2009,

by Neflix Challenge winners)

Page 21: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Why do we need content?

Content can tackle some issues of collaborative filtering

Page 22: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Why do we need content?

Collaborative Filtering issues: sparsity

Page 23: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Why do we need content?

Collaborative Filtering issues: new item problem

Page 24: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Why do we need content?

Collaborative Filtering: lack of transparency!

Page 25: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Why do we need content?

Collaborative Filtering: poor explanations!

Who knows the «customers who bought…»?

Page 26: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Why do we need content?

User Modeling based on simple graph-based representation of social

connections is quite poor.

User Models can benefit of information about the items the user has

consumed (news content, hashtag contained in the Tweets she liked,

etc.)

To enrich and improve user modeling

Page 27: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Why do we need content?

Because a relevant part of the information spread

on social media is content!

And social media really matter

Page 28: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Because a relevant part of the information spread

on social media is content!

Page 29: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

can be considered as novel data silos

Social Media

Page 30: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Social Media

information about preferences

Page 31: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

information about

People feelings and connections

Social Media

Page 32: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

changed the rule for

user modeling and

personalization

Social Media

Page 33: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Recap #1

Why do we need content?

In general: to extend and improve user modeling

To exploit the information spread on social media

To overcome typical issues of collaborative filtering

and matrix factorization

Because search engines can’t simply work without

content

Page 34: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Why? Why do we need semantics?

Page 35: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Why do we need semantics?

A deep comprehension of the information conveyed by

textual content is crucial to improve the quality of user

profiles and the effectiveness of intelligent information

access platforms.

Page 36: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Why do we need semantics?

…some scenarios can be more convincing

(But we need some basics, before)

Page 37: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Basics: Content-based RecSys (CBRS)

Suggest items similar to those the

user liked in the past

Recommendations generated by matching

the description of items with the

profile of the user’s interests

use of specific features

Recommender Systems Handbook,

The Adaptive Web

Page 38: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Basics: Content-based RecSys (CBRS)

user profile items

Page 39: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Basics: Content-based RecSys (CBRS)

user profile items

Recommendation are

generated by

matching the features

stored in the user

profile with those

describing the items

to be recommended.

Page 40: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Basics: Content-based RecSys (CBRS)

user profile items

Recommendation are

generated by

matching the features

stored in the user

profile with those

describing the items

to be recommended. X

Page 41: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Lack of Semantics in User Models

“I love turkey. It’s my choice

for these #holidays!

Social Media can be helpful to avoid cold start

Page 42: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Lack of Semantics in User Models

“I love turkey. It’s my choice

for these #holidays!

..but pure content-based representations

can’t handle polysemy

Page 43: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Lack of Semantics in User Models

“I love turkey. It’s my choice

for these #holidays!

Pure Content-based Representation can easily drive a recommender systems towards failures!

?

Page 44: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Lack of Semantics in Social Media Analysis

?

What are people worried about?

Are they worried about the eagle

or about the city of L’Aquila?

Page 45: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Lack of Semantics in User Models

AI

Artificial

Intelligence

apple

multi-word concepts

?

Book recommendation

Page 46: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

AI

Artificial

Intelligence

apple

synonymy

Lack of Semantics in User Models

…is not only about polysemy

?

Book recommendation

Most of the preferences regard AI,

but due to synonymy «apple» is the

most relevant feature in the profile

Page 47: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

italian

english

Lack of Semantics in CBRS

Page 48: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Lack of Semantics in CBRS

user profile items

Page 49: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Lack of Semantics in CBRS

user profile items

It is likely that the

algorithm is not able

to suggest a

(relevant) english

news since no

overlap between

the features

occurs!

X

Page 50: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Lack of Semantics in CBRS

user profile items

It is likely that the

algorithm is not able

to suggest a

(relevant) english

news since no

overlap between

the features

occurs!

X

Page 51: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Recap #2

In general: to improve

content representation in

intelligent information

access platforms

To avoid typical issues of

natural language

representations (polysemy,

synonymy, etc.)

To better model user

preferences

To better understand the

information spread on social

media

To provide multilingual

recommendations

Why do we need semantics?

Becuase language is

inherently ambiguous

Page 52: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

How?

How to introduce semantics?

Page 53: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Information Retrieval and Filtering

Two sides of the same coin (Belkin&Croft,1992)

Information

Retrieval

information need expressed

through a query

goal: retrieve information which

might be relevant to a

user

Information

Filtering

information need expressed

through a

user profile

goal: expose users to only the

information that is

relevant to them,

according to personal profiles

[Belkin&Croft, 1992] Belkin, Nicholas J., and W. Bruce Croft.

"Information filtering and information retrieval: Two sides of the same

coin?." Communications of the ACM 35.12 (1992): 29-38.

It’s all about searching!

Page 54: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Search (and Content-based Recommendation)

is not so simple as it might seem

Meno: and how will you enquire, Socrates, into that

which you do not know? What will you put forth

as the subject of enquiry? And if you find what

you want, how will you know that this is the

thing you did not know?

Socrates: I know, Meno, what you mean; but just

see what a tiresome dispute you are introducing. You argue that a man cannot search either

for what he knows or for what he does not

know; if he knows it, there is no need to search;

and if not, he cannot; he does not know the very

subject about which he is to search.

Plato Meno 80d-81a

http://www.gutenberg.org/etext/1643

60

Meno’s Paradox of Inquiry

Page 55: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Meno’s question at our times:

the “vocabulary mismatch” problem (revisited)

How to discover the concepts that connect us to the

the information we are seeking (search task) or we want

to be exposed to (recommendation and user modeling

tasks) ?

61

Page 56: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Meno’s question at our times:

the “vocabulary mismatch” problem (revisited)

How to discover the concepts that connect us to the

the information we are seeking (search task) or we want

to be exposed to (recommendation and user modeling

tasks) ?

62

We need some «intelligent» support

(as intelligent information access

technologies)

Page 57: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Meno’s question at our times:

the “vocabulary mismatch” problem (revisited)

How to discover the concepts that connect us to the

the information we are seeking (search task) or we want

to be exposed to (recommendation and user modeling

tasks) ?

63

We need to better understand

and represent the content

We need some «intelligent» support

(as intelligent information access

technologies)

Page 58: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Meno’s question at our times:

the “vocabulary mismatch” problem (revisited)

How to discover the concepts that connect us to the

the information we are seeking (search task) or we want

to be exposed to (recommendation and user modeling

tasks) ?

64

We need to better understand

and represent the content

We need some «intelligent» support

(as intelligent information access

technologies)

Page 59: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

…before semantics

some basics

of Natural Language Processing (NLP)

Page 60: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

How?

basics of NLP and keyword-based representations

Page 61: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Scenario

Pasquale really loves the movie «The Matrix», and he asks a content-based

recommender system for some suggestions.

How can we feed the algorithm with some textual features related to the movie

to build a (content-based) profile and provide recommendations?

?

Question

Recommendation

Engine

Page 62: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Scenario

the plot can be a rich source of content-based features

Page 63: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Scenario

…but we need to properly process it through a pipeline of

Natural Language Processing techniques

the plot can be a rich source of content-based features

Page 64: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Basic NLP operations

o normalization strip unwanted characters/markup (e.g.

HTML/XML tags, punctuation, numbers, etc.)

o tokenization break text into tokens

o stopword removal exclude common words having

little semantic content

o lemmatization reduce inflectional/variant forms to base

form (lemma in the dictionary), e.g. am, are, is be

o stemming reduce terms to their “roots”, e.g. automate(s),

automatic, automation all reduced to automat

vocabulary

Page 65: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Example

The Matrix is a 1999 American-Australian neo-noir

science fiction action film written and directed by the

Wachowskis, starring Keanu Reeves, Laurence

Fishburne, Carrie-Anne Moss, Hugo Weaving, and Joe

Pantoliano. It depicts a dystopian future in which reality

as perceived by most humans is actually a simulated

reality called "the Matrix", created by sentient machines

to subdue the human population, while their bodies' heat

and electrical activity are used as an energy source.

Computer programmer "Neo" learns this truth and is

drawn into a rebellion against the machines, which

involves other people who have been freed from the

"dream world".

Page 66: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

X

X

X

X

X

X

X

X X

X X

X

X X X

X

X

X X

The Matrix is a 1999 American-Australian neo-noir

science fiction action film written and directed by the

Wachowskis, starring Keanu Reeves, Laurence

Fishburne, Carrie-Anne Moss, Hugo Weaving, and Joe

Pantoliano. It depicts a dystopian future in which reality

as perceived by most humans is actually a simulated

reality called "the Matrix", created by sentient machines

to subdue the human population, while their bodies' heat

and electrical activity are used as an energy source.

Computer programmer "Neo" learns this truth and is

drawn into a rebellion against the machines, which

involves other people who have been freed from the

"dream world".

Example

normalization

Page 67: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Example

The Matrix is a 1999 American Australian neo noir

science fiction action film written and directed by the

Wachowskis starring Keanu Reeves Laurence Fishburne

Carrie Anne Moss Hugo Weaving and Joe Pantoliano It

depicts a dystopian future in which reality as perceived

by most humans is actually a simulated reality called the

Matrix created by sentient machines to subdue the

human population while their bodies heat and electrical

activity are used as an energy source Computer

programmer Neo learns this truth and is drawn into a

rebellion against the machines which involves other

people who have been freed from the dream world

tokenization

Page 68: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Tokenization issues

compound words

o science-fiction: break up hyphenated sequence?

o Keanu Reeves: one token or two? How do you decide it is one

token?

numbers and dates

o 3/20/91 Mar. 20, 1991 20/3/91

o 55 B.C.

o (800) 234-2333

Page 69: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Tokenization issues

language issues

o German noun compounds not segmented

Lebensversicherungsgesellschaftsangestellter means

life insurance company employee

o Chinese and Japanese have no spaces between words (not always

guaranteed a unique tokenization)

莎拉波娃现在居住在美国东南部的佛罗里达

o Arabic (or Hebrew) is basically written right to left, but with certain items like

numbers written left to right

Algeria achieved its independence in 1962 after 132 years of French

occupation

Page 70: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

X

X X X

X X

X

X

X

X

X

X

X X

X X

X

X X X

X X

X

X

X

X

X

X

X

X X X

X

X

X

X

X X

X

X

X

Example

stopword removal

The Matrix is a 1999 American Australian neo noir

science fiction action film written and directed by the

Wachowskis starring Keanu Reeves Laurence Fishburne

Carrie Anne Moss Hugo Weaving and Joe Pantoliano It

depicts a dystopian future in which reality as perceived

by most humans is actually a simulated reality called the

Matrix created by sentient machines to subdue the

human population while their bodies heat and electrical

activity are used as an energy source Computer

programmer Neo learns this truth and is drawn into a

rebellion against the machines which involves other

people who have been freed from the dream world

Page 71: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Example

The Matrix is a 1999 American Australian neo noir

science fiction action film written and directed by the

Wachowskis starring Keanu Reeves Laurence Fishburne

Carrie Anne Moss Hugo Weaving and Joe Pantoliano It

depicts a dystopian future in which reality as perceived

by most humans is actually a simulated reality called the

Matrix created by sentient machines to subdue the

human population while their bodies heat and electrical

activity are used as an energy source Computer

programmer Neo learns this truth and is drawn into a

rebellion against the machines which involves other

people who have been freed from the dream world

stopword removal

Page 72: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Example

The Matrix is a 1999 American Australian neo noir

science fiction action film written and directed by the

Wachowskis starring Keanu Reeves Laurence Fishburne

Carrie Anne Moss Hugo Weaving and Joe Pantoliano It

depicts a dystopian future in which reality as perceived

by most humans is actually a simulated reality called the

Matrix created by sentient machines to subdue the

human population while their bodyies heat and electrical

activity are used as an energy source Computer

programmer Neo learns this truth and is drawn into a

rebellion against the machines which involves other

people who have been freed from the dream world

lemmatization

Page 73: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Example

Matrix 1999 American Australian neo noir science fiction

action film write direct Wachowskis star Keanu Reeves

Laurence Fishburne Carrie Anne Moss Hugo Weaving

Joe Pantoliano depict dystopian future reality perceived

human simulate reality call Matrix create sentient

machine subdue human population body heat electrical

activity use energy source Computer programmer Neo

learn truth draw rebellion against machine involve people

free dream world

next step: to give a weight to each feature

(e.g. through TF-IDF)

Page 74: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Weighting features: TF-IDF

terms frequency – inverse document

frequency best known weighting scheme in information retrieval.

Weight of a term as product of tf weight and idf weight

tf number of times the term occurs in the document

idf depends on rarity of a term in a collection

tf-idf increases with the number of occurrences within a

document, and with the rarity of the term in the collection.

)df/log()tflog1(w ,, tdt Ndt

Page 75: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Example

Matrix 1999 American Australian neo noir science fiction

action film write direct Wachowskis star Keanu Reeves

Laurence Fishburne Carrie Anne Moss Hugo Weaving

Joe Pantoliano depict dystopian future reality

perceived human simulate reality call Matrix create

sentient machine subdue human population body heat

electrical activity use energy source Computer

programmer Neo learn truth draw rebellion against

machine involve people free dream world

green=high IDF

red=low IDF

Page 76: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

The Matrix representation

Matrix

1999

American

Australian

fiction

world

keywords

a portion of Pasquale’s

content-based profile

given a content-based profile, we can easily build a basic

recommender system through

Vector Space Model and

similarity measures

science

Hugo

Page 77: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Vector Space Model (VSM)

given a set of n features (vocabulary)

f = {f1, f

2 ,..., f

n}

given a set of M items, each item I

represented as a point in a n-dimensional vector space

I = (wf1

,.....wfn

)

wfi is the weight of feature i in the item

Page 78: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Similarity between vectors

cosine similarity

V

i i

V

i i

V

i ii

JI

JI

J

J

I

I

JI

JIJI

1

2

1

2

1),cos(

dot product unit vectors

Page 79: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Basic Content-based Recommendations

o documents represented as vectors o features identified through NLP operations

o features weigthed using tf-idf

o cosine measure for computing similarity

between vectors

Page 80: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Drawbacks

a portion of Pasquale’s

content-based profile

Recommendation:

Notre Dame de Paris,

by Victor Hugo

Basic Content-based Recommendations

Why?

Entities as «Hugo

Weaving» were not

modeled

Matrix

1999

American

Australian

fiction

world

science

Hugo

Page 81: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Drawbacks

Basic Content-based Recommendations

Why?

More complex concepts

as «science fiction» were

not modeled as single

features

Recommendation:

The March of Penguins

Matrix

1999

American

Australian

fiction

world

science

Hugo

a portion of Pasquale’s

content-based profile

Page 82: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Vision

Basic Content-based Recommendations

Page 83: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Vision

Basic Content-based Recommendations

Bad recommendations

Page 84: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Recap #3

Natural Language Processing

techniques necessary to build a

content-based profile

basic content-based

algorithms can be easily built

through TF-IDF

keyword-based representation

too poor and can drive to bad

modeling of preferences (and

bad recommendations)

we need to shift from keywords to concepts

basics of NLP and keyword-based representation

Page 85: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

How?

Semantics-aware Content Representation

Page 86: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Semantic representations

Page 87: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Semantic representations

Explicit (Exogenous)

Semantics

Implicit (Endogenous)

Semantics

Page 88: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Semantic representations

Explicit (Exogenous)

Semantics

Implicit (Endogenous)

Semantics

top-down

approaches based on the

integration of external

knowledge for

representing content. Able to provide the linguistic,

cultural and backgroud

knowledge in the

content representation

Page 89: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Semantic representations

Explicit (Exogenous)

Semantics

Implicit (Endogenous)

Semantics

top-down

approaches based on the

integration of external

knowledge for

representing content. Able to provide the linguistic,

cultural and backgroud

knowledge in the

content representation

bottom-up

approaches that determine

the meaning of a word

by analyzing the rules of its usage in the context of

ordinary and concrete

language behavior

Page 90: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Semantic representations

Explicit (Exogenous)

Semantics

Implicit (Endogenous)

Semantics

Introduce semantics by

mapping the features

describing the item with

semantic concepts

Introduce semantics

by linking

the Item to

a knowledge graph

Page 91: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Semantic representations

Explicit (Exogenous)

Semantics

Implicit (Endogenous)

Semantics

Word Sense

Disambiguation

Entity

Linking …….

Introduce semantics by

mapping the features

describing the item with

semantic concepts

Introduce semantics

by linking

the item to

a knowledge graph

Page 92: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Semantic representations

Explicit (Exogenous)

Semantics

Implicit (Endogenous)

Semantics

Ontologies Linked

Open Data

Introduce semantics

by linking

the Item to

a knowledge graph

…….

Introduce semantics by

mapping the features

describing the item with

semantic concepts

Page 93: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Semantic representations

Explicit (Exogenous)

Semantics

Implicit (Endogenous)

Semantics

Distributional

semantic models

Introduce semantics by

mapping the features

describing the item with

semantic concepts

Introduce semantics

by linking

the Item to

a knowledge graph

Page 94: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Semantic representations

Explicit (Exogenous)

Semantics

Implicit (Endogenous)

Semantics

Explicit

Semantic

Analysis

Random

Indexing

…… Word2Vec

Distributional

semantic models

Introduce semantics by

mapping the features

describing the item with

semantic concepts

Introduce semantics

by linking

the Item to

a knowledge graph

Page 95: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

How? Encoding exogenous semantics

(top-down approaches)

Page 96: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Semantic representations

Explicit (Exogenous)

Semantics

Implicit (Endogenous)

Semantics

Word Sense

Disambiguation

Entity

Linking …….

Introduce semantics by

mapping the features

describing the item with

semantic concepts

Introduce semantics

by linking the

Item to a

knowledge graph

Page 97: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Word Sense Disambiguation (WSD)

using linguistic ontologies

WSD selects the proper meaning, i.e. sense, for a word in

a text by taking into account the context in which it occurs

Page 98: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Word Sense Disambiguation (WSD)

using linguistic ontologies

WSD selects the proper meaning, i.e. sense, for a word in

a text by taking into account the context in which it occurs

Page 99: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Sense Repository

WordNet groups words into sets of synonyms called synsets

It contains nouns, verbs, adjectives, adverbs

Word Meanings

Word Forms

F1 F2 F3 … … Fn

M1 V(1,1) V(2,1)

M2 V(2,2) V(3,2)

M3

M…

Mm V(m,n)

Synonym

word forms

(synset)

polysemous word:

disambiguation needed

WordNet linguistic ontology [*]

[*] Miller, George A. "WordNet: a lexical database for

English." Communications of the ACM 38.11 (1995): 39-41.

Page 100: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

an example of synset

Sense Repository

WordNet linguistic ontology

Page 101: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Sense Repository

WordNet Hierarchies

WordNet linguistic ontology

Page 102: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Word Sense Disambiguation

State of the art: JIGSAW algorithm [*]

Input

o D = {w1, w

2, …. , w

h} document

Output

o X = {s1, s

2, …. , s

k} (kh)

Each si obtained by disambiguating wi based on the context

of each word

Some words not recognized by WordNet

Groups of words recognized as a single concept

[*] Basile, P., de Gemmis, M., Gentile, A. L., Lops, P., & Semeraro, G. (2007, June). UNIBA:

JIGSAW algorithm for word sense disambiguation. InProceedings of the 4th International

Workshop on Semantic Evaluations (pp. 398-401). Association for Computational Linguistics.

Page 103: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

How to use WordNet for WSD?

Semantic similarity between synsets inversely proportional to their distance in the WordNet IS-A hierarchy

Path length similarity between synsets used to assign scores to synsets of a polysemous word in order to choose the correct sense

Placental mammal

Carnivore Rodent

Feline, felid

Cat (feline mammal)

Mouse (rodent)

1

2

3 4

5

JIGSAW WSD algorithm

Page 104: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

SINSIM(cat,mouse) =

-log(5/32)=0.806

Placental mammal

Carnivore Rodent

Feline, felid

Cat (feline mammal)

Mouse (rodent)

1

2

3 4

5

Synset semantic similarity

Page 105: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

w

C

JIGSAW WSD algorithm

Page 106: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

w

C

0.107

0.0

0.0

0.806 0.8060.806

JIGSAW WSD algorithm

Page 107: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

through WSD can we obtain a semantics-aware representation

of textual content

Page 108: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Synset-based representation

{09596828} American -- (a native or inhabitant of the United States)

{06281561} fiction -- (a literary work based on the imagination and not necessarily on fact)

{06525881} movie, film, picture, moving picture, moving-picture show, motion picture,

motion-picture show, picture show, pic, flick -- (a form of entertainment that enacts a story…

{02605965} star -- (feature as the star; "The movie stars Dustin

Hoffman as an autistic man")

Page 109: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

synsets

{09596828} American -- (a native or

inhabitant of the United States)

{06281561} fiction -- (a literary

work based on the imagination

and not necessarily on fact)

{06525881} movie, film, picture,

moving picture, moving-picture

show, motion picture,

motion-picture show, picture show,

pic, flick -- (a form of entertainment

that enacts a story…

{02605965} star -- (feature as the

star; "The movie stars Dustin

Hoffman as an autistic man")

The Matrix representation

through WSD we process the textual

description of the item and we obtain a semantics-aware

representation of the item as

output

keyword-based features replaced

with the concepts (in this

case WordNet synsets) they refer to

Matrix

1999

American

Australian

fiction

world

keywords

science

Hugo

Page 110: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

synsets

{09596828} American -- (a native or

inhabitant of the United States)

{06281561} fiction -- (a literary

work based on the imagination

and not necessarily on fact)

{06525881} movie, film, picture,

moving picture, moving-picture

show, motion picture,

motion-picture show, picture show,

pic, flick -- (a form of entertainment

that enacts a story…

{02605965} star -- (feature as the

star; "The movie stars Dustin

Hoffman as an autistic man")

The Matrix representation

Word Sense Disambiguation recap

polysemy and synonymy

effectively handled

classical NLP techniques helpful to

remove further noise (e.g.

stopwords)

potentially language-independent

(later)

entities (e.g. Hugo Weaving)

still not recognized

Matrix

1999

American

Australian

fiction

world

keywords

science

Hugo

Page 111: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Semantic representations

Explicit (Exogenous)

Semantics

Implicit (Endogenous)

Semantics

Word Sense

Disambiguation

Entity

Linking

Introduce semantics

by linking the item

itself to a

knowledge graph

…….

Introduce semantics by

mapping the features

describing the item with

semantic concepts

Page 112: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

• Basic Idea

• Input: free text

• e.g. Wikipedia

abstract

• Output:

identification of

the entities

mentioned in the

text.

Entity Linking Algorithms

Page 113: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Why Entity Linking?

because we need to identify the entities mentioned in the textual description

to better catch user preferences and information needs.

… and many more

Several state-of-the-art implementations are already available

Page 115: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

synsets

{09596828} American -- (a native or

inhabitant of the United States)

{06281561} fiction -- (a literary

work based on the imagination

and not necessarily on fact)

{06525881} movie, film, picture,

moving picture, moving-picture

show, motion picture,

motion-picture show, picture show,

pic, flick -- (a form of entertainment

that enacts a story…

{02605965} star -- (feature as the

star; "The movie stars Dustin

Hoffman as an autistic man")

The Matrix representation

Matrix

1999

American

Australian

neo

science

fiction

world

keywords entities

Page 116: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

synsets

{09596828} American -- (a native or

inhabitant of the United States)

{06281561} fiction -- (a literary

work based on the imagination

and not necessarily on fact)

{06525881} movie, film, picture,

moving picture, moving-picture

show, motion picture,

motion-picture show, picture show,

pic, flick -- (a form of entertainment

that enacts a story…

{02605965} star -- (feature as the

star; "The movie stars Dustin

Hoffman as an autistic man")

The Matrix representation

Matrix

1999

American

Australian

neo

science

fiction

world

keywords entities

entities are correctly

recognized and modeled

partially multilingual

(entities are inherently multilingual,

but other concepts aren’t)

common sense and abstract

concepts now ignored.

Page 117: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

very transparent and human-readable content representation

non-trivial NLP tasks automatically performed

(stopwords removal, n-grams identification, named entities recognition and

disambiguation)

Entity Linking Algorithms

Tag.me

Output

Page 118: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

each entity identified in the content can be a feature of a

semantics-aware content representation

based on entity linking

Entity Linking Algorithms

Tag.me

Output

Page 119: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Advantage #1: several common sense

concepts are now identified

Entity Linking Algorithms

Tag.me

Output

Page 120: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Output

Advantage #2: each entity is a reference

to a Wikipedia page

http://en.wikipedia.org/wiki/The_Wachowskis

not a simple textual feature!

Entity Linking Algorithms

Tag.me

Page 121: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

We can enrich this entity-based representation

by exploiting the Wikipedia categories’ tree

Entity Linking Algorithms

Tag.me + Wikipedia Categories

Page 122: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

final representation of items obtained by

merging entities identified in the text with

the (most relevant)

Wikipedia

categories each

entity is linked to

+ entities wikipedia categories features =

Entity Linking Algorithms

Tag.me + Wikipedia Categories

Page 123: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

synsets

{09596828} American -- (a native or

inhabitant of the United States)

{06281561} fiction -- (a literary

work based on the imagination

and not necessarily on fact)

{06525881} movie, film, picture,

moving picture, moving-picture

show, motion picture,

motion-picture show, picture show,

pic, flick -- (a form of entertainment

that enacts a story…

{02605965} star -- (feature as the

star; "The movie stars Dustin

Hoffman as an autistic man")

The Matrix representation

Matrix

1999

American

Australian

neo

science

fiction

world

keywords Wikipedia pages

Page 124: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

synsets

{09596828} American -- (a native or

inhabitant of the United States)

{06281561} fiction -- (a literary

work based on the imagination

and not necessarily on fact)

{06525881} movie, film, picture,

moving picture, moving-picture

show, motion picture,

motion-picture show, picture show,

pic, flick -- (a form of entertainment

that enacts a story…

{02605965} star -- (feature as the

star; "The movie stars Dustin

Hoffman as an autistic man")

The Matrix representation

Matrix

1999

American

Australian

neo

science

fiction

world

keywords Wikipedia pages

entities recognized and

modeled (as in OpenCalais)

Wikipedia-based representation:

some common sense terms

included, and new interesting

features (e.g. «science-fiction fil

director») can be generated

terms without a Wikipedia

mapping are ignored

Page 125: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

traditional

resources

collaborative

resources

o manually curated by experts

o available for a few languages

o difficult to maintain and update

o collaboratively built by the crowd

o highly multilingual

o up-to-date

Entity Linking Algorithms

Babelfy

Page 126: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016
Page 127: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Entity Linking Algorithms

Babelfy

we have both Named Entities and Concepts!

Page 128: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Entity Linking Algorithms

Babelfy

Page 129: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Entity Linking Algorithms

Babelfy

Page 130: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

synsets

{09596828} American -- (a native or

inhabitant of the United States)

{06281561} fiction -- (a literary

work based on the imagination

and not necessarily on fact)

{06525881} movie, film, picture,

moving picture, moving-picture

show, motion picture,

motion-picture show, picture show,

pic, flick -- (a form of entertainment

that enacts a story…

{02605965} star -- (feature as the

star; "The movie stars Dustin

Hoffman as an autistic man")

The Matrix representation

Matrix

1999

American

Australian

neo

science

fiction

world

keywords Babel synsets

Page 131: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

synsets

{09596828} American -- (a native or

inhabitant of the United States)

{06281561} fiction -- (a literary

work based on the imagination

and not necessarily on fact)

{06525881} movie, film, picture,

moving picture, moving-picture

show, motion picture,

motion-picture show, picture show,

pic, flick -- (a form of entertainment

that enacts a story…

{02605965} star -- (feature as the

star; "The movie stars Dustin

Hoffman as an autistic man")

The Matrix representation

Matrix

1999

American

Australian

neo

science

fiction

world

keywords Babel synsets

entities recognized and

modeled (as in OpenCalais

and Tag.me)

Wikipedia-based representation:

some common sense terms

included, and new interesting

features (e.g. «science-fiction

director) can be generated

includes linguistic knowledge

and is able to disambiguate terms

also multilingual!

Page 132: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Recap #4

o «Exogenous» techniques use

external knowledge sources to inject

semantics

o Word Sense Disambiguation

algorithms process the textual

description and replace keywords with

semantic concepts (as synsets)

o Entity Linking algorithms focus on

the identification of the entities. Some

recent approaches also able to identify

common sense terms

o Combination of both

approaches is potentially the

best strategy

encoding exogenous semantics

by processing textual descriptions

Page 133: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Semantic representations

Explicit (Exogenous)

Semantics

Implicit (Endogenous)

Semantics

Ontologies Linked

Open Data

Introduce semantics

by linking the

Item to a

knowledge graph

…….

Introduce semantics by

mapping the features

describing the item with

semantic concepts

Page 134: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Ontologies

o used to describe domain-specific

knowledge

o hierarchies of

concepts with

attributes and relations

o “An ontology is a formal,

explicit specification of

a shared conceptualization”

Guarino, Nicola. "Understanding, building and using ontologies." International Journal of Human-Computer

Studies 46.2 (1997): 293-310.

Page 135: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Exogenous Semantics through Ontologies

why do we need an ontology?

to share common understanding of the structure of information

o among people

o among software agents

to enable reuse of domain knowledge

o to avoid “re-inventing the wheel”

o to introduce standards to allow interoperability

Page 136: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Exogenous Semantics through Ontologies

why do we need an ontology?

to share common understanding of the structure of information

o among people

o among software agents

to enable reuse of domain knowledge

o to avoid “re-inventing the wheel”

o to introduce standards to allow interoperability

…let’s have an example!

Page 137: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Exogenous Semantics through Ontologies

A Movie Ontology

Page 138: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Exogenous Semantics through Ontologies

A Movie Ontology

(a small portion, actually)

we formally encode relevant aspects and the relationships among them

Page 139: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Exogenous Semantics through Ontologies

A Movie Ontology

(a small portion, actually)

every item formally modeled by following this structure

and encoded through a Semantic Web language (e.g. OWL, RDF)

Page 140: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Exogenous Semantics through Ontologies

A Movie Ontology

(a small portion, actually)

every item formally modeled by following this structure

and encoded through a Semantic Web language (e.g. OWL, RDF)

Page 141: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Exogenous Semantics through Ontologies

A Movie Ontology

(a small portion, actually)

why is it useful?

Page 142: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Exogenous Semantics through Ontologies

A Movie Ontology

why is it useful? each feature has a non-ambiguous «meaning»

Page 143: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Exogenous Semantics through Ontologies

A Movie Ontology

why is it useful?

we don’t need to process unstructured content

Page 144: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Exogenous Semantics through Ontologies

A Movie Ontology

(a small portion, actually)

why is it useful? we can perform some «reasoning» on user preferences. How?

Page 145: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Exogenous Semantics through Ontologies

The Movie Ontology

We can reason on the preferences and infer

that a user interested in The Matrix

(SciFi_and_Fantasy genre) is interested in

Imaginational_Entertainment and potentially

in Logical_Thrilling

Page 146: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

The Matrix representation

from a flat representation

toward a graph-based

representation

Page 147: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

The Matrix representation

from a flat representation

toward a graph-based

representation

semantics explicitly encoded

explicit relations between

concepts exist: reasoning to infer

interesting information

ontologies typically

domain-dependant

huge effort to build and

mantain an ontology

very huge effort to

populate an ontology!

Page 148: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Vision

Page 149: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Semantic representations

Explicit (Exogenous)

Semantics

Implicit (Endogenous)

Semantics

Ontologies Linked

Open Data

Introduce semantics

by linking the

Item to a

knowledge graph

…….

Introduce semantics by

mapping the features

describing the item with

semantic concepts

Page 150: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Linked Open Data

the giant global graph

Page 151: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Linked Open Data (cloud)

what is it?

(large) set of interconnected

semantic datasets

Page 152: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Linked Open Data (cloud)

statistics

149 billions triples, 3,842 datasets (http://stats.lod2.eu)

Page 153: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Linked Open Data (cloud)

DBpedia

core of the LOD cloud

RDF mapping of Wikipedia

Page 154: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Linked Open Data

cornerstones

1.

2.

methodology to publish, share and link

structured data on the Web

use of RDF

o every resource/entity/relation identified by a (unique) URI

o URI: http://dbpedia.org/resource/Halifax

re-use of existing properties to express an

agreed semantics and connect data sources

Page 155: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Linked Open Data (cloud)

Page 157: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

The Matrix representation

from a flat representation

toward a (richer) graph-based

representation

Page 158: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

The Matrix representation

we have the advantage of formal semantics defined in RDF, with

interesting features coming from Wikipedia

without the need of building and manually populating an ontology

Page 159: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

1.

2.

graph-based data models can be exploited to define more semantic

features based on graph topology

another advantage

Page 160: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

i4 (bipartite graph)

users = nodes

items = nodes

preferences = edges

Very intuitive

representation!

u1

i1

u2 i2

u3 i3

u4

i4

Graph-based Data Model

Page 161: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

i4

u1

i1

u2

u3 i3

u4

Semantic Graph-based Data Model

Page 162: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

i4 DBpedia

mapping

u1

i1

u2

u3 i3

u4

Semantic Graph-based Data Model

Page 163: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

i4

u1

i1

u2

u3 i3

u4

dcterms:subject Films about

Rebellions

Quentin Tarantino

1999 films

http://dbpedia.org/resource/1999_films

dcterms:subject

Semantic Graph-based Data Model

(1-hop)

Page 164: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

i4

u1

i1 u2

u3 i3 u4

dcterms:subject Films about

Rebellions

Quentin Tarantino

1999 films

http://dbpedia.org/resource/1999_films

dcterms:subject

American film

directors

dbo:award Lynne Thigpen

http://dbpedia.org/resource/Lynne_Thigpen

Semantic Graph-based Data Model

(2-hop)

Page 165: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

i4

u1

i1 u2

u3 i3 u4

dcterms:subject Films about

Rebellions

Quentin Tarantino

1999 films

http://dbpedia.org/resource/1999_films

dcterms:subject

American film

directors

dbo:award Lynne Thigpen

http://dbpedia.org/resource/Lynne_Thigpen

Semantic Graph-based Data Model

(n-hop)

Page 166: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

PageRank

Spreading activation

Average Neighbors

Degree Centrality

Semantic Graph-based Data Model

(Feature Generation)

new semantic features describing the

item can be inferred by mining the

structure of the tripartite graph

Page 167: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Recap #5

o Ontologies enrich the representation

by introducing formal semantics, but

they are very complicated to maintain

and build

o Linked Open Data merge the

advantages of ontologies with the

simplicity of a collaborative knowledge

source as Wikipedia

o Such approaches build a

graph-based representation

that triggers the generation of

semantic topological features

o Inherently multilingual!

encoding exogenous semantics through Knowledge Graphs

Page 168: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

How? Encoding endogenous semantics

(bottom-up approaches)

Page 169: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Insight

Very huge availability of textual content

Page 170: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Insight

We can use this huge amount of content to

directly learn a representation of words

Page 171: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Insight

What is «Peroni» ?

Pass me a Peroni!

I like Peroni

Football and Peroni, what a perfect Saturday!

Page 172: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Insight

What is «Budweiser» ?

Pass me a Budweiser!

I like Budweiser

Football and Budweiser, what a perfect Saturday!

Page 173: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Insight

What is «Budweiser» ?

Pass me a Budweiser!

I like Budweiser

Football and Budweiser, what a perfect Saturday!

Page 174: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Insight

What is «Peroni» ?

Pass me a Peroni!

I like Peroni

Football and Peroni, what a perfect Saturday!

Page 175: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Insight

Distributional Hypothesis

«Terms used in similar contexts

share a similar meaning»

Page 176: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Insight

The semantics learnt according to

terms usage is called «distributional»

Page 177: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Distributional Semantics

L.Wittgenstein

(Austrian philosopher)

Page 178: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Distributional Semantics

by analyzing large corpora of

textual data it is possible to

infer information about the

usage (about the meaning) of

the terms

Definition co-occurrence co-occurrence

co-occurrence co-occurrence (*) Firth, J.R. A synopsis of linguistic theory

1930-1955. In Studies in Linguistic Analysis,

pp. 1-32, 1957.

Page 179: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Distributional Semantics

by analyzing large corpora of

textual data it is possible to

infer information about the

usage (about the meaning) of

the terms

Definition co-occurrence co-occurrence

co-occurrence co-occurrence

Beer and wine share a similar meaning since

they are often used in similar contexts

(*) Firth, J.R. A synopsis of linguistic theory

1930-1955. In Studies in Linguistic Analysis,

pp. 1-32, 1957.

Page 180: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Distributional Semantics

Term-Contexts Matrix

A vector-space representation is learnt

by encoding in which context each term is used

(This representation is called WordSpace)

Page 181: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Distributional Semantics

Term-Contexts Matrix

A vector-space representation is learnt

by encoding in which context each term is used

Each row of the matrix is a vector!

Page 182: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Distributional Semantics

Term-Contexts Matrix

beer vs wine: good overlap

Similar!

Page 183: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Distributional Semantics

Term-Contexts Matrix

beer vs wine: no overlap

Not Similar!

Page 184: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

WordSpace

beer wine

mojito

dog

A vector space representation (called WordSpace)

is learnt according to terms usage in contexts

Page 185: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

WordSpace

beer wine

mojito

dog

A vector space representation (called WordSpace)

is learnt according to terms usage in contexts

Terms sharing a

similar usage

are very close

in the space

Page 186: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Distributional Semantics

Term-Contexts Matrix

Key question: what is the context?

Page 187: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Distributional Semantics

Term-Contexts Matrix

Key question: what is the context?

These approaches are very flexible since the «context» can

be set according to the granularity required by the

representation

Page 188: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Distributional Semantics

Term-Contexts Matrix

Key question: what is the context?

Coarse-grained granularity:

context=whole document

Page 189: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Distributional Semantics

Term-Contexts Matrix = Term-Document Matrix

Key question: what is the context?

(This is Vector Space Model!)

Vector Space Model is a Distributional Model

Page 190: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Distributional Semantics

Term-Contexts Matrix

Key question: what is the context?

Fine-grained granularities:

context=paragraph, sentence, window of words

Page 191: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Distributional Semantics

Term-Contexts Matrix

Fine-grained granularities:

PROs: the more fine-grained the representation, more precise the vectors

CONs: the more fine-grained the representation, the bigger the matrix

Page 192: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Distributional Semantics

Term-Contexts Matrix

The flexibility of distributional semantics models

also regards the rows of the matrix

Page 193: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Distributional Semantics

Term-Contexts Matrix

The flexibility of distributional semantics models

also regards the rows of the matrix

Keywords can be replaced with concepts

(as synsets or entities!)

Page 194: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Distributional Semantics

Term-Contexts Matrix

The flexibility of distributional semantics models

also regards the rows of the matrix

Keywords can be replaced with concepts

(as synsets or entities!)

✔ ✔

Page 195: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Distributional Semantics

✔ ✔

Term-Contexts Matrix

Keanu Reeves and Al Pacino

are «connected» because they

both acted in Drama Films

Page 196: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Distributional Semantics

Representing Documents

Given a WordSpace, a vector space representation of

documents (called DocSpace) is typically built as the

centroid vector of word representations

✔ ✔

Page 197: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Distributional Semantics

Representing Documents

✔ ✔

Page 198: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

DocSpace

Given a WordSpace, a vector space representation of

documents (called DocSpace) is typically built as the

centroid vector of word representations

Matrix Revolutions

Donnie Darko

Up!

similarity

calculations

between items

semantic

representation

The Matrix

Page 199: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Distributional Semantics

• We can exploit the (big) corpora of

data to directly learn a semantic

vector-space representation of

the terms of a language

• Lightweight semantics, not

formally defined

• High flexibility: everything is a

vector: term/term similarity, doc/term,

term/doc, etc..

• Context can have different

granularities

• Huge amount of content is needed

• Matrices are particularly huge and

difficult to build

• Too many features: need for

dimensionality reduction

Page 200: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Semantic representations

Explicit (Exogenous)

Semantics

Implicit (Endogenous)

Semantics

Explicit

Semantic

Analysis

Random

Indexing

…… Word2Vec

Distributional

semantic models

Introduce semantics by

mapping the features

describing the item with

semantic concepts

Introduce semantics

by linking

the Item to

a knowledge graph

Distributional Semantics

models share the same

insight but have important

distinguishing aspects

Page 201: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Semantic representations

Explicit (Exogenous)

Semantics

Implicit (Endogenous)

Semantics

Explicit

Semantic

Analysis

Random

Indexing

…… Word2Vec

Distributional

semantic models

Introduce semantics by

mapping the features

describing the item with

semantic concepts

Introduce semantics

by linking

the Item to

a knowledge graph

Distributional Semantics

models share the same

insight but have important

distinguishing aspects

Page 202: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Explicit Semantic Analysis (ESA)

ESA builds a vector-space

semantic

representation of natural language texts in a

high-dimensional space of

comprehensible

concepts derived from

Wikipedia [Gabri06]

[Gabri06] E. Gabrilovich and S. Markovitch. Overcoming the Brittleness Bottleneck using Wikipedia: Enhancing Text

Categorization with Encyclopedic Knowledge. In Proceedings of the 21th National Conf. on Artificial Intelligence and the

18th Innovative Applications of Artificial Intelligence Conference, pages 1301–1306. AAAI Press, 2006.

Panthera

World War II

Jane Fonda

Island

Page 203: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Explicit Semantic Analysis (ESA)

ESA matrix

ESA Concept 1 … Concept n

term 1 TF-IDF TF-IDF TF-IDF

… TF-IDF TF-IDF TF-IDF

term k TF-IDF TF-IDF TF-IDF Term

s

218

ESA is a Distributional

Semantic model which

uses Wikipedia

articles as context

Wikipedia articles

Page 204: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Explicit Semantic Analysis (ESA)

ESA matrix

ESA Concept 1 … Concept n

term 1 TF-IDF TF-IDF TF-IDF

… TF-IDF TF-IDF TF-IDF

term k TF-IDF TF-IDF TF-IDF Term

s

219

Wikipedia articles

semantic relatedness

between a word and a concept

TF-IDF score

Page 205: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Every Wikipedia article represents a concept

Article words are associated with the concept (TF-IDF)

Explicit Semantic Analysis (ESA)

Each Wikipedia page can be described in terms of

the words with the highest TF/IDF score (this is a

column of ESA

matrix)

Page 206: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Explicit Semantic Analysis (ESA)

ESA matrix

ESA Concept 1 … Concept n

term 1 TF-IDF TF-IDF TF-IDF

… TF-IDF TF-IDF TF-IDF

term k TF-IDF TF-IDF TF-IDF

221

The vector-space representation of each term is called

semantic interpretation vector

Page 207: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Explicit Semantic Analysis (ESA)

Every Wikipedia article represents a concept

Article words are associated with the concept (TF-IDF)

The semantics of a word is the

vector of its associations with

Wikipedia concepts

Page 208: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Explicit Semantic Analysis (ESA)

Important: the semantics of the words is not static.

It changes as Wikipedia articles are modified or

new articles are introduced.

ESA provides a representation which evolves over time!

Page 209: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Explicit Semantic Analysis (ESA)

«web» in 1980 «web» in 2000

Important: the semantics of the words is not static.

It changes as Wikipedia articles are modified or

new articles are introduced.

ESA provides a representation which evolves over time!

Page 210: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Explicit Semantic Analysis (ESA)

«web» in 1980 «web» in 2000

Important: the semantics of the words is not static.

It changes as Wikipedia articles are modified or

new articles are introduced.

ESA provides a representation which evolves over time!

Page 211: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

The semantics of a text fragment is the

centroid of the semantics of its words

Game Controller

[0.32]

Mickey Mouse [0.81]

Game Controller

[0.64]

Explicit Semantic Analysis (ESA)

Page 212: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Explicit Semantic Analysis (ESA)

A semantic representation of an item can be built as the

centroid vector of the semantic interpretation vectors of

the terms in the plot.

Page 213: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Explicit Semantic Analysis (ESA)

A semantic representation of an item can be built as the

centroid vector of the semantic interpretation vectors of

the terms in the plot.

Page 214: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Explicit Semantic Analysis (ESA)

Representation can be further improved and enriched by

processing content through exogenous techniques (e.g.

entity linking) in order to catch complex concepts

Page 215: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Explicit Semantic Analysis (ESA)

semantic

relatedness

of a pair of text fragments

(e.g. description of two

items) computed by

comparing their

semantic

interpretation

vectors using the

cosine metric

Matrix Revolutions

Donnie Darko

Up!

The Matrix

Page 216: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Explicit Semantic Analysis (ESA)

Another advantage: ESA can be also used to generate a set

of relevant extra concepts describing the items.

How?

Page 217: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Explicit Semantic Analysis (ESA)

Another advantage: ESA can be also used to generate a set

of relevant extra concepts describing the items.

The Wikipedia pages with the highest TF/IDF score in the

semantic interpretation vector of the item!

Page 218: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Explicit Semantic Analysis (ESA)

Artificial Intelligence

[0.61]

Page 219: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

235

Explicit Semantic Analysis (ESA)

Page 220: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Explicit Semantic Analysis (ESA)

Distributional Model which uses

Wikipedia Article as context

Very Transparent representation

(columns have an explicit meaning)

Representation can evolve over time!

Also language-independent, thanks to

cross-language Wikipedia links

The whole matrix is very huge

«Empirical» tuning of the parameters:

how many articles? How many terms?

What is the thresholding?

Page 221: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

When transparency is not so important,

it is possible to learn a more compact

vector-space representation of terms and items

Page 222: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

When transparency is not so important,

it is possible to learn a more compact

vector-space representation of terms and items

Dimensionality Reduction techniques

Page 223: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

When transparency is not so important,

it is possible to learn a more compact

vector-space representation of terms and items

a.k.a. Word embedding techniques Embedding = a smaller representation of words

(more recent – equivalent - buzzword )

Page 224: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

When transparency is not so important,

it is possible to learn a more compact

vector-space representation of terms and items

a.k.a. Word embedding techniques Embedding = a smaller representation of words

Is this new?

Page 225: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Dimensionality reduction techniques

Latent Semantic Analysis (LSA) is a widespread

distributional semantics model which builds

a term/term matrix and calculates SVD over that matrix.

Dumais, Susan T. "Latent semantic

analysis." Annual review of information science

and technology 38.1 (2004): 188-230.

Page 226: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Dimensionality reduction techniques

Dumais, Susan T. "Latent semantic

analysis." Annual review of information science

and technology 38.1 (2004): 188-230.

Truncated Singular Value Decomposition

induces higher-order (paradigmatic) relations through the truncated SVD

Latent Semantic Analysis (LSA) is a widespread

distributional semantics model which builds

a term/context matrix and calculates SVD over that matrix.

Page 227: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Singular Value Decomposition

PROBLEM

the huge co-occurrence matrix

SOLUTION

don’t build the huge co-occurrence matrix!

Use incremental and scalable techniques

Dimensionality reduction techniques

Page 228: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Semantic representations

Explicit (Exogenous)

Semantics

Implicit (Endogenous)

Semantics

Explicit

Semantic

Analysis

Random

Indexing

…… Word2Vec

Distributional

semantic models

Introduce semantics by

mapping the features

describing the item with

semantic concepts

Introduce semantics

by linking

the Item to

a knowledge graph

Page 229: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Dimensionality reduction

Random Indexing

It is an incremental and scalable technique

for dimensionality reduction.

M. Sahlgren. The Word-Space Model: Using Distributional Analysis to Represent Syntagmatic and Paradigmatic Relations

between Words in High-dimensional Vector Spaces. PhD thesis, Stockholm University, 2006.

Page 230: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Dimensionality reduction

Random Indexing

It is an incremental and scalable technique

for dimensionality reduction.

Insight

Assign a vector to each context (word, documents, etc.). The

vector can be as big as you want.

Fill the vector with (almost) randomly assigned values.

Given a word, collect the contexts where that word appears.

Update the representation according to term co-occurrences.

The final representation is the «sum» of the contexts.

Obtain a (smaller but equivalent) vector space representation of

the terms

M. Sahlgren. The Word-Space Model: Using Distributional Analysis to Represent Syntagmatic and Paradigmatic Relations

between Words in High-dimensional Vector Spaces. PhD thesis, Stockholm University, 2006.

Page 231: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Dimensionality reduction

Random Indexing

It is an incremental and scalable technique

for dimensionality reduction.

Insight

Assign a vector to each context (word, documents, etc.). The

vector can be as big as you want.

Fill the vector with (almost) randomly assigned values.

Given a word, collect the contexts where that word appears.

Update the representation according to term co-occurrences.

The final representation is the «sum» of the contexts.

Obtain a (smaller but equivalent) vector space representation of

the terms

M. Sahlgren. The Word-Space Model: Using Distributional Analysis to Represent Syntagmatic and Paradigmatic Relations

between Words in High-dimensional Vector Spaces. PhD thesis, Stockholm University, 2006.

Page 232: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Random Indexing

Algorithm

Step 1 - definition of the context granularity:

Document? Paragraph? Sentence? Word?

Step 2 – building the random matrix R

each ‘context’ (e.g. sentence) is assigned a

context vector

dimension = k

allowed values = {-1, 0, +1}

small # of non-zero elements, i.e. sparse vectors

values distributed in a random way

Page 233: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Random Indexing

Context vectors of dimension k = 8

Each row is a «context»

Page 234: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Random Indexing

Algorithm

Step 3 – building the reduced space B

the vector space representation of a term t

obtained by combining the random vectors

of the context in which it occurs in

t1 ∈ {c1, c2, c5}

Page 235: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Random Indexing

Algorithm

Step 3 – building the reduced space B

r1

0, 0, -1, 1, 0, 0, 0, 0

r2 1, 0, 0, 0, 0, 0, 0, -1

r3 0, 0, 0, 0, 0, -1, 1, 0

r4 -1, 1, 0, 0, 0, 0, 0, 0

r5

1, 0, 0, -1, 1, 0, 0, 0

rn …

t1 ∈ {c1, c2, c5}

r1

0, 0, -1, 1, 0, 0, 0, 0

r2 1, 0, 0, 0, 0, 0, 0, -1

r5 1, 0, 0, -1, 1, 0, 0, 0

t1 2, 0, -1, 0, 1, 0, 0, -1

Output: WordSpace

Page 236: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Random Indexing

Algorithm

Step 4 – building the document space

the vector space representation of a

document d obtained by

combining the vector space representation

of the terms that occur in the document

Output: DocSpace

Page 237: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

WordSpace and DocSpace

c1 c

2 c

3 c

4 … c

k

t1

t2

t3

t4

tm

c1 c

2 c

3 c

4 … c

k

d1

d2

d3

d4

dn

DocSpace WordSpace

Uniform representation

k is a simple

parameter

of the model

Page 238: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Dimensionality reduction

..even if it sounds weird

theory: Johnson-Lindenstrauss’ lemma [*]

Bm,k ≈ Am,n Rn,k k << n

distances between the points in the reduced space approximately preserved if

context vectors are nearly orthogonal

(and they are)

[*] Johnson, W. B., & Lindenstrauss, J. (1984). Extensions of Lipschitz mappings

into a Hilbert space. Contemporary mathematics, 26(189-206), 1.

Page 239: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Random Indexing

…. is also multilingual!

the same concept, expressed in different languages,

assumes the same position in language-based geometric

spaces

the position of beer in a geometric space based on English

and the position of birra in a geometric space based on

Italian are (almost) the same

Page 240: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Random Indexing

…. is also multilingual!

Italian WordSpace English WordSpace

glass

spoon

dog

beer

cucchiaio

cane

birra

bicchiere

The position in the space can be slightly different, but the

relations similarity between terms still hold

Page 241: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Random Indexing

Incremental and Scalable technique for

learning word embeddings

Page 242: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Semantic representations

Explicit (Exogenous)

Semantics

Implicit (Endogenous)

Semantics

Explicit

Semantic

Analysis

Random

Indexing

…… Word2Vec

Distributional

semantic models

Introduce semantics by

mapping the features

describing the item with

semantic concepts

Introduce semantics

by linking

the Item to

a knowledge graph

Page 243: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Word2Vec

• Distributional Model to learn Word Embeddings.

• Uses a two-layers neural network

• Training based on the Skip-Gram methodology

• Update of the network through Mini-batch or Stochastic Gradient

Descent

In a nutshell

Page 244: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Word2Vec

(Partial) Structure of the network

Input Layer:

• Vocabulary V

• |V| number of terms

• |V| nodes

• Each term is

represented through

a «one hot

representation»

Page 245: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Word2Vec

(Partial) Strucure of the network

Input Layer:

• Vocabulary V

• |V| number of terms

• |V| nodes

• One-hot representation

Hidden Layer:

• N nodes

• N = size of the embeddings

• Parameter of the model

Page 246: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Word2Vec

(Partial) Structure of the network

Hidden Layer:

• N nodes

• N = size of the embeddings

• Parameter of the model

Weight of the network:

• Randomly set (initially)

• Updated through the training

Input Layer:

• Vocabulary V

• |V| number of terms

• |V| nodes

• One-hot representation

Page 247: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Word2Vec

Hidden Layer:

• N nodes

• N = size of the embeddings

• Parameter of the model

Weight of the network:

• Randomly set (initially)

• Updated through the training

Final Representation for term tk

• Weights Extracted from the network

• tk=[wtkv1, wtkv2 … wtkvn]

Input Layer:

• Vocabulary V

• |V| number of terms

• |V| nodes

• One-hot representation

(Partial) Structure of the network

Page 248: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Word2Vec

Training Procedure: how to create training examples?

Skip-Gram Methodology Continuous Bag-of-Words

Methodology

Given a word w(t), predict its

context w(t-2), t(t-1).. w(t+1), w(t+2) Given a context w(t-2), t(t-1)..

w(t+1), w(t+2) predict word w(t)

Page 249: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Word2Vec

Training Procedure: how to create training examples?

Skip-Gram Methodology

Given a word w(t), predict its

context w(t-2), t(t-1).. w(t+1), w(t+2)

Example Input: ”the quick brown fox

jumped over the lazy dog”

Window Size: 1

Contexts:

• ([the, brown], quick)

• ([quick, fox], brown)

• ([brown, jumped], fox) ...

Training Examples:

• (quick, the)

• (quick, brown)

• (brown, quick)

• (brown, fox) ...

Page 250: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Word2Vec

Training Procedure: how to optimize the model?

And probability is calculated through soft-max

The model tries to maximize The

probability of predicting a context C

given a word w

Given a corpus, we create of training examples through Skip-Gram.

Page 251: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Word2Vec

Training Procedure: how to optimize the model?

And probability is calculated through soft-max

The model tries to maximize The

probability of predicting a context C

given a word w

Intuitively, probability is high when scalar product

is close to 1 when vectors are similar!

Given a corpus, we create a training examples through Skip-Gram.

Page 252: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Word2Vec

Training Procedure: how to optimize the model?

Given a corpus, we create a training examples through Skip-Gram.

And probability is calculated through soft-max

The model tries to maximize The

probability of predicting a context C

given a word w

Intuitively, probability is high when scalar product

is close to 1 when vectors are similar!

Word2Vec is a distributional model since it learns a

representation such that couples (word,context)

appearing together have similar vectors

The error is collected and weights in the network are updated

accordingly. Typically is used Stochastic Gradient Descent or Mini-

Batch (every 128 or 512 training examples)

Page 253: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Representation can be really really

small (size<100, typically)

Trending - Recent and Very Hot

technique

Word2Vec

Learning Word Embeddings

through Neural Networks: it is not

based on «counting» co-

occurrences. It relies on «predict»

the distribution

Not transparent anymore

Needs more computational resources

Page 254: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

…Let’s put everything together

Page 255: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Semantic representations

Explicit (Exogenous)

Semantics

Implicit (Endogenous)

Semantics

Distributional

semantic models

Introduce semantics by

mapping the features

describing the item with

semantic concepts

Introduce semantics

by linking

the Item to

a knowledge graph

Build a

Graph-based Data

Model

Page 256: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Semantic representations

Explicit (Exogenous)

Semantics

Implicit (Endogenous)

Semantics

Distributional

semantic models

Introduce semantics by

mapping the features

describing the item with

semantic concepts

Introduce semantics

by linking

the Item to

a knowledge graph

Build a

Graph-based Data

Model

Work on

Vector Space Model Work on

Vector Space Model

Page 257: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Semantic representations

Explicit (Exogenous)

Semantics

Implicit (Endogenous)

Semantics

Distributional

semantic models

Introduce semantics by

mapping the features

describing the item with

semantic concepts

Introduce semantics

by linking

the Item to

a knowledge graph

Build a

Graph-based Data

Model

Work on

Vector Space Model Work on

Vector Space Model

Can Exogenous and Endogenous approaches be combined?

Page 258: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Exogenous Approaches as Entity Linking and WSD

work on the row of the matrix

…Let’s put everything together

Page 259: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Exogenous Approaches as Entity Linking and WSD

work on the row of the matrix

…Let’s put everything together

Endogenous Approaches as ESA or Word2Vec

work on the columns of the matrix

Page 260: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Exogenous Approaches as Entity Linking and WSD

work on the row of the matrix

…Let’s put everything together

Endogenous Approaches as ESA or Word2Vec

work on the columns of the matrix

Both approaches can be combined to obtain richer

and more precise semantic representations

(e.g. Word2Vec over textual description processed with WSD)

Page 261: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

What?

semantics-aware recommender systems

Page 262: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Semantic representations

Explicit (Exogenous)

Semantics

Implicit (Endogenous)

Semantics

Word Sense

Disambiguation

Entity

Linking …….

Introduce semantics by

mapping the features

describing the item with

semantic concepts

Introduce semantics

by linking the

Item to a

knowledge graph

Page 263: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

synsets

{09596828} American -- (a native or

inhabitant of the United States)

{06281561} fiction -- (a literary

work based on the imagination

and not necessarily on fact)

{06525881} movie, film, picture,

moving picture, moving-picture

show, motion picture,

motion-picture show, picture show,

pic, flick -- (a form of entertainment

that enacts a story…

{02605965} star -- (feature as the

star; "The movie stars Dustin

Hoffman as an autistic man")

The Matrix representation

through WSD we process the

textual description of the item

and we obtain a semantics-

aware representation of the

item as output.

In this case, keyword-based

features are replaced with the

concepts (in this case, a

WordNet synset) they refer to.

Matrix

1999

American

Australian

fiction

world

keywords

science

Hugo

Page 264: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Synset-based representation

AI

Artificial

Intelligence

apple

M. Degemmis, P. Lops, and G. Semeraro. A Content-collaborative Recommender that Exploits WordNet-based User Profiles for

Neighborhood Formation. User Modeling and User-Adapted Interaction: The Journal of Personalization Research (UMUAI), 17(3):217–255,

Springer Science + Business Media B.V., 2007.

G. Semeraro, M. Degemmis, P. Lops, and P. Basile. Combining Learning and Word Sense Disambiguation for Intelligent User Profiling. In M.

M. Veloso, editor, IJCAI 2007, Proceedings of the 20th International Joint Conference on Artificial Intelligence, Hyderabad, India, January 6-

12, 2007 , pages 2856–2861. Morgan Kaufmann, 2007.

M.Degemmis, P. Lops, G. Semeraro, Pierpaolo Basile: Integrating tags in a semantic content-based recommender

ACM Conference on Recommender Systems, RecSys 2008: 163-170

Page 265: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Keywords- vs synsets-based profiles

EachMovie dataset

o 1,628 movies grouped into 10 categories

o Users who rated between 30 and 100 movies

o Movie content crawled from IMDb

Page 266: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

In the context of cultural heritage personalization, does the integration of UGC and textual description of artwork collections cause an increase of the prediction accuracy in the process of recommending artifacts to users?

Results in a cultural heritage scenario

In RecSys ’08, Proceed. of the 2nd ACM Conference on Recommender Systems, pages 163–170, October 23-25, 2008,

Lausanne, Switzerland, ACM, 2008.

Page 267: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Results in a cultural heritage scenario

5-point rating scale

Textual description of items (static content)

Personal Tags

Social Tags (from other users): caravaggio, deposition, christ, cross, suffering, religion

Social Tags

passion

Page 268: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Results in a cultural heritage scenario

Personal Tags

Static Content

Social Tags

Page 269: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Results in a cultural heritage scenario

o Artwork representation

o Artist

o Title

o Description

o Tags

o change of text representation from vectors of words (BOW) into vectors of WordNet synsets (BOS)

o From tags to semantic tags

o supervised Learning

o Bayesian Classifier learned from artworks labeled with user ratings and tags

Page 270: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Results in a cultural heritage scenario

* Results averaged over the 30 study subjects

Au

gm

en

ted

Pro

file

s

Co

nte

nt-

based

Pro

file

s

Tag

-based

Pro

file

s

Overall accuracy F1 ≈ 85%

Page 271: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Results in a cultural heritage scenario

personalized museum tours by arranging the most interesting

items for the active user

step forward to take into account spatial layout & time constraint

L. Iaquinta, M. de Gemmis, P. Lops, G. Semeraro: Recommendations toward Serendipitous Diversions. ISDA 2009: 1049-1054

Page 272: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Semantic representations

Explicit (Exogenous)

Semantics

Implicit (Endogenous)

Semantics

Ontologies Linked

Open Data

Introduce semantics

by linking the

Item to a

knowledge graph

…….

Introduce semantics by

mapping the features

describing the item with

semantic concepts

Page 273: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Semantic Analysis using Ontologies

Quickstep & Foxtrot

o on-line academic research

papers recommenders

o items and user profiles represented through a research

topic ontology

o is-a relationships exploited to

infer general interests when specific

topics are observed

o match based on the correlation

between the topics in the user

profile and topics in papers

ACM Transactions

on Information Systems, 22(1):54–88, 2004

Page 274: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Semantic Analysis using Ontologies

. In W. Nejdl, J. Kay,

P. Pu, and E. Herder, editors, Adaptive Hypermedia and AdaptiveWeb-Based Systems, volume 5149 of Lecture Notes in Computer

Science, pages 279–283. Springer, 2008.

News@hand

Page 275: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Semantic Analysis using Ontologies

o user interests propagation from concepts which received the user feedback to

others related ones though spreading activation

o contextualized propagation strategies of user interests

horizontal propagation among siblings

anisotropic vertical propagation, i.e. user interests propagated differently

upward and downward

. Inf. Sci.,

250:40–60, 2013.

Page 276: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Semantic representations

Explicit (Exogenous)

Semantics

Implicit (Endogenous)

Semantics

Ontologies Linked

Open Data

Introduce semantics

by linking the

Item to a

knowledge graph

…….

Introduce semantics by

mapping the features

describing the item with

semantic concepts

Page 277: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Linked Open Data & RecSys

structured information source

for item descriptions

Page 278: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

LOD & Recommender Systems

Vector Space Model for LOD

Tommaso Di Noia, Roberto Mirizzi, Vito Claudio Ostuni, Davide Romito, Markus Zanker. Linked Open Data to support Content-based

Recommender Systems. 8th International Conference on Semantic Systems (I-SEMANTICS) - 2012 (Best Paper Award)

Page 279: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

LOD & Recommender Systems

Vector Space Model for LOD

Tommaso Di Noia, Roberto Mirizzi, Vito Claudio Ostuni, Davide Romito, Markus Zanker. Linked Open Data to support Content-based

Recommender Systems. 8th International Conference on Semantic Systems (I-SEMANTICS) - 2012 (Best Paper Award)

(starring, directors,

subject, etc.)

Page 280: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

LOD & Recommender Systems

Property subset evaluation

subject+broader solution

better than only subject or

subject+more broaders

too many broaders

introduce noise

best solution achieved

with

subject+broader+genres

Tommaso Di Noia, Roberto Mirizzi, Vito Claudio Ostuni, Davide Romito, Markus Zanker. Linked Open Data to support Content-based

Recommender Systems. 8th International Conference on Semantic Systems (I-SEMANTICS) - 2012 (Best Paper Award)

Page 281: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Graph-based RecSys

Recommendations

obtained by mining

the graph

Page 282: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Graph-based RecSys

Recommendations

obtained by mining

the graph

Identification of the

most relevant (target)

nodes, according to

the recommendation

scenario

Page 283: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Graph-based RecSys

Recommendations

obtained by mining

the graph

Identification of the

most relevant (target)

nodes, according to

the recommendation

scenario

PageRank

Spreading Activation

Personalized PageRank

Cataldo Musto, Pasquale Lops, Pierpaolo Basile, Marco De

Gemmis, Giovanni Semeraro.

.

UMAP 2016

Page 284: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Graph-based RecSys

Personalized PageRank [*] to identify

the most relevant nodes in the graph

[*] T. H. Haveliwala. Topic-

Sensitive PageRank: A

Context-Sensitive Ranking

Algorithm for Web Search.

IEEE Trans. Knowl. Data

Eng., 15(4):784–796, 2003.

Page 285: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Graph-based RecSys

MovieLens 100K dataset

G = Personalized PageRank on Bipartite User-Item Graph

G+LOD = Tripartite Graph modeling also Linked Open Data

Page 286: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Graph-based RecSys

is it necessary to inject all the properties available in LOD cloud?

Page 287: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Graph-based RecSys

is it necessary to inject all the properties available in LOD cloud?

Page 288: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Graph-based RecSys

is it necessary to inject all the properties available in LOD cloud?

Page 289: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Graph-based RecSys

what are the most

promising properties

to include?

manual selection o domain-specific

properties

o most frequent properties

o …

automatic selection

o more difficult to

implement

Cataldo Musto, Pasquale Lops, Pierpaolo Basile, Marco De

Gemmis, Giovanni Semeraro.

.

UMAP 2016

Page 290: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Feature selection

selecting the most promising subset of LOD-based

properties

possible techniques

PageRank

Principal Component Analysis

Information Gain

Information Gain Ratio

Mininum Redundancy Maximum Relevance

Page 291: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Graph-based RecSys

tradeoff between accuracy and diversity

MovieLens 100K dataset

Page 292: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Graph-based RecSys

Comparison to state of the art

MovieLens 100K dataset

Page 293: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

we can further build some

extra features by

mining the paths

occurring in the graph

path acyclic sequence of relations ( s , .. rl , .. rL )

frequency of pathj in the sub-graph related to u ad x

u3

s i2 p

2 e

1 p

1 i1

(s, p2 ,

p

1)

𝑤𝑢𝑥(𝑗) = #𝑝𝑎𝑡ℎ𝑢𝑥(𝑗)

#𝑝𝑎𝑡ℎ𝑢𝑥(𝑗)𝑗

Semantic Graph-based Data Model

(Path Based Features)

Vito Claudio Ostuni, Tommaso Di Noia, Eugenio Di Sciascio, Roberto Mirizzi: Top-N recommendations from implicit feedback leveraging

linked open data. RecSys 2013: 85-92

Page 294: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

LOD & Recommender Systems

Sprank Systems

o analysis of complex relations between user

preferences and the target item

o extraction of path-based features

Vito Claudio Ostuni, Tommaso Di Noia, Eugenio Di Sciascio, Roberto Mirizzi: Top-N recommendations from implicit feedback leveraging

linked open data. RecSys 2013: 85-92

Page 295: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

wu3X1 ?

Semantic Graph-based Data Model

(Path Based Features)

Page 296: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

path1 (s, s, s) : 1

Semantic Graph-based Data Model

(Path Based Features)

Page 297: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

path1 (s, s, s) : 2

Semantic Graph-based Data Model

(Path Based Features)

Page 298: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

path1 (s, s, s) : 2

path2 (s, p2, p1) : 1

Semantic Graph-based Data Model

(Path Based Features)

Page 299: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

path1 (s, s, s) : 2

path2 (s, p2, p1) : 2

Semantic Graph-based Data Model

(Path Based Features)

Page 300: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

path1 (s, s, s) : 2

path2 (s, p2, p1) : 2

path3 (s, p2, p3, p1) : 1

Semantic Graph-based Data Model

(Path Based Features)

Page 301: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

path1 (s, s, s) : 2

path2 (s, p2, p1) : 2

path3 (s, p2, p3, p1) : 1

𝑤𝑢3𝑥1 1 = 2

5

𝑤𝑢3𝑥1 2 = 2

5

𝑤𝑢3𝑥1 3 = 1

5

Semantic Graph-based Data Model

(Path Based Features)

Page 302: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Evaluation:

LOD-based overcomes state-of-the-art

Page 303: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Semantic representations

Explicit (Exogenous)

Semantics

Implicit (Endogenous)

Semantics

Explicit

Semantic

Analysis

Random

Indexing

…… Word2Vec

Distributional

semantic models

Introduce semantics by

mapping the features

describing the item with

semantic concepts

Introduce semantics

by linking

the Item to

a knowledge graph

Page 304: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Word2Vec

• Empirical Comparison of Word Embedding Techniques

for Content-based Recommender Systems [*]

• Methodology

• Build a WordSpace using different Word Embedding

techniques (and different sizes)

• Build a DocSpace as the centroid vectors of term vectors

• Build User Profiles as centroid of the items they liked

• Provide Users with Recommendations

• Compare the approaches

European Conference on Information Retrieval

Page 305: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Word2Vec

Results on Dbbook and MovieLens data

European Conference on Information Retrieval

Page 306: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Semantic representations

Explicit (Exogenous)

Semantics

Implicit (Endogenous)

Semantics

Explicit

Semantic

Analysis

Random

Indexing

…… Word2Vec

Distributional

semantic models

Introduce semantics by

mapping the features

describing the item with

semantic concepts

Introduce semantics

by linking

the Item to

a knowledge graph

Page 307: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

eVSM

• Enhanced Vector Space Model [*]

• Content-based Recommendation Framework

• Cornerstones

• Semantics modeled through Distributional Models

• Random Indexing for Dimensionality Reduction

• Negative Preferences modeled through Quantum Negation [^]

• User Profiles as centroid vectors of items representation

• Recommendations through Cosine Similarity

[*] Musto, Cataldo. "Enhanced vector space models for content-based recommender systems." Proceedings of the fourth

ACM conference on Recommender systems. ACM, 2010.

Mathematics of language

Page 308: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

eVSM

Musto, Cataldo. "Enhanced vector space models for content-based recommender systems." Proceedings of the fourth ACM

conference on Recommender systems. ACM, 2010.

Distributional Models

to build DocSpace of the items

(whole document used as context)

Random Indexing for

Dimensionality Reduction

Page 309: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

eVSM

Cornerstones

• Given two vectors a e b

• Through Quantum Negation we can define a Vector (a ∧¬b) • Formally:

• Projection of vector a on the subspace orthogonal to that generated

by vector b

• Intuitively:

• Vector «a» models «positive» preferences

• Vector «b» models «negative» preferences

• Through quantum negation we get a unique vector modeling both

aspects

• Close to vectors containing as many as possible features from «a»

and as less as possible features from «b»

Mathematics of language

Page 310: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

eVSM

• User Profiles

• Calculated as centroid vectors of the items the user liked/disliked

Random Indexing-

based Profiles (RI)

Musto, Cataldo. "Enhanced vector space

models for content-based recommender

systems." Proceedings of the fourth ACM

conference on Recommender systems. ACM,

2010.

Page 311: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

eVSM

Cornerstones

• User Profiles

• Calculated as centroid vectors of the items the user liked/disliked

Random Indexing-

based Profiles (W-RI)

Quantum Negation- based

Profiles (W-QN)

Musto, Cataldo. "Enhanced vector space

models for content-based recommender

systems." Proceedings of the fourth ACM

conference on Recommender systems. ACM,

2010.

Page 312: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

eVSM

Cornerstones

• Recommendations

• Similarity Calculations on DocSpace

Page 313: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

eVSM

Experiments

The size of the embeddings does not significantly affect the overall

accuracy of eVsm (MovieLens data)

Page 314: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

eVSM

Experiments

Quantum Negation improves the accuracy of the model

(MovieLens data, embedding size=100)

Page 315: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

eVSM

Experiments

eVSM significantly overcame all the baselines.

(MovieLens data, embedding size=400)

Page 316: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Semantic representations

Explicit (Exogenous)

Semantics

Implicit (Endogenous)

Semantics

Explicit

Semantic

Analysis

Random

Indexing

…… Word2Vec

Distributional

semantic models

Introduce semantics by

mapping the features

describing the item with

semantic concepts

Introduce semantics

by linking

the Item to

a knowledge graph

Word Sense

Disambiguation

Entity

Linking

Page 317: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

C-eVSM

• Contextual Enhanced Vector Space Model [*]

• Extension of eVSM: context-aware Framework

• Cornerstones

• Entity Linking of the content through Tag.me

• Semantics modeled through Distributional Models

• Random Indexing for Dimensionality Reduction

• Distributional Models also used to build a representation of

the context

• Context-aware User Profiles as centroid vectors

• Recommendations through Cosine Similarity

Musto, Cataldo, et al. "Combining distributional semantics and entity linking for context-aware content-based

recommendation." International Conference on User Modeling, Adaptation, and Personalization. UMAP 2014

Page 318: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

C-eVSM

• Context-aware User Profiles

Musto, Cataldo, et al. "Combining distributional semantics and entity linking for context-aware content-based

recommendation." International Conference on User Modeling, Adaptation, and Personalization. UMAP 2014

α α

Let u be the target user

Let ck be a contextual variable (e.g. task, mood, etc.)

Let vj be its value (e.g. task=running, mood=sad, etc.)

Page 319: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

C-eVSM

• Context-aware User Profiles

Musto, Cataldo, et al. "Combining distributional semantics and entity linking for context-aware content-based

recommendation." International Conference on User Modeling, Adaptation, and Personalization. UMAP 2014

α α

eVSM

profile

Page 320: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

C-eVSM

• Context-aware User Profiles

Musto, Cataldo, et al. "Combining distributional semantics and entity linking for context-aware content-based

recommendation." International Conference on User Modeling, Adaptation, and Personalization. UMAP 2014

C-WRI(u,ck,v

j) = α * WRI(u) + (1-α) * context(u,c

k,v

j)

eVSM

profile

A Vector representing value v for

context c is introduced (e.g.

company=friends)

Page 321: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

C-eVSM

• Context-aware User Profiles

Musto, Cataldo, et al. "Combining distributional semantics and entity linking for context-aware content-based

recommendation." International Conference on User Modeling, Adaptation, and Personalization. UMAP 2014

C-WRI(u,ck,v

j) = α * WRI(u) + (1-α) * context(u,c

k,v

j)

eVSM

profile

A Vector representing value v for

context c is introduced (e.g.

company=friends)

Linear Combination

(a=1 eVSM no context

taken into account)

Page 322: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

C-eVSM

• Context-aware User Profiles

Musto, Cataldo, et al. "Combining distributional semantics and entity linking for context-aware content-based

recommendation." International Conference on User Modeling, Adaptation, and Personalization. UMAP 2014

C-WRI(u,ck,v

j) = α * WRI(u) + (1-α) * context(u,c

k,v

j)

eVSM

profile

A Vector representing value v for

context c is introduced (e.g.

company=friends)

Linear Combination

(a=1 eVSM no context

taken into account)

Page 323: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

C-eVSM

• Context-aware User Profiles

Musto, Cataldo, et al. "Combining distributional semantics and entity linking for context-aware content-based

recommendation." International Conference on User Modeling, Adaptation, and Personalization. UMAP 2014

C-WRI(u,ck,v

j) = α * WRI(u) + (1-α) * context(u,c

k,v

j)

eVSM

profile

A Vector representing value v for

context c is introduced (e.g.

company=friends)

Why this

formula?

Page 324: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

C-eVSM

• Why this formula?

Musto, Cataldo, et al. "Combining distributional semantics and entity linking for context-aware content-based

recommendation." International Conference on User Modeling, Adaptation, and Personalization. UMAP 2014

Insight: it exists a set of terms that is more descriptive

of items relevant in that specific context

for a romantic dinner, e.g. candlelight, seaview, violin

Page 325: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

C-eVSM

• Why this formula?

Musto, Cataldo, et al. "Combining distributional semantics and entity linking for context-aware content-based

recommendation." International Conference on User Modeling, Adaptation, and Personalization. UMAP 2014

Page 326: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

C-eVSM

• Why this formula?

Musto, Cataldo, et al. "Combining distributional semantics and entity linking for context-aware content-based

recommendation." International Conference on User Modeling, Adaptation, and Personalization. UMAP 2014

Thanks to Distributional Semantics Models it is

possible to build a vector-space representation of the

context which emphasize the importance of those

terms, since they are more used ( more important) in

that specific contextual setting.

Page 327: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

C-eVSM

Musto, Cataldo, et al. "Combining distributional semantics and entity linking for context-aware content-based

recommendation." International Conference on User Modeling, Adaptation, and Personalization. UMAP 2014

Page 328: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

C-eVSM

Musto, Cataldo, et al. "Combining distributional semantics and entity linking for context-aware content-based

recommendation." International Conference on User Modeling, Adaptation, and Personalization. UMAP 2014

Page 329: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

C-eVSM

Musto, Cataldo, et al. "Combining distributional semantics and entity linking for context-aware content-based

recommendation." International Conference on User Modeling, Adaptation, and Personalization. UMAP 2014

Page 330: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

C-eVSM

Musto, Cataldo, et al. "Combining distributional semantics and entity linking for context-aware content-based

recommendation." International Conference on User Modeling, Adaptation, and Personalization. UMAP 2014

Entities are better than simple keywords!

Selection of Results

Page 331: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

C-eVSM

Incorporating contextual information in recommender systems using a multi-

dimensional approach

Compared to Context-aware Collaborative Filtering (CACF)

[*] algorithm: better in 7 contextual segments

Page 332: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Semantic representations

Explicit (Exogenous)

Semantics

Implicit (Endogenous)

Semantics

Explicit

Semantic

Analysis

Random

Indexing

…… Word2Vec

Distributional

semantic models

Introduce semantics by

mapping the features

describing the item with

semantic concepts

Introduce semantics

by linking

the Item to

a knowledge graph

Page 333: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Text Categorization [Gabri09] experiments on diverse datasets

Semantic relatedness of

words and texts [Gabri09] cosine similarity between vectors of ESA concepts

Information Retrieval [Egozi08, Egozi11] ESA-based IR algorithm enriching documents and queries

ESA effectively used for

[Gabri09] E. Gabrilovich and S. Markovitch. Wikipedia-based Semantic Interpretation for Natural Language Processing. Journal of Artificial

Intelligence Research 34:443-498, 2009.

[Egozi08] Ofer Egozi, Evgeniy Gabrilovich, Shaul Markovitch: Concept-Based Feature Generation and Selection for Information Retrieval.

AAAI 2008, 1132-1137, 2008.

[Egozi11] Ofer Egozi, Shaul Markovitch, Evgeniy Gabrilovich. Concept-Based Information Retrieval using Explicit Semantic Analysis.

ACM Transactions on Information Systems 29(2), April 2011.

370

what about ESA for Information Filtering?

Page 334: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Electronic Program Guides

problem

description of TV shows too short or

poorly meaningful to feed a

content-based recommendation algorithm

solution

Explicit Semantic Analysis exploited to obtain an

enhanced representation

[Musto12] C. Musto, F. Narducci, P. Lops, G. Semeraro, M. de Gemmis, M. Barbieri, J. H. M. Korst, V. Pronk, and R. Clout.

Enhanced semantic tv-show representation for personalized electronic program guides. UMAP 2012, pp. 188–199.

Springer, 2012

372

Page 335: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Electronic Program Guides

[Musto12] C. Musto, F. Narducci, P. Lops, G. Semeraro, M. de Gemmis, M. Barbieri, J. H. M. Korst, V. Pronk, and R. Clout.

Enhanced semantic tv-show representation for personalized electronic program guides. UMAP 2012, pp. 188–199. Springer, 2012

373

Page 336: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Electronic Program Guides

[Musto12] C. Musto, F. Narducci, P. Lops, G. Semeraro, M. de Gemmis, M. Barbieri, J. H. M. Korst, V. Pronk, and R. Clout.

Enhanced semantic tv-show representation for personalized electronic program guides. UMAP 2012, pp. 188–199. Springer, 2012

374

Page 337: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Electronic Program Guides

[Musto12] C. Musto, F. Narducci, P. Lops, G. Semeraro, M. de Gemmis, M. Barbieri, J. H. M. Korst, V. Pronk, and R. Clout.

Enhanced semantic tv-show representation for personalized electronic program guides. UMAP 2012, pp. 188–199. Springer, 2012

375

Wikipedia Articles related to the

TV show are added to the

description

Page 338: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Electronic Program Guides

[Musto12] C. Musto, F. Narducci, P. Lops, G. Semeraro, M. de Gemmis, M. Barbieri, J. H. M. Korst, V. Pronk, and R. Clout.

Enhanced semantic tv-show representation for personalized electronic program guides. UMAP 2012, pp. 188–199. Springer, 2012

376

user profile tv show

motogp

sports

motorbike

...

competition

2012 Superbike

Italian Grand Prix

Page 339: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Electronic Program Guides

[Musto12] C. Musto, F. Narducci, P. Lops, G. Semeraro, M. de Gemmis, M. Barbieri, J. H. M. Korst, V. Pronk, and R. Clout.

Enhanced semantic tv-show representation for personalized electronic program guides. UMAP 2012, pp. 188–199. Springer, 2012

377

user profile tv show

motogp

sports

motorbike

...

competition

2012 Superbike

Italian Grand Prix

X No matching!

Page 340: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Electronic Program Guides

[Musto12] C. Musto, F. Narducci, P. Lops, G. Semeraro, M. de Gemmis, M. Barbieri, J. H. M. Korst, V. Pronk, and R. Clout.

Enhanced semantic tv-show representation for personalized electronic program guides. UMAP 2012, pp. 188–199. Springer, 2012

378

user profile tv show

2012 Superbike

Italian Grand Prix

motogp

superbike

sports

motorbike

formula 1

competition

Through ESA we can

add new features to

the profile and we

can improve the

overlap between

textual description

Page 341: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Electronic Program Guides

[Musto12] C. Musto, F. Narducci, P. Lops, G. Semeraro, M. de Gemmis, M. Barbieri, J. H. M. Korst, V. Pronk, and R. Clout.

Enhanced semantic tv-show representation for personalized electronic program guides. UMAP 2012, pp. 188–199. Springer, 2012

379

user profile tv show

2012 Superbike

Italian Grand Prix

motogp

superbike

sports

motorbike

formula 1

competition

Matching!

Page 342: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Electronic Program Guides

380

results on Aprico.tv data

The more Wikipedia Concepts are added to the textual description

of the items (eBOW+60), the best the precision of the algorithm

eBOW = Bag of Words + Wikipedia Concepts

Page 343: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Semantics User Profiles

• Research Question

• Is it possible to exploit semantic representation

techniques to improve the quality of user profiles?

• Methodology

• Build User Profiles by extracting data available from social

networks

• Process User Profiles through Semantics-aware

Techniques

• Evaluate the effectiveness of the profiles

• Accuracy

• Transparency

• Serendipity

Narducci, F., Musto, C., Semeraro, G., Lops, P., & de Gemmis, M.

(2013, August). Exploiting big data for enhanced representations

in content-based recommender systems. In International

Conference on Electronic Commerce and Web Technologies (pp. 182-

193). Springer Berlin Heidelberg.

Page 344: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Semantics User Profiles

• Techniques

• Keyword-based Profiles

• Entity Linking (Tag.me)

• Explicit Semantic Analysis

• Scenario I’m in trepidation for my first riding lesson!, I’m really anxious

for the soccer match :( , This summer I will flight by Ryanair to London!, Ryanair

really cheapest company!, Ryanair lost my luggage :(, This summer holidays are

really amazing!, Total relax during these holidays!

Page 345: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Semantics User Profiles

Results – Experiment 1

• User Study involving 63 users • Twitter and Facebook as Social Networks

• Users were provided with a set of personalized news

• Answers gathered through User Feedback on recommended news

Page 346: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Semantics User Profiles

Results – Experiment 1

• User Study involving 63 users • Twitter and Facebook as Social Networks

• Users were provided with a set of personalized news

• Answers gathered through User Feedback on recommended news

No relevance feedback

Avg.

rating

Min

rating

Max

rating

Std.

Dev.

1.49 0 5 1.12

1.89 0 5 1.47

2.86 1 5 1.30

2.59 0 5 1.37

Page 347: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Semantics User Profiles

Transparency Serendipity

KEYWORDS 1.33 0 3 0.65 0.42 0 2 0.57

TAG.ME 3.88 2 5 0.82 0.54 0 2 0.61

ESA 1.16 0 4 1.00 3.24 0 5 1.24

Results – Experiment 2

• User Study involving 51 users • Twitter and Facebook as Social Networks

• Users were provided with a tag cloud describing their profile

• Answers gathered through Questionnaires

Page 348: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

What?

Cross-lingual recommender systems

Page 349: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Cross-lingual access: motivations

relevant information exist in different languages

Web is becoming more and more multilingual

users are becoming increasing polyglot

more than half of the world population bilingual

cross-language recommender systems

can likely increase the number of tail products suggested (25-30% of sales for online stores)

387

Page 350: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Cross-lingual access: problems

Vocabulary mismatch

use of different languages

extreme case of vocabulary mismatch

388

Page 351: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Semantic analytics for cross-lingual access

Sense-based representations

inherently multilingual

terms in each specific language change, while

concepts (word meanings) remain the same

across different languages

match between

items and user profiles at a conceptual level

Page 352: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

(Cross-lingual) Concept-based representations

ESA

Page 353: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

MultiWordNet

392

Sense-based representations

Word Sense Disambiguation (JIGSAW) based

on Multiwordnet as sense repository

multilingual lexical database that supports

English, Italian, Spanish, Portuguese, Hebrew,

Romanian, Latin

alignment between synsets in the different languages

semantic relations imported and preserved

Page 354: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

MultiWordNet

393

Page 355: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

394

Bag of MultiWordNet synsets

bag of

MultiWordNet

synsets

bag of

MultiWordNet

synsets

Match

between

senses

Page 356: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Some results

cross-language movie recommendation scenario

profiles learned from ENG/ITA descriptions

recommendation provided on ITA/ENG descriptions

MovieLens dataset, F1 measure, Wikipedia source for descriptions

395

Page 357: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Cross-lingual representation: Tagme

Page 358: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Cross-language links

397

Page 359: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Cross-lingual representation: Tagme

Film d’azione Lana e Lilly Wachowskis

Laurence Fishburne Keanu Reeves

distopia …

action film The Wachowskis

Keanu Reeves Laurence Fishburne

Dystopia Perception

Carrie-Anne Moss Cyberspace

Page 360: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Text Language L1

Text Language Ln

Text Language L2

Translated Text PIVOT LANGUAGE

C1 C2 C3 … … Cn

t1

t2

tk

ESA MATRIX PIVOT LANGUAGE

Wikipedia articles

Term

s o

ccu

rrin

g in

W

ikip

edia

art

icle

s

TF-IDF

TRANSLATION PROCESS

Cross-lingual representation: ESA

Translation-based ESA

Page 361: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

ESA MATRIX L1 C1-L1 C2-L1 … Cn-L1

ESA MATRIX L2 C1-L2 C2-L2 … Cn-L2

ESA MATRIX LN C1-LN C2-LN … Cn-LN

ESA MATRIX LN C1-LN C2-LN … Cn-LN

Text

La

ngu

age

L1

Text

La

ngu

age

Ln

Text

La

ngu

age

L2

Cross-lingual representation: ESA

Cross-language ESA

Page 362: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Cross-lingual representation: Babelfy

Page 363: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Preliminary results

effectiveness of knowledge-based strategies to provide

cross-lingual recommendations

MovieLens and DBbook datasets

F1 measure, Wikipedia source for descriptions

English and Italian languages

406

Page 364: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Cross-lingual representation: Distributional models

distribution of the terms

(almost) the same in different languages

o cross-lingual representation comes with

no costs thanks to the distributional hypothesis

Page 365: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Distributional Semantics

…. is also multilingual! (Recap)

Italian WordSpace English WordSpace

glass

spoon

dog

beer

cucchiaio

cane

birra

bicchiere

The position in the space can be slightly different, but the

relations similarity between terms still hold

Page 366: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Distributional Semantics

Multilingual DocSpace

Italian DocSpace English DocSpace

D2_L1

D3_L1

D4_L1

D1_L1

D7_L2 D8_L2

D5_L2

D6_L2

By following the same procedure we can

obtain a multilingual DocSpace

Different documents in different languages are represented in a uniform space

Page 367: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Distributional Semantics

…. is also multilingual!

Italian DocSpace English DocSpace

D2_L1

D3_L1

D4_L1

D1_L1

D7_L2 D8_L2

D5_L2

D6_L2

By following the same procedure we can

obtain a multilingual DocSpace

How to build a cross-lingual recommender?

Page 368: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Distributional Semantics

…. is also multilingual!

Italian DocSpace English DocSpace

D2_L1

D3_L1

D4_L1

D1_L1

D7_L2 D8_L2

D5_L2

D6_L2

P1

We build a user profile in L1 (Italian DocSpace)

How to build a cross-lingual recommender?

Page 369: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Distributional Semantics

…. is also multilingual!

Italian DocSpace English DocSpace

D2_L1

D3_L1

D4_L1

D1_L1

D7_L2 D8_L2

D5_L2

D6_L2

P1 P1

We build a user profile in L1 (Italian DocSpace)

We can «move» the profile in L2 (EnglishDocSpace)

How to build a cross-lingual recommender?

Page 370: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Distributional Semantics

…. is also multilingual!

Italian DocSpace English DocSpace

D2_L1

D3_L1

D4_L1

D1_L1

D7_L2 D8_L2

D5_L2

D6_L2

We build a user profile in L1 (Italian DocSpace)

We can «move» the profile in L2 (EnglishDocSpace) We can use similarity measures to suggest items in different language

How to build a cross-lingual recommender?

P1 P1

Page 371: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Some results

effectiveness of knowledge-based strategies to provide

cross-lingual recommendations

MovieLens dataset

F1 measure

comparable results (gap not statistically significant)

414

C. Musto, F. Narducci, P. Basile, P. Lops, M. de Gemmis, G. Semeraro:"Cross-language information filtering: Word sense disambiguation

vs. distributional models." AI*IA 2011: 250-261

Page 372: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

What?

Explaining recommendations

Page 373: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Graph-based RecSys: explanations

are properties useful for providing

explanations?

advantages readability of properties

disadvantages difficult to generate

natural language

explanations

selection of the properties

to include in the

explanation

Page 374: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Graph-based RecSys: explanations

main intuition of the EXPLOD system

o graph connecting user preferences and

recommendations through a set of

LOD-based properties

o scoring and ranking properties

Cataldo Musto, Fedelucio Narducci , Pasquale Lops, Marco de Gemmis, Giovanni Semeraro:

. RecSys 2016, to appear

Page 375: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Scoring properties in EXPLOD

items in the

user profile

items in the

recommendation list property

number of edges

connecting the propertyc

with the items in

the user profile

number of edges

connecting the propertyc

with the items in

the recommendation set

Graph-based RecSys: explanations

Page 376: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Scoring properties in EXPLOD

weighting factors

adaptation of the

Inverse

Document

Frequency

Graph-based RecSys: explanations

Page 377: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Scoring properties in EXPLOD

higher score to properties highly connected to the items in

both the user profile and the recommendation list,

and which are not common.

Graph-based RecSys: explanations

Page 378: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

From properties to Natural Language explanation

I recommend you The Dark Knight because you often like Films

shot in the United Kingdom as Inception and Forrest Gump. In

addition, you sometimes like Films produced by Christopher

Nolan as Inception and Screenplays by Christopher Nolan as

Inception

Graph-based RecSys: explanations

Page 379: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

From properties to Natural Language explanation

o top-ranked properties to fill in a template-based structure of

the explanation

o LOD cloud-properties mapped to natural language

expressions

o adverbs obtained by mapping the normalized occurrence

of that property to a different range of the score

Graph-based RecSys: explanations

Page 380: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Graph-based RecSys: explanations

preliminary results

aim question

transparency I understood why this movie

was recommended to me

o topic

o director o distributor

o music composer

persuasion

The explanation made the

recommendation more

convincing

o awards

o director

o location

o producer

engagement

The explanation helped me

discover new information about

this movie

o writer

o director

o producer

o distributor

trust

The explanation increased my

trust in the recommender

system

o awards o composer

o producer

o topic

effectiveness I like this recommendation o director

o writer

o location o composer

user study involving 308 subjects

Page 381: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

What?

Semantic analysis of social streams

Page 382: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Social Networks

novel data silos

Page 383: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Goal

To design a unique framework implementing

a pipeline of semantic analysis techniques

Page 384: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Our Contribution: CrowdPulse

Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams

Information Systems, 54 (2015): 127-146.

Page 385: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Our Contribution: CrowdPulse

Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams

Information Systems, 54 (2015): 127-146.

Page 386: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

CrowdPulse Workflow

Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams

Information Systems, 54 (2015): 127-146.

Page 387: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

CrowdPulse Workflow

Each «project» is an instance of such a workflow

Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams

Information Systems, 54 (2015): 127-146.

Page 388: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

CrowdPulse Workflow

Each «project» is an instance of such a workflow

Every module is set according to the needs of the scenario

Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams

Information Systems, 54 (2015): 127-146.

Page 389: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

CrowdPulse

Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams

Information Systems, 54 (2015): 127-146.

Page 390: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

CrowdPulse

Solution:

Entity Linking Algorithms

Input: textual content

Output: identification and

disambiguation of the

entities mentioned in the

text.

Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams

Information Systems, 54 (2015): 127-146.

Page 391: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

CrowdPulse

Solution:

Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams

Information Systems, 54 (2015): 127-146.

Page 392: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

CrowdPulse

Solution:

Overall sentiment: :-(

We implemented:

• A lexicon-based approach, which assigns a sentiment

according to the words in the social content

• A supervised classification algorithm, which exploits

labeled examples to learn a classification model

Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams

Information Systems, 54 (2015): 127-146.

Page 393: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Supervised learning

Unsupervised learning

Linguistic Analysis (Distributional Models)

classification, regression tasks

clustering

building word spaces, similarity between

concepts, analysis of terms usage, etc.

CrowdPulse

Step 4: Domain-Specific Processing

CrowdPulse natively supports all these methodologies

The choice is typically scenario-dependent

Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams

Information Systems, 54 (2015): 127-146.

Page 394: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

CrowdPulse

Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams

Information Systems, 54 (2015): 127-146.

Page 395: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

CrowdPulse Workflow

Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams

Information Systems, 54 (2015): 127-146.

(recap)

Page 396: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams

Information Systems, 54 (2015): 127-146.

Page 397: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Research Question:

Is it possible to extract and process social

media to monitor in real time people feelings,

opinions and sentiments about the current

state of the social capital of L’Aquila?

Use Case

L’Aquila Social Urban Network

Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams

Information Systems, 54 (2015): 127-146.

Page 398: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

4

4

1

Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams

Information Systems, 54 (2015): 127-146.

Use Case

L’Aquila Social Urban Network

Page 399: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Use Case

L’Aquila Social Urban Network

Heuristics:

- Twitter users (local newspapers, mention to politicians)

- Twitter content+geo (50km around l’Aquila and/or specific hashtags as #laquila

#earthquake, etc)

Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams

Information Systems, 54 (2015): 127-146.

Page 400: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Use Case

L’Aquila Social Urban Network

Extracted Content

Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams

Information Systems, 54 (2015): 127-146.

Page 401: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Use Case

L’Aquila Social Urban Network

Semantic and Sentiment Analysis of Extracted Content

Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams

Information Systems, 54 (2015): 127-146.

Page 402: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams

Information Systems, 54 (2015): 127-146.

Use Case

L’Aquila Social Urban Network

Domain-specific Task

Given a piece of social content,

we have to classify it against the social indicators

Page 403: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams

Information Systems, 54 (2015): 127-146.

Use Case

L’Aquila Social Urban Network

Classification Task

Build a supervised classification model

for each social indicator

Page 404: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams

Information Systems, 54 (2015): 127-146.

Use Case

L’Aquila Social Urban Network

Input: Tweet + Social Indicators Classification Model

Page 405: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams

Information Systems, 54 (2015): 127-146.

Use Case

L’Aquila Social Urban Network

Output: Social Indicators and Sentiment conveyed

Page 406: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams

Information Systems, 54 (2015): 127-146.

Use Case

L’Aquila Social Urban Network

The «score» of a social Indicator is the sum of the

Sentiment conveyed by the Tweets

Page 407: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams

Information Systems, 54 (2015): 127-146.

Use Case

L’Aquila Social Urban Network

Overall score of the social indicators between March and August 2014

Page 408: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams

Information Systems, 54 (2015): 127-146.

Use Case

L’Aquila Social Urban Network

COMMUNITY PROMOTER

DEFINES SOME INITIATIVES TO EMPOWER THE SOCIAL CAPITAL

MONITORS THE STATE OF THE SOCIAL INDICATORS

Connecting Real Communities

With Virtual Communities

Page 409: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams

Information Systems, 54 (2015): 127-146.

Use Case

The Italian Hate Map

http://users.humboldt.edu/mstephens/hate/hate_map.html Inspired by the

Hate Map built by the

Humboldt University

Is it possible to exploit

techniques for semantic

analysis of social media to

detect intolerant content

posted on social networks and

identify the most at-risk areas

of the Italian country?

Research Question

Page 410: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams

Information Systems, 54 (2015): 127-146.

Use Case

The Italian Hate Map

Heuristics: Twitter content

- 76 intolerant seed terms, defined by the psychologists teams

- 5 intolerance dimensions: violence (against women), racism,

homophobia, disability, anti-semitism

Page 411: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams

Information Systems, 54 (2015): 127-146.

Use Case

The Italian Hate Map

Many non-intolerant Tweets are extracted!

X X

Page 412: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams

Information Systems, 54 (2015): 127-146.

Use Case

The Italian Hate Map

Solution: replace a simple keyword-based approach

with a supervised classification model

X X

Page 413: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams

Information Systems, 54 (2015): 127-146.

Use Case

The Italian Hate Map

Entities and Wikipedia categories used as features

Page 414: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams

Information Systems, 54 (2015): 127-146.

Use Case

The Italian Hate Map

Sample of Tweet is manually labeled to build classification models

Page 415: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams

Information Systems, 54 (2015): 127-146.

Use Case

The Italian Hate Map

Page 416: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams

Information Systems, 54 (2015): 127-146.

Use Case

The Italian Hate Map

Tweet about an Italian ministry

Tweet about an Italian football player

Violence against women Disability

Racism Homophobia

Page 417: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams

Information Systems, 54 (2015): 127-146.

Use Case

The Italian Hate Map

Given the maps and given the output of the linguistic analysis of

intolerant Tweets (co-occurrences between terms, time lapse, etc.), the

psychologists team defined some guidelines to tackle and prevent

intolerant behaviors.

These guidelines have been freely distributed to public

administrations on early 2015.

Page 418: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams

Information Systems, 54 (2015): 127-146.

CrowdPulse

Lessons Learned

Pipeline of state of the art techniques Entity Linking, Sentiment Analysis, Machine Learning, Data Visualization

Use Cases.

L’Aquila Social Urban Network

The Italian Hate Map

REAL-TIME SEMANTIC CONTENT ANALYSIS

1. 2.

The outcomes of both use cases showed that very

complex phenomena can be analyzed in a totally new

way, thanks to the huge availability of textual data

Page 419: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Readings

Semantics-aware Recommender Systems

o C. Musto, G. Semeraro, M. de Gemmis, P. Lops: Learning Word Embeddings from Wikipedia for Content-

Based Recommender Systems. ECIR 2016: 729-734

o M. de Gemmis, P. Lops, C. Musto, F.Narducci, G. Semeraro: Semantics-Aware Content-Based

Recommender Systems. Recommender Systems Handbook 2015: 119-159

o C. Musto, G. Semeraro, M. de Gemmis, P. Lops: Word Embedding Techniques for Content-based

Recommender Systems: An Empirical Evaluation. RecSys Posters 2015

o C. Musto, P. Basile, M. de Gemmis, P. Lops, G. Semeraro, S. Rutigliano: Automatic Selection of Linked

Open Data Features in Graph-based Recommender Systems. CBRecSys@RecSys 2015: 10-13

o P. Basile, C. Musto, M. de Gemmis, P. Lops, F. Narducci, G. Semeraro: Content-Based Recommender

Systems + DBpedia Knowledge = Semantics-Aware Recommender Systems. SemWebEval@ESWC

2014: 163-169

o C. Musto, P. Basile, P. Lops, M. de Gemmis, G. Semeraro: Linked Open Data-enabled Strategies for Top-N

Recommendations. CBRecSys@RecSys 2014: 49-56

o C. Musto, G. Semeraro, P. Lops, M. de Gemmis: Combining Distributional Semantics and Entity Linking

for Context-Aware Content-Based Recommendation. UMAP 2014: 381-392

o C. Musto, G. Semeraro, P. Lops, M. de Gemmis: Contextual eVSM: A Content-Based Context-Aware

Recommendation Framework Based on Distributional Semantics. EC-Web 2013: 125-136

o C. Musto, F. Narducci, P. Lops, G. Semeraro, M. de Gemmis, M. Barbieri, J. H. M. Korst, V. Pronk, R. Clout:

Enhanced Semantic TV-Show Representation for Personalized Electronic Program Guides. UMAP

2012: 188-199

o M. Degemmis, P. Lops, G. Semeraro: A content-collaborative recommender that exploits WordNet-based

user profiles for neighborhood formation. User Model. User-Adapt. Interact. 17(3): 217-255 (2007)

o G. Semeraro, M. Degemmis, P. Lops, P. Basile: Combining Learning and Word Sense Disambiguation for

Intelligent User Profiling. IJCAI 2007: 2856-2861

Page 420: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016

Readings

Cross-language Recommender Systems

o C. Musto, F. Narducci, P. Basile, P. Lops, M. de Gemmis, G. Semeraro: Cross-Language Information

Filtering: Word Sense Disambiguation vs. Distributional Models. AI*IA 2011: 250-261

o P. Lops, C. Musto, F. Narducci, M. de Gemmis, P. Basile, G. Semeraro: Cross-Language Personalization

through a Semantic Content-Based Recommender System. AIMSA 2010: 52-60

Explanations

o C. Musto, F. Narducci, P. Lops, M. de Gemmis, G. Semeraro: ExpLOD: a framework for Explaining

Recommendations based on the Linked Open Data cloud. RecSys 2016, to appear

Semantic Analysis of Social Streams

o C. Musto, G. Semeraro, P. Lops, M. de Gemmis: CrowdPulse: A framework for real-time semantic

analysis of social streams. Inf. Syst. 54: 127-146 (2015)