Profiling User Interests on the Social Semantic Web

  • View
    793

  • Download
    3

  • Category

    Science

Preview:

DESCRIPTION

Fabrizio Orlandi's PhD Viva @Insight NUI Galway (ex-DERI) - 31/03/2014. Supervisors: Alexandre Passant and John G. Breslin. Examiners: Fabien Gandon and Stefan Decker

Citation preview

Profiling User Interests on the

Social Semantic WebPh.D. Viva

Fabrizio Orlandi

2

Context: Personalisation

3

Problem

4

Goal

1 – Heterogeneous data sources

SportCEV Volleyball Cup

MusicHeavy Metal

MastodonAtlanta

Microblog?

Challenges

5 / 37

Social Networking

Service?

2 – Lack of provenance

SportCEV Volleyball Cup

MusicHeavy Metal

MastodonAtlanta

Where?Who?

How?

Challenges

6 / 37

What?

3 – Semantics of entities of interest

SportCEV Volleyball Cup

MusicHeavy Metal

MastodonAtlanta

Semantics?Pragmatics

?

Relevance?

Challenges

7 / 37

Research Questions1. Aggregation of Social Web data: How can we aggregate and represent user data distributed across heterogeneous social media systems for profiling user interests?

2. Provenance of data for user profiling: What is the role of provenance on the Social Web and on the Web of Data and how to leverage its potential for user profiling?

3. Semantic enrichment of user profiles and personalisation:

How to combine data from the Social and Semantic Web for enriching user profiles of interests and deploying them to different personalisation tasks?

8 / 37

Research GoalHow can we collect, represent, aggregate, mine, enrich and deploy user profiles of interests on the Social Web for multi-source personalisation?

9 / 37

Methodology

10 / 37

1. Aggregation of Social Web data: How can we aggregate and represent user data distributed across heterogeneous social media systems for profiling user interests?

11 / 37

Aggregation of Social Web Data Modelling solution for Social Web data and user profiles

Based on SIOC, FOAF and extensions

Experiments on wikis

[Orlandi, Passant. WikiSym. ACM. 2010.] 12 / 37

MusicHeavy Metal

MastodonAtlanta

CEV Champions LeagueVolleyball

Semantic WebRDF

“Mastodon is the best heavy metal band from Atlanta… Can’t wait to see them live again!”

“Trentino vs Lugano about to start - Diatec youngster to impress again in CEV Champions League #volleyball”

User likes RDF and SemanticWeb on Facebook

• Natural language processing tools

for entity extraction(Zemanta & Spotlight)

• Frequency + time-decay weighting

schemes

Example

13 / 37

14

Aggregation and Mining of Interests7 types of user profiling strategies:

2 types of DBpedia entities: Categories vs. Resources

2 types of weighting-scheme for category-based methods- Cat1: Interests Weight Propagation- Cat2: Interests Weight Propagation w/ Cat. Discount

2 types of exponential Time Decay function- Short mean lifetime- Long mean lifetime

1 “bag-of-words” (Tag-based) state-of-the-art approach

days120days360

Evaluation User study: 21 users rating their user profiles from Twitter &

Facebook 210 ratings for each of the 7 different profiling methods

Aggregation and Mining of Interests

Resou

rces

Categ

ories

Tags

0

0.2

0.4

0.6

0.8

1

P@10AVG Score

Key findings DBpedia resource-based profiles

outperform Dbpedia category-based and tag-based profiles.

Best strategy: Resources + Frequency & Slow Time Decay weighting scheme

[Orlandi, Breslin, Passant. I-Semantics. ACM. 2012.] 15 / 37

1. Aggregation of Social Web data: How can we aggregate and represent user data distributed across heterogeneous social media systems for profiling user interests?

2. Provenance of data for user profiling: What is the role of provenance on the Social Web and on the Web of Data and how to leverage its potential for user profiling?

16 / 37

Motivation: use of provenance information as core of the profiling heuristics to improve mining of user interests and semantic enrichment Data Provenance as the history, the origins and the evolution of data

Who created/modified it? When? What is the content? Where is it located?How and Why was it created? Which tools and processes were used?

Provenance of Data

Provenance as the “bridge” between Social Web and Web of Data

e.g. Wikipedia/DBpedia

17 / 37

Use Case: Provenance on Wikis

Provenance on the Social Web for the Web of Data

A semantic model to represent provenance information in wikis A software architecture to extract provenance from Wikipedia An application that uses and exposes provenance data to compute

measures and statistics on Wikipedia articles

[Orlandi, Champin, Passant. SWPM at ISWC. 2010.] 18 / 37

Provenance on the Social Web

19 / 37

Using detailed provenance information extracted from Wikipedia we are able to compute provenance also for DBpedia resources.

Analyzing the “diffs” between the revisions of Wikipedia articles and the users' contributions we identify the edits on Wikipedia that resulted in a change in the related DBpedia resource.

We built a model and an application that shows provenance information for each triple on DBpedia that is the result of users' edits on Wikipedia.

Provenance on the Web of Data for the Social Web

Use Case: Provenance on DBpedia

[Orlandi, Passant. Journal of Web Semantics. 2011] 20 / 37

Semantic provenance in DBpedia

• Using detailed provenance information extracted from Wikipedia we are able to compute provenance also for DBpedia resources.

• Analyzing the “diffs” between the revisions of Wikipedia articles and the users' contributions we identify the edits on Wikipedia that resulted in a change in the related DBpedia resource.

• We built an application that shows provenance information for each triple on DBpedia that is the result of users' edits on Wikipedia.

21 / 37

Provenance for Profiling InterestsDifferent provenance features to support interest mining

Not only: authorship and temporal features But also: social media source, object, type of action,

22 / 37

Provenance for Profiling InterestsUser study: 27 users on Twitter and FacebookThey evaluated their aggregated and provenance-aware user

profilesSocial Feature Score

E FB education 4.62E FB workplace 4.60I TW followees’ posts 4.03I FB checkins 3.95E FB interests 3.95E FB likes 3.92I TW favourite posts 3.76I TW retweets 3.76I TW posts 3.61I TW replies 3.52I FB status updates 3.50I FB media actions 3.24I FB comments 2.56I FB direct posts 2.37

AVG Scores from 1 to 5

Locations, explicit profile info and also followees’ posts provide better accuracy for mining user interests

Interests stated explicitly by users produce user profiles 20% more accurate than implicitlySeries1

1 2 3 4 5

[Orlandi, Kapanipathi, Sheth, Passant. IEEE/ACM WI. 2013] 23 / 37

2. Provenance of data for user profiling: What is the role of provenance on the Social Web and on the Web of Data and how to leverage its potential for user profiling?

3. Semantic enrichment of user profiles and personalisation:

How to combine data from the Social and Semantic Web for enriching user profiles of interests and deploying them to different personalisation tasks?

24 / 37

Semantic Enrichment

db:Montreal

db:Quebec

db:Gilles_Villeneuve

db:Ferrari db:Formula_1dbo:wikiPageWikiLink

dbo:wikiPageWikiLink

dbo:birthPlace

dbp:largestcity

25 / 37

Music

Heavy Metal

Mastodon (band)

CEV Champions League

Volleyball

Semantic Web

RDF

ExampleAre all the extracted entities useful for personalisation?

How are concepts/entities being used on the Social Web? (Pragmatics)

Very abstract, very popular

Specific and time-dependent on events, etc.

Specific and time-dependent on events, etc.

Abstract and not popular

Abstract and popular

Specific and not popular

Very popular

26 / 37

Characterising Concepts of Interest

27

Novel measures for the characterisation and semantic expansion of concepts of interest Enrichment of entity-based user profiles for personalisation

Popularity of concepts on the Social Web (using Twitter) How popular an entity is on the Social Web? How frequently is it

mentioned/used at that point of time?

Trend and temporal dynamics (using Wikipedia page views) The trend and evolution of the frequency of mentions of an entity on

the Social Web (i.e. popularity over time)

Specificity and categorisation of entities of interest (using LOD)

The level of abstraction that an entity has in a common conceptual schema shared by humans

27 / 37

Requirements

Use case: real-time personalisation of Social Web streams

1. Real-time computation of the dimensions

2. Results constantly up to date with the real world

3. Knowledge base and domain independent approach

28 / 37

Popularity?

[Orlandi, Kapanipathi, Sheth, Passant. IEEE/ACM, WI 2013]

Characterising Concepts of Interest

Trendy and Stable?

Specificity?

29 / 37

Real-time Semantic Personalisation of Social Web Streams

“SPOTS”: A methodology for real-time personalisation of any large social stream

Automatic dynamic generation of multi-source user profiles of interests.

Semantic enrichment of concepts of interest with provenance and Linked Data info.

Ranking and selection of the interests according to their relevance for the user and for the personalisation use case.

Informativeness measures for posts to filter a large social stream.

Evaluation of the approach on the public Twitter stream

Against Twitter #Discover: from 192% increase in accuracy30 / 37

31[Kapanipathi, Orlandi, Sheth, Passant. SPIM at ISWC 2011.]

31

Real-time Semantic Personalisation of Social Web Streams

Evaluation on SPOTSUser study to evaluate the impact of the enrichment on a personalisation use case

27 users, 800 user ratings collectedMain outcome:

Popularity and Temporal Dynamics are useful measures for real-time personalisation

SPOTS Improvement*No Enrichment ---

Trendy +29%Not Stable +26%

At Least 2 Features +9%

Specific + Not Popular +5%

* In recommendations accuracy over non-enriched profiles 32 / 37

Evaluation on User ProfilesUser study to evaluate the impact of the enrichment on user profiles according to users’ judgement

27 users, 800 user ratings collectedMain outcome:

Specificity is more useful than popularity measures according to user perception

User Profiles Improvement*No Enrichment ---

Not Specific + Not Popular +13%

Not Specific +8%Not Popular +2%

Stable + Not Trendy +1%

* In profile accuracy over non-enriched profiles 33 / 37

Summary

34 / 37[Orlandi, UMAP 2012]

Summary We provide and evaluate a complete methodology for

profiling user interests across multiple sources on the Social Web Collect, Represent, Aggregate, Mine, Enrich, Deploy

Aggregation of user data: • Semantic representation of Social Web content and user activities

Provenance of data:• Improves profiling accuracy and connects Social Web and WoD

Mining of user interests:• Provenance + Linked Data/Entity-based strategies + time decay,

outperform traditional “bag-of-words” strategies and facilitate enrichment

Semantic enrichment:• Improves profiling accuracy and it is necessary for the deployment of

the profiles in a personalisation use case• Different types of personalisation need different entities of interest

35 / 37

Future Work

Federated Personal Data Manager Privacy-aware, interoperable, autonomous,

user profiling infrastructure

Provenance at Web Scale Necessary to focus on techniques for an easier and less expensive

tracking and management of provenance on the Social Semantic Web

Adaptive Profiling of User Interests Adaptation of the profiling algorithm and strategy according to the

application and the context

36 / 37

Contributions & Dissemination Semantic Web modelling solutions for Social Web data, user

profiles, provenance on the Social Web and Web of Data. A provenance computation framework Novel measures for characterising entities of interest A real-time personalisation system for large Social Web

streams User studies for different profiling strategies, provenance

features and personalisation use-cases A privacy-aware user profile management system

Publications

2 journal, 4 conference, 2 workshop papers

37 / 37

Thanks!

38

39

ContextUser Modelling• The process of representing a user or some of his/her

characteristics (e.g. interests, workplace, location, etc.)

User Profile• A characterisation of a user at a particular point of time

Experiment6 types of user profiles evaluated:

2 types of DBpedia entities

Categories vs. Resources

2 types of weighting-scheme for category-based methodsCat1: Interests Weight PropagationCat2: Interests Weight Propagation w/ Cat. Discount

2 types of exponential Time Decay function

Short mean lifetime

Long mean lifetime

days120

days360

Experiment

6 types of user profiles evaluated:

Cat2

Cat1-120 Cat1-360 Cat2-120 Cat2-

360Res-120 Res-360

Res Cat

Cat1

42

User-based Evaluation

We asked users to rate the top 10 interests generated for each of the 6 profiling strategies Question:

“Please rate how relevant is each concept for representing your personal interests and context…”

Rating: 0 (not at all or don't know), 1 (low), 2, 3, 4, 5 (high)

Rating converted to a (0…10) scale Performance evaluated with:

MRR (Mean Reciprocal Rank)P@10 (Precision at K = 10)

Comparison with a Baseline A traditional approach based on “keyword frequency”

EvaluationOn average for:200 Tweets & 200 Facebook posts, and items.

~106 interests – DBpedia Resources ~720 interests – DBpedia Categories (~7 times)

Statistical significance for:Resources vs. Categories (p<0.05)Any method vs. Baseline (p<0.05)Not for time decay (p~0.2) and Cat1 vs. Cat2

Recommended