Upload
fabrizio-orlandi
View
793
Download
3
Tags:
Embed Size (px)
DESCRIPTION
Fabrizio Orlandi's PhD Viva @Insight NUI Galway (ex-DERI) - 31/03/2014. Supervisors: Alexandre Passant and John G. Breslin. Examiners: Fabien Gandon and Stefan Decker
Citation preview
Profiling User Interests on the
Social Semantic WebPh.D. Viva
Fabrizio Orlandi
2
Context: Personalisation
3
Problem
4
Goal
1 – Heterogeneous data sources
SportCEV Volleyball Cup
MusicHeavy Metal
MastodonAtlanta
…
Microblog?
Challenges
5 / 37
Social Networking
Service?
2 – Lack of provenance
SportCEV Volleyball Cup
MusicHeavy Metal
MastodonAtlanta
…
Where?Who?
How?
Challenges
6 / 37
What?
3 – Semantics of entities of interest
SportCEV Volleyball Cup
MusicHeavy Metal
MastodonAtlanta
…
Semantics?Pragmatics
?
Relevance?
Challenges
7 / 37
Research Questions1. Aggregation of Social Web data: How can we aggregate and represent user data distributed across heterogeneous social media systems for profiling user interests?
2. Provenance of data for user profiling: What is the role of provenance on the Social Web and on the Web of Data and how to leverage its potential for user profiling?
3. Semantic enrichment of user profiles and personalisation:
How to combine data from the Social and Semantic Web for enriching user profiles of interests and deploying them to different personalisation tasks?
8 / 37
Research GoalHow can we collect, represent, aggregate, mine, enrich and deploy user profiles of interests on the Social Web for multi-source personalisation?
9 / 37
Methodology
10 / 37
1. Aggregation of Social Web data: How can we aggregate and represent user data distributed across heterogeneous social media systems for profiling user interests?
11 / 37
Aggregation of Social Web Data Modelling solution for Social Web data and user profiles
Based on SIOC, FOAF and extensions
Experiments on wikis
[Orlandi, Passant. WikiSym. ACM. 2010.] 12 / 37
MusicHeavy Metal
MastodonAtlanta
CEV Champions LeagueVolleyball
Semantic WebRDF
“Mastodon is the best heavy metal band from Atlanta… Can’t wait to see them live again!”
“Trentino vs Lugano about to start - Diatec youngster to impress again in CEV Champions League #volleyball”
User likes RDF and SemanticWeb on Facebook
• Natural language processing tools
for entity extraction(Zemanta & Spotlight)
• Frequency + time-decay weighting
schemes
Example
13 / 37
14
Aggregation and Mining of Interests7 types of user profiling strategies:
2 types of DBpedia entities: Categories vs. Resources
2 types of weighting-scheme for category-based methods- Cat1: Interests Weight Propagation- Cat2: Interests Weight Propagation w/ Cat. Discount
2 types of exponential Time Decay function- Short mean lifetime- Long mean lifetime
1 “bag-of-words” (Tag-based) state-of-the-art approach
days120days360
Evaluation User study: 21 users rating their user profiles from Twitter &
Facebook 210 ratings for each of the 7 different profiling methods
Aggregation and Mining of Interests
Resou
rces
Categ
ories
Tags
0
0.2
0.4
0.6
0.8
1
P@10AVG Score
Key findings DBpedia resource-based profiles
outperform Dbpedia category-based and tag-based profiles.
Best strategy: Resources + Frequency & Slow Time Decay weighting scheme
[Orlandi, Breslin, Passant. I-Semantics. ACM. 2012.] 15 / 37
1. Aggregation of Social Web data: How can we aggregate and represent user data distributed across heterogeneous social media systems for profiling user interests?
2. Provenance of data for user profiling: What is the role of provenance on the Social Web and on the Web of Data and how to leverage its potential for user profiling?
16 / 37
Motivation: use of provenance information as core of the profiling heuristics to improve mining of user interests and semantic enrichment Data Provenance as the history, the origins and the evolution of data
Who created/modified it? When? What is the content? Where is it located?How and Why was it created? Which tools and processes were used?
Provenance of Data
Provenance as the “bridge” between Social Web and Web of Data
e.g. Wikipedia/DBpedia
17 / 37
Use Case: Provenance on Wikis
Provenance on the Social Web for the Web of Data
A semantic model to represent provenance information in wikis A software architecture to extract provenance from Wikipedia An application that uses and exposes provenance data to compute
measures and statistics on Wikipedia articles
[Orlandi, Champin, Passant. SWPM at ISWC. 2010.] 18 / 37
Provenance on the Social Web
19 / 37
Using detailed provenance information extracted from Wikipedia we are able to compute provenance also for DBpedia resources.
Analyzing the “diffs” between the revisions of Wikipedia articles and the users' contributions we identify the edits on Wikipedia that resulted in a change in the related DBpedia resource.
We built a model and an application that shows provenance information for each triple on DBpedia that is the result of users' edits on Wikipedia.
Provenance on the Web of Data for the Social Web
Use Case: Provenance on DBpedia
[Orlandi, Passant. Journal of Web Semantics. 2011] 20 / 37
Semantic provenance in DBpedia
• Using detailed provenance information extracted from Wikipedia we are able to compute provenance also for DBpedia resources.
• Analyzing the “diffs” between the revisions of Wikipedia articles and the users' contributions we identify the edits on Wikipedia that resulted in a change in the related DBpedia resource.
• We built an application that shows provenance information for each triple on DBpedia that is the result of users' edits on Wikipedia.
21 / 37
Provenance for Profiling InterestsDifferent provenance features to support interest mining
Not only: authorship and temporal features But also: social media source, object, type of action,
…
22 / 37
Provenance for Profiling InterestsUser study: 27 users on Twitter and FacebookThey evaluated their aggregated and provenance-aware user
profilesSocial Feature Score
E FB education 4.62E FB workplace 4.60I TW followees’ posts 4.03I FB checkins 3.95E FB interests 3.95E FB likes 3.92I TW favourite posts 3.76I TW retweets 3.76I TW posts 3.61I TW replies 3.52I FB status updates 3.50I FB media actions 3.24I FB comments 2.56I FB direct posts 2.37
AVG Scores from 1 to 5
Locations, explicit profile info and also followees’ posts provide better accuracy for mining user interests
Interests stated explicitly by users produce user profiles 20% more accurate than implicitlySeries1
1 2 3 4 5
[Orlandi, Kapanipathi, Sheth, Passant. IEEE/ACM WI. 2013] 23 / 37
2. Provenance of data for user profiling: What is the role of provenance on the Social Web and on the Web of Data and how to leverage its potential for user profiling?
3. Semantic enrichment of user profiles and personalisation:
How to combine data from the Social and Semantic Web for enriching user profiles of interests and deploying them to different personalisation tasks?
24 / 37
Semantic Enrichment
db:Montreal
db:Quebec
db:Gilles_Villeneuve
db:Ferrari db:Formula_1dbo:wikiPageWikiLink
dbo:wikiPageWikiLink
dbo:birthPlace
dbp:largestcity
25 / 37
Music
Heavy Metal
Mastodon (band)
CEV Champions League
Volleyball
Semantic Web
RDF
ExampleAre all the extracted entities useful for personalisation?
How are concepts/entities being used on the Social Web? (Pragmatics)
Very abstract, very popular
Specific and time-dependent on events, etc.
Specific and time-dependent on events, etc.
Abstract and not popular
Abstract and popular
Specific and not popular
Very popular
26 / 37
Characterising Concepts of Interest
27
Novel measures for the characterisation and semantic expansion of concepts of interest Enrichment of entity-based user profiles for personalisation
Popularity of concepts on the Social Web (using Twitter) How popular an entity is on the Social Web? How frequently is it
mentioned/used at that point of time?
Trend and temporal dynamics (using Wikipedia page views) The trend and evolution of the frequency of mentions of an entity on
the Social Web (i.e. popularity over time)
Specificity and categorisation of entities of interest (using LOD)
The level of abstraction that an entity has in a common conceptual schema shared by humans
27 / 37
Requirements
Use case: real-time personalisation of Social Web streams
1. Real-time computation of the dimensions
2. Results constantly up to date with the real world
3. Knowledge base and domain independent approach
28 / 37
Popularity?
[Orlandi, Kapanipathi, Sheth, Passant. IEEE/ACM, WI 2013]
Characterising Concepts of Interest
Trendy and Stable?
Specificity?
29 / 37
Real-time Semantic Personalisation of Social Web Streams
“SPOTS”: A methodology for real-time personalisation of any large social stream
Automatic dynamic generation of multi-source user profiles of interests.
Semantic enrichment of concepts of interest with provenance and Linked Data info.
Ranking and selection of the interests according to their relevance for the user and for the personalisation use case.
Informativeness measures for posts to filter a large social stream.
Evaluation of the approach on the public Twitter stream
Against Twitter #Discover: from 192% increase in accuracy30 / 37
31[Kapanipathi, Orlandi, Sheth, Passant. SPIM at ISWC 2011.]
31
Real-time Semantic Personalisation of Social Web Streams
Evaluation on SPOTSUser study to evaluate the impact of the enrichment on a personalisation use case
27 users, 800 user ratings collectedMain outcome:
Popularity and Temporal Dynamics are useful measures for real-time personalisation
SPOTS Improvement*No Enrichment ---
Trendy +29%Not Stable +26%
At Least 2 Features +9%
Specific + Not Popular +5%
* In recommendations accuracy over non-enriched profiles 32 / 37
Evaluation on User ProfilesUser study to evaluate the impact of the enrichment on user profiles according to users’ judgement
27 users, 800 user ratings collectedMain outcome:
Specificity is more useful than popularity measures according to user perception
User Profiles Improvement*No Enrichment ---
Not Specific + Not Popular +13%
Not Specific +8%Not Popular +2%
Stable + Not Trendy +1%
* In profile accuracy over non-enriched profiles 33 / 37
Summary
34 / 37[Orlandi, UMAP 2012]
Summary We provide and evaluate a complete methodology for
profiling user interests across multiple sources on the Social Web Collect, Represent, Aggregate, Mine, Enrich, Deploy
Aggregation of user data: • Semantic representation of Social Web content and user activities
Provenance of data:• Improves profiling accuracy and connects Social Web and WoD
Mining of user interests:• Provenance + Linked Data/Entity-based strategies + time decay,
outperform traditional “bag-of-words” strategies and facilitate enrichment
Semantic enrichment:• Improves profiling accuracy and it is necessary for the deployment of
the profiles in a personalisation use case• Different types of personalisation need different entities of interest
35 / 37
Future Work
Federated Personal Data Manager Privacy-aware, interoperable, autonomous,
user profiling infrastructure
Provenance at Web Scale Necessary to focus on techniques for an easier and less expensive
tracking and management of provenance on the Social Semantic Web
Adaptive Profiling of User Interests Adaptation of the profiling algorithm and strategy according to the
application and the context
36 / 37
Contributions & Dissemination Semantic Web modelling solutions for Social Web data, user
profiles, provenance on the Social Web and Web of Data. A provenance computation framework Novel measures for characterising entities of interest A real-time personalisation system for large Social Web
streams User studies for different profiling strategies, provenance
features and personalisation use-cases A privacy-aware user profile management system
Publications
2 journal, 4 conference, 2 workshop papers
37 / 37
Thanks!
38
39
ContextUser Modelling• The process of representing a user or some of his/her
characteristics (e.g. interests, workplace, location, etc.)
User Profile• A characterisation of a user at a particular point of time
Experiment6 types of user profiles evaluated:
2 types of DBpedia entities
Categories vs. Resources
2 types of weighting-scheme for category-based methodsCat1: Interests Weight PropagationCat2: Interests Weight Propagation w/ Cat. Discount
2 types of exponential Time Decay function
Short mean lifetime
Long mean lifetime
days120
days360
Experiment
6 types of user profiles evaluated:
Cat2
Cat1-120 Cat1-360 Cat2-120 Cat2-
360Res-120 Res-360
Res Cat
Cat1
42
User-based Evaluation
We asked users to rate the top 10 interests generated for each of the 6 profiling strategies Question:
“Please rate how relevant is each concept for representing your personal interests and context…”
Rating: 0 (not at all or don't know), 1 (low), 2, 3, 4, 5 (high)
Rating converted to a (0…10) scale Performance evaluated with:
MRR (Mean Reciprocal Rank)P@10 (Precision at K = 10)
Comparison with a Baseline A traditional approach based on “keyword frequency”
EvaluationOn average for:200 Tweets & 200 Facebook posts, and items.
~106 interests – DBpedia Resources ~720 interests – DBpedia Categories (~7 times)
Statistical significance for:Resources vs. Categories (p<0.05)Any method vs. Baseline (p<0.05)Not for time decay (p~0.2) and Cat1 vs. Cat2