38
TAGora: Semiotic Dynamics of Online Social Communities EU- IST-2006-034721 Modelling Users’ Profiles and Interests based on Cross-Folksonomy Analysis Martin Szomszor University of Southampton

Modelling Users’ Profiles and Interests based on Cross-Folksonomy Analysis @ HT2009

Embed Size (px)

DESCRIPTION

Invited talk at the ACM Hypertext Conference 2009, Turin, Italy

Citation preview

Page 1: Modelling Users’ Profiles and  Interests based on  Cross-Folksonomy Analysis @ HT2009

TAGora: Semiotic Dynamics of Online Social Communities EU-IST-2006-034721

Modelling Users’ Profiles and Interests based on

Cross-Folksonomy Analysis

Martin SzomszorUniversity of Southampton

Page 2: Modelling Users’ Profiles and  Interests based on  Cross-Folksonomy Analysis @ HT2009

Outline• Introduction and Motivation

– Why is your folksonomy interaction useful?– How could it be exploited?

• Making Sense of Folksonomies– Distributed Contact Networks– Tag Filtering / Tag Senses

• Profiles of Interests• Future Work

– Disambiguation– Building Better Profiles of Interests

Page 3: Modelling Users’ Profiles and  Interests based on  Cross-Folksonomy Analysis @ HT2009

Introduction

delicious.comhttp://slashdot.org/

http://news.bbc.co.uk/

Dream Theater

Metallica

Rush

Page 4: Modelling Users’ Profiles and  Interests based on  Cross-Folksonomy Analysis @ HT2009

Increasing number ofonline identities

• Recent Ofcom study found that UK adults have on average 1.6 profiles. 39% of those that have one profile have at least 2– [Ofcom 2008] Social Networking: A quantative and qualitative

research report into attitudes, behaviours, and use.

• In the future, people will maintain an increasing number of online identities to meet different information sharing tasks and to connect with different communities

Page 5: Modelling Users’ Profiles and  Interests based on  Cross-Folksonomy Analysis @ HT2009

delicious.com

Tag Clouds

Page 6: Modelling Users’ Profiles and  Interests based on  Cross-Folksonomy Analysis @ HT2009

Profile of Interests

The Big Picturedelicious.com

Page 7: Modelling Users’ Profiles and  Interests based on  Cross-Folksonomy Analysis @ HT2009

delicious.com

Profiles could be exported to other sites to improve recommendation quality

Profile of

Interests

Personalisation

Profiles could be used to support

personalised searching

Better user experience

Page 8: Modelling Users’ Profiles and  Interests based on  Cross-Folksonomy Analysis @ HT2009

Consolidation and Integration

currency

travel

hotels

cuba

http://dbpedia.org/resource/Cuba

cuba

holiday

2008

http://dbpedia.org/resource/Travel

http://dbpedia.org/resource/Holiday

http://dbpedia.org/resource/Category:Tourism

Page 9: Modelling Users’ Profiles and  Interests based on  Cross-Folksonomy Analysis @ HT2009

Tagging Variation

[1] Szomszor, M., Cantador, I. and Alani, H. (2008). Correlating User Profiles from Multiple Folksonomies. In: ACM Conference on Hypertext and Hypermedia, 2008 , Pittsburgh, Pennsylvania.

Raw Tags

Filtered Tags

Page 10: Modelling Users’ Profiles and  Interests based on  Cross-Folksonomy Analysis @ HT2009

Disconnected Identities

fan of

contact friend

friend

#me

Page 11: Modelling Users’ Profiles and  Interests based on  Cross-Folksonomy Analysis @ HT2009

Delicious Last.fm Flickr Facebook

Identity Integration Tag Integration

Tagging Semantics

FOAF DBpedia + Wordnet

Making Sense of Folksonomies

Page 12: Modelling Users’ Profiles and  Interests based on  Cross-Folksonomy Analysis @ HT2009

Delicious Last.fm

Identity Integration Tag Integration

Tagging Semantics

FOAF DBpedia + Wordnet

1. Contact Integration

Flickr Facebook

Page 13: Modelling Users’ Profiles and  Interests based on  Cross-Folksonomy Analysis @ HT2009

SNS Contact Integration

Page 14: Modelling Users’ Profiles and  Interests based on  Cross-Folksonomy Analysis @ HT2009

#me

Consolidated Contact View

• Recommend new connections

Page 15: Modelling Users’ Profiles and  Interests based on  Cross-Folksonomy Analysis @ HT2009

http://tagora.ecs.soton.ac.uk/delicious/martinszomszor

http://tagora.ecs.soton.ac.uk/flickr/7214044@N08

http://tagora.ecs.soton.ac.uk/lastfm/mszomszor

http://tagora.ecs.soton.ac.uk/facebook/613077109

http://tagora.ecs.soton.ac.uk/LiveSocialSemantics/ht2009/foaf/4

<owl#sameAs> <http://tagora.ecs.soton.ac.uk/facebook/613077109> <http://tagora.ecs.soton.ac.uk/schemas/facebook#hasFriend> <http://tagora.ecs.soton.ac.uk/facebook/1006466985>, <http://tagora.ecs.soton.ac.uk/facebook/684541156>, … <http://tagora.ecs.soton.ac.uk/facebook/1043367866>;

FOAF Representation of SNS Accounts

Page 16: Modelling Users’ Profiles and  Interests based on  Cross-Folksonomy Analysis @ HT2009

Delicious Last.fm Flickr Facebook

Identity Integration Tag Integration

Tagging Semantics

FOAF DBpedia + Wordnet

2. Tag Integration

Page 17: Modelling Users’ Profiles and  Interests based on  Cross-Folksonomy Analysis @ HT2009

Folksonomy IntegrationTag Heterogeneity

Web2.0 Web_2.0!=

Page 18: Modelling Users’ Profiles and  Interests based on  Cross-Folksonomy Analysis @ HT2009

Folksonomy Integration:Tag Heterogeneity

Web2.0 Web_2.0

isFilteredTo

Page 19: Modelling Users’ Profiles and  Interests based on  Cross-Folksonomy Analysis @ HT2009

Tag Filtering• Find canonical form for each tag:

– Use Dbpedia entry labels as reference• compound terms separated by _

– second-life, second+life, second.life -> second_life

• concatenated / camel case terms are expanded– secondlife, SecondLife -> second_life

• International Characters Normalised:– Caf%C3%A9 -> Cafe

• Recommend Spelling Corrections– resaerch -> didYouMean research

• Follow unambiguous redirections:– Humor, Funny -> Humour

Page 20: Modelling Users’ Profiles and  Interests based on  Cross-Folksonomy Analysis @ HT2009

http://tagora.ecs.soton.ac.uk/schemas/tagging#

http://www.w3.org/2001/XMLSchema#

(f) = functional property

property subclass

hasUserFrequency

hasGlobalFrequency

hasDomainFrequency

rdfs:labelhasCooccurrenceInfo

hasCooccurrenceFrequency

cooccurringTag

hasPost taggedResource

isFilteredTo

hasNextSegment (f)

hasTagSequence (f)

tagUsed (f)

taggedOn

xsd:integer

xsd:integer

xsd:integer

xsd:string

xsd:integer

xsd:datetime

hasGlobalTag

hasDomainTag

UserTag

DomainTag

GlobalTag

usesTag

Tag

Tagger

PostResource

TagSegment

FinalTagSegment

CooccurrencInfo

Page 21: Modelling Users’ Profiles and  Interests based on  Cross-Folksonomy Analysis @ HT2009

Linked Data View

Page 22: Modelling Users’ Profiles and  Interests based on  Cross-Folksonomy Analysis @ HT2009

Linked Data View

Page 23: Modelling Users’ Profiles and  Interests based on  Cross-Folksonomy Analysis @ HT2009

Linked Data View

Page 24: Modelling Users’ Profiles and  Interests based on  Cross-Folksonomy Analysis @ HT2009

Linked Data View

Page 25: Modelling Users’ Profiles and  Interests based on  Cross-Folksonomy Analysis @ HT2009

Finding Syntactic Variationssparql$ select ?x where {?x <http://tagora.ecs.soton.ac.uk/schemas/tagging#isFilteredTo> <http://tagora.ecs.soton.ac.uk/tag/web_2.0>}┌─────────────────────────────────────────────┐│ ?x │├─────────────────────────────────────────────┤│ <http://tagora.ecs.soton.ac.uk/tag/web2.0> ││ <http://tagora.ecs.soton.ac.uk/tag/web2> ││ <http://tagora.ecs.soton.ac.uk/tag/web_2.0> ││ <http://tagora.ecs.soton.ac.uk/tag/web_20> ││ <http://tagora.ecs.soton.ac.uk/tag/web20> │└─────────────────────────────────────────────┘sparql$ select * where {?x <http://tagora.ecs.soton.ac.uk/schemas/tagging#isFilteredTo> <http://tagora.ecs.soton.ac.uk/tag/second_life>}┌───────────────────────────────────────────────────┐│ ?x │├───────────────────────────────────────────────────┤│ <http://tagora.ecs.soton.ac.uk/tag/second_Life> ││ <http://tagora.ecs.soton.ac.uk/tag/second.life> ││ <http://tagora.ecs.soton.ac.uk/tag/SecondLife> ││ <http://tagora.ecs.soton.ac.uk/tag/Second_Life> ││ <http://tagora.ecs.soton.ac.uk/tag/second%20life> ││ <http://tagora.ecs.soton.ac.uk/tag/SECOND_LIFE> ││ <http://tagora.ecs.soton.ac.uk/tag/second_life> ││ <http://tagora.ecs.soton.ac.uk/tag/secondlife> │└───────────────────────────────────────────────────┘

Page 26: Modelling Users’ Profiles and  Interests based on  Cross-Folksonomy Analysis @ HT2009

Tag Senses• What are the possible meanings for a tag?• We use two reference sets:

– DBPedia• Concepts

– Wordnet• Synsets

Page 27: Modelling Users’ Profiles and  Interests based on  Cross-Folksonomy Analysis @ HT2009

http://tagora.ecs.soton.ac.uk/schemas/tagging#

http://www.w3.org/2001/XMLSchema#

(f) = functional property

property subclass

http://tagora.ecs.soton.ac.uk/schemas/dbpedia#

http://tagora.ecs.soton.ac.uk/schemas/disambiguation#senseWeight

dbpediaSense

hasDbpediaSenseInfo

didYouMean

Resource

DbpediaSenseInfo

xsd:float

http://www.w3.org/2006/03/wn/wn20/schema/

hasWordnetSense

WordSenseTag

Disambiguation Ontology

Page 28: Modelling Users’ Profiles and  Interests based on  Cross-Folksonomy Analysis @ HT2009
Page 29: Modelling Users’ Profiles and  Interests based on  Cross-Folksonomy Analysis @ HT2009

DBpedia Extraction• Extract triples from XML dump

– Calculate normalised title string• Caf%C3%A9 -> cafe

– Calculate concatenated title string• Second_life -> secondlife

– Extract disambiguation term from title• Orange_(fruit)

– Identify compound labels• Second_Life -> Second, Life

Page 30: Modelling Users’ Profiles and  Interests based on  Cross-Folksonomy Analysis @ HT2009

DBpedia Extraction

• Number of incoming links• Extract page redirects• Extract Disambiguation Links

– Find Primary disambiguation (e.g. Apple)

Page 31: Modelling Users’ Profiles and  Interests based on  Cross-Folksonomy Analysis @ HT2009

DBpedia Extraction

• Parse wiki text and extract terms:– Terms filtered using stop words (with some wiki

specific additions)– Store term frequencies– Store number of distinct terms in page– Store total term frequency

• Can associate a vector of terms and weights to each possible sense

Page 32: Modelling Users’ Profiles and  Interests based on  Cross-Folksonomy Analysis @ HT2009

FinalCompoundLabelSequence

hasCompoundLabelSequence (f)

hasNextLabelSequence (f)

hasCompoundLabel (f)

isa

hasLabel

hasNormalisedLabel

hasConcatenatedLabel

hasDisambiguationTerm

hasTermFrequencyPair

hasTerm

hasTermFrequency

hasDisambiguation

hasPrimaryDisambiguation

hasTotalTermFrequencyhasTotalTerms

CompoundLabelSequence

Resource

xsd:integerxsd:integer

xsd:integer

xsd:string

xsd:string

xsd:string

xsd:string

xsd:string

xsd:string

TermFrequencyPair

Page 33: Modelling Users’ Profiles and  Interests based on  Cross-Folksonomy Analysis @ HT2009

Profiles of Interests[2] Szomszor, M., Alani, H., Cantador, I., O'Hara, K. and Shadbolt, N. (2008) Semantic Modelling of User Interests based on Cross-Folksonomy Analysis. In: 7th International Semantic Web Conference (ISWC), October 26th - 30th, Karlsruhe, Germany.

Page 34: Modelling Users’ Profiles and  Interests based on  Cross-Folksonomy Analysis @ HT2009

Global Category View• What are the differences in the interests

that are learnt from each domain?

Delicious FlickrWikipedia Category Total Freq Wikipedia Category Total Freq

Design 69,215 Travel 51,674

Blogs 68,319 Australia 51,617

Music 45,063 London 46,623

Photography 41,356 Festivals 42,504

Tools 35,795 Music 40,943

Video 34,318 Cats 38,230

Arts 29,966 Holidays 37,610

Software 28,746 Family 37,100

Maps 26,912 Japan 36,513

Teaching 22,120 Concerts 35,374

Games 21,549 Surnames 34,947

How-to 19,533 Washington 33,924

Technology 18,032 Given Names 32,843

News 17,737 Dogs 32,206

Humor 15,816 Birthdays 22,290

Page 35: Modelling Users’ Profiles and  Interests based on  Cross-Folksonomy Analysis @ HT2009

Future Work• Given a set of possible senses, how can

we choose the best match?• Folksonomy data can provide contextual

information:– User tag-cloud– Cooccurrence Network– User Cooccurrence Network

• Can abstract this information as a vector of terms and weights (context)

Page 36: Modelling Users’ Profiles and  Interests based on  Cross-Folksonomy Analysis @ HT2009

Disambiguating Flickr Images

Page 37: Modelling Users’ Profiles and  Interests based on  Cross-Folksonomy Analysis @ HT2009

Building Better Profiles• What tags correspond to interests?

– Locations and topics are useful, but other terms are not

• TF / IDF Approach– It’s not that useful to find out we are all

interested in HTML• Making use of the Category hierarchy

– If I’m interested in Facebook, Flickr, Last.fm, Delicious, etc, I can extrapolate the interest Online_Social_Networks

Page 38: Modelling Users’ Profiles and  Interests based on  Cross-Folksonomy Analysis @ HT2009

http://tagora.ecs.soton.ac.uk/tag/apple

http://tagora.ecs.soton.ac.uk/dbpedia/resource/Apple_Inc.

http://tagora.ecs.soton.ac.uk/tag/apple/sense-info/0

0.30628910807

_:b9510f00000000a5 “mac”35dbpedia:hasTermFrequency

dbpedia:hasTerm

dbpedia:hasTermFrequencyPair

dbpedia:hasDbpediaSenseInfo

dbpedia:sensedbpedia:senseWei

ght

http://tagora.ecs.soton.ac.uk/dbpedia/resource/Apple

http://tagora.ecs.soton.ac.uk/tag/apple/sense-info/1

0.248912928

_:b9510f00000000a5 “fruit”41dbpedia:hasTermFrequency

dbpedia:hasTerm

dbpedia:hasTermFrequencyPair

dbpedia:sensedbpedia:senseWei

ght

owl:sameAs

owl:sameAs