Characterising the Emergent Semantics in Twitter Lists

Preview:

DESCRIPTION

Presentation at

Citation preview

Characterising the Emergent Semantics in Twitter Lists

Andrés García-Silva †, Jeon-Hyung Kang*, Kristina Lerman*,Oscar Corcho †

† {hgarcia, ocorcho}@fi.upm.esFacultad de Informática

Universidad Politécnica de Madrid, Spain

*{jeonhyuk,lerman}@isi.edu

Information Sciences Institute,

University of Southern California, USA

Characterising the Emergent Semantics in Twitter Lists 2

Introduction

Twitter Lists

3Characterising the Emergent Semantics in Twitter Lists

Introduction

Curators and

List Names

4Characterising the Emergent Semantics in Twitter Lists

Introduction

Members and

List Names

5Characterising the Emergent Semantics in Twitter Lists

Introduction

Subscribers

and

List Names

6Characterising the Emergent Semantics in Twitter Lists

• Previous examples showed individual uses of lists• Some list names where related among them

• What about if we group the lists?

Introduction

7Characterising the Emergent Semantics in Twitter Lists

IntroductionLists where the Yahoo!Finance user is a member grouped by frequency of membership

Lists where the NASDAQ user is a member grouped by number of subscriptions

8Characterising the Emergent Semantics in Twitter Lists

Stocks

PersonalBanking

Investment

BanksCurator 1 Curator 2

Subscriber 1

List members

• Is it possible to identify related keywords from list names according to the use given by the different user roles?• Are two list names related if they have been used by a similar set of

curators?• Are two list names related if a similar set of users have subscribe to the

corresponding lists?• Are two list names related if their corresponding lists have a similar set of

members?• What kind of user roles will generate more related keywords?• What types of relations between keywords can we obtain?

• Synonyms, is-a, siblings..?

Introduction: Research questions

9Characterising the Emergent Semantics in Twitter Lists

Approach

Elicit related keywords from Twitter lists

Characterise the semantics of the relations

Schema Representation of keywords

Based on members

Based on subscribers

Based on curators

Model to identify similar keywords

Vector Space Model

Latent Dirichlet Allocation

Pairs of related

keywords per

Schema Rep. and

Model

Twitter Lists

10Characterising the Emergent Semantics in Twitter Lists

Approach

Elicit related keywords from Twitter lists

Characterise the semantics of the relations

Pairs of related

keywords per

Schema Rep. and

Model

Similarity based on WordNet

Jiang & Conrath (Distributional Inf.)

Wu & Palmer (Hierarchical Inf.)

Path Length

SPARQL queries over general KBs published as Linked Data

DBpedia, OpenCyc, and UMBEL

SynonymsIs-a

SiblingsIndirect is-a

Specificity of relations

Synonyms(sameAs)

Binary relations(TypeOf, BT)

Object Prop.(Occupation)

11Characterising the Emergent Semantics in Twitter Lists

• Data set• Total

• 297,521 lists, 2,171,140 members, 215,599 curators, and 616,662 subscribers

• We extracted 5932 unique keywords from list names; 55% of them were found in WordNet.

• We use approximate matching of the list names with dictionary entries

• The dictionary was created from Wikipedia article titles

Experiment: Setup

12Characterising the Emergent Semantics in Twitter Lists

Experiment: Execution

Pairs of related

keywords per

Schema Rep. and

Model

Each keyword

with the 5 Most

related WordNet Similarity

Characterise the semantics of the relations

Similarity based on WordNet

Jiang & Conrath (Distributional Inf.)

Wu & Palmer (Hierarchical Inf.)

Path Length

Elicit related keywords from Twitter lists

Schema Representation of keywords

Based on members

Based on subscribers

Based on curators

Model to identify similar keywords

Vector Space Model

Latent Dirichlet Allocation

Dataset

13Characterising the Emergent Semantics in Twitter Lists

Experiment: Data Analysis

Pearson's coefficient of correlations

Average J&C distance and W&P similarity

Cor

rela

tion

Val

ues

(-1

to

1)

14Characterising the Emergent Semantics in Twitter Lists

Path Length Members Subscribers Curators

VSM LDA VSM LDA VSM LDA

1 (synonyms) 8.58% 10.87% 3.97% 3.24% 1.24% 0.50%

2 (is-a) 3.42% 3.08% 1.93% 0.47% 0.70% 0.00%

3 (Siblings, ind. Is-a) 2.37% 3.77% 2.96% 2.06% 2.38% 4.03%

>3 67.61% 65.5% 67.2% 67.5% 77.8% 75.8%

Experiment: Data Analysis

In average 97.65% of the relations with a path length greater than 3 involve a common subsumer

Path Length in WordNet

% of relations found by each schema representation and model

15Characterising the Emergent Semantics in Twitter Lists

Rel

atio

ns

in W

ord

Net

Depth of the least common subsumer

Experiment: Data Analysis

Rel

atio

ns

wit

h d

ept(

LC

S)

>=

5

Length of the path setting up the relation

Depth (LCS) and path length as indicators of specificity

16Characterising the Emergent Semantics in Twitter Lists

Summary• Similarity models based on members

• produce the results that are most correlated to the results of similarity measures based on WordNet

• find more synonyms and direct relations is-a when compared to the other models (path length).

• The majority of relations found by any model have a path length >= 3 and involve a common subsumer.• Depth of LCS

• VSM based on subscribers produces the highest number of specific relations (depth of LCS >= 5 or 6).

• Similarity models based on curators produce a lower number of relations.

Experiment: Findings

17Characterising the Emergent Semantics in Twitter Lists

Experiment: ExecutionExperiment: Execution

Pairs of related

keywords per

Schema Rep. and

Model

Each keyword

with the 5 Most

related

Elicit related keywords from Twitter lists

Schema Representation of keywords

Based on members

Based on subscribers

Based on curators

Model to identify similar keywords

Vector Space Model

Latent Dirichlet Allocation

Dataset

Ontological Relations between

keywords

Characterise the semantics of the relations

SPARQL queries over general KBs published as Linked Data

DBpedia, OpenCyc, and UMBEL

18Characterising the Emergent Semantics in Twitter Lists

• We anchor 63.77% of the keywords extracted from Twitter Lists to DBPedia resources

Experiment

19Characterising the Emergent Semantics in Twitter Lists

Experiment

Linked data pattern (54.73%): x -> object <-yRelations object Keywords

type type 67.35% company nokia intelsubClassOf subClassOf 30.61% activities philanthropy fundraising

Linked data pattern (43.49%): x <-object->yRelations object Keywords

genre genre 12.43% Aesthetica theater filmoccupation genre 10.27% Adam Maxwell fiction writeroccupation occupation 8.11% Alina Tugend poet writer

product product 7.57% ChenOne clothes fashionindustry product 9.73% UserLand Softw. blogs internet

known for occupation 5.41% Adeline Yen Mah author writingknown for known for 3.78% Rebecca Watson skeptics atheist

main interest main interest 3.24% Aristotle politics government

Relation type Example of keywordsBroader Term 26% life-science biotech

subClassOf 26% writers authorsdeveloper 11% google google_apps

genre 11% funland comedylargest city 6% houston texas

Others 20% - -

Vector-space model based on members (direct relations)

Vector-space model based on subscribers (relations of length 3)

20Characterising the Emergent Semantics in Twitter Lists

• Different models to elicit related keywords from Twitter lists.• Curators, Subscribers and members - VSM and LDA

• Characterise the semantics of relations: WordNet-based similarity measures and SPARQL queries over linked data sets

Conclusions

21Characterising the Emergent Semantics in Twitter Lists

• Vector-space and LDA models based on members produce the most correlated results to those of WordNet-based metrics.• Shortest JC distance and highest WP similarities

• According to the path length in WordNet• Models based on members produce more synonyms and direct is-a• Most of the relations have path length ≥ 3 and have a common subsumer

• Depth of LCS• Vector-space model based on subscribers finds highest

number of relations (depth LCS ≥ 5 and 4 ≤ path length ≤ 0) • We confirm these results according to linked data sets

Conclusions

Characterising the Emergent Semantics in Twitter Lists

Andrés García-Silva †, Jeon-Hyung Kang*, Kristina Lerman*,Oscar Corcho †

† {hgarcia, ocorcho}@fi.upm.esFacultad de Informática

Universidad Politécnica de Madrid, Spain

*{jeonhyuk,lerman}@isi.edu

Information Sciences Institute,

University of Southern California, USA

Recommended