Characterising the Emergent Semantics in Twitter Lists

Andrés García-Silva †, Jeon-Hyung Kang*, Kristina Lerman*,Oscar Corcho †

† {hgarcia, ocorcho}@fi.upm.esFacultad de Informática

Universidad Politécnica de Madrid, Spain

*{jeonhyuk,lerman}@isi.edu

Information Sciences Institute,

University of Southern California, USA

Characterising the Emergent Semantics in Twitter Lists 2

Introduction

Twitter Lists

3Characterising the Emergent Semantics in Twitter Lists

Introduction

Curators and

List Names

Introduction

Members and

List Names

Introduction

Subscribers

List Names

• Previous examples showed individual uses of lists• Some list names where related among them

• What about if we group the lists?

Introduction

IntroductionLists where the Yahoo!Finance user is a member grouped by frequency of membership

Lists where the NASDAQ user is a member grouped by number of subscriptions

Stocks

PersonalBanking

Investment

BanksCurator 1 Curator 2

Subscriber 1

List members

• Is it possible to identify related keywords from list names according to the use given by the different user roles?• Are two list names related if they have been used by a similar set of

curators?• Are two list names related if a similar set of users have subscribe to the

corresponding lists?• Are two list names related if their corresponding lists have a similar set of

members?• What kind of user roles will generate more related keywords?• What types of relations between keywords can we obtain?

• Synonyms, is-a, siblings..?

Introduction: Research questions

Approach

Elicit related keywords from Twitter lists

Characterise the semantics of the relations

Schema Representation of keywords

Based on members

Based on subscribers

Based on curators

Model to identify similar keywords

Vector Space Model

Latent Dirichlet Allocation

Pairs of related

keywords per

Schema Rep. and

Twitter Lists

Approach

Pairs of related

keywords per

Schema Rep. and

Similarity based on WordNet

Jiang & Conrath (Distributional Inf.)

Wu & Palmer (Hierarchical Inf.)

Path Length

SPARQL queries over general KBs published as Linked Data

DBpedia, OpenCyc, and UMBEL

SynonymsIs-a

SiblingsIndirect is-a

Specificity of relations

Synonyms(sameAs)

Binary relations(TypeOf, BT)

Object Prop.(Occupation)

• Data set• Total

• 297,521 lists, 2,171,140 members, 215,599 curators, and 616,662 subscribers

• We extracted 5932 unique keywords from list names; 55% of them were found in WordNet.

• We use approximate matching of the list names with dictionary entries

• The dictionary was created from Wikipedia article titles

Experiment: Setup

Experiment: Execution

Pairs of related

keywords per

Schema Rep. and

Each keyword

with the 5 Most

related WordNet Similarity

Similarity based on WordNet

Jiang & Conrath (Distributional Inf.)

Wu & Palmer (Hierarchical Inf.)

Path Length

Based on members

Based on curators

Vector Space Model

Dataset

Experiment: Data Analysis

Pearson's coefficient of correlations

Average J&C distance and W&P similarity

Path Length Members Subscribers Curators

VSM LDA VSM LDA VSM LDA

1 (synonyms) 8.58% 10.87% 3.97% 3.24% 1.24% 0.50%

2 (is-a) 3.42% 3.08% 1.93% 0.47% 0.70% 0.00%

3 (Siblings, ind. Is-a) 2.37% 3.77% 2.96% 2.06% 2.38% 4.03%

>3 67.61% 65.5% 67.2% 67.5% 77.8% 75.8%

In average 97.65% of the relations with a path length greater than 3 involve a common subsumer

Path Length in WordNet

% of relations found by each schema representation and model

Depth of the least common subsumer

Length of the path setting up the relation

Depth (LCS) and path length as indicators of specificity

Summary• Similarity models based on members

• produce the results that are most correlated to the results of similarity measures based on WordNet

• find more synonyms and direct relations is-a when compared to the other models (path length).

• The majority of relations found by any model have a path length >= 3 and involve a common subsumer.• Depth of LCS

• VSM based on subscribers produces the highest number of specific relations (depth of LCS >= 5 or 6).

• Similarity models based on curators produce a lower number of relations.

Experiment: Findings

Experiment: ExecutionExperiment: Execution

Pairs of related

keywords per

Schema Rep. and

Each keyword

with the 5 Most

Based on curators

Vector Space Model

Dataset

Ontological Relations between

keywords

SPARQL queries over general KBs published as Linked Data

DBpedia, OpenCyc, and UMBEL

• We anchor 63.77% of the keywords extracted from Twitter Lists to DBPedia resources

Experiment

Linked data pattern (54.73%): x -> object <-yRelations object Keywords

type type 67.35% company nokia intelsubClassOf subClassOf 30.61% activities philanthropy fundraising

Linked data pattern (43.49%): x <-object->yRelations object Keywords

genre genre 12.43% Aesthetica theater filmoccupation genre 10.27% Adam Maxwell fiction writeroccupation occupation 8.11% Alina Tugend poet writer

product product 7.57% ChenOne clothes fashionindustry product 9.73% UserLand Softw. blogs internet

known for occupation 5.41% Adeline Yen Mah author writingknown for known for 3.78% Rebecca Watson skeptics atheist

main interest main interest 3.24% Aristotle politics government

Relation type Example of keywordsBroader Term 26% life-science biotech

subClassOf 26% writers authorsdeveloper 11% google google_apps

genre 11% funland comedylargest city 6% houston texas

Others 20% - -

Vector-space model based on members (direct relations)

Vector-space model based on subscribers (relations of length 3)

• Different models to elicit related keywords from Twitter lists.• Curators, Subscribers and members - VSM and LDA

• Characterise the semantics of relations: WordNet-based similarity measures and SPARQL queries over linked data sets

Conclusions

• Vector-space and LDA models based on members produce the most correlated results to those of WordNet-based metrics.• Shortest JC distance and highest WP similarities

• According to the path length in WordNet• Models based on members produce more synonyms and direct is-a• Most of the relations have path length ≥ 3 and have a common subsumer

• Depth of LCS• Vector-space model based on subscribers finds highest

number of relations (depth LCS ≥ 5 and 4 ≤ path length ≤ 0) • We confirm these results according to linked data sets

Conclusions

Characterising the Emergent Semantics in Twitter Lists

Andrés García-Silva †, Jeon-Hyung Kang*, Kristina Lerman*,Oscar Corcho †

† {hgarcia, ocorcho}@fi.upm.esFacultad de Informática

Universidad Politécnica de Madrid, Spain

*{jeonhyuk,lerman}@isi.edu

Information Sciences Institute,

University of Southern California, USA

Characterising the Emergent Semantics in Twitter Lists

Technology

Characterising Tumours on Cytology

Characterising uncertainty through climate model ensemblesemps.exeter.ac.uk/media/universityofexeter/emps/... · Characterising uncertainty through climate model ensembles Open issues

Probing Emergent Semantics in Predictive Agents via

Emergent Semantics Through Interaction in Image …gupta/publications/kde-sp-01.pdf · Emergent Semantics Through Interaction in Image Databases ... As Umberto Eco puts it: ... a

researchportal.bath.ac.uk · Web viewRunning head: CHARACTERISING THE LINGUISTIC CHAMELEON. Characterising the Linguistic Chameleon: Personal and Social Correlates of Linguistic

Characterising complex binding interactions by

Social Emergent Semantics for Personal Data Management

for Characterising Weld Geometry

Interoperability through Emergent Semantics. A Semiotic Dynamics

Characterising International English

Characterising Organic Hydroge - UCL Discovery

Structure and Dynamics of Emergent Semantics Systemsnsfcac.rutgers.edu/conferences/icac/keynotes/icac_EmSem.pdfEmergent Semantics Systems Karl Aberer EPFL School of Computer and Communication

CHARACTERISING MACROEVOLUTIONARY PATTERNS WITHIN CROCODYLOMORPHAetheses.bham.ac.uk/id/eprint/8872/1/Lorena Godoy18PhD.pdf · 2019-03-05 · CHARACTERISING MACROEVOLUTIONARY PATTERNS

© Ramesh Jain Ramesh Jain CTO, PRAJA inc. and Professor Emeritus, UCSD rjain@praja.com Emergent Semantics and Experiential Computing

Characterising seizures in anti-NMDA-receptor encephalitis ...karl/Characterising seizures in anti-NMDA-receptor...Characterising seizures in anti-NMDA-receptor encephalitis with dynamic

Characterising the Inherent Variability of Textile Compositesuserweb.eng.gla.ac.uk/philip.harrison/Teaching/2011 Gordon Kanyike... · Characterising the Inherent Variability of Textile

Viewpoints on Emergent Semantics - MIT CSAIL

PAREIDOLIA: CHARACTERISING FACIAL ANTHROPOMORPHISM …

Emergent Semantics Systemspeople.csail.mit.edu/pcm/papers/ICSNW2004.pdf · Emergent Semantics Systems Karl Aberer1, Tiziana Catarci2, Philippe Cudr´e-Mauroux1, ... c IFIP International

Dynamic Building of Domain Specific Lexicons Using Emergent Semantics