50
Motivation Research question State-of-the-art Methodology Results Conclusions and Future Work Exploring the similarity between Social Knowledge Sources and Twitter for Cross-DomainTopic Classification of Tweets Andrea Varga, Amparo E. Cano and Fabio Ciravegna 1 Organisations Information and Knowledge (OAK) Research Group University of Sheffield 2 Knowledge Management Institute (KMI) Open University KECSM 2012/ISWC 2012 Nov 12, 2012 1/22

Exploring the similarity between Social Knowledge Sources and Twitter for Cross-Domain Topic Classification of Tweets #KECSM 2012 #ISWC2012

Embed Size (px)

DESCRIPTION

The rapid rate of information propagation on social streams has proven to be an up-to-date channel of communication, which can reveal events happening in the world. However, identifying the topicality of short messages (e.g. tweets) distributed on these streams poses new challenges in the development of accurate classification algorithms. In order to alleviate this problem we study for the first time a transfer learning setting aiming to make use of two frequently updated social knowledge sources KSs (DBpedia and Freebase) for detecting topics in tweets. In this paper we investigate the similarity (and dissimilarity) between these KSs and Twitter at the lexical and conceptual (entity) level. We also evaluate the contribution of these types of features and propose various statistical measures for determining the topics which are highly similar or different in KSs and tweets. Our findings can be of potential use to machine learning or domain adaptation algorithms aiming to use named entities for topic classification of tweets. These results can also be valuable in the identification of representative sets of annotated articles from the KSs, which can help in building accurate topic classifiers of tweets.

Citation preview

Page 1: Exploring the similarity between Social Knowledge Sources and Twitter for Cross-Domain Topic Classification of Tweets #KECSM 2012 #ISWC2012

Motivation Research question State-of-the-art Methodology Results Conclusions and Future Work

Exploring the similarity between Social Knowledge Sources andTwitter for Cross-Domain Topic Classification of Tweets

Andrea Varga, Amparo E. Cano and Fabio Ciravegna

1 Organisations Information and Knowledge (OAK) Research GroupUniversity of Sheffield

2 Knowledge Management Institute (KMI)Open University

KECSM 2012/ISWC 2012

Nov 12, 2012

1/22

Page 2: Exploring the similarity between Social Knowledge Sources and Twitter for Cross-Domain Topic Classification of Tweets #KECSM 2012 #ISWC2012

Motivation Research question State-of-the-art Methodology Results Conclusions and Future Work

Outline

1 Motivation

2 State-of-the-art

3 Methodology

4 Results

5 Conclusions and Future Work

2/22

Page 3: Exploring the similarity between Social Knowledge Sources and Twitter for Cross-Domain Topic Classification of Tweets #KECSM 2012 #ISWC2012

Motivation Research question State-of-the-art Methodology Results Conclusions and Future Work

Why classifying Tweets into topics?

Topic classification (TC) of tweets can be important for multipleapplication:

Information Retrieval

Recommendation

Emergency responses, etc.

Topic name Example tweetsDisaster&Accident(DisAcc) happening accident people dying could

phone ambulance wakakkaka xdEntertainment&Culture(EntCult) google adwords commercial greeeat en-

joyed watching greeeeeat dayPolitics(Pol) quoting military source sk media reports

deployed rocket launchers decoys realSports(Sports) ravens good position games left browns

bengals playoffs

3/22

Page 4: Exploring the similarity between Social Knowledge Sources and Twitter for Cross-Domain Topic Classification of Tweets #KECSM 2012 #ISWC2012

Motivation Research question State-of-the-art Methodology Results Conclusions and Future Work

What are the challenges in Topic Classification (TC) of Tweets?

Special characteristics of tweets

the restricted size of a post (limited to 140 characters)

the frequent use of misspellings and jargons

the frequent use of abbreviations

the use of non-standard English: reflected in vocabulary and writing style

Topic name Example tweetsDisaster&Accident(DisAcc) happening accident people dying could

phone ambulance wakakkaka xdEntertainment&Culture(EntCult) google commercial greeeat enjoyed

watching dayPolitics(Pol) quoting military source media reports de-

ployed rocket launchers decoys realSports(Sports) ravens good position games left browns

bengals playoffs

4/22

Page 5: Exploring the similarity between Social Knowledge Sources and Twitter for Cross-Domain Topic Classification of Tweets #KECSM 2012 #ISWC2012

Motivation Research question State-of-the-art Methodology Results Conclusions and Future Work

What are the challenges in Topic Classification (TC) of Tweets?

Special characteristics of tweets

the restricted size of a post (limited to 140 characters)

the frequent use of misspellings and jargons

the frequent use of abbreviations

the use of non-standard English: reflected in vocabulary and writing style

Topic name Example tweetsDisaster&Accident(DisAcc) happening accident people dying could

phone ambulance wakakkaka xdEntertainment&Culture(EntCult) google adwords commercial greeeat en-

joyed watching greeeeeat dayPolitics(Pol) quoting military source media reports de-

ployed rocket launchers decoys realSports(Sports) ravens good position games left browns

bengals playoffs

4/22

Page 6: Exploring the similarity between Social Knowledge Sources and Twitter for Cross-Domain Topic Classification of Tweets #KECSM 2012 #ISWC2012

Motivation Research question State-of-the-art Methodology Results Conclusions and Future Work

What are the challenges in Topic Classification (TC) of Tweets?

Special characteristics of tweets

the restricted size of a post (limited to 140 characters)

the frequent use of misspellings and jargons

the frequent use of abbreviations

the use of non-standard English: reflected in vocabulary and writing style

Topic name Example tweetsDisaster&Accident(DisAcc) happening accident people dying could

phone ambulance wakakkaka xdEntertainment&Culture(EntCult) google adwords commercial greeeat en-

joyed watching greeeeeat dayPolitics(Pol) quoting military source sk media reports

deployed rocket launchers decoys realSports(Sports) ravens good position games left browns

bengals playoffs

4/22

Page 7: Exploring the similarity between Social Knowledge Sources and Twitter for Cross-Domain Topic Classification of Tweets #KECSM 2012 #ISWC2012

Motivation Research question State-of-the-art Methodology Results Conclusions and Future Work

What are the challenges in Topic Classification (TC) of Tweets?

Special characteristics of tweets

the restricted size of a post (limited to 140 characters)

the frequent use of misspellings and jargons

the frequent use of abbreviations

the use of non-standard English: reflected in vocabulary and writing style

Topic name Example tweetsDisaster&Accident(DisAcc) happening accident people dying could

phone ambulance wakakkaka xdEntertainment&Culture(EntCult) google adwords commercial greeeat en-

joyed watching greeeeeat dayPolitics(Pol) quoting military source sk media reports

deployed rocket launchers decoys realSports(Sports) ravens good position games left browns

bengals playoffs

4/22

Page 8: Exploring the similarity between Social Knowledge Sources and Twitter for Cross-Domain Topic Classification of Tweets #KECSM 2012 #ISWC2012

Motivation Research question State-of-the-art Methodology Results Conclusions and Future Work

What are the challenges in Topic Classification (TC) of Tweets?

Special characteristics of tweets

the restricted size of a post (limited to 140 characters)

the frequent use of misspellings and jargons

the frequent use of abbreviations

the use of non-standard English: reflected in vocabulary and writing style

Topic name Example tweetsDisaster&Accident(DisAcc) happening accident people dying could

phone ambulance wakakkaka xdEntertainment&Culture(EntCult) google adwords commercial greeeat en-

joyed watching greeeeeat dayPolitics(Pol) quoting military source sk media reports

deployed rocket launchers decoys realSports(Sports) ravens good position games left browns

bengals playoffs

=> These characteristics poses additional challenges for traditionalsupervised machine learning approaches for buildingaccurate TC of tweets

4/22

Page 9: Exploring the similarity between Social Knowledge Sources and Twitter for Cross-Domain Topic Classification of Tweets #KECSM 2012 #ISWC2012

Motivation Research question State-of-the-art Methodology Results Conclusions and Future Work

Why are Social Knowledge Sources (KS) relevant to Twitter?

Data bottleneck problem: investigate an alternative approach inspiredby domain adaptation/transfer learning for exploiting the information fromSocial Knowledge Sources (DBpedia and Freebase) for TC of Tweets

Commonalities between KSs and Twitter

they are constantly edited by web users

they are social and built on a collaborative manner

they cover a large number of topics

5/22

Page 10: Exploring the similarity between Social Knowledge Sources and Twitter for Cross-Domain Topic Classification of Tweets #KECSM 2012 #ISWC2012

Motivation Research question State-of-the-art Methodology Results Conclusions and Future Work

Why are Social Knowledge Sources (KS) relevant to Twitter?

Data bottleneck problem: investigate an alternative approach inspiredby domain adaptation/transfer learning for exploiting the information fromSocial Knowledge Sources (DBpedia and Freebase) for TC of Tweets

Commonalities between KSs and Twitter

they are constantly edited by web users

they are social and built on a collaborative manner

they cover a large number of topics

5/22

Page 11: Exploring the similarity between Social Knowledge Sources and Twitter for Cross-Domain Topic Classification of Tweets #KECSM 2012 #ISWC2012

Motivation Research question State-of-the-art Methodology Results Conclusions and Future Work

Why are Social Knowledge Sources (KS) relevant to Twitter?

Data bottleneck problem: investigate an alternative approach inspiredby domain adaptation/transfer learning for exploiting the information fromSocial Knowledge Sources (DBpedia and Freebase) for TC of Tweets

Commonalities between KSs and Twitter

they are constantly edited by web users

they are social and built on a collaborative manner

they cover a large number of topics

5/22

Page 12: Exploring the similarity between Social Knowledge Sources and Twitter for Cross-Domain Topic Classification of Tweets #KECSM 2012 #ISWC2012

Motivation Research question State-of-the-art Methodology Results Conclusions and Future Work

Why are Social Knowledge Sources (KS) relevant to Twitter?

Data bottleneck problem: investigate an alternative approach inspiredby domain adaptation/transfer learning for exploiting the information fromSocial Knowledge Sources (DBpedia and Freebase) for TC of Tweets

Commonalities between KSs and Twitter

they are constantly edited by web users

they are social and built on a collaborative manner

they cover a large number of topics

5/22

Page 13: Exploring the similarity between Social Knowledge Sources and Twitter for Cross-Domain Topic Classification of Tweets #KECSM 2012 #ISWC2012

Motivation Research question State-of-the-art Methodology Results Conclusions and Future Work

Why are Social Knowledge Sources (KS) relevant to Twitter?

Data bottleneck problem: investigate an alternative approach inspiredby domain adaptation/transfer learning for exploiting the information fromSocial Knowledge Sources (DBpedia and Freebase) for TC of Tweets

Commonalities between KSs and Twitter

they are constantly edited by web users

they are social and built on a collaborative manner

they cover a large number of topics

More importantly: KSs contain a large number of annotated data on alarge number of topics

5/22

Page 14: Exploring the similarity between Social Knowledge Sources and Twitter for Cross-Domain Topic Classification of Tweets #KECSM 2012 #ISWC2012

Motivation Research question State-of-the-art Methodology Results Conclusions and Future Work

Research questions

1 Are KSs relevant for topic classification of Tweets?

2 Which features make the KSs look more similar to Twitter?

3 How similar or dissimilar are KSs to Twitter? Which similarity measuredoes better quantify the lexical changes between KSs and Twitter?

6/22

Page 15: Exploring the similarity between Social Knowledge Sources and Twitter for Cross-Domain Topic Classification of Tweets #KECSM 2012 #ISWC2012

Motivation Research question State-of-the-art Methodology Results Conclusions and Future Work

State-of-the-art approaches for TC of Tweets

Using DBpedia for Topic Classification of Tweets:Wikify (Mihalcea, R. and Csomai, A., 2007)Enriching unstructured text with Wikipedia links (D. Milne and I. H. Witten,2008)Tagme (P. Ferragina and U. Scaiella., 2010)Topical Social Sensor (P. K. P. N. Mendes et al., 2010)Vector space model (Oscar Munoz-Garcia et al. 2011)

Using Freebase for Topic Classification of Tweets:Clustering based approach (S.P.Kasiviswanathan et al., 2011)

Our main contribution:Understanding the similarity between KSs and Twitter

Exploring multiple KSs (DBpedia + Freebase)

Investigating various statistical metrics for quantifying thesimilarity between KSs and Twitter

7/22

Page 16: Exploring the similarity between Social Knowledge Sources and Twitter for Cross-Domain Topic Classification of Tweets #KECSM 2012 #ISWC2012

Motivation Research question State-of-the-art Methodology Results Conclusions and Future Work

Methodology followed

8/22

Page 17: Exploring the similarity between Social Knowledge Sources and Twitter for Cross-Domain Topic Classification of Tweets #KECSM 2012 #ISWC2012

Motivation Research question State-of-the-art Methodology Results Conclusions and Future Work

Methodology followed

1 Collecting Data from KSs

Retrieve articles

Concept enrichment

Build Cross-domain Classifier Annotate Tweets

Sc. FB Sc. DB-FBSc. DB

Retrieve tweets

Concept enrichment

8/22

Page 18: Exploring the similarity between Social Knowledge Sources and Twitter for Cross-Domain Topic Classification of Tweets #KECSM 2012 #ISWC2012

Motivation Research question State-of-the-art Methodology Results Conclusions and Future Work

Methodology followed

1 Collecting Data from KSs

2 Building Cross-Domain (CD) Topic Classifier of Tweets

Retrieve articles

Concept enrichment

Build Cross-domain Classifier Annotate Tweets

Sc. FB Sc. DB-FBSc. DB

Retrieve tweets

Concept enrichment

8/22

Page 19: Exploring the similarity between Social Knowledge Sources and Twitter for Cross-Domain Topic Classification of Tweets #KECSM 2012 #ISWC2012

Motivation Research question State-of-the-art Methodology Results Conclusions and Future Work

Methodology followed

1 Collecting Data from KSs

2 Building Cross-Domain (CD) Topic Classifier of Tweets

3 Measuring Distributional Changes Between KSs and Twitter

Retrieve articles

Concept enrichment

Build Cross-domain Classifier Annotate Tweets

Sc. FB Sc. DB-FBSc. DB

Retrieve tweets

Concept enrichment

8/22

Page 20: Exploring the similarity between Social Knowledge Sources and Twitter for Cross-Domain Topic Classification of Tweets #KECSM 2012 #ISWC2012

Motivation Research question State-of-the-art Methodology Results Conclusions and Future Work

Step 1: Collecting Data from KSs

Twitter corpus collected in Abel et al. (2011), tweets posted between October 2010 andJanuary 2011, annotated with 17 topics

Random selection of 1,000 articles/tweets from DBpedia/Freebase/Twitter for each topic =>9,465 articles from DBpedia; 16,915 articles from Freebase; and 12,412 tweets

Preprocessing: removal of hastags, mentions and URLs from tweets; taking top-1000

features for each topic

88.6%

8.6%

1.8%0.9%

Dbpedia multilabel frequency

1 8 2 3+4+5+6+7+9

99.9% 0.1%

Freebase multilabel frequency

1 2

71%

22.3%

5.6%

1%0.1%

Twitter multilabel frequency

1 2 3 4 6+5

88.6%

8.6%

1.8%0.9%

Dbpedia multilabel frequency

1 8 2 3+4+5+6+7+9

99.9% 0.1%

Freebase multilabel frequency

1 2

71%

22.3%

5.6%

1%0.1%

Twitter multilabel frequency

1 2 3 4 6+5

9/22

Page 21: Exploring the similarity between Social Knowledge Sources and Twitter for Cross-Domain Topic Classification of Tweets #KECSM 2012 #ISWC2012

Motivation Research question State-of-the-art Methodology Results Conclusions and Future Work

Step 1: Collecting Data from KSs

Disaster_AccidentBusiness_Finance Education Entertainment

Health

Environment

Human Interest Labor Law_Crime

Politics

Religion Social Issues Sports

Technology_IT

Weather War_Conflict

Retrieval of articles for a given topic (e.g. Politics):from DBpedia: executing SPARQL queries for retrieving category namescontaining the topic name:

Category:Politics_of_the_United_StatesCategory:National_Democratic_Party_Egypt_politiciansetc.

from Freebase: accessing Text Service API for articles belonging to thetopic:

for underspecified topics/domains: consider articles containing the topic in theirtitles

10/22

Page 22: Exploring the similarity between Social Knowledge Sources and Twitter for Cross-Domain Topic Classification of Tweets #KECSM 2012 #ISWC2012

Motivation Research question State-of-the-art Methodology Results Conclusions and Future Work

Step 2: Building Cross-Domain (CD) Topic Classifier of Tweets

Considering two different feature sets:

11/22

Page 23: Exploring the similarity between Social Knowledge Sources and Twitter for Cross-Domain Topic Classification of Tweets #KECSM 2012 #ISWC2012

Motivation Research question State-of-the-art Methodology Results Conclusions and Future Work

Step 2: Building Cross-Domain (CD) Topic Classifier of Tweets

Considering two different feature sets:BOW: tf.idf value of the words present the examples (articles or tweets)

11/22

Page 24: Exploring the similarity between Social Knowledge Sources and Twitter for Cross-Domain Topic Classification of Tweets #KECSM 2012 #ISWC2012

Motivation Research question State-of-the-art Methodology Results Conclusions and Future Work

Step 2: Building Cross-Domain (CD) Topic Classifier of Tweets

Considering two different feature sets:BOW: tf.idf value of the words present the examples (articles or tweets)

BOE: tf.idf value of the words and entity+concept pairs present the examples(articles or tweets)

Retrieve articles

Concept enrichment

Build Cross-domain Classifier Annotate Tweets

Sc. FB Sc. DB-FBSc. DB

Retrieve tweets

Concept enrichment

11/22

Page 25: Exploring the similarity between Social Knowledge Sources and Twitter for Cross-Domain Topic Classification of Tweets #KECSM 2012 #ISWC2012

Motivation Research question State-of-the-art Methodology Results Conclusions and Future Work

Step 3: Measuring Distributional Changes Between KSs and Twitter

Building a vector−→ds for each the source dataset (Sc.DB, Sc.Fb,

Sc.Db-FB) and a vector−→dt for the target dataset (Twitter) consisting of

the TF-IDF weight for either the BoW or BoE feature sets

statistical measures applied:

χ2 test: χ2 =∑ (O−E)2

E , where O is the observed value for a feature, whileE is the expected value calculated on the basis of the joint corpus

Kullback-Leibler symmetric distance:

KL(−→ds ||−→dt ) =

∑f∈FS∪FT

(−→ds(f )−

−→dt (f )) log

−→ds (f )−→dt (f )

cosine similarity: cosine(−→ds ,−→dt ) =

∑FS∪FTk=1 (

−→ds (fSk

)×−→ds (fTk

))∑FS∪FTk=1 (

−→ds (fSk

))2×∑FS∪FT

k=1 (−→dt (fTk

))2

12/22

Page 26: Exploring the similarity between Social Knowledge Sources and Twitter for Cross-Domain Topic Classification of Tweets #KECSM 2012 #ISWC2012

Motivation Research question State-of-the-art Methodology Results Conclusions and Future Work

Experimental setting

1-vs-all approach, building individual CD classifier for each topic, SVMclassifiers, performed 5 cross-fold validation

Sc-Db, Sc-Fb, Sc-Db-Fb classifiers trained on full KS data, evaluated on20% Twitter data 2,482 tweets)

TGT classifier: trained on 80% Twitter data, evaluated on 20% Twitterdata (2,482 tweets)

13/22

Page 27: Exploring the similarity between Social Knowledge Sources and Twitter for Cross-Domain Topic Classification of Tweets #KECSM 2012 #ISWC2012

Motivation Research question State-of-the-art Methodology Results Conclusions and Future Work

Findings -Classification performance in F1 measure

Q1 : Which KS reflects better the lexical variation in Twitter?

14/22

Page 28: Exploring the similarity between Social Knowledge Sources and Twitter for Cross-Domain Topic Classification of Tweets #KECSM 2012 #ISWC2012

Motivation Research question State-of-the-art Methodology Results Conclusions and Future Work

Findings -Classification performance in F1 measure

Q1 : Which KS reflects better the lexical variation in Twitter?

Sc.D

B(Bo

E)

Sc.D

B(Bo

W)

Sc.F

B(Bo

E)

SC.F

B(Bo

W)

Sc.D

B−FB

(BoE

)

SC.D

B−FB

(BoW

)

TGT(

BoW

)

TGT(

BoE)

Edu

Sports

War

Labor

Weather

HumInt

Env

TechIT

DisAcc

SocIssue

HospRecr

Law

Pol

Health

Religion

EntCult

BusFi

37.40 37.80 42.50 47.20 42.50 45.70 71.90 71.30

10.30 11.70 26.50 26.20 26.20 26.00 60.10 59.20

9.30 10.90 23.90 23.60 24.60 25.70 67.60 72.70

1.40 1.30 31.90 31.90 30.10 29.90 79.90 79.40

3.10 7.00 39.80 39.90 36.70 36.00 81.20 81.50

1.10 1.40 2.00 1.60 1.50 2.20 33.60 34.20

15.20 14.20 2.20 2.20 8.90 8.40 46.60 48.30

1.60 2.40 18.40 18.60 12.40 9.90 57.40 58.00

8.30 9.70 14.50 14.50 21.00 21.10 57.50 59.20

1.30 2.00 8.80 9.00 9.70 9.10 44.20 44.00

1.40 2.30 17.20 19.50 11.30 13.90 41.60 42.60

0.90 2.70 16.80 17.80 14.80 13.30 46.80 46.40

22.20 21.00 27.80 26.80 24.20 26.80 45.10 44.90

28.60 30.30 25.60 24.70 26.90 25.40 51.70 51.80

23.40 25.10 14.40 14.70 21.00 20.40 58.40 58.50

15.30 15.10 15.50 16.20 19.50 20.20 42.20 43.10

21.40 21.20 11.50 15.30 18.60 19.10 47.50 46.50

14/22

Page 29: Exploring the similarity between Social Knowledge Sources and Twitter for Cross-Domain Topic Classification of Tweets #KECSM 2012 #ISWC2012

Motivation Research question State-of-the-art Methodology Results Conclusions and Future Work

Findings -Classification performance in F1 measure

Q1 : Which KS reflects better the lexical variation in Twitter?

Sc.D

B(Bo

E)

Sc.D

B(Bo

W)

Sc.F

B(Bo

E)

SC.F

B(Bo

W)

Sc.D

B−FB

(BoE

)

SC.D

B−FB

(BoW

)

TGT(

BoW

)

TGT(

BoE)

Edu

Sports

War

Labor

Weather

HumInt

Env

TechIT

DisAcc

SocIssue

HospRecr

Law

Pol

Health

Religion

EntCult

BusFi

37.40 37.80 42.50 47.20 42.50 45.70 71.90 71.30

10.30 11.70 26.50 26.20 26.20 26.00 60.10 59.20

9.30 10.90 23.90 23.60 24.60 25.70 67.60 72.70

1.40 1.30 31.90 31.90 30.10 29.90 79.90 79.40

3.10 7.00 39.80 39.90 36.70 36.00 81.20 81.50

1.10 1.40 2.00 1.60 1.50 2.20 33.60 34.20

15.20 14.20 2.20 2.20 8.90 8.40 46.60 48.30

1.60 2.40 18.40 18.60 12.40 9.90 57.40 58.00

8.30 9.70 14.50 14.50 21.00 21.10 57.50 59.20

1.30 2.00 8.80 9.00 9.70 9.10 44.20 44.00

1.40 2.30 17.20 19.50 11.30 13.90 41.60 42.60

0.90 2.70 16.80 17.80 14.80 13.30 46.80 46.40

22.20 21.00 27.80 26.80 24.20 26.80 45.10 44.90

28.60 30.30 25.60 24.70 26.90 25.40 51.70 51.80

23.40 25.10 14.40 14.70 21.00 20.40 58.40 58.50

15.30 15.10 15.50 16.20 19.50 20.20 42.20 43.10

21.40 21.20 11.50 15.30 18.60 19.10 47.50 46.50

14/22

Page 30: Exploring the similarity between Social Knowledge Sources and Twitter for Cross-Domain Topic Classification of Tweets #KECSM 2012 #ISWC2012

Motivation Research question State-of-the-art Methodology Results Conclusions and Future Work

Findings -Classification performance in F1 measure

Q1 : Which KS reflects better the lexical variation in Twitter?

Sc.D

B(Bo

E)

Sc.D

B(Bo

W)

Sc.F

B(Bo

E)

SC.F

B(Bo

W)

Sc.D

B−FB

(BoE

)

SC.D

B−FB

(BoW

)

TGT(

BoW

)

TGT(

BoE)

Edu

Sports

War

Labor

Weather

HumInt

Env

TechIT

DisAcc

SocIssue

HospRecr

Law

Pol

Health

Religion

EntCult

BusFi

37.40 37.80 42.50 47.20 42.50 45.70 71.90 71.30

10.30 11.70 26.50 26.20 26.20 26.00 60.10 59.20

9.30 10.90 23.90 23.60 24.60 25.70 67.60 72.70

1.40 1.30 31.90 31.90 30.10 29.90 79.90 79.40

3.10 7.00 39.80 39.90 36.70 36.00 81.20 81.50

1.10 1.40 2.00 1.60 1.50 2.20 33.60 34.20

15.20 14.20 2.20 2.20 8.90 8.40 46.60 48.30

1.60 2.40 18.40 18.60 12.40 9.90 57.40 58.00

8.30 9.70 14.50 14.50 21.00 21.10 57.50 59.20

1.30 2.00 8.80 9.00 9.70 9.10 44.20 44.00

1.40 2.30 17.20 19.50 11.30 13.90 41.60 42.60

0.90 2.70 16.80 17.80 14.80 13.30 46.80 46.40

22.20 21.00 27.80 26.80 24.20 26.80 45.10 44.90

28.60 30.30 25.60 24.70 26.90 25.40 51.70 51.80

23.40 25.10 14.40 14.70 21.00 20.40 58.40 58.50

15.30 15.10 15.50 16.20 19.50 20.20 42.20 43.10

21.40 21.20 11.50 15.30 18.60 19.10 47.50 46.50

14/22

Page 31: Exploring the similarity between Social Knowledge Sources and Twitter for Cross-Domain Topic Classification of Tweets #KECSM 2012 #ISWC2012

Motivation Research question State-of-the-art Methodology Results Conclusions and Future Work

Findings -Classification performance in F1 measure

Q1 : Which KS reflects better the lexical variation in Twitter?

Sc.D

B(Bo

E)

Sc.D

B(Bo

W)

Sc.F

B(Bo

E)

SC.F

B(Bo

W)

Sc.D

B−FB

(BoE

)

SC.D

B−FB

(BoW

)

TGT(

BoW

)

TGT(

BoE)

Edu

Sports

War

Labor

Weather

HumInt

Env

TechIT

DisAcc

SocIssue

HospRecr

Law

Pol

Health

Religion

EntCult

BusFi

37.40 37.80 42.50 47.20 42.50 45.70 71.90 71.30

10.30 11.70 26.50 26.20 26.20 26.00 60.10 59.20

9.30 10.90 23.90 23.60 24.60 25.70 67.60 72.70

1.40 1.30 31.90 31.90 30.10 29.90 79.90 79.40

3.10 7.00 39.80 39.90 36.70 36.00 81.20 81.50

1.10 1.40 2.00 1.60 1.50 2.20 33.60 34.20

15.20 14.20 2.20 2.20 8.90 8.40 46.60 48.30

1.60 2.40 18.40 18.60 12.40 9.90 57.40 58.00

8.30 9.70 14.50 14.50 21.00 21.10 57.50 59.20

1.30 2.00 8.80 9.00 9.70 9.10 44.20 44.00

1.40 2.30 17.20 19.50 11.30 13.90 41.60 42.60

0.90 2.70 16.80 17.80 14.80 13.30 46.80 46.40

22.20 21.00 27.80 26.80 24.20 26.80 45.10 44.90

28.60 30.30 25.60 24.70 26.90 25.40 51.70 51.80

23.40 25.10 14.40 14.70 21.00 20.40 58.40 58.50

15.30 15.10 15.50 16.20 19.50 20.20 42.20 43.10

21.40 21.20 11.50 15.30 18.60 19.10 47.50 46.50

14/22

Page 32: Exploring the similarity between Social Knowledge Sources and Twitter for Cross-Domain Topic Classification of Tweets #KECSM 2012 #ISWC2012

Motivation Research question State-of-the-art Methodology Results Conclusions and Future Work

Findings -Classification performance in F1 measure

Q1 : Which KS reflects better the lexical variation in Twitter?

Sc.Db-FB showed best performance, followed by Sc.Fb and Sc.Db

Sc.D

B(Bo

E)

Sc.D

B(Bo

W)

Sc.F

B(Bo

E)

SC.F

B(Bo

W)

Sc.D

B−FB

(BoE

)

SC.D

B−FB

(BoW

)

TGT(

BoW

)

TGT(

BoE)

Edu

Sports

War

Labor

Weather

HumInt

Env

TechIT

DisAcc

SocIssue

HospRecr

Law

Pol

Health

Religion

EntCult

BusFi

37.40 37.80 42.50 47.20 42.50 45.70 71.90 71.30

10.30 11.70 26.50 26.20 26.20 26.00 60.10 59.20

9.30 10.90 23.90 23.60 24.60 25.70 67.60 72.70

1.40 1.30 31.90 31.90 30.10 29.90 79.90 79.40

3.10 7.00 39.80 39.90 36.70 36.00 81.20 81.50

1.10 1.40 2.00 1.60 1.50 2.20 33.60 34.20

15.20 14.20 2.20 2.20 8.90 8.40 46.60 48.30

1.60 2.40 18.40 18.60 12.40 9.90 57.40 58.00

8.30 9.70 14.50 14.50 21.00 21.10 57.50 59.20

1.30 2.00 8.80 9.00 9.70 9.10 44.20 44.00

1.40 2.30 17.20 19.50 11.30 13.90 41.60 42.60

0.90 2.70 16.80 17.80 14.80 13.30 46.80 46.40

22.20 21.00 27.80 26.80 24.20 26.80 45.10 44.90

28.60 30.30 25.60 24.70 26.90 25.40 51.70 51.80

23.40 25.10 14.40 14.70 21.00 20.40 58.40 58.50

15.30 15.10 15.50 16.20 19.50 20.20 42.20 43.10

21.40 21.20 11.50 15.30 18.60 19.10 47.50 46.50

15/22

Page 33: Exploring the similarity between Social Knowledge Sources and Twitter for Cross-Domain Topic Classification of Tweets #KECSM 2012 #ISWC2012

Motivation Research question State-of-the-art Methodology Results Conclusions and Future Work

Findings -Classification performance in F1 measure

Q1 : Which KS reflects better the lexical variation in Twitter?

Sc.Db-FB showed best performance, followed by Sc.Fb and Sc.Db

Sc.D

B(Bo

E)

Sc.D

B(Bo

W)

Sc.F

B(Bo

E)

SC.F

B(Bo

W)

Sc.D

B−FB

(BoE

)

SC.D

B−FB

(BoW

)

TGT(

BoW

)

TGT(

BoE)

Edu

Sports

War

Labor

Weather

HumInt

Env

TechIT

DisAcc

SocIssue

HospRecr

Law

Pol

Health

Religion

EntCult

BusFi

37.40 37.80 42.50 47.20 42.50 45.70 71.90 71.30

10.30 11.70 26.50 26.20 26.20 26.00 60.10 59.20

9.30 10.90 23.90 23.60 24.60 25.70 67.60 72.70

1.40 1.30 31.90 31.90 30.10 29.90 79.90 79.40

3.10 7.00 39.80 39.90 36.70 36.00 81.20 81.50

1.10 1.40 2.00 1.60 1.50 2.20 33.60 34.20

15.20 14.20 2.20 2.20 8.90 8.40 46.60 48.30

1.60 2.40 18.40 18.60 12.40 9.90 57.40 58.00

8.30 9.70 14.50 14.50 21.00 21.10 57.50 59.20

1.30 2.00 8.80 9.00 9.70 9.10 44.20 44.00

1.40 2.30 17.20 19.50 11.30 13.90 41.60 42.60

0.90 2.70 16.80 17.80 14.80 13.30 46.80 46.40

22.20 21.00 27.80 26.80 24.20 26.80 45.10 44.90

28.60 30.30 25.60 24.70 26.90 25.40 51.70 51.80

23.40 25.10 14.40 14.70 21.00 20.40 58.40 58.50

15.30 15.10 15.50 16.20 19.50 20.20 42.20 43.10

21.40 21.20 11.50 15.30 18.60 19.10 47.50 46.50

15/22

Page 34: Exploring the similarity between Social Knowledge Sources and Twitter for Cross-Domain Topic Classification of Tweets #KECSM 2012 #ISWC2012

Motivation Research question State-of-the-art Methodology Results Conclusions and Future Work

Findings -Classification performance in F1 measure

Q1 : Which KS reflects better the lexical variation in Twitter?

Sc.Db-FB showed best performance, followed by Sc.Fb and Sc.Db

Sc.D

B(Bo

E)

Sc.D

B(Bo

W)

Sc.F

B(Bo

E)

SC.F

B(Bo

W)

Sc.D

B−FB

(BoE

)

SC.D

B−FB

(BoW

)

TGT(

BoW

)

TGT(

BoE)

Edu

Sports

War

Labor

Weather

HumInt

Env

TechIT

DisAcc

SocIssue

HospRecr

Law

Pol

Health

Religion

EntCult

BusFi

37.40 37.80 42.50 47.20 42.50 45.70 71.90 71.30

10.30 11.70 26.50 26.20 26.20 26.00 60.10 59.20

9.30 10.90 23.90 23.60 24.60 25.70 67.60 72.70

1.40 1.30 31.90 31.90 30.10 29.90 79.90 79.40

3.10 7.00 39.80 39.90 36.70 36.00 81.20 81.50

1.10 1.40 2.00 1.60 1.50 2.20 33.60 34.20

15.20 14.20 2.20 2.20 8.90 8.40 46.60 48.30

1.60 2.40 18.40 18.60 12.40 9.90 57.40 58.00

8.30 9.70 14.50 14.50 21.00 21.10 57.50 59.20

1.30 2.00 8.80 9.00 9.70 9.10 44.20 44.00

1.40 2.30 17.20 19.50 11.30 13.90 41.60 42.60

0.90 2.70 16.80 17.80 14.80 13.30 46.80 46.40

22.20 21.00 27.80 26.80 24.20 26.80 45.10 44.90

28.60 30.30 25.60 24.70 26.90 25.40 51.70 51.80

23.40 25.10 14.40 14.70 21.00 20.40 58.40 58.50

15.30 15.10 15.50 16.20 19.50 20.20 42.20 43.10

21.40 21.20 11.50 15.30 18.60 19.10 47.50 46.50

15/22

Page 35: Exploring the similarity between Social Knowledge Sources and Twitter for Cross-Domain Topic Classification of Tweets #KECSM 2012 #ISWC2012

Motivation Research question State-of-the-art Methodology Results Conclusions and Future Work

Findings -Classification performance in F1 measure

Q2 : What feature makes the KSs look more similar to Twitter?

BoW features were found better than BoE for CD classifiersBoE features were found better than BoW for TGT

Sc.D

B(Bo

E)

Sc.D

B(Bo

W)

Sc.F

B(Bo

E)

SC.F

B(Bo

W)

Sc.D

B−FB

(BoE

)

SC.D

B−FB

(BoW

)

TGT(

BoW

)

TGT(

BoE)

Edu

Sports

War

Labor

Weather

HumInt

Env

TechIT

DisAcc

SocIssue

HospRecr

Law

Pol

Health

Religion

EntCult

BusFi

37.40 37.80 42.50 47.20 42.50 45.70 71.90 71.30

10.30 11.70 26.50 26.20 26.20 26.00 60.10 59.20

9.30 10.90 23.90 23.60 24.60 25.70 67.60 72.70

1.40 1.30 31.90 31.90 30.10 29.90 79.90 79.40

3.10 7.00 39.80 39.90 36.70 36.00 81.20 81.50

1.10 1.40 2.00 1.60 1.50 2.20 33.60 34.20

15.20 14.20 2.20 2.20 8.90 8.40 46.60 48.30

1.60 2.40 18.40 18.60 12.40 9.90 57.40 58.00

8.30 9.70 14.50 14.50 21.00 21.10 57.50 59.20

1.30 2.00 8.80 9.00 9.70 9.10 44.20 44.00

1.40 2.30 17.20 19.50 11.30 13.90 41.60 42.60

0.90 2.70 16.80 17.80 14.80 13.30 46.80 46.40

22.20 21.00 27.80 26.80 24.20 26.80 45.10 44.90

28.60 30.30 25.60 24.70 26.90 25.40 51.70 51.80

23.40 25.10 14.40 14.70 21.00 20.40 58.40 58.50

15.30 15.10 15.50 16.20 19.50 20.20 42.20 43.10

21.40 21.20 11.50 15.30 18.60 19.10 47.50 46.50

16/22

Page 36: Exploring the similarity between Social Knowledge Sources and Twitter for Cross-Domain Topic Classification of Tweets #KECSM 2012 #ISWC2012

Motivation Research question State-of-the-art Methodology Results Conclusions and Future Work

Findings -Examining the number of annotation needed for Twitterclassifier to outperform Sc. Db-FB

Investigated the impact of employing Sc. Db-FB classifier over theTwitter classifier in terms of number of annotations

The performance of the Twitter classifier against the three CD classifiersover the full learning curve

17/22

Page 37: Exploring the similarity between Social Knowledge Sources and Twitter for Cross-Domain Topic Classification of Tweets #KECSM 2012 #ISWC2012

Motivation Research question State-of-the-art Methodology Results Conclusions and Future Work

Findings -Examining the number of annotation needed for Twitterclassifier to outperform Sc. Db-FB

Investigated the impact of employing Sc. Db-FB classifier over theTwitter classifier in terms of number of annotations

The performance of the Twitter classifier against the three CD classifiersover the full learning curve

=> In the absence of any annotated tweets, applying these CDclassifiers are beneficial

17/22

Page 38: Exploring the similarity between Social Knowledge Sources and Twitter for Cross-Domain Topic Classification of Tweets #KECSM 2012 #ISWC2012

Motivation Research question State-of-the-art Methodology Results Conclusions and Future Work

Findings -Examining the number of annotation needed for Twitterclassifier to outperform the CD classifiers

Q3 : How similar or dissimilar are KSs to Twitter posts; and whichsimilarity measure does better reflect the lexical changes between KSsand Twitter posts?

Compared χ2, KL-divergence, cosine for each topic

χ2 obtained the best correlation with the performance of CD classifiers,achived scores >70% for 32 cases

cosine obtained correlation scores >70% for 25 cases

KL obtained correlation scores >70% for 24 cases

18/22

Page 39: Exploring the similarity between Social Knowledge Sources and Twitter for Cross-Domain Topic Classification of Tweets #KECSM 2012 #ISWC2012

Motivation Research question State-of-the-art Methodology Results Conclusions and Future Work

Findings -Examining the number of annotation needed for Twitterclassifier to outperform the CD classifiers

Q3 : How similar or dissimilar are KSs to Twitter posts; and whichsimilarity measure does better reflect the lexical changes between KSsand Twitter posts?

Compared χ2, KL-divergence, cosine for each topic

χ2 obtained the best correlation with the performance of CD classifiers,achived scores >70% for 32 cases

cosine obtained correlation scores >70% for 25 cases

KL obtained correlation scores >70% for 24 cases

=> χ2 test is the best measure for quantifying the distributionaldifferences between KSs and Twitter.

18/22

Page 40: Exploring the similarity between Social Knowledge Sources and Twitter for Cross-Domain Topic Classification of Tweets #KECSM 2012 #ISWC2012

Motivation Research question State-of-the-art Methodology Results Conclusions and Future Work

Conclusions and Future Work

We presented a first study towards understanding the usefulness of KSs in TC oftweets at various granularities: lexical features (BoW) and entity features (BoE)

Our main findings are:

19/22

Page 41: Exploring the similarity between Social Knowledge Sources and Twitter for Cross-Domain Topic Classification of Tweets #KECSM 2012 #ISWC2012

Motivation Research question State-of-the-art Methodology Results Conclusions and Future Work

Conclusions and Future Work

We presented a first study towards understanding the usefulness of KSs in TC oftweets at various granularities: lexical features (BoW) and entity features (BoE)

Our main findings are:In the absence of any annotated tweets, applying these CD classifiers are beneficial

19/22

Page 42: Exploring the similarity between Social Knowledge Sources and Twitter for Cross-Domain Topic Classification of Tweets #KECSM 2012 #ISWC2012

Motivation Research question State-of-the-art Methodology Results Conclusions and Future Work

Conclusions and Future Work

We presented a first study towards understanding the usefulness of KSs in TC oftweets at various granularities: lexical features (BoW) and entity features (BoE)

Our main findings are:In the absence of any annotated tweets, applying these CD classifiers are beneficial

Out of the two KSs, Freebase topics seem to be much closer to the Twitter topics thanthe DBpedia topics.

19/22

Page 43: Exploring the similarity between Social Knowledge Sources and Twitter for Cross-Domain Topic Classification of Tweets #KECSM 2012 #ISWC2012

Motivation Research question State-of-the-art Methodology Results Conclusions and Future Work

Conclusions and Future Work

We presented a first study towards understanding the usefulness of KSs in TC oftweets at various granularities: lexical features (BoW) and entity features (BoE)

Our main findings are:In the absence of any annotated tweets, applying these CD classifiers are beneficial

Out of the two KSs, Freebase topics seem to be much closer to the Twitter topics thanthe DBpedia topics.

The two KSs contain complementary information

19/22

Page 44: Exploring the similarity between Social Knowledge Sources and Twitter for Cross-Domain Topic Classification of Tweets #KECSM 2012 #ISWC2012

Motivation Research question State-of-the-art Methodology Results Conclusions and Future Work

Conclusions and Future Work

We presented a first study towards understanding the usefulness of KSs in TC oftweets at various granularities: lexical features (BoW) and entity features (BoE)

Our main findings are:In the absence of any annotated tweets, applying these CD classifiers are beneficial

Out of the two KSs, Freebase topics seem to be much closer to the Twitter topics thanthe DBpedia topics.

The two KSs contain complementary information

For the CD classifiers, on average BOW features were more useful than BoE features

19/22

Page 45: Exploring the similarity between Social Knowledge Sources and Twitter for Cross-Domain Topic Classification of Tweets #KECSM 2012 #ISWC2012

Motivation Research question State-of-the-art Methodology Results Conclusions and Future Work

Conclusions and Future Work

We presented a first study towards understanding the usefulness of KSs in TC oftweets at various granularities: lexical features (BoW) and entity features (BoE)

Our main findings are:In the absence of any annotated tweets, applying these CD classifiers are beneficial

Out of the two KSs, Freebase topics seem to be much closer to the Twitter topics thanthe DBpedia topics.

The two KSs contain complementary information

For the CD classifiers, on average BOW features were more useful than BoE features

For the Twitter classifiers, on average BOE features were more useful than BoWfeatures

19/22

Page 46: Exploring the similarity between Social Knowledge Sources and Twitter for Cross-Domain Topic Classification of Tweets #KECSM 2012 #ISWC2012

Motivation Research question State-of-the-art Methodology Results Conclusions and Future Work

Conclusions and Future Work

We presented a first study towards understanding the usefulness of KSs in TC oftweets at various granularities: lexical features (BoW) and entity features (BoE)

Our main findings are:In the absence of any annotated tweets, applying these CD classifiers are beneficial

Out of the two KSs, Freebase topics seem to be much closer to the Twitter topics thanthe DBpedia topics.

The two KSs contain complementary information

For the CD classifiers, on average BOW features were more useful than BoE features

For the Twitter classifiers, on average BOE features were more useful than BoWfeatures

We found χ2 test as being the best measure for quantifying the distributionaldifferences between KSs and Twitter.

19/22

Page 47: Exploring the similarity between Social Knowledge Sources and Twitter for Cross-Domain Topic Classification of Tweets #KECSM 2012 #ISWC2012

Motivation Research question State-of-the-art Methodology Results Conclusions and Future Work

Conclusions and Future Work

We presented a first study towards understanding the usefulness of KSs in TC oftweets at various granularities: lexical features (BoW) and entity features (BoE)

Our main findings are:In the absence of any annotated tweets, applying these CD classifiers are beneficial

Out of the two KSs, Freebase topics seem to be much closer to the Twitter topics thanthe DBpedia topics.

The two KSs contain complementary information

For the CD classifiers, on average BOW features were more useful than BoE features

For the Twitter classifiers, on average BOE features were more useful than BoWfeatures

We found χ2 test as being the best measure for quantifying the distributionaldifferences between KSs and Twitter.

Our future work will focus on building more accurate TC classifiers andinvestigating better measures

19/22

Page 48: Exploring the similarity between Social Knowledge Sources and Twitter for Cross-Domain Topic Classification of Tweets #KECSM 2012 #ISWC2012

Motivation Research question State-of-the-art Methodology Results Conclusions and Future Work

Corpus Analysis - Size of vocabulary

!"

#!!!"

$!!!"

%!!!"

&!!!"

'!!!!"

'#!!!"

'$!!!"

'%!!!"

()*+,"

-,*.//"

01)"

0234)53"

026"

78953:"

7;*<=8/"

7)>?23"

@9A;B"

@9C"

D;5"

=85,E,;2"

F;/?**)8"

F<;B3*"

G8/:?G"

H893:8B"

H9B"

IJ%'" I%&!"IIJ$" IIJI"

I&%!"I$K!" I$L$" I#%J" I'!!"

IJ#I" I$!I" I#&#" I#JK"IJL'" II%I" IIK!" I'$#"

&K%&"LI$#"

K$'#"

'!#'#"

LJJ$"LI!K" LI$#" L$##" LI$#" LI$#"

LK'&"

''L$I"

LI$#"

&'&I"

L&KJ"

&&J&"LI$#"

I&!I"

#'&L"

IK!K"

##J$"'&#J"

$%&$"

JK%#"

%JK'"

'J%&"

$JJ'"

%$'#"J&&&"

'&LK"

%&$#" %K#&"

#$#K"#KJ!"

'!K#'"

'!!L#"

L##'"

'''!!"

'!!I'"

''L%L"'#$#J"

'I!$L"

L&#I"

''J#I"

'IIIJ"

'J!I'"

'!!#L"

'#J!K"

'I&'L"

LLII"'!$IJ"

GMG"

F/N-("

F/N+("

F/N-(O+("

20/22

Page 49: Exploring the similarity between Social Knowledge Sources and Twitter for Cross-Domain Topic Classification of Tweets #KECSM 2012 #ISWC2012

Motivation Research question State-of-the-art Methodology Results Conclusions and Future Work

Understanding the results - Number of unique entities

Examining the number of entities in the source (Sc. DB, Sc. FB, Sc. DB-FB) andtarget (TGT) datasets after pre-processing.

the TGT dataset consists of 1.73± 0.35 entities/tweet

the Sc.DB dataset consists of 22.24± 1.44 entities entities/article

the Sc.FB dataset consists of 8.14± 5.78 entities entities/article

!"

#!!!"

$!!!!"

$#!!!"

%!!!!"

%#!!!"

&!!!!"

&#!!!"

'!!!!"

'#!!!"

()*+,"

-,*.//"

01)"

0234)53"

026"

78953:"

7;*<=8/"

7)>?23"

@9A;B"

@9C"

D;5"

=85,E,;2"

F;/?**)8"

F<;B3*"

G8/:?G"

H893:8B"

H9B"

$II%" $JKK" $J!#" $J!#" $JK&" $'K$" $%J&" $#I$" $'#&" $ILI" %&$%"$$%!" $#$%" %$$L" $KJ&" %%JL"

$%!L"

%$$#L"%%&K&" %%!IL"

%&'#!"%%%J'"

%!LI%"

%%&K&" %%#K%" %%&K&" %%&K&"%!JII"

$KIK'"

%%&K&"

%'KK$"%&L'$"

%%%&I" %%&K&"

L'!$"

%&$L"

KL%%"

$I$&" %&J#"

#K#J"

IJ#$"

$'L&L"

%'%'"

KIK%"

$I%&!"

$%!#&"

%''$"

$J'K&"

$'$#&"

%J%I"

KIKK"

%K!K#"

%'#%'"

%I$J&"

%#%&L"%'&'#"

%#I!J"

&$LJ%"

&LL'$"

%'#KK"

%IIJK"

&I%JK"

%I&$!"

%'#IL"

'%&KI"

&K&!!"

%'J&&"

%III%"

GMG"

F/N-("

F/N+("

F/N-(O+("

21/22

Page 50: Exploring the similarity between Social Knowledge Sources and Twitter for Cross-Domain Topic Classification of Tweets #KECSM 2012 #ISWC2012

Motivation Research question State-of-the-art Methodology Results Conclusions and Future Work

Corpus Analysis - Distribution of the top 15 entities

Labo

r (Sc

.FB)

War

(TG

T)W

ar (S

c.FB

)So

cIss

ue (S

c.FB

)W

eath

er (S

c.FB

)Te

chIT

(Sc.

DB)

Hum

Int (

Sc.D

B)W

eath

er (S

c.DB

)So

cIss

ue (S

c.DB

)La

w (S

c.DB

)La

bor (

Sc.D

B)Di

sAcc

(Sc.

DB)

Hosp

Recr

(Sc.

DB)

EntC

ult (

Sc.F

B)Te

chIT

(Sc.

FB)

Hum

Int (

TGT)

EntC

ult (

TGT)

EntC

ult (

Sc.D

B)Te

chIT

(TG

T)Bu

sFi (

Sc.F

B)He

alth

(Sc.

FB)

Heal

th (S

c.DB

)Bu

sFi (

Sc.D

B)En

v (S

c.DB

)La

w (S

c.FB

)En

v (S

c.FB

)Di

sAcc

(Sc.

FB)

Hosp

Recr

(Sc.

FB)

Wea

ther

(TG

T)Di

sAcc

(TG

T)En

v (T

GT)

Edu

(Sc.

DB)

Edu

(Sc.

FB)

Spor

ts (S

c.DB

)W

ar (S

c.DB

)Hu

mIn

t (Sc

.FB)

Pol (

Sc.F

B)Sp

orts

(Sc.

FB)

Relig

ion

(Sc.

FB)

Spor

ts (T

GT)

Relig

ion

(TG

T)Po

l (TG

T)Po

l (Sc

.DB)

Relig

ion

(Sc.

DB)

Heal

th (T

GT)

Hosp

Recr

(TG

T)Bu

sFi (

TGT)

Labo

r (TG

T)So

cIss

ue (T

GT)

Edu

(TG

T)La

w (T

GT)

NaturalFeature

Country

Person

Position

Organization

Technology

MedicalCondition

SportsEvent

Region

Continent

Company

IndustryTerm

City

Facility

ProvinceOrState

22/22