18
Intelligent Database Systems Presenter: WU, MIN-CONG Authors: Abdelghani Bellaachia and Mohammed Al-Dhelaan 2012, WIIAT NE-Rank: A Novel Graph-based Keyphrase Extraction in Twitter

Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Abdelghani Bellaachia and Mohammed Al-Dhelaan 2012, WIIAT NE-Rank: A Novel Graph-based

Embed Size (px)

Citation preview

Page 1: Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Abdelghani Bellaachia and Mohammed Al-Dhelaan 2012, WIIAT NE-Rank: A Novel Graph-based

Intelligent Database Systems Lab

Presenter: WU, MIN-CONG

Authors: Abdelghani Bellaachia

and Mohammed Al-Dhelaan

2012, WIIAT

NE-Rank: A Novel Graph-based Keyphrase Extraction in Twitter

Page 2: Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Abdelghani Bellaachia and Mohammed Al-Dhelaan 2012, WIIAT NE-Rank: A Novel Graph-based

Intelligent Database Systems Lab

Outlines

MotivationObjectivesMethodologyExperimentsConclusionsComments

1

Page 3: Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Abdelghani Bellaachia and Mohammed Al-Dhelaan 2012, WIIAT NE-Rank: A Novel Graph-based

Intelligent Database Systems Lab

Motivation• When used in text to represent a lexical graph,

it is possible to include a weight for the words

that will measure the ranking more accurately

instead of only relaying on the co-occurrence in

Twitter.

2

Page 4: Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Abdelghani Bellaachia and Mohammed Al-Dhelaan 2012, WIIAT NE-Rank: A Novel Graph-based

Intelligent Database Systems Lab

Objectives• In task of extracted topical keyphrase, we start by

proposing a novel unsupervised graph- based keyword

ranking method, called NE-Rank, that considers word

weights in addition to edge weights when

calculating the ranking.

3

Page 5: Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Abdelghani Bellaachia and Mohammed Al-Dhelaan 2012, WIIAT NE-Rank: A Novel Graph-based

Intelligent Database Systems Lab

Methodology-System Overviewθ

Twitter set

documentk1k2k3k4

topical subdatsets

NE-Rank

Hashtags Titles

candidate keyphrase

4

Page 6: Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Abdelghani Bellaachia and Mohammed Al-Dhelaan 2012, WIIAT NE-Rank: A Novel Graph-based

Intelligent Database Systems Lab

Methodology- Topic Extraction

5

documentk1k2k3k4

topicw1w2w3w4w5w6

For this papertopic

w1w2w3w4w5w6

documentk1k2k3k4

Page 7: Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Abdelghani Bellaachia and Mohammed Al-Dhelaan 2012, WIIAT NE-Rank: A Novel Graph-based

Intelligent Database Systems Lab

Methodology- Topic Extraction

Problem

Top 5 Search TF-IDFTop 10 terms

Insert

6

Page 8: Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Abdelghani Bellaachia and Mohammed Al-Dhelaan 2012, WIIAT NE-Rank: A Novel Graph-based

Intelligent Database Systems Lab7

Methodology- Graph-based Keywords Ranking extant approach

PageRank

TextRank

c

c

6

Summary

target Edge weiget Node weiget

PageRank web non-consideration non-consideration

TextRank word consideration non-consideration

Page 9: Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Abdelghani Bellaachia and Mohammed Al-Dhelaan 2012, WIIAT NE-Rank: A Novel Graph-based

Intelligent Database Systems Lab8

Methodology- Graph-based Keywords Ranking proposing approach

NE-Rank

Summary

target Edge weiget Node weiget

PageRank web non-consideration non-consideration

TextRank word consideration non-consideration

NE-Rank word consideration consideration

Page 10: Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Abdelghani Bellaachia and Mohammed Al-Dhelaan 2012, WIIAT NE-Rank: A Novel Graph-based

Intelligent Database Systems Lab9

Methodology- Hashtags Titles

Hashtags titles

topical dataset

wordusing an English dictionary with frequencies.

Strengthening Strategy

in-degreeBoosted 5%

extract

split

record

Page 11: Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Abdelghani Bellaachia and Mohammed Al-Dhelaan 2012, WIIAT NE-Rank: A Novel Graph-based

Intelligent Database Systems Lab

Methodology- Candidate Keyphrase Generation

positions

keyphrase

1. magnment2. business3. customer4. staff5. finance

Information magnment

descending order

find

find

Twitter set

1. magnment2. business3. customer4. staff5. finance

10

Page 12: Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Abdelghani Bellaachia and Mohammed Al-Dhelaan 2012, WIIAT NE-Rank: A Novel Graph-based

Intelligent Database Systems Lab

Methodology- Keyphrase Ranking

keyphrases

phrases list

score

filtering

summarize

hashtags Usage

Another study is measuring sentiment in hashtags. Usage of hashtags as keywords annotation makes them of a very interest to our work.

less than 5 times

11

Page 13: Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Abdelghani Bellaachia and Mohammed Al-Dhelaan 2012, WIIAT NE-Rank: A Novel Graph-based

Intelligent Database Systems Lab

Experiment- Dataset and Preprocessing

12

tweets tokens hashtags Hashtags frequency

Twitter set 31,227 244,139 4,079 40,674

Dataset

Preprocessing

remove non-english

Remove flag Ex: URL. emoticons. smileys

transform slangs and abbreviation

English dictionary

Vocabulary OOV

POS tagger

removed stopwords

LDA

500 iterations30 topics

Page 14: Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Abdelghani Bellaachia and Mohammed Al-Dhelaan 2012, WIIAT NE-Rank: A Novel Graph-based

Intelligent Database Systems Lab

Experiment- Evaluation Metrics

13

Precision

Bpref

Page 15: Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Abdelghani Bellaachia and Mohammed Al-Dhelaan 2012, WIIAT NE-Rank: A Novel Graph-based

Intelligent Database Systems Lab

Experiment- Results

14

Page 16: Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Abdelghani Bellaachia and Mohammed Al-Dhelaan 2012, WIIAT NE-Rank: A Novel Graph-based

Intelligent Database Systems Lab

Experiment- Results

15

Page 17: Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Abdelghani Bellaachia and Mohammed Al-Dhelaan 2012, WIIAT NE-Rank: A Novel Graph-based

Intelligent Database Systems Lab

Conclusions• The potential and validity of both approaches have

been demonstrated by conducting an experimental

evaluation.

16

Page 18: Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Abdelghani Bellaachia and Mohammed Al-Dhelaan 2012, WIIAT NE-Rank: A Novel Graph-based

Intelligent Database Systems Lab

Comments• Advantages

– keyphrase score not only rely on the co-occurrence.

• Applications– Automatic Keyphrase Extraction.

17