Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Abdelghani Bellaachia and Mohammed...

Preview:

Citation preview

Intelligent Database Systems Lab

Presenter: WU, MIN-CONG

Authors: Abdelghani Bellaachia

and Mohammed Al-Dhelaan

2012, WIIAT

NE-Rank: A Novel Graph-based Keyphrase Extraction in Twitter

Intelligent Database Systems Lab

Outlines

MotivationObjectivesMethodologyExperimentsConclusionsComments

1

Intelligent Database Systems Lab

Motivation• When used in text to represent a lexical graph,

it is possible to include a weight for the words

that will measure the ranking more accurately

instead of only relaying on the co-occurrence in

Twitter.

2

Intelligent Database Systems Lab

Objectives• In task of extracted topical keyphrase, we start by

proposing a novel unsupervised graph- based keyword

ranking method, called NE-Rank, that considers word

weights in addition to edge weights when

calculating the ranking.

3

Intelligent Database Systems Lab

Methodology-System Overviewθ

Twitter set

documentk1k2k3k4

topical subdatsets

NE-Rank

Hashtags Titles

candidate keyphrase

4

Intelligent Database Systems Lab

Methodology- Topic Extraction

5

documentk1k2k3k4

topicw1w2w3w4w5w6

For this papertopic

w1w2w3w4w5w6

documentk1k2k3k4

Intelligent Database Systems Lab

Methodology- Topic Extraction

Problem

Top 5 Search TF-IDFTop 10 terms

Insert

6

Intelligent Database Systems Lab7

Methodology- Graph-based Keywords Ranking extant approach

PageRank

TextRank

c

c

6

Summary

target Edge weiget Node weiget

PageRank web non-consideration non-consideration

TextRank word consideration non-consideration

Intelligent Database Systems Lab8

Methodology- Graph-based Keywords Ranking proposing approach

NE-Rank

Summary

target Edge weiget Node weiget

PageRank web non-consideration non-consideration

TextRank word consideration non-consideration

NE-Rank word consideration consideration

Intelligent Database Systems Lab9

Methodology- Hashtags Titles

Hashtags titles

topical dataset

wordusing an English dictionary with frequencies.

Strengthening Strategy

in-degreeBoosted 5%

extract

split

record

Intelligent Database Systems Lab

Methodology- Candidate Keyphrase Generation

positions

keyphrase

1. magnment2. business3. customer4. staff5. finance

Information magnment

descending order

find

find

Twitter set

1. magnment2. business3. customer4. staff5. finance

10

Intelligent Database Systems Lab

Methodology- Keyphrase Ranking

keyphrases

phrases list

score

filtering

summarize

hashtags Usage

Another study is measuring sentiment in hashtags. Usage of hashtags as keywords annotation makes them of a very interest to our work.

less than 5 times

11

Intelligent Database Systems Lab

Experiment- Dataset and Preprocessing

12

tweets tokens hashtags Hashtags frequency

Twitter set 31,227 244,139 4,079 40,674

Dataset

Preprocessing

remove non-english

Remove flag Ex: URL. emoticons. smileys

transform slangs and abbreviation

English dictionary

Vocabulary OOV

POS tagger

removed stopwords

LDA

500 iterations30 topics

Intelligent Database Systems Lab

Experiment- Evaluation Metrics

13

Precision

Bpref

Intelligent Database Systems Lab

Experiment- Results

14

Intelligent Database Systems Lab

Experiment- Results

15

Intelligent Database Systems Lab

Conclusions• The potential and validity of both approaches have

been demonstrated by conducting an experimental

evaluation.

16

Intelligent Database Systems Lab

Comments• Advantages

– keyphrase score not only rely on the co-occurrence.

• Applications– Automatic Keyphrase Extraction.

17

Recommended