Upload
liana-janey
View
213
Download
1
Tags:
Embed Size (px)
Citation preview
Intelligent Database Systems Lab
Presenter: WU, MIN-CONG
Authors: Abdelghani Bellaachia
and Mohammed Al-Dhelaan
2012, WIIAT
NE-Rank: A Novel Graph-based Keyphrase Extraction in Twitter
Intelligent Database Systems Lab
Outlines
MotivationObjectivesMethodologyExperimentsConclusionsComments
1
Intelligent Database Systems Lab
Motivation• When used in text to represent a lexical graph,
it is possible to include a weight for the words
that will measure the ranking more accurately
instead of only relaying on the co-occurrence in
Twitter.
2
Intelligent Database Systems Lab
Objectives• In task of extracted topical keyphrase, we start by
proposing a novel unsupervised graph- based keyword
ranking method, called NE-Rank, that considers word
weights in addition to edge weights when
calculating the ranking.
3
Intelligent Database Systems Lab
Methodology-System Overviewθ
Twitter set
documentk1k2k3k4
topical subdatsets
NE-Rank
Hashtags Titles
candidate keyphrase
4
Intelligent Database Systems Lab
Methodology- Topic Extraction
5
documentk1k2k3k4
topicw1w2w3w4w5w6
For this papertopic
w1w2w3w4w5w6
documentk1k2k3k4
Intelligent Database Systems Lab
Methodology- Topic Extraction
Problem
Top 5 Search TF-IDFTop 10 terms
Insert
6
Intelligent Database Systems Lab7
Methodology- Graph-based Keywords Ranking extant approach
PageRank
TextRank
c
c
6
Summary
target Edge weiget Node weiget
PageRank web non-consideration non-consideration
TextRank word consideration non-consideration
Intelligent Database Systems Lab8
Methodology- Graph-based Keywords Ranking proposing approach
NE-Rank
Summary
target Edge weiget Node weiget
PageRank web non-consideration non-consideration
TextRank word consideration non-consideration
NE-Rank word consideration consideration
Intelligent Database Systems Lab9
Methodology- Hashtags Titles
Hashtags titles
topical dataset
wordusing an English dictionary with frequencies.
Strengthening Strategy
in-degreeBoosted 5%
extract
split
record
Intelligent Database Systems Lab
Methodology- Candidate Keyphrase Generation
positions
keyphrase
1. magnment2. business3. customer4. staff5. finance
Information magnment
descending order
find
find
Twitter set
1. magnment2. business3. customer4. staff5. finance
10
Intelligent Database Systems Lab
Methodology- Keyphrase Ranking
keyphrases
phrases list
score
filtering
summarize
hashtags Usage
Another study is measuring sentiment in hashtags. Usage of hashtags as keywords annotation makes them of a very interest to our work.
less than 5 times
11
Intelligent Database Systems Lab
Experiment- Dataset and Preprocessing
12
tweets tokens hashtags Hashtags frequency
Twitter set 31,227 244,139 4,079 40,674
Dataset
Preprocessing
remove non-english
Remove flag Ex: URL. emoticons. smileys
transform slangs and abbreviation
English dictionary
Vocabulary OOV
POS tagger
removed stopwords
LDA
500 iterations30 topics
Intelligent Database Systems Lab
Experiment- Evaluation Metrics
13
Precision
Bpref
Intelligent Database Systems Lab
Experiment- Results
14
Intelligent Database Systems Lab
Experiment- Results
15
Intelligent Database Systems Lab
Conclusions• The potential and validity of both approaches have
been demonstrated by conducting an experimental
evaluation.
16
Intelligent Database Systems Lab
Comments• Advantages
– keyphrase score not only rely on the co-occurrence.
• Applications– Automatic Keyphrase Extraction.
17