Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Aspect Extraction with Automated Prior Knowledge Learning
Zhiyuan (Brett) Chen Arjun Mukherjee
Bing Liu
Aspect Extraction
Extracting aspect terms
Aspect Terms
This camera takes beautiful pictures but its price is higher than $200.
Aspect Terms
This camera takes beautiful pictures but its price is higher than $200.
Aspect Extraction
Grouping terms into categories
Extracting aspect terms
Grouping
PicturePhotoImageAspect 1 Aspect 2
PriceCostMoney
Aspect Extraction
Input: A review collection
Output: A set of aspects(with top aspect terms). Price
CheapCostMoneyPricy
BatteryLifeChargeAAAHour
Aspect 1 Aspect 2
Topic Models to Extract Aspects (e.g., Chen et al., 2013; Kim et al., 2013; Lazaridou et al., 2013; Mukherjee and Liu, 2012; Moghaddam and Ester, 2011; Sauper et al., 2011; Lin and He, 2009; Titov and McDonald, 2008; Lu and Zhai, 2008;)
Perform both extracting and grouping
A topic is basically an aspect
Traditional Modeling Flow
M DocsDomain 1
Traditional Modeling Flow
T Topics
LDA
M DocsDomain 1
Traditional Modeling Flow
T Topics
LDA
M DocsDomain 1
T Topics
LDA
M DocsDomain 2
Traditional Modeling Flow
T Topics
LDA
M DocsDomain 1
T Topics
LDA
M DocsDomain 2
T Topics
LDA
M DocsDomain N
…
Can we improve these topics by using them only?
Can we improve these topics by using them only? Fully automatic No other resources No human intervention
…
M DocsDomain 1
T Topics
LDA
M DocsDomain 2
T Topics
LDA
M DocsDomain N
T Topics
LDA
Our Proposed Algorithm
…
M DocsDomain 1
T Topics
LDA
M DocsDomain 2
T Topics
LDA
M DocsDomain N
T Topics
LDA
Topic Base
Our Proposed Algorithm
…
Knowledge BaseLearn Knowledge Automatically
M DocsDomain 1
T Topics
LDA
M DocsDomain 2
T Topics
LDA
M DocsDomain N
T Topics
LDA
Topic Base
Our Proposed Algorithm
…
Knowledge BaseLearn Knowledge Automatically
M DocsDomain 1
T Topics
LDA
M DocsDomain 2
T Topics
LDA
M DocsDomain N
T Topics
LDA
Topic Base
Our Proposed Algorithm
a) Existing Domains
AKL (Automated !Knowledge LDA)!
…
Knowledge BaseLearn Knowledge Automatically
M DocsDomain 1
T Topics
LDA
M DocsDomain 2
T Topics
LDA
M DocsDomain N
T Topics
LDA
Topic Base
M DocsDomain 1
T Topics
AKL
M DocsDomain 2
T Topics
AKL
M DocsDomain N
T Topics
AKL
Our Proposed Algorithm
a) Existing Domains
…
Knowledge BaseLearn Knowledge Automatically
M DocsDomain 1
T Topics
LDA
M DocsDomain 2
T Topics
LDA
M DocsDomain N
T Topics
LDA
Topic Base
Our Proposed Algorithm
b) New Domain
…
Knowledge BaseLearn Knowledge Automatically
M DocsDomain 1
T Topics
LDA
M DocsDomain 2
T Topics
LDA
M DocsDomain N
T Topics
LDA
Topic Base
M DocsDomain N+1
T Topics
AKL
Our Proposed Algorithm
b) New Domain
Why don’t we merge documents from different domains and run LDA?
Run LDA on Merged Data
Number of Topics
Topic belongs to which domain
Scalability
M Docs
M Docs M Docs
M Docs
M Docs
Run LDA on Merged Data
Run LDA on Merged Data
Run LDA
Our Proposed Algorithm Run LDA Run LDA Run LDA
Run LDA
Run LDA
T Topics
T Topics T Topics
T Topics
T Topics
Our Proposed Algorithm
Our Proposed Algorithm
Learn Knowledge
Knowledge
Our Proposed Algorithm
Knowledge Knowledge
Knowledge
Knowledge
Our Proposed Algorithm Run AKL Run AKL Run AKL
Run AKL
Run AKL
Multiple Senses
KnowledgeReliability
Learn Knowledge Automatically
Multiple Senses
KnowledgeReliability
Learn Knowledge Automatically
{Light, Bright}{Light, Luminance}
{Light, Weight}{Light, Heavy}
Light
Multiple Senses
Existing Models with Multiple Senses
Assume single senseDF-‐‑‒LDA (Andrzejewski et al., 2009)
User specified multiple sensesMC-‐‑‒LDA (Chen et al., 2013)
Automatically distinguish senses when extracting knowledge
Multiple Senses
KnowledgeReliability
Topic Clustering
Learn knowledge Automatically
Topic Clustering
A topic represents words with similar meaning (but noisy)
Group topics with similar sense into one cluster
Different senses of a word should be split into different clusters
Multiple Senses
KnowledgeReliability
Topic Clustering
Learn knowledge Automatically
Topic Overlapping
Every product domain has price.
Most electronic domains have battery.
Some electronic domains share screen.
Example BatteryLifePictureCharge
BatteryPriceLifeSize
BatteryChargeAAAScreen
D1 D2 D3
Example
BatteryLifePictureCharge
BatteryPriceLifeSize
BatteryChargeAAAScreen
D1 D2 D3
Two words together at least 2 times
Example
BatteryLifePictureCharge
BatteryPriceLifeSize
BatteryChargeAAAScreen
D1 D2 D3
Two words together at least 2 times
{Battery, Life} and {Battery, Charge}
Multiple Senses
KnowledgeReliability
Topic Clustering
Frequent Itemset Mining
Learn knowledge Automatically
Frequent Itemset Mining (FIM)
Each topic is a transaction
Find frequent patterns satisfy minimum support thresholds
Each pattern contains 2 terms
Knowledge Representation
In the form of knowledge clusters (KC)
Each KC has a list of frequent 2-‐‑‒patterns
KC1: {battery, life}, {battery, charge}, {battery, hour}, {charge, hour}
AKL (Automated Knowledge LDA)
Incorporate Knowledge
Wrong Know. Towards Domain
AKL Model
Add variable cIncorporate Knowledge
Wrong Know. Towards Domain
AKL Plate Notation
c: knowledge cluster
AKL Plate Notation
c: knowledge cluster
AKL Plate Notation
c: knowledge cluster
AKL Plate Notation
c: knowledge cluster
AKL Model
Add variable c
GPU Model
Incorporate Knowledge
Wrong Know. Towards Domain
Topic 0
price
LDA with SPU (Simple Pólya Urn Model)
Topic 0
price price
LDA with SPU (Simple Pólya Urn Model)
Topic 0
price
AKL with GPU (Generalized Pólya Urn Model)
Topic 0
price price
cheap
{price, cheap}AKL with GPU (Generalized Pólya Urn Model)
AKL Model
Add variable c
GPU Model
Incorporate Knowledge
Wrong Know. Towards Domain
Wrong Know. Towards Domain Wrong because of TM mistakes{Price, Picture}
Wrong towards a particular domain {Light, Bright}{Light, Weight}
AKL Model
Add variable c
GPU Model
Co-‐‑‒Document Frequency Ratio
Incorporate Knowledge
Wrong Know. Towards Domain
Co-Document Frequency Ratio
Co-Document Frequency Ratio
Estimated in the current domain
Co-Document Frequency Ratio
Estimated in the current domain
{Price, Cheap}{Price, Image}
Evaluation
Evaluation 36 product domains. Each domain:1000 Reviews15 Topics
EvaluationHuman
Objective
Model Comparison
LDA (Blei et al., 2003)
GK-‐‑‒LDA (Chen et al., 2013)
MC-‐‑‒LDA (Chen et al., 2013)
Model Comparison
LDA (Blei et al., 2003)
GK-‐‑‒LDA (Chen et al., 2013)
Feed them with the knowledge from our algorithm
MC-‐‑‒LDA (Chen et al., 2013)
Objective Evaluation
-1510
-1490
-1470
-1450
-1430
0 1 2 3 4 5 6
Topi
c C
oher
ence
AKL GK-LDAMC-LDA LDA
Example Aspects
Human Evaluation
0.6
0.7
0.8
0.9
1.0
Camera Computer Headphone GPS
Precision
@ 5
AKL GK-LDA MC-LDA LDA
Human Evaluation
0.6
0.7
0.8
0.9
1.0
Camera Computer Headphone GPS
Precision
@ 1
0
AKL GK-LDA MC-LDA LDA
Number of Topic Clusters
-1510-1490-1470-1450-1430
20 30 40 50 60 70Top
ic C
oher
ence
#Clusters
Conclusions
To extract better aspects
Learn knowledge automatically
AKL: Leverage automated knowledge
Multiple Senses
KnowledgeReliability
Learn knowledge Automatically
Topic Clustering
Frequent Itemset Mining
AKL Model
Add variable c
GPU Model
Co-‐‑‒Document Frequency Ratio
Incorporate Knowledge
Wrong Know. Towards Domain
Q&A