Upload
iain
View
41
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections. Dipanjan Das Carnegie Mellon University. Slav Petrov Google Research. June 21 ACL 2011. Part-of-Speech Tagging. Portland has a thriving music scene . . ADJ. NOUN. NOUN. NOUN. DET. VERB. - PowerPoint PPT Presentation
Citation preview
Unsupervised Part-of-Speech Tagging
with Bilingual Graph-Based Projections
June 21ACL 2011
Slav PetrovGoogle Research
Dipanjan DasCarnegie Mellon University
Part-of-Speech Tagging
Portland has a thriving music scene .NOUN VERB DET ADJ NOUN NOUN .
2
Supervised POS Tagging
ArabicBasque
BulgarianCatalanChinese
CzechDanishEnglishFrench
GermanGreek
HungarianItalian
JapaneseKorean
PortugueseRussianSloveneSpanishSwedishTurkish
84 86 88 90 92 94 96 98 100
96.9
93.797.8
98.2
93.499.1
96.4
96.8
96.7
98.1
97.5
95.6
95.8
98.7
97.5
96.8
96.8
94.6
96.3
94.7
89.1
POS Accuracy
Supervised setting: average accuracy is 96.2% 3
TnT (Brants, 2000)
Resource-Poor Languages
Several major languages with no or little annotated data
OriyaIndonesian-Malay
Azerbaijani
e.g.
See http://www.ethnologue.org/ethno_docs/distribution.asp?by=size
Haitian
However, lots of parallel and
unannotated data!Basic NLP tools like POS
tagging essential for development of
language technologies
4
PunjabiVietnamese
Polish32 million37 million20 million
Native speakers
7.7 million
109 million69 million40 million
(Nearly) Universal Part-of-Speech Tags
VERB DETNOUN CONJPRON NUMADJ PRTADV .ADP X
5
(Nearly) Universal Part-of-Speech Tags
Example Penn Treebank tag maps:
NN NOUNNNP NOUNNNPS NOUNNNS NOUN
PRP PRONPRP$ PRONWP PRONWP$ PRON
np NOUNnc NOUN
Example Spanish Treebank tag maps:p0 PRONpd PRONpe PRONpi PRONpn PRON
pp PRONpr PRONpt PRONpx PRON
See Petrov, Das and McDonald (2011)
(Nearly) Universal Part-of-Speech Tags
Portland has a thriving music scene .NOUN VERB DET ADJ NOUN NOUN .
Portland hat eine prächtig gedeihende Musikszene .NOUN VERB DET ADJ ADJ NOUN .
প�োর্ট�ল্যোন্ড শহর এর সঙ্গীত �রিরবে�শ প�শ উন্নত | NOUN NOUN ADP NOUN NOUN ADJ ADJ .
7
Supervised Universal POS Tagging
ArabicBasque
BulgarianCatalanChinese
CzechDanishEnglishFrench
GermanGreek
HungarianItalian
JapaneseKorean
PortugueseRussianSloveneSpanishSwedishTurkish
84 86 88 90 92 94 96 98 100
96.9
93.797.8
98.2
93.499.1
96.4
96.8
96.7
98.1
97.5
95.6
95.8
98.7
97.5
96.8
96.8
94.6
96.3
94.7
89.1
POS Accuracy
Less variance in accuracy than fine tags 8
TnT (Brants, 2000)
State of the Art in Unsupervised POS Tagging
9
Unsupervised Part-of-Speech Tagging
Portland hat eine prächti
g gedeihende Musikszene .
? ? ? ? ? ? ?
Hidden Markov Model (HMM) estimated with the Expectation-Maximization algorithm
: observation sequence: state sequence
10Merialdo (1994)
Unsupervised Part-of-Speech Tagging
Portland hat eine prächti
g gedeihende Musikszene .
? ? ? ? ? ? ?
Hidden Markov Model (HMM) estimated with the Expectation-Maximization algorithm
one of the 12 coarse
tags
: observation sequence: state sequence
11Merialdo (1994)
Unsupervised Part-of-Speech Tagging
Portland hat
? ?
Hidden Markov Model (HMM) estimated with the Expectation-Maximization algorithm
transition multinomials
: observation sequence: state sequence
12Merialdo (1994)
Unsupervised Part-of-Speech Tagging
Portland hat
? ?
Hidden Markov Model (HMM) estimated with the Expectation-Maximization algorithm
emission multinomials
: observation sequence: state sequence
13Merialdo (1994)
Unsupervised Part-of-Speech Tagging
Portland hat eine prächti
g gedeihende Musikszene .
? ? ? ? ? ? ?
Hidden Markov Model (HMM) estimated with the Expectation-Maximization algorithm
Danish
Dutch German
Greek Italian Portuguese
Spanish
Swedish
Average
68.7 57.0 75.9 65.8 63.7 62.9 71.5 68.4 66.7EM-HMM
Poor average result
14Johnson (2007)
Unsupervised Part-of-Speech TaggingHidden Markov Model (HMM)
with locally normalized log-linear models
: observation sequence
Portland hat
? ? emission multinomials
: state sequence
15
Berg-Kirkpatrick et al. (2010)
Unsupervised Part-of-Speech TaggingHidden Markov Model (HMM)
with locally normalized log-linear models
: observation sequence
Portland hat
? ? emission multinomials
suffixhyphen
capital lettersnumbers
...: state sequence
16
Berg-Kirkpatrick et al. (2010)
Unsupervised Part-of-Speech TaggingHidden Markov Model (HMM)
with locally normalized log-linear models
: observation sequence
Portland hat
? ? emission multinomials
suffixhyphen
capital lettersnumbers
...
Estimated using gradient-based methods
: state sequence
17
Berg-Kirkpatrick et al. (2010)
Unsupervised Part-of-Speech TaggingHidden Markov Model (HMM)
with locally normalized log-linear models
Portland hat
? ? emission multinomials
Danish
Dutch German
Greek Italian Portuguese
Spanish
Swedish
Average
68.7 57.0 75.9 65.8 63.7 62.9 71.5 68.4 66.769.1
65.1 81.3 71.8 68.1 78.4 80.2 70.1 73.0EM-HMM
Feature-HMM
Estimated using gradient-based methods
Improvements across all languages
18
Berg-Kirkpatrick et al. (2010)
Portland hat eine prächti
g gedeihende Musikszene .
NOUN VERB
PRONDETADJNUM
ADJADV ADJ NOUN .
Unsupervised POS Tagging with Dictionaries
Hidden Markov Model (HMM) with locally normalized log-linear modelsState space constrained by possible gold
tags
19
Portland hat eine prächti
g gedeihende Musikszene .
NOUN VERB
PRONDETADJNUM
ADJADV ADJ NOUN .
Unsupervised POS Tagging with Dictionaries
Hidden Markov Model (HMM) with locally normalized log-linear modelsState space constrained by possible gold
tags
Danish
Dutch German
Greek Italian Portuguese
Spanish
Swedish
Average
68.7 57.0 75.9 65.8 63.7 62.9 71.5 68.4 66.769.1 65.1 81.3 71.8 68.1 78.4 80.2 70.1 73.0
93.1
94.7 93.5 96.6 96.4 94.0 95.8 85.5 93.7
EM-HMMFeature-HMM
w/ gold dictionary
Average result close to supervised accuracy! 20
For most languages, access to high-quality
tag dictionaries is not realistic.
Ideas: 1) Use supervision in resource-rich
languages2) Use parallel data
3) Construct projected tag lexicons
21
Morphologically rich languages only have base
forms in dictionaries
Bilingual Projection
Portland has a thriving music scene .
NOUN VERB DET ADJ NOUN NOUN .
automatic labels from supervised tagger, 97% accuracy
22
Bilingual Projection
Portland has a thriving music scene .
NOUN VERB DET ADJ NOUN NOUN .
Portland hat eine prächtig gedeihende Musikszene .
Automatic unsupervised alignments from translation data(available for more than 50 languages)
23
Bilingual Projection
Portland has a thriving music scene .
NOUN VERB DET ADJ NOUN NOUN .
Portland hat eine prächtig gedeihende Musikszene .
Idea 1: direct projectionunaligned word
NOUN(most frequent tag)
24Yarowsky and Ngai
(2001)
Bilingual Projection
Idea 1: direct projection
Portland hat eine prächtig gedeihende Musikszene .NOUN VERB DET NOUN ADJ NOUN .
+more projected tagged sentences
supervised training
tagger
25
(Brants, 2000)
Yarowsky and Ngai (2001)
Bilingual Projection
Idea 1: direct projection
26
Danish
Dutch German
Greek Italian Portuguese
Spanish
Swedish
Average
68.7 57.0 75.9 65.8 63.7 62.9 71.5 68.4 66.769.1 65.1 81.3 71.8 68.1 78.4 80.2 70.1 73.0
73.6
77.0 83.2 79.3 79.7 82.6 80.1 74.7 78.8
EM-HMM
Directprojection
Feature-HMM
Yarowsky and Ngai (2001)
Bilingual Projection
Idea 1: direct projection
27
Danish
Dutch German
Greek Italian Portuguese
Spanish
Swedish
Average
68.7 57.0 75.9 65.8 63.7 62.9 71.5 68.4 66.769.1 65.1 81.3 71.8 68.1 78.4 80.2 70.1 73.0
73.6
77.0 83.2 79.3 79.7 82.6 80.1 74.7 78.8
EM-HMM
Directprojection
Feature-HMM
Yarowsky and Ngai (2001)
consistent improvements over unsupervised models
Bilingual ProjectionIdea 2: lexicon projection
28
Bilingual ProjectionIdea 2: lexicon projection
NOUNPortland
VERBhas
DETa
ADJthriving
NOUNmusic
NOUNscene
.
.
Portland
hat
eine
prächtig
gedeihende
Musikszene
.
29
Bilingual ProjectionIdea 2: lexicon projection
NOUNPortland Portlan
d
ADJthrivinggedeihe
ndeprächtigVERB
has
hat
DETa
eine
NOUNsceneMusiksze
ne
NOUNmusic
.
..
ignore unaligned word
30
Bilingual ProjectionIdea 2: lexicon projection
NOUNPortland Portlan
d
ADJthrivinggedeihe
nde
VERBhas
hat
DETa
eine
NOUNsceneMusiksze
ne
NOUNmusic
.
..
Bag of alignments
31
Bilingual ProjectionIdea 2: lexicon projection
NOUNPortland Portlan
d
ADJthrivinggedeihe
nde
VERBhas
hat
eine
NOUNsceneMusiksze
ne
NOUNmusic
.
..
ADJNOUN
XNUM
PRONVERB
DET
DETa
32ADJNOUN
XNUM
PRONVERB
DET
ADJNOUN
XNUM
PRONVERB
DET
ADJNOUN
XNUM
PRONVERB
DET ADJNOUN
XNUM
PRONVERB
DET
ADJNOUN
XNUM
PRONVERB
DET
Bilingual ProjectionIdea 2: lexicon projection
NOUNPortland Portlan
d
ADJthrivinggedeihe
nde
VERBhas
hat
eine
NOUNscene
NOUNmusic
.
..
DETa NUMon
e
PRONone
Musikszene
33ADJNOUN
XNUM
PRONVERB
DET
ADJNOUN
XNUM
PRONVERB
DET
ADJNOUN
XNUM
PRONVERB
DET ADJNOUN
XNUM
PRONVERB
DET
ADJNOUN
XNUM
PRONVERB
DET
ADJNOUN
XNUM
PRONVERB
DET
Bilingual ProjectionIdea 2: lexicon projection
NOUNPortland Portlan
d
ADJthrivinggedeihe
nde
VERBhas
hat
eine
NOUNscene
NOUNmusic
.
..
DETa NUMon
e
PRONone
Musikszene
VERBthriving
34ADJNOUN
XNUM
PRONVERB
DET
ADJNOUN
XNUM
PRONVERB
DET
ADJNOUN
XNUM
PRONVERB
DETADJ
NOUNX
NUMPRONVERB
DET
ADJNOUN
XNUM
PRONVERB
DET
ADJNOUN
XNUM
PRONVERB
DET
Bilingual ProjectionIdea 2: lexicon projection
Portland
gedeihende
hat
eine
Musikszene
.
After scanning all the parallel data:
= probability of a tag given a word
35
ADJNOUN
XNUM
PRONVERB
DET
ADJNOUN
XNUM
PRONVERB
DET
ADJNOUN
XNUM
PRONVERB
DET
ADJNOUN
XNUM
PRONVERB
DET
ADJNOUN
XNUM
PRONVERB
DET
ADJNOUN
XNUM
PRONVERB
DET
Bilingual ProjectionIdea 2: lexicon projection
Feature HMM constrained with projected dictionary
Improvements over simple projection for majority of the languages
36
Danish
Dutch German
Greek Italian Portuguese
Spanish
Swedish
Average
68.7 57.0 75.9 65.8 63.7 62.9 71.5 68.4 66.769.1 65.1 81.3 71.8 68.1 78.4 80.2 70.1 73.0
73.6 77.0 83.2 79.3 79.7 82.6 80.1 74.7 78.879.0
78.8 82.4 76.3 84.8 87.0 82.8 79.4 81.3
EM-HMM
DirectprojectionProjectedDictionar
y
Feature-HMM
Can coverage be improved?Idea:Projected lexicon
expansion and refinement using
a lot of unlabeled data
No information about unaligned words
37
Portland hat eine prächtig gedeihende Musikszene .
Portland has a thriving music scene .
NOUN VERB DET ADJ NOUN NOUN .
Brief Overview:Graph-Based Learning
with Labeled and Unlabeled Data
38
labeled datapoints unlabeled datapoints
supervised label distributions
distributions to be found
Zhu, Ghahramani and Lafferty (2003) 39
0.9
0.01
0.8
0.9
0.1
= symmetric weight matrix
0.05
0.9
0.01
0.8
0.9
0.1
Label Propagation
Zhu, Ghahramani and Lafferty (2003) 40
0.05
0.9
0.01
0.8
0.9
0.1
set of distributions over unlabeled vertices
Zhu, Ghahramani and Lafferty (2003) 41
Label Propagation
0.05
0.9
0.01
0.8
0.9
0.1
unlabeled vertices
Zhu, Ghahramani and Lafferty (2003) 42
Label Propagation
0.05
0.9
0.01
0.8
0.9
0.1
brings the distributions of similarvertices closer
Zhu, Ghahramani and Lafferty (2003) 43
Label Propagation
0.05
0.9
0.01
0.8
0.9
0.1
brings the distributions of uncertain neighborhoods
close to the uniform distribution
Size of the label set
Zhu, Ghahramani and Lafferty (2003) 44
Label Propagation
0.05
0.9
0.01
0.8
0.9
0.1
Iterative updates for optimization
Zhu, Ghahramani and Lafferty (2003) 45
Label Propagation
0.05
How can label propagation help? For a language:
1) Build graph over a 2M trigram types as vertices• compute similarity matrix using co-occurrence
statistics
2) Label distribution at each vertex tag distribution over the trigram’s middle word
Subramanya, Petrov and Pereira (2010)
Idea 3: Graph-Based Projections
46
ist gut bei
ist lebhafter bei
ist wichtig bei
ist fein bei
gutem Essen zugetan
fuers Essen drauf
1000 Essen proschlechtes Essen
und
zum Essen niederlassen
zu realisieren ,zu essen ,
zu stecken , zu erreichen ,
Example Graph in German
47
ist gut bei
ist lebhafter bei
ist wichtig bei
ist fein bei
gutem Essen zugetan
fuers Essen drauf
1000 Essen proschlechtes Essen
und
zum Essen niederlassen
zu realisieren ,zu essen ,
zu stecken , zu erreichen ,
Example Graph in German
48
NOUN
VERB
How can label propagation help? For a target language:
1) Build graph over a 2M trigram types as vertices• compute similarity matrix using co-occurrence
statistics
2) Label distribution at each vertex tag distribution over the trigram’s middle word3) Plug in auto-tagged words from a source language
4) Links between source and target language units are word alignments
Idea 3: Graph-Based Projections
49
ist gut bei
ist lebhafter bei
ist wichtig bei
ist fein bei
gutem Essen zugetan
fuers Essen drauf
1000 Essen proschlechtes Essen
und
zum Essen niederlassen
zu realisieren ,zu essen ,
zu stecken , zu erreichen ,eat
food
eateating
NOUNVERB
VERBVERB
goodADJ
nicelyADV
fineADJ
importantADJBilingual Graph
50
How can label propagation help?
For a target language:
3) Plug in auto-tagged words from a source language
4) Links between source and target language units are word alignments
5) Run first stage of label propagation• Source language target language
Idea 3: Graph-Based Projections
51
ist gut bei
ist lebhafter bei
ist wichtig bei
ist fein bei
gutem Essen zugetan
fuers Essen drauf
1000 Essen proschlechtes Essen
und
zum Essen niederlassen
zu realisieren ,zu essen ,
zu stecken , zu erreichen ,eat
food
eateating
NOUNVERB
VERBVERB
goodADJ
nicelyADV
fineADJ
importantADJ
First Stage of Label Propagation
52
ist gut bei
ist lebhafter bei
ist wichtig bei
ist fein bei
gutem Essen zugetan
fuers Essen drauf
1000 Essen proschlechtes Essen
und
zum Essen niederlassen
zu realisieren ,zu essen ,
zu stecken , zu erreichen ,eat
food
eateating
NOUNVERB
VERBVERB
goodADJ
nicelyADV
fineADJ
importantADJ
First Stage of Label Propagation
VERBNOUN
53
ADJADV
ADJADV
How can label propagation help?
For a target language:
3) Plug in auto-tagged words from a source language
4) Links between source and target language units are word alignments
5) Run first stage of label propagation• Source language target language
6) Run second stage of label propagation • within target language vertices• graph objective function with squared
penalties
Idea 3: Graph-Based Projections
54
ist gut bei
ist lebhafter bei
ist wichtig bei
ist fein bei
gutem Essen zugetan
fuers Essen drauf
1000 Essen proschlechtes Essen
und
zum Essen niederlassen
zu realisieren ,zu essen ,
zu stecken , zu erreichen ,eat
food
eateating
NOUNVERB
VERBVERB
goodADJ
nicelyADV
fineADJ
importantADJ
Second Stage of Label Propagation
VERBNOUN
55
ADJADV
ADJADV
ADJist gut bei
ist lebhafter bei
ist wichtig bei
ist fein bei
gutem Essen zugetan
fuers Essen drauf
1000 Essen proschlechtes Essen
und
zum Essen niederlassen
zu realisieren ,zu essen ,
zu stecken , zu erreichen ,eat
food
eateating
NOUNVERB
VERBVERB
goodADJ
nicelyADV
fineADJ
importantADJ
Second Stage of Label Propagation
VERBNOUN
ADJ
VERBNOUN
56
ADJADV
ADJADV
ADJist gut bei
ist lebhafter bei
ist wichtig bei
ist fein bei
gutem Essen zugetan
fuers Essen drauf
1000 Essen proschlechtes Essen
und
zum Essen niederlassen
zu realisieren ,zu essen ,
zu stecken , zu erreichen ,eat
food
eateating
NOUNVERB
VERBVERB
goodADJ
nicelyADV
fineADJ
importantADJ
Second Stage of Label Propagation
VERBNOUN
ADJ
VERBNOUN
VERB
VERBNOUN
57
ADJADV
ADJADV
ADJist gut bei
ist lebhafter bei
ist wichtig bei
ist fein bei
gutem Essen zugetan
fuers Essen drauf
1000 Essen proschlechtes Essen
und
zum Essen niederlassen
zu realisieren ,zu essen ,
zu stecken , zu erreichen ,eat
food
eateating
NOUNVERB
VERBVERB
goodADJ
nicelyADV
fineADJ
importantADJ
Second Stage of Label Propagation
VERBNOUN
ADJ
VERBNOUN
VERB
VERBNOUN
58
ADJADV
ADJADV
Continues till convergence...
fein lebhafter realisieren
Idea 3: Graph-Based Projections
Portland
gedeihende
hat
eine
Musikszene
.
End result?
59
ADJNOUN
XNUM
PRONVERB
DET
ADJNOUN
XNUM
PRONVERB
DET
ADJNOUN
XNUM
PRONVERB
DET
ADJNOUN
XNUM
PRONVERB
DET
ADJNOUN
XNUM
PRONVERB
DET
ADJNOUN
XNUM
PRONVERB
DET
ADJNOUN
XNUM
PRONVERB
DET
ADJNOUN
ADVNUM
PRONVERB
DET
ADJNOUN
XNUM
PRONVERB
DET
fein lebhafter realisieren
Idea 3: Graph-Based Projections
Portland
gedeihende
hat
eine
Musikszene
.
60
ADJNOUN
XNUM
PRONVERB
DET
ADJNOUN
XNUM
PRONVERB
DET
ADJNOUN
XNUM
PRONVERB
DET
ADJNOUN
XNUM
PRONVERB
DET
ADJNOUN
XNUM
PRONVERB
DET
ADJNOUN
XNUM
PRONVERB
DET
ADJNOUN
XNUM
PRONVERB
DET
ADJNOUN
ADVNUM
PRONVERB
DET
ADJNOUN
XNUM
PRONVERB
DET
A larger set of tag distributions better and larger dictionary
Idea 3: Graph-Based Projections
Feature HMM constrained with graph-based dictionary
Danish
Dutch German
Greek Italian Portuguese
Spanish
Swedish
Average
68.7 57.0 75.9 65.8 63.7 62.9 71.5 68.4 66.769.1 65.1 81.3 71.8 68.1 78.4 80.2 70.1 73.0
73.6 77.0 83.2 79.3 79.7 82.6 80.1 74.7 78.879.0 78.8 82.4 76.3 84.8 87.0 82.8 79.4 81.383.2
79.5 82.8 82.5 86.8 87.9 84.2 80.5 83.4
EM-HMM
DirectprojectionProjectedDictionar
yGraph-BasedProjections
Feature-HMM
Idea 3: Graph-Based Projections
Feature HMM constrained with graph-based dictionary
Danish
Dutch German
Greek Italian Portuguese
Spanish
Swedish
Average
68.7 57.0 75.9 65.8 63.7 62.9 71.5 68.4 66.769.1 65.1 81.3 71.8 68.1 78.4 80.2 70.1 73.0
73.6 77.0 83.2 79.3 79.7 82.6 80.1 74.7 78.879.0 78.8 82.4 76.3 84.8 87.0 82.8 79.4 81.383.2
79.5 82.8 82.5 86.8 87.9 84.2 80.5 83.4
EM-HMM
Feature-HMM
DirectprojectionProjectedDictionar
yGraph-BasedProjections
93.1 94.7 93.5 96.6 96.4 94.0 95.8 85.5 93.7w/ gold dictionary
96.9 94.9 98.2 97.8 95.8 97.2 96.8 94.8 96.6supervised62
63
Idea 3: Graph-Based Projections
Lexicon Expansion
Danish
Dutch
German
Greek
Italian
Portu
guese
Span
ish
Swed
ish0
20000400006000080000
100000120000140000
Projected Dictionary Graph-Based Projections
thousandsof words
Concluding Notes• Soft expansion of lexicon using parallel
data and supervision in a resource rich language–Graph-based learning helps in almost all cases
• Reasonably accurate POS taggers without direct supervision• Traditional evaluation of unsupervised POS
taggers done using greedy metrics that use labeled data–Our presented models avoid these evaluation methods
• Practically no hyperparameter tuning–except a threshold parameter for dictionary construction
64
Future Directions• Scaling up the number of nodes in the
graph from 2M to billions may help create larger lexicons
• Including penalties in the graph objective that induce sparse tag distributions at each graph vertex
• Inclusion of multiple languages in the graph may further improve results–Label propagation in one huge multilingual graph
65
66
Projected Data
http://code.google.com/p/pos-projection/
Available at:
Portland has a thriving music scene .
NOUNADJADJ
Portland hat eine prächtig gedeihende Musikszene .
প�োর্ট�ল্যোন্ড শহর এর সঙ্গীত �রিরবে�শ প�শ উন্নত |
NOUN VERB DET ADJ NOUN NOUN .
Portland tiene una escena musical vibrante .
波特兰 有 一个 生机勃勃的 音乐 场景
Portland a une scène musicale florissante .
ADJNOUNNOUN NOUN
ADP
.NOUN VERB DET ADJ
ADJ
.NOUN VERB DET ADJ NOUN NOUNNOUN VERB DET NOUN
ADJ
ADJ .
Questions?
67