Upload
hovan
View
33
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance. Shay B. Cohen Dipanjan Das Noah A. Smith Carnegie Mellon University. July 27 EMNLP 2011. Goal: . Learn linguistic structure for a language without any labeled data in that language. - PowerPoint PPT Presentation
Citation preview
Unsupervised Structure Predictionwith Non-Parallel Multilingual
Guidance
July 27EMNLP 2011
Shay B. Cohen Dipanjan Das Noah A. Smith
Carnegie Mellon University
Goal:
2
Learn linguistic structure for a language without any labeled
data in that language
Part-of-Speech Tagging
DET NOUN NOUN VERB ADJ ADP .The Skibo Castle is close by .
Dependency Parsing
EMNLP 2011Cohen, Das and Smith (2011)
This work!
Multilingual Unsupervised Learning
3
using parallel data
no parallel data
(hard)
supervision in source
language(s)
joint learning for multiple languages
Snyder et al. (2009)Naseem et al. (2010)
supervision in source
language(s)
Smith and Eisner (2009)Das and Petrov (2011)McDonald et al. (2011)
joint learning for multiple languages
Cohen and Smith (2009)Berg-Kirkpatrick and
Klein (2010)
EMNLP 2011Cohen, Das and Smith (2011)
Yarowsky and Ngai (2001)Xi and Hwa (2005)
Annotated data
In a Nutshell
4
Unlabeled data in
Portuguese+ =
Spanish Italian
Coarse, universal paramete
rs
Coarse, universal paramete
rsInterpolatio
n(unsupervised training)
coarse parameters of Portuguese
Monolingual unsupervised training in Portuguese
Coarse-to-fine expansion
and initialization
Cohen, Das and Smith (2011)
Portuguese parameters
EMNLP 2011
5
Assumptions for a given problem:
1. Underlying model is generative
HMMThe Skibo is close byCastle Merialdo (1994)
EMNLP 2011Cohen, Das and Smith (2011)
66
1. Underlying model is generative
DET NOUN NOUN VERB ADJ ADP
ROOT
DMVKlein and
Manning (2004)
Assumptions for a given problem:
EMNLP 2011Cohen, Das and Smith (2011)
77
Composed of multinomial distributions
HMMThe Skibo is close byCastle Merialdo (1994)
Assumptions for a given problem:
1. Underlying model is generative
EMNLP 2011Cohen, Das and Smith (2011)
88
DET NOUN NOUN VERB ADJ ADP
ROOT
DMVKlein and
Manning (2004)
Composed of multinomial distributions
Assumptions for a given problem:
1. Underlying model is generative
EMNLP 2011Cohen, Das and Smith (2011)
99
In general, unlexicalized parameters look like:
kth multinomial in the modelith event in the multinomial
Assumptions for a given problem:
1. Underlying model is generative
e.g. transition from ADJ ( ) to NOUN ( ) EMNLP 2011Cohen, Das and Smith (2011)
1010
The lexicalized parameters take a similar form(No lexicalized parameters for the DMV)
Assumptions for a given problem:
1. Underlying model is generative
EMNLP 2011Cohen, Das and Smith (2011)
1111
unlexicalizedlexicalized
number of times event i of multinomial k fires in the
derivation
Assumptions for a given problem:
1. Underlying model is generative
EMNLP 2011Cohen, Das and Smith (2011)
12
2. Coarse, universal part-of-speech tags
VERB DETNOUN CONJPRON NUMADJ PRTADV .ADP X
Assumptions for a given problem:
EMNLP 2011Cohen, Das and Smith (2011)
13
Assumptions for a given problem:
2. Coarse, universal part-of-speech tags
VERB DETNOUN CONJPRON NUMADJ PRTADV .ADP X
Treebanktagset
For each language , there is a mapping
EMNLP 2011Cohen, Das and Smith (2011)
Coarse treebank
coarse conversion
3. helper languages
14
Assumptions for a given problem:
Treebank
unlexicalized parameters
MLE
EMNLP 2011Cohen, Das and Smith (2011)
For each:
15
Multilingual Modeling
EMNLP 2011Cohen, Das and Smith (2011)
16
Multilingual ModelingFor a target language, unlexicalized parameters:
kth multinomial in the model
(say, the transitions from the ADJ tagin an HMM)
mixture weight for kth multinomial
for the th
helper languageEMNLP 2011Cohen, Das and Smith (2011)
ADJ → . ADJ → . ADJ → .ADJ → . ADJ → .
0.7 0.3
NOUNVERBADJADV
PRONDETADPNUMCONJPRT
.X
NOUNVERBADJADV
PRONDETADPNUMCONJPRT
.X
17
Multilingual Modelinge.g., two helper languages: Spanish and Italian
NOUNVERBADJADV
PRONDETADPNUMCONJPRT
.X
0.260.120.030.040.050.030.250.010.130.040.010.04
0.270.120.030.040.040.030.250.010.100.050.010.05
0.250.110.040.040.060.040.260.0
0.200.00.0
0.00
EMNLP 2011Cohen, Das and Smith (2011)
? ?
NOUNVERBADJADV
PRONDETADPNUMCONJPRT
.X
NOUNVERBADJADV
PRONDETADPNUMCONJPRT
.X
18
Multilingual Modelinge.g., two helper languages: Spanish and Italian
0.270.120.030.040.040.030.250.010.100.050.010.05
0.250.110.040.040.060.040.260.0
0.200.00.0
0.00
unknown
ADJ → . ADJ → . ADJ → .ADJ → . ADJ → .
EMNLP 2011Cohen, Das and Smith (2011)
19
Learning and Inference
EMNLP 2011Cohen, Das and Smith (2011)
20
Learning and Inference
normal learning
EMNLP 2011Cohen, Das and Smith (2011)
21
Learning and Inference
multilingual learning
are fixed!
EMNLP 2011Cohen, Das and Smith (2011)
22
Learning and InferenceMultilingual
learninglearning with EM:
Number of times is used in a derivation
M-step:
EMNLP 2011Cohen, Das and Smith (2011)
23
Learning and InferenceMultilingual
learningWhat about feature-rich
generative models?
Berg-Kirkpatrick et al. (2010)
Locally normalized log-linear model
EMNLP 2011Cohen, Das and Smith (2011)
? ?
NOUNVERBADJADV
PRONDETADPNUMCONJPRT
.X
NOUNVERBADJADV
PRONDETADPNUMCONJPRT
.X
24
Multilingual Modelinge.g., two helper languages: Spanish and Italian
0.270.120.030.040.040.030.250.010.100.050.010.05
0.250.110.040.040.060.040.260.0
0.200.00.0
0.00
unknown
ADJ → . ADJ → .ADJ → . ADJ → .
EMNLP 2011Cohen, Das and Smith (2011)
0.6237 0.3763
NOUNVERBADJADV
PRONDETADPNUMCONJPRT
.X
NOUNVERBADJADV
PRONDETADPNUMCONJPRT
.X
25
Multilingual Modelinge.g., two helper languages: Spanish and Italian
0.270.120.030.040.040.030.250.010.100.050.010.05
0.250.110.040.040.060.040.260.0
0.200.00.0
0.00
learned
ADJ → . ADJ → . ADJ → .ADJ → . ADJ → .
NOUNVERBADJADV
PRONDETADPNUMCONJPRT
.X
0.260.120.030.040.050.030.250.010.130.040.010.04
EMNLP 2011Cohen, Das and Smith (2011)
JJS → .JJ → .JJR → .
26
Coarse-to-fine expansion
NOUNVERBADJADV
PRONDETADPNUMCONJPRT
.X
0.260.120.030.040.050.030.250.010.130.040.010.04
Learning and Inference(for English)
NOUNVERBADJADV
PRONDETADPNUMCONJPRT
.X
0.260.120.030.040.050.030.250.010.130.040.010.04
NOUNVERBADJADV
PRONDETADPNUMCONJPRT
.X
0.260.120.030.040.050.030.250.010.130.040.010.04
NOUNVERBADJADV
PRONDETADPNUMCONJPRT
.X
0.260.120.030.040.050.030.250.010.130.040.010.04
identicalcopies
Step 1
ADJ → .
EMNLP 2011Cohen, Das and Smith (2011)
27
Coarse-to-fine expansionLearning and Inference
(for English)
NOUNVERBADJADV
PRONDETADPNUMCONJPRT
.X
0.260.120.030.040.050.030.250.010.130.040.010.04
JJ → .
EMNLP 2011Cohen, Das and Smith (2011)
28
NOUNVERBADJADV
PRONDETADPNUMCONJPRT
.X
0.260.120.030.040.050.030.250.010.130.040.010.04
JJ → .
Coarse-to-fine expansionLearning and Inference
(for English)
VBVBDVBGVBNVBPVBZ
0.0650.0650.0650.0650.020.020.020.020.020.02
NNNNSNNPNNPS
.
.
.
.
.
.
.
.
.
.
.
.
Equaldivision
Monolingualunsupervised
training
Initializer
new, fine
JJ → .Step 2
EMNLP 2011Cohen, Das and Smith (2011)
29
Experiments
EMNLP 2011Cohen, Das and Smith (2011)
30
Two ProblemsUnsupervised Part-of-Speech
Tagging
Model:feature-based HMM(Berg-Kirkpatrick et al., 2010)
Learning:L-BFGS
Unsupervised Dependency
Parsing
Model:DMV
(Klein and Manning, 2004)
Learning:EM
EMNLP 2011Cohen, Das and Smith (2011)
31
Languages
Target Languages:Bulgarian, Danish, Dutch, Greek, Japanese, Portuguese, Slovene, Spanish, Swedish, and Turkish
Helper Languages:English, German, Italian and Czech
(CoNLL Treebanks from 2006 and 2007)EMNLP 2011Cohen, Das and Smith (2011)
Direct Gradient
(DG)
Uniform+
DG
Mixture+
DGNumber of
Languages with Best Results
Average Accuracy
32
Results: POS Tagging
(without tag dictionary)EMNLP 2011Cohen, Das and Smith (2011)
Monolingual baseline(Berg-Kirkpatrick et al.,
2010)
Uniform mixture parameters
(no learning)
Full model
Direct Gradient
(DG)
Uniform+
DG
Mixture+
DGNumber of
Languages with Best Results
2(Portuguese,
Danish)
2(Turkish,
Bulgarian)
6
Average Accuracy 40.6 41.0 43.3
33
Results: POS Tagging
(without tag dictionary)EMNLP 2011Cohen, Das and Smith (2011)
EM PR PGI
Number of
Languages with
Best ResultsAverage Accuracy
34
Results: Dependency Parsing
EMNLP 2011Cohen, Das and Smith (2011)
Monolingual EM(Klein and Manning, 2004)
Posterior Regularization(Gillenwater et al, 2010)
Phylogenetic Grammar Induction
(Berg-Kirkpatrick and Klein, 2010)
EM PR PGI
Number of
Languages with
Best ResultsAverage Accuracy
Uniform Mixture
Uniform + EM
Mixture + EM
1. Uniform mixture parameters
2. No coarse-to-fine expansion
(no learning)
35
Results: Dependency Parsing
EMNLP 2011Cohen, Das and Smith (2011)
1. Learned mixture parameters
2. No coarse-to-fine expansion
1. Uniform mixture parameters2. Coarse-to-fine expansion →
monolingual learning
1. Learned mixture parameters2. Coarse-to-fine expansion →
monolingual learning
Uniform Mixture
Uniform + EM
Mixture + EM
EM PR PGI
Number of
Languages with
Best Results
0 2(Turkis
h, Sloven
e)
0
Average Accuracy
41.4 50.2*
53.6*
36
Results: Dependency Parsing
EMNLP 2011Cohen, Das and Smith (2011)
Uniform Mixture
Uniform + EM
Mixture + EM
3(Bulgarian, Swedish,
Dutch)
1(Danish
)
1(Greek)
3(Portugue
se, Japanese, Spanish)
61.6 62.2 61.5 62.1
EM PR PGI
Number of
Languages with
Best Results
0 2(Turkis
h, Sloven
e)
0
Average Accuracy
41.4 50.2*
53.6*
37
Results: Dependency Parsing
EMNLP 2011Cohen, Das and Smith (2011)
Cohen, Das and Smith (2011) EMNLP 2011 38
Analyzing with Principal Component Analysis
Two principal components
39
From Words to Dependencies
EMNLP 2011Cohen, Das and Smith (2011)
40
From Words to DependenciesUse induced tags to induce
dependencies
1. In a pipeline2. Using the posteriors over
tagsin a sausage lattice(Cohen and Smith, 2007)
EMNLP 2011Cohen, Das and Smith (2011)
Cohen, Das and Smith (2011) EMNLP 2011 41
From Words to DependenciesJoint Decoding:
1 2 3 4
The Skibo Castle
DET : 0.95
ADJ: 0.03NOUN: 0.02
DET : 0.0
ADJ: 0.3NOUN: 0.7
DET : 0.01
ADJ: 0.1NOUN: 0.89
DMV
Parsing a
lattice
42
Results: Words to DependenciesPipeline Joint
DG Mixture + DG
DG Mixture + DG
Number of Languages with Best Results
Average
EMNLP 2011Cohen, Das and Smith (2011)
43
Results: Words to DependenciesPipeline Joint
DG Mixture + DG
DG Mixture + DG
Number of Languages with Best Results
1(Greek)
0 5(Portuguese,
Turkish, Swedish, Slovebe,Danish)
4(Bulgarian, Japanese, Spanish,Dutch)
Average 56.9 54.0 57.9 55.6
EMNLP 2011Cohen, Das and Smith (2011)
44
Results: Words to DependenciesPipeline Joint
DG Mixture + DG
DG Mixture + DG
Number of Languages with Best Results
1(Greek)
0 5(Portuguese,
Turkish, Swedish, Slovebe,Danish)
4(Bulgarian, Japanese, Spanish,Dutch)
Average 56.9 54.0 57.9 55.6
EMNLP 2011Cohen, Das and Smith (2011)
Best average result with gold tags: 62.2Interesting result: Auto tags perform better
for Turkish and Slovene
45
Conclusions
EMNLP 2011Cohen, Das and Smith (2011)
46
Conclusions• Improvements for two major tasks
using non-parallel multilingual guidance
• In general grammar induction results better than POS tagging
• Joint POS and dependency parsing performs surprisingly well• For a few languages, results are better
than using gold tags• Joint decoding performs better than a
pipelineEMNLP 2011Cohen, Das and Smith (2011)
47
Questions?
EMNLP 2011Cohen, Das and Smith (2011)
48
Results: POS TaggingDirect
Gradient(DG)
Uniform+
DG
Mixture+
DGBulgarian 34.7 38.0 35.8
Danish 48.8 36.2 39.9Dutch 45.4 43.7 50.2Greek 35.3 36.7 38.9
Japanese 52.3 60.4 61.7Portugue
se53.5 45.7 51.5
Slovene 33.4 35.9 36.0Spanish 40.0 31.8 40.5Swedish 34.4 37.7 39.9Turkish 27.9 43.6 38.6Average 40.6 41.0 43.3(without tag dictionary)EMNLP 2011Cohen, Das and Smith (2011)
49
Results: POS TaggingDirect
Gradient(DG)
Uniform+
DG
Mixture+
DGBulgarian 34.7 38.0 35.8
Danish 48.8 36.2 39.9Dutch 45.4 43.7 50.2Greek 35.3 36.7 38.9
Japanese 52.3 60.4 61.7Portugue
se53.5 45.7 51.5
Slovene 33.4 35.9 36.0Spanish 40.0 31.8 40.5Swedish 34.4 37.7 39.9Turkish 27.9 43.6 38.6Average 40.6 41.0 43.3(without tag dictionary)EMNLP 2011Cohen, Das and Smith (2011)
50
Results: POS TaggingDirect
Gradient(DG)
Uniform+
DG
Mixture+
DGBulgarian 80.7 81.3 82.6
Danish 82.3 82.0 82.0Dutch 79.2 79.3 80.0Greek 88.0 80.3 80.3
Japanese 83.4 77.9 79.9Portugue
se75.4 83.8 84.7
Slovene 75.6 82.8 82.8Spanish 82.3 82.3 83.3Swedish 61.5 69.0 67.0Turkish 50.4 50.4 50.4Average 75.9 76.9 77.3(with tag dictionary)EMNLP 2011Cohen, Das and Smith (2011)
51
Results: POS TaggingDirect
Gradient(DG)
Uniform+
DG
Mixture+
DGBulgarian 80.7 81.3 82.6
Danish 82.3 82.0 82.0Dutch 79.2 79.3 80.0Greek 88.0 80.3 80.3
Japanese 83.4 77.9 79.9Portugue
se75.4 83.8 84.7
Slovene 75.6 82.8 82.8Spanish 82.3 82.3 83.3Swedish 61.5 69.0 67.0Turkish 50.4 50.4 50.0Average 75.9 76.9 77.3(with tag dictionary)EMNLP 2011Cohen, Das and Smith (2011)
Uniform
Mixture
Uniform + EM
Mixture + EM
75.6 75.5 74.7 72.859.2 59.9 51.3 55.250.7 51.1 45.9 46.057.0 59.5 73.0 72.356.3 58.3 59.8 63.978.6 76.8 78.7 79.846.1 46.0 41.3 41.073.2 75.9 75.5 76.774.0 73.2 70.5 68.745.0 45.3 43.9 44.161.6 62.2 61.5 62.1
EM PR PGIBulgarian 54.
354.0
-
Danish 41.4
44.0
41.6
Dutch 38.6
37.9
45.1
Greek 41.0
- -
Japanese 43.0
60.2
-
Portuguese
42.5
47.8
63.1
Slovene 37.0
50.3
49.6
Spanish 38.1
62.4
63.8
Swedish 42.3
42.2
58.3
Turkish 36.3
53.4
-
Average 41.4
- -
52
Results: Dependency Parsing
EMNLP 2011Cohen, Das and Smith (2011)
Uniform
Mixture
Uniform + EM
Mixture + EM
75.6 75.5 74.7 72.859.2 59.9 51.3 55.250.7 51.1 45.9 46.057.0 59.5 73.0 72.356.3 58.3 59.8 63.978.6 76.8 78.7 79.846.1 46.0 41.3 41.073.2 75.9 75.5 76.774.0 73.2 70.5 68.745.0 45.3 43.9 44.161.6 62.2 61.5 62.1
EM PR PGIBulgarian 54.
354.0
-
Danish 41.4
44.0
41.6
Dutch 38.6
37.9
45.1
Greek 41.0
- -
Japanese 43.0
60.2
-
Portuguese
42.5
47.8
63.1
Slovene 37.0
50.3
49.6
Spanish 38.1
62.4
63.8
Swedish 42.3
42.2
58.3
Turkish 36.3
53.4
-
Average 41.4
- -
53
Results: Dependency Parsing
EMNLP 2011Cohen, Das and Smith (2011)
54
Results: Words to DependenciesJoint Pipeline Gold
TagsDG Mixture +
DGDG Mixture +
DGBulgarian 62.4 67.0 57.7 62.9 75.6
Danish 50.4 50.1 48.9 48.3 59.9Dutch 48.3 52.2 49.9 51.2 50.7Greek 63.5 52.2 68.2 50.0 73.0
Japanese 61.4 69.5 64.2 68.6 63.9Portuguese 68.4 62.2 60.0 59.8 79.8
Slovene 47.2 36.8 45.8 36.4 46.1Spanish 67.7 69.3 65.8 68.1 76.7Swedish 58.2 49.1 57.9 47.6 74.0Turkish 52.4 47.4 50.8 47.1 45.3Average 57.9 55.0 56.9 54.0 64.5
EMNLP 2011Cohen, Das and Smith (2011)