Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

Preview:

DESCRIPTION

Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections. Dipanjan Das Carnegie Mellon University. Slav Petrov Google Research. June 21 ACL 2011. Part-of-Speech Tagging. Portland has a thriving music scene . . ADJ. NOUN. NOUN. NOUN. DET. VERB. - PowerPoint PPT Presentation

Citation preview

Unsupervised Part-of-Speech Tagging

with Bilingual Graph-Based Projections

June 21ACL 2011

Slav PetrovGoogle Research

Dipanjan DasCarnegie Mellon University

Part-of-Speech Tagging

Portland has a thriving music scene .NOUN VERB DET ADJ NOUN NOUN .

2

Supervised POS Tagging

ArabicBasque

BulgarianCatalanChinese

CzechDanishEnglishFrench

GermanGreek

HungarianItalian

JapaneseKorean

PortugueseRussianSloveneSpanishSwedishTurkish

84 86 88 90 92 94 96 98 100

96.9

93.797.8

98.2

93.499.1

96.4

96.8

96.7

98.1

97.5

95.6

95.8

98.7

97.5

96.8

96.8

94.6

96.3

94.7

89.1

POS Accuracy

Supervised setting: average accuracy is 96.2% 3

TnT (Brants, 2000)

Resource-Poor Languages

Several major languages with no or little annotated data

OriyaIndonesian-Malay

Azerbaijani

e.g.

See http://www.ethnologue.org/ethno_docs/distribution.asp?by=size

Haitian

However, lots of parallel and

unannotated data!Basic NLP tools like POS

tagging essential for development of

language technologies

4

PunjabiVietnamese

Polish32 million37 million20 million

Native speakers

7.7 million

109 million69 million40 million

(Nearly) Universal Part-of-Speech Tags

VERB DETNOUN CONJPRON NUMADJ PRTADV .ADP X

5

(Nearly) Universal Part-of-Speech Tags

Example Penn Treebank tag maps:

NN NOUNNNP NOUNNNPS NOUNNNS NOUN

PRP PRONPRP$ PRONWP PRONWP$ PRON

np NOUNnc NOUN

Example Spanish Treebank tag maps:p0 PRONpd PRONpe PRONpi PRONpn PRON

pp PRONpr PRONpt PRONpx PRON

See Petrov, Das and McDonald (2011)

(Nearly) Universal Part-of-Speech Tags

Portland has a thriving music scene .NOUN VERB DET ADJ NOUN NOUN .

Portland hat eine prächtig gedeihende Musikszene .NOUN VERB DET ADJ ADJ NOUN .

প�োর্ট�ল্যোন্ড শহর এর সঙ্গীত �রিরবে�শ প�শ উন্নত | NOUN NOUN ADP NOUN NOUN ADJ ADJ .

7

Supervised Universal POS Tagging

ArabicBasque

BulgarianCatalanChinese

CzechDanishEnglishFrench

GermanGreek

HungarianItalian

JapaneseKorean

PortugueseRussianSloveneSpanishSwedishTurkish

84 86 88 90 92 94 96 98 100

96.9

93.797.8

98.2

93.499.1

96.4

96.8

96.7

98.1

97.5

95.6

95.8

98.7

97.5

96.8

96.8

94.6

96.3

94.7

89.1

POS Accuracy

Less variance in accuracy than fine tags 8

TnT (Brants, 2000)

State of the Art in Unsupervised POS Tagging

9

Unsupervised Part-of-Speech Tagging

Portland hat eine prächti

g gedeihende Musikszene .

? ? ? ? ? ? ?

Hidden Markov Model (HMM) estimated with the Expectation-Maximization algorithm

: observation sequence: state sequence

10Merialdo (1994)

Unsupervised Part-of-Speech Tagging

Portland hat eine prächti

g gedeihende Musikszene .

? ? ? ? ? ? ?

Hidden Markov Model (HMM) estimated with the Expectation-Maximization algorithm

one of the 12 coarse

tags

: observation sequence: state sequence

11Merialdo (1994)

Unsupervised Part-of-Speech Tagging

Portland hat

? ?

Hidden Markov Model (HMM) estimated with the Expectation-Maximization algorithm

transition multinomials

: observation sequence: state sequence

12Merialdo (1994)

Unsupervised Part-of-Speech Tagging

Portland hat

? ?

Hidden Markov Model (HMM) estimated with the Expectation-Maximization algorithm

emission multinomials

: observation sequence: state sequence

13Merialdo (1994)

Unsupervised Part-of-Speech Tagging

Portland hat eine prächti

g gedeihende Musikszene .

? ? ? ? ? ? ?

Hidden Markov Model (HMM) estimated with the Expectation-Maximization algorithm

Danish

Dutch German

Greek Italian Portuguese

Spanish

Swedish

Average

68.7 57.0 75.9 65.8 63.7 62.9 71.5 68.4 66.7EM-HMM

Poor average result

14Johnson (2007)

Unsupervised Part-of-Speech TaggingHidden Markov Model (HMM)

with locally normalized log-linear models

: observation sequence

Portland hat

? ? emission multinomials

: state sequence

15

Berg-Kirkpatrick et al. (2010)

Unsupervised Part-of-Speech TaggingHidden Markov Model (HMM)

with locally normalized log-linear models

: observation sequence

Portland hat

? ? emission multinomials

suffixhyphen

capital lettersnumbers

...: state sequence

16

Berg-Kirkpatrick et al. (2010)

Unsupervised Part-of-Speech TaggingHidden Markov Model (HMM)

with locally normalized log-linear models

: observation sequence

Portland hat

? ? emission multinomials

suffixhyphen

capital lettersnumbers

...

Estimated using gradient-based methods

: state sequence

17

Berg-Kirkpatrick et al. (2010)

Unsupervised Part-of-Speech TaggingHidden Markov Model (HMM)

with locally normalized log-linear models

Portland hat

? ? emission multinomials

Danish

Dutch German

Greek Italian Portuguese

Spanish

Swedish

Average

68.7 57.0 75.9 65.8 63.7 62.9 71.5 68.4 66.769.1

65.1 81.3 71.8 68.1 78.4 80.2 70.1 73.0EM-HMM

Feature-HMM

Estimated using gradient-based methods

Improvements across all languages

18

Berg-Kirkpatrick et al. (2010)

Portland hat eine prächti

g gedeihende Musikszene .

NOUN VERB

PRONDETADJNUM

ADJADV ADJ NOUN .

Unsupervised POS Tagging with Dictionaries

Hidden Markov Model (HMM) with locally normalized log-linear modelsState space constrained by possible gold

tags

19

Portland hat eine prächti

g gedeihende Musikszene .

NOUN VERB

PRONDETADJNUM

ADJADV ADJ NOUN .

Unsupervised POS Tagging with Dictionaries

Hidden Markov Model (HMM) with locally normalized log-linear modelsState space constrained by possible gold

tags

Danish

Dutch German

Greek Italian Portuguese

Spanish

Swedish

Average

68.7 57.0 75.9 65.8 63.7 62.9 71.5 68.4 66.769.1 65.1 81.3 71.8 68.1 78.4 80.2 70.1 73.0

93.1

94.7 93.5 96.6 96.4 94.0 95.8 85.5 93.7

EM-HMMFeature-HMM

w/ gold dictionary

Average result close to supervised accuracy! 20

For most languages, access to high-quality

tag dictionaries is not realistic.

Ideas: 1) Use supervision in resource-rich

languages2) Use parallel data

3) Construct projected tag lexicons

21

Morphologically rich languages only have base

forms in dictionaries

Bilingual Projection

Portland has a thriving music scene .

NOUN VERB DET ADJ NOUN NOUN .

automatic labels from supervised tagger, 97% accuracy

22

Bilingual Projection

Portland has a thriving music scene .

NOUN VERB DET ADJ NOUN NOUN .

Portland hat eine prächtig gedeihende Musikszene .

Automatic unsupervised alignments from translation data(available for more than 50 languages)

23

Bilingual Projection

Portland has a thriving music scene .

NOUN VERB DET ADJ NOUN NOUN .

Portland hat eine prächtig gedeihende Musikszene .

Idea 1: direct projectionunaligned word

NOUN(most frequent tag)

24Yarowsky and Ngai

(2001)

Bilingual Projection

Idea 1: direct projection

Portland hat eine prächtig gedeihende Musikszene .NOUN VERB DET NOUN ADJ NOUN .

+more projected tagged sentences

supervised training

tagger

25

(Brants, 2000)

Yarowsky and Ngai (2001)

Bilingual Projection

Idea 1: direct projection

26

Danish

Dutch German

Greek Italian Portuguese

Spanish

Swedish

Average

68.7 57.0 75.9 65.8 63.7 62.9 71.5 68.4 66.769.1 65.1 81.3 71.8 68.1 78.4 80.2 70.1 73.0

73.6

77.0 83.2 79.3 79.7 82.6 80.1 74.7 78.8

EM-HMM

Directprojection

Feature-HMM

Yarowsky and Ngai (2001)

Bilingual Projection

Idea 1: direct projection

27

Danish

Dutch German

Greek Italian Portuguese

Spanish

Swedish

Average

68.7 57.0 75.9 65.8 63.7 62.9 71.5 68.4 66.769.1 65.1 81.3 71.8 68.1 78.4 80.2 70.1 73.0

73.6

77.0 83.2 79.3 79.7 82.6 80.1 74.7 78.8

EM-HMM

Directprojection

Feature-HMM

Yarowsky and Ngai (2001)

consistent improvements over unsupervised models

Bilingual ProjectionIdea 2: lexicon projection

28

Bilingual ProjectionIdea 2: lexicon projection

NOUNPortland

VERBhas

DETa

ADJthriving

NOUNmusic

NOUNscene

.

.

Portland

hat

eine

prächtig

gedeihende

Musikszene

.

29

Bilingual ProjectionIdea 2: lexicon projection

NOUNPortland Portlan

d

ADJthrivinggedeihe

ndeprächtigVERB

has

hat

DETa

eine

NOUNsceneMusiksze

ne

NOUNmusic

.

..

ignore unaligned word

30

Bilingual ProjectionIdea 2: lexicon projection

NOUNPortland Portlan

d

ADJthrivinggedeihe

nde

VERBhas

hat

DETa

eine

NOUNsceneMusiksze

ne

NOUNmusic

.

..

Bag of alignments

31

Bilingual ProjectionIdea 2: lexicon projection

NOUNPortland Portlan

d

ADJthrivinggedeihe

nde

VERBhas

hat

eine

NOUNsceneMusiksze

ne

NOUNmusic

.

..

ADJNOUN

XNUM

PRONVERB

DET

DETa

32ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

Bilingual ProjectionIdea 2: lexicon projection

NOUNPortland Portlan

d

ADJthrivinggedeihe

nde

VERBhas

hat

eine

NOUNscene

NOUNmusic

.

..

DETa NUMon

e

PRONone

Musikszene

33ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

Bilingual ProjectionIdea 2: lexicon projection

NOUNPortland Portlan

d

ADJthrivinggedeihe

nde

VERBhas

hat

eine

NOUNscene

NOUNmusic

.

..

DETa NUMon

e

PRONone

Musikszene

VERBthriving

34ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DETADJ

NOUNX

NUMPRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

Bilingual ProjectionIdea 2: lexicon projection

Portland

gedeihende

hat

eine

Musikszene

.

After scanning all the parallel data:

= probability of a tag given a word

35

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

Bilingual ProjectionIdea 2: lexicon projection

Feature HMM constrained with projected dictionary

Improvements over simple projection for majority of the languages

36

Danish

Dutch German

Greek Italian Portuguese

Spanish

Swedish

Average

68.7 57.0 75.9 65.8 63.7 62.9 71.5 68.4 66.769.1 65.1 81.3 71.8 68.1 78.4 80.2 70.1 73.0

73.6 77.0 83.2 79.3 79.7 82.6 80.1 74.7 78.879.0

78.8 82.4 76.3 84.8 87.0 82.8 79.4 81.3

EM-HMM

DirectprojectionProjectedDictionar

y

Feature-HMM

Can coverage be improved?Idea:Projected lexicon

expansion and refinement using

a lot of unlabeled data

No information about unaligned words

37

Portland hat eine prächtig gedeihende Musikszene .

Portland has a thriving music scene .

NOUN VERB DET ADJ NOUN NOUN .

Brief Overview:Graph-Based Learning

with Labeled and Unlabeled Data

38

labeled datapoints unlabeled datapoints

supervised label distributions

distributions to be found

Zhu, Ghahramani and Lafferty (2003) 39

0.9

0.01

0.8

0.9

0.1

= symmetric weight matrix

0.05

0.9

0.01

0.8

0.9

0.1

Label Propagation

Zhu, Ghahramani and Lafferty (2003) 40

0.05

0.9

0.01

0.8

0.9

0.1

set of distributions over unlabeled vertices

Zhu, Ghahramani and Lafferty (2003) 41

Label Propagation

0.05

0.9

0.01

0.8

0.9

0.1

unlabeled vertices

Zhu, Ghahramani and Lafferty (2003) 42

Label Propagation

0.05

0.9

0.01

0.8

0.9

0.1

brings the distributions of similarvertices closer

Zhu, Ghahramani and Lafferty (2003) 43

Label Propagation

0.05

0.9

0.01

0.8

0.9

0.1

brings the distributions of uncertain neighborhoods

close to the uniform distribution

Size of the label set

Zhu, Ghahramani and Lafferty (2003) 44

Label Propagation

0.05

0.9

0.01

0.8

0.9

0.1

Iterative updates for optimization

Zhu, Ghahramani and Lafferty (2003) 45

Label Propagation

0.05

How can label propagation help? For a language:

1) Build graph over a 2M trigram types as vertices• compute similarity matrix using co-occurrence

statistics

2) Label distribution at each vertex tag distribution over the trigram’s middle word

Subramanya, Petrov and Pereira (2010)

Idea 3: Graph-Based Projections

46

ist gut bei

ist lebhafter bei

ist wichtig bei

ist fein bei

gutem Essen zugetan

fuers Essen drauf

1000 Essen proschlechtes Essen

und

zum Essen niederlassen

zu realisieren ,zu essen ,

zu stecken , zu erreichen ,

Example Graph in German

47

ist gut bei

ist lebhafter bei

ist wichtig bei

ist fein bei

gutem Essen zugetan

fuers Essen drauf

1000 Essen proschlechtes Essen

und

zum Essen niederlassen

zu realisieren ,zu essen ,

zu stecken , zu erreichen ,

Example Graph in German

48

NOUN

VERB

How can label propagation help? For a target language:

1) Build graph over a 2M trigram types as vertices• compute similarity matrix using co-occurrence

statistics

2) Label distribution at each vertex tag distribution over the trigram’s middle word3) Plug in auto-tagged words from a source language

4) Links between source and target language units are word alignments

Idea 3: Graph-Based Projections

49

ist gut bei

ist lebhafter bei

ist wichtig bei

ist fein bei

gutem Essen zugetan

fuers Essen drauf

1000 Essen proschlechtes Essen

und

zum Essen niederlassen

zu realisieren ,zu essen ,

zu stecken , zu erreichen ,eat

food

eateating

NOUNVERB

VERBVERB

goodADJ

nicelyADV

fineADJ

importantADJBilingual Graph

50

How can label propagation help?

For a target language:

3) Plug in auto-tagged words from a source language

4) Links between source and target language units are word alignments

5) Run first stage of label propagation• Source language target language

Idea 3: Graph-Based Projections

51

ist gut bei

ist lebhafter bei

ist wichtig bei

ist fein bei

gutem Essen zugetan

fuers Essen drauf

1000 Essen proschlechtes Essen

und

zum Essen niederlassen

zu realisieren ,zu essen ,

zu stecken , zu erreichen ,eat

food

eateating

NOUNVERB

VERBVERB

goodADJ

nicelyADV

fineADJ

importantADJ

First Stage of Label Propagation

52

ist gut bei

ist lebhafter bei

ist wichtig bei

ist fein bei

gutem Essen zugetan

fuers Essen drauf

1000 Essen proschlechtes Essen

und

zum Essen niederlassen

zu realisieren ,zu essen ,

zu stecken , zu erreichen ,eat

food

eateating

NOUNVERB

VERBVERB

goodADJ

nicelyADV

fineADJ

importantADJ

First Stage of Label Propagation

VERBNOUN

53

ADJADV

ADJADV

How can label propagation help?

For a target language:

3) Plug in auto-tagged words from a source language

4) Links between source and target language units are word alignments

5) Run first stage of label propagation• Source language target language

6) Run second stage of label propagation • within target language vertices• graph objective function with squared

penalties

Idea 3: Graph-Based Projections

54

ist gut bei

ist lebhafter bei

ist wichtig bei

ist fein bei

gutem Essen zugetan

fuers Essen drauf

1000 Essen proschlechtes Essen

und

zum Essen niederlassen

zu realisieren ,zu essen ,

zu stecken , zu erreichen ,eat

food

eateating

NOUNVERB

VERBVERB

goodADJ

nicelyADV

fineADJ

importantADJ

Second Stage of Label Propagation

VERBNOUN

55

ADJADV

ADJADV

ADJist gut bei

ist lebhafter bei

ist wichtig bei

ist fein bei

gutem Essen zugetan

fuers Essen drauf

1000 Essen proschlechtes Essen

und

zum Essen niederlassen

zu realisieren ,zu essen ,

zu stecken , zu erreichen ,eat

food

eateating

NOUNVERB

VERBVERB

goodADJ

nicelyADV

fineADJ

importantADJ

Second Stage of Label Propagation

VERBNOUN

ADJ

VERBNOUN

56

ADJADV

ADJADV

ADJist gut bei

ist lebhafter bei

ist wichtig bei

ist fein bei

gutem Essen zugetan

fuers Essen drauf

1000 Essen proschlechtes Essen

und

zum Essen niederlassen

zu realisieren ,zu essen ,

zu stecken , zu erreichen ,eat

food

eateating

NOUNVERB

VERBVERB

goodADJ

nicelyADV

fineADJ

importantADJ

Second Stage of Label Propagation

VERBNOUN

ADJ

VERBNOUN

VERB

VERBNOUN

57

ADJADV

ADJADV

ADJist gut bei

ist lebhafter bei

ist wichtig bei

ist fein bei

gutem Essen zugetan

fuers Essen drauf

1000 Essen proschlechtes Essen

und

zum Essen niederlassen

zu realisieren ,zu essen ,

zu stecken , zu erreichen ,eat

food

eateating

NOUNVERB

VERBVERB

goodADJ

nicelyADV

fineADJ

importantADJ

Second Stage of Label Propagation

VERBNOUN

ADJ

VERBNOUN

VERB

VERBNOUN

58

ADJADV

ADJADV

Continues till convergence...

fein lebhafter realisieren

Idea 3: Graph-Based Projections

Portland

gedeihende

hat

eine

Musikszene

.

End result?

59

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

ADVNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

fein lebhafter realisieren

Idea 3: Graph-Based Projections

Portland

gedeihende

hat

eine

Musikszene

.

60

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

ADVNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

A larger set of tag distributions better and larger dictionary

Idea 3: Graph-Based Projections

Feature HMM constrained with graph-based dictionary

Danish

Dutch German

Greek Italian Portuguese

Spanish

Swedish

Average

68.7 57.0 75.9 65.8 63.7 62.9 71.5 68.4 66.769.1 65.1 81.3 71.8 68.1 78.4 80.2 70.1 73.0

73.6 77.0 83.2 79.3 79.7 82.6 80.1 74.7 78.879.0 78.8 82.4 76.3 84.8 87.0 82.8 79.4 81.383.2

79.5 82.8 82.5 86.8 87.9 84.2 80.5 83.4

EM-HMM

DirectprojectionProjectedDictionar

yGraph-BasedProjections

Feature-HMM

Idea 3: Graph-Based Projections

Feature HMM constrained with graph-based dictionary

Danish

Dutch German

Greek Italian Portuguese

Spanish

Swedish

Average

68.7 57.0 75.9 65.8 63.7 62.9 71.5 68.4 66.769.1 65.1 81.3 71.8 68.1 78.4 80.2 70.1 73.0

73.6 77.0 83.2 79.3 79.7 82.6 80.1 74.7 78.879.0 78.8 82.4 76.3 84.8 87.0 82.8 79.4 81.383.2

79.5 82.8 82.5 86.8 87.9 84.2 80.5 83.4

EM-HMM

Feature-HMM

DirectprojectionProjectedDictionar

yGraph-BasedProjections

93.1 94.7 93.5 96.6 96.4 94.0 95.8 85.5 93.7w/ gold dictionary

96.9 94.9 98.2 97.8 95.8 97.2 96.8 94.8 96.6supervised62

63

Idea 3: Graph-Based Projections

Lexicon Expansion

Danish

Dutch

German

Greek

Italian

Portu

guese

Span

ish

Swed

ish0

20000400006000080000

100000120000140000

Projected Dictionary Graph-Based Projections

thousandsof words

Concluding Notes• Soft expansion of lexicon using parallel

data and supervision in a resource rich language–Graph-based learning helps in almost all cases

• Reasonably accurate POS taggers without direct supervision• Traditional evaluation of unsupervised POS

taggers done using greedy metrics that use labeled data–Our presented models avoid these evaluation methods

• Practically no hyperparameter tuning–except a threshold parameter for dictionary construction

64

Future Directions• Scaling up the number of nodes in the

graph from 2M to billions may help create larger lexicons

• Including penalties in the graph objective that induce sparse tag distributions at each graph vertex

• Inclusion of multiple languages in the graph may further improve results–Label propagation in one huge multilingual graph

65

66

Projected Data

http://code.google.com/p/pos-projection/

Available at:

Portland has a thriving music scene .

NOUNADJADJ

Portland hat eine prächtig gedeihende Musikszene .

প�োর্ট�ল্যোন্ড শহর এর সঙ্গীত �রিরবে�শ প�শ উন্নত | 

NOUN VERB DET ADJ NOUN NOUN .

Portland tiene una escena musical vibrante .

波特兰 有 一个 生机勃勃的 音乐 场景

Portland a une scène musicale florissante .

ADJNOUNNOUN NOUN

ADP

.NOUN VERB DET ADJ

ADJ

.NOUN VERB DET ADJ NOUN NOUNNOUN VERB DET NOUN

ADJ

ADJ .

Questions?

67

Recommended