67
Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University

Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

  • Upload
    iain

  • View
    41

  • Download
    0

Embed Size (px)

DESCRIPTION

Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections. Dipanjan Das Carnegie Mellon University. Slav Petrov Google Research. June 21 ACL 2011. Part-of-Speech Tagging. Portland has a thriving music scene . . ADJ. NOUN. NOUN. NOUN. DET. VERB. - PowerPoint PPT Presentation

Citation preview

Page 1: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

Unsupervised Part-of-Speech Tagging

with Bilingual Graph-Based Projections

June 21ACL 2011

Slav PetrovGoogle Research

Dipanjan DasCarnegie Mellon University

Page 2: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

Part-of-Speech Tagging

Portland has a thriving music scene .NOUN VERB DET ADJ NOUN NOUN .

2

Page 3: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

Supervised POS Tagging

ArabicBasque

BulgarianCatalanChinese

CzechDanishEnglishFrench

GermanGreek

HungarianItalian

JapaneseKorean

PortugueseRussianSloveneSpanishSwedishTurkish

84 86 88 90 92 94 96 98 100

96.9

93.797.8

98.2

93.499.1

96.4

96.8

96.7

98.1

97.5

95.6

95.8

98.7

97.5

96.8

96.8

94.6

96.3

94.7

89.1

POS Accuracy

Supervised setting: average accuracy is 96.2% 3

TnT (Brants, 2000)

Page 4: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

Resource-Poor Languages

Several major languages with no or little annotated data

OriyaIndonesian-Malay

Azerbaijani

e.g.

See http://www.ethnologue.org/ethno_docs/distribution.asp?by=size

Haitian

However, lots of parallel and

unannotated data!Basic NLP tools like POS

tagging essential for development of

language technologies

4

PunjabiVietnamese

Polish32 million37 million20 million

Native speakers

7.7 million

109 million69 million40 million

Page 5: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

(Nearly) Universal Part-of-Speech Tags

VERB DETNOUN CONJPRON NUMADJ PRTADV .ADP X

5

Page 6: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

(Nearly) Universal Part-of-Speech Tags

Example Penn Treebank tag maps:

NN NOUNNNP NOUNNNPS NOUNNNS NOUN

PRP PRONPRP$ PRONWP PRONWP$ PRON

np NOUNnc NOUN

Example Spanish Treebank tag maps:p0 PRONpd PRONpe PRONpi PRONpn PRON

pp PRONpr PRONpt PRONpx PRON

See Petrov, Das and McDonald (2011)

Page 7: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

(Nearly) Universal Part-of-Speech Tags

Portland has a thriving music scene .NOUN VERB DET ADJ NOUN NOUN .

Portland hat eine prächtig gedeihende Musikszene .NOUN VERB DET ADJ ADJ NOUN .

প�োর্ট�ল্যোন্ড শহর এর সঙ্গীত �রিরবে�শ প�শ উন্নত | NOUN NOUN ADP NOUN NOUN ADJ ADJ .

7

Page 8: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

Supervised Universal POS Tagging

ArabicBasque

BulgarianCatalanChinese

CzechDanishEnglishFrench

GermanGreek

HungarianItalian

JapaneseKorean

PortugueseRussianSloveneSpanishSwedishTurkish

84 86 88 90 92 94 96 98 100

96.9

93.797.8

98.2

93.499.1

96.4

96.8

96.7

98.1

97.5

95.6

95.8

98.7

97.5

96.8

96.8

94.6

96.3

94.7

89.1

POS Accuracy

Less variance in accuracy than fine tags 8

TnT (Brants, 2000)

Page 9: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

State of the Art in Unsupervised POS Tagging

9

Page 10: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

Unsupervised Part-of-Speech Tagging

Portland hat eine prächti

g gedeihende Musikszene .

? ? ? ? ? ? ?

Hidden Markov Model (HMM) estimated with the Expectation-Maximization algorithm

: observation sequence: state sequence

10Merialdo (1994)

Page 11: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

Unsupervised Part-of-Speech Tagging

Portland hat eine prächti

g gedeihende Musikszene .

? ? ? ? ? ? ?

Hidden Markov Model (HMM) estimated with the Expectation-Maximization algorithm

one of the 12 coarse

tags

: observation sequence: state sequence

11Merialdo (1994)

Page 12: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

Unsupervised Part-of-Speech Tagging

Portland hat

? ?

Hidden Markov Model (HMM) estimated with the Expectation-Maximization algorithm

transition multinomials

: observation sequence: state sequence

12Merialdo (1994)

Page 13: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

Unsupervised Part-of-Speech Tagging

Portland hat

? ?

Hidden Markov Model (HMM) estimated with the Expectation-Maximization algorithm

emission multinomials

: observation sequence: state sequence

13Merialdo (1994)

Page 14: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

Unsupervised Part-of-Speech Tagging

Portland hat eine prächti

g gedeihende Musikszene .

? ? ? ? ? ? ?

Hidden Markov Model (HMM) estimated with the Expectation-Maximization algorithm

Danish

Dutch German

Greek Italian Portuguese

Spanish

Swedish

Average

68.7 57.0 75.9 65.8 63.7 62.9 71.5 68.4 66.7EM-HMM

Poor average result

14Johnson (2007)

Page 15: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

Unsupervised Part-of-Speech TaggingHidden Markov Model (HMM)

with locally normalized log-linear models

: observation sequence

Portland hat

? ? emission multinomials

: state sequence

15

Berg-Kirkpatrick et al. (2010)

Page 16: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

Unsupervised Part-of-Speech TaggingHidden Markov Model (HMM)

with locally normalized log-linear models

: observation sequence

Portland hat

? ? emission multinomials

suffixhyphen

capital lettersnumbers

...: state sequence

16

Berg-Kirkpatrick et al. (2010)

Page 17: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

Unsupervised Part-of-Speech TaggingHidden Markov Model (HMM)

with locally normalized log-linear models

: observation sequence

Portland hat

? ? emission multinomials

suffixhyphen

capital lettersnumbers

...

Estimated using gradient-based methods

: state sequence

17

Berg-Kirkpatrick et al. (2010)

Page 18: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

Unsupervised Part-of-Speech TaggingHidden Markov Model (HMM)

with locally normalized log-linear models

Portland hat

? ? emission multinomials

Danish

Dutch German

Greek Italian Portuguese

Spanish

Swedish

Average

68.7 57.0 75.9 65.8 63.7 62.9 71.5 68.4 66.769.1

65.1 81.3 71.8 68.1 78.4 80.2 70.1 73.0EM-HMM

Feature-HMM

Estimated using gradient-based methods

Improvements across all languages

18

Berg-Kirkpatrick et al. (2010)

Page 19: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

Portland hat eine prächti

g gedeihende Musikszene .

NOUN VERB

PRONDETADJNUM

ADJADV ADJ NOUN .

Unsupervised POS Tagging with Dictionaries

Hidden Markov Model (HMM) with locally normalized log-linear modelsState space constrained by possible gold

tags

19

Page 20: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

Portland hat eine prächti

g gedeihende Musikszene .

NOUN VERB

PRONDETADJNUM

ADJADV ADJ NOUN .

Unsupervised POS Tagging with Dictionaries

Hidden Markov Model (HMM) with locally normalized log-linear modelsState space constrained by possible gold

tags

Danish

Dutch German

Greek Italian Portuguese

Spanish

Swedish

Average

68.7 57.0 75.9 65.8 63.7 62.9 71.5 68.4 66.769.1 65.1 81.3 71.8 68.1 78.4 80.2 70.1 73.0

93.1

94.7 93.5 96.6 96.4 94.0 95.8 85.5 93.7

EM-HMMFeature-HMM

w/ gold dictionary

Average result close to supervised accuracy! 20

Page 21: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

For most languages, access to high-quality

tag dictionaries is not realistic.

Ideas: 1) Use supervision in resource-rich

languages2) Use parallel data

3) Construct projected tag lexicons

21

Morphologically rich languages only have base

forms in dictionaries

Page 22: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

Bilingual Projection

Portland has a thriving music scene .

NOUN VERB DET ADJ NOUN NOUN .

automatic labels from supervised tagger, 97% accuracy

22

Page 23: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

Bilingual Projection

Portland has a thriving music scene .

NOUN VERB DET ADJ NOUN NOUN .

Portland hat eine prächtig gedeihende Musikszene .

Automatic unsupervised alignments from translation data(available for more than 50 languages)

23

Page 24: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

Bilingual Projection

Portland has a thriving music scene .

NOUN VERB DET ADJ NOUN NOUN .

Portland hat eine prächtig gedeihende Musikszene .

Idea 1: direct projectionunaligned word

NOUN(most frequent tag)

24Yarowsky and Ngai

(2001)

Page 25: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

Bilingual Projection

Idea 1: direct projection

Portland hat eine prächtig gedeihende Musikszene .NOUN VERB DET NOUN ADJ NOUN .

+more projected tagged sentences

supervised training

tagger

25

(Brants, 2000)

Yarowsky and Ngai (2001)

Page 26: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

Bilingual Projection

Idea 1: direct projection

26

Danish

Dutch German

Greek Italian Portuguese

Spanish

Swedish

Average

68.7 57.0 75.9 65.8 63.7 62.9 71.5 68.4 66.769.1 65.1 81.3 71.8 68.1 78.4 80.2 70.1 73.0

73.6

77.0 83.2 79.3 79.7 82.6 80.1 74.7 78.8

EM-HMM

Directprojection

Feature-HMM

Yarowsky and Ngai (2001)

Page 27: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

Bilingual Projection

Idea 1: direct projection

27

Danish

Dutch German

Greek Italian Portuguese

Spanish

Swedish

Average

68.7 57.0 75.9 65.8 63.7 62.9 71.5 68.4 66.769.1 65.1 81.3 71.8 68.1 78.4 80.2 70.1 73.0

73.6

77.0 83.2 79.3 79.7 82.6 80.1 74.7 78.8

EM-HMM

Directprojection

Feature-HMM

Yarowsky and Ngai (2001)

consistent improvements over unsupervised models

Page 28: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

Bilingual ProjectionIdea 2: lexicon projection

28

Page 29: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

Bilingual ProjectionIdea 2: lexicon projection

NOUNPortland

VERBhas

DETa

ADJthriving

NOUNmusic

NOUNscene

.

.

Portland

hat

eine

prächtig

gedeihende

Musikszene

.

29

Page 30: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

Bilingual ProjectionIdea 2: lexicon projection

NOUNPortland Portlan

d

ADJthrivinggedeihe

ndeprächtigVERB

has

hat

DETa

eine

NOUNsceneMusiksze

ne

NOUNmusic

.

..

ignore unaligned word

30

Page 31: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

Bilingual ProjectionIdea 2: lexicon projection

NOUNPortland Portlan

d

ADJthrivinggedeihe

nde

VERBhas

hat

DETa

eine

NOUNsceneMusiksze

ne

NOUNmusic

.

..

Bag of alignments

31

Page 32: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

Bilingual ProjectionIdea 2: lexicon projection

NOUNPortland Portlan

d

ADJthrivinggedeihe

nde

VERBhas

hat

eine

NOUNsceneMusiksze

ne

NOUNmusic

.

..

ADJNOUN

XNUM

PRONVERB

DET

DETa

32ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

Page 33: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

Bilingual ProjectionIdea 2: lexicon projection

NOUNPortland Portlan

d

ADJthrivinggedeihe

nde

VERBhas

hat

eine

NOUNscene

NOUNmusic

.

..

DETa NUMon

e

PRONone

Musikszene

33ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

Page 34: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

Bilingual ProjectionIdea 2: lexicon projection

NOUNPortland Portlan

d

ADJthrivinggedeihe

nde

VERBhas

hat

eine

NOUNscene

NOUNmusic

.

..

DETa NUMon

e

PRONone

Musikszene

VERBthriving

34ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DETADJ

NOUNX

NUMPRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

Page 35: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

Bilingual ProjectionIdea 2: lexicon projection

Portland

gedeihende

hat

eine

Musikszene

.

After scanning all the parallel data:

= probability of a tag given a word

35

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

Page 36: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

Bilingual ProjectionIdea 2: lexicon projection

Feature HMM constrained with projected dictionary

Improvements over simple projection for majority of the languages

36

Danish

Dutch German

Greek Italian Portuguese

Spanish

Swedish

Average

68.7 57.0 75.9 65.8 63.7 62.9 71.5 68.4 66.769.1 65.1 81.3 71.8 68.1 78.4 80.2 70.1 73.0

73.6 77.0 83.2 79.3 79.7 82.6 80.1 74.7 78.879.0

78.8 82.4 76.3 84.8 87.0 82.8 79.4 81.3

EM-HMM

DirectprojectionProjectedDictionar

y

Feature-HMM

Page 37: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

Can coverage be improved?Idea:Projected lexicon

expansion and refinement using

a lot of unlabeled data

No information about unaligned words

37

Portland hat eine prächtig gedeihende Musikszene .

Portland has a thriving music scene .

NOUN VERB DET ADJ NOUN NOUN .

Page 38: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

Brief Overview:Graph-Based Learning

with Labeled and Unlabeled Data

38

Page 39: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

labeled datapoints unlabeled datapoints

supervised label distributions

distributions to be found

Zhu, Ghahramani and Lafferty (2003) 39

0.9

0.01

0.8

0.9

0.1

= symmetric weight matrix

0.05

Page 40: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

0.9

0.01

0.8

0.9

0.1

Label Propagation

Zhu, Ghahramani and Lafferty (2003) 40

0.05

Page 41: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

0.9

0.01

0.8

0.9

0.1

set of distributions over unlabeled vertices

Zhu, Ghahramani and Lafferty (2003) 41

Label Propagation

0.05

Page 42: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

0.9

0.01

0.8

0.9

0.1

unlabeled vertices

Zhu, Ghahramani and Lafferty (2003) 42

Label Propagation

0.05

Page 43: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

0.9

0.01

0.8

0.9

0.1

brings the distributions of similarvertices closer

Zhu, Ghahramani and Lafferty (2003) 43

Label Propagation

0.05

Page 44: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

0.9

0.01

0.8

0.9

0.1

brings the distributions of uncertain neighborhoods

close to the uniform distribution

Size of the label set

Zhu, Ghahramani and Lafferty (2003) 44

Label Propagation

0.05

Page 45: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

0.9

0.01

0.8

0.9

0.1

Iterative updates for optimization

Zhu, Ghahramani and Lafferty (2003) 45

Label Propagation

0.05

Page 46: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

How can label propagation help? For a language:

1) Build graph over a 2M trigram types as vertices• compute similarity matrix using co-occurrence

statistics

2) Label distribution at each vertex tag distribution over the trigram’s middle word

Subramanya, Petrov and Pereira (2010)

Idea 3: Graph-Based Projections

46

Page 47: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

ist gut bei

ist lebhafter bei

ist wichtig bei

ist fein bei

gutem Essen zugetan

fuers Essen drauf

1000 Essen proschlechtes Essen

und

zum Essen niederlassen

zu realisieren ,zu essen ,

zu stecken , zu erreichen ,

Example Graph in German

47

Page 48: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

ist gut bei

ist lebhafter bei

ist wichtig bei

ist fein bei

gutem Essen zugetan

fuers Essen drauf

1000 Essen proschlechtes Essen

und

zum Essen niederlassen

zu realisieren ,zu essen ,

zu stecken , zu erreichen ,

Example Graph in German

48

NOUN

VERB

Page 49: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

How can label propagation help? For a target language:

1) Build graph over a 2M trigram types as vertices• compute similarity matrix using co-occurrence

statistics

2) Label distribution at each vertex tag distribution over the trigram’s middle word3) Plug in auto-tagged words from a source language

4) Links between source and target language units are word alignments

Idea 3: Graph-Based Projections

49

Page 50: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

ist gut bei

ist lebhafter bei

ist wichtig bei

ist fein bei

gutem Essen zugetan

fuers Essen drauf

1000 Essen proschlechtes Essen

und

zum Essen niederlassen

zu realisieren ,zu essen ,

zu stecken , zu erreichen ,eat

food

eateating

NOUNVERB

VERBVERB

goodADJ

nicelyADV

fineADJ

importantADJBilingual Graph

50

Page 51: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

How can label propagation help?

For a target language:

3) Plug in auto-tagged words from a source language

4) Links between source and target language units are word alignments

5) Run first stage of label propagation• Source language target language

Idea 3: Graph-Based Projections

51

Page 52: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

ist gut bei

ist lebhafter bei

ist wichtig bei

ist fein bei

gutem Essen zugetan

fuers Essen drauf

1000 Essen proschlechtes Essen

und

zum Essen niederlassen

zu realisieren ,zu essen ,

zu stecken , zu erreichen ,eat

food

eateating

NOUNVERB

VERBVERB

goodADJ

nicelyADV

fineADJ

importantADJ

First Stage of Label Propagation

52

Page 53: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

ist gut bei

ist lebhafter bei

ist wichtig bei

ist fein bei

gutem Essen zugetan

fuers Essen drauf

1000 Essen proschlechtes Essen

und

zum Essen niederlassen

zu realisieren ,zu essen ,

zu stecken , zu erreichen ,eat

food

eateating

NOUNVERB

VERBVERB

goodADJ

nicelyADV

fineADJ

importantADJ

First Stage of Label Propagation

VERBNOUN

53

ADJADV

ADJADV

Page 54: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

How can label propagation help?

For a target language:

3) Plug in auto-tagged words from a source language

4) Links between source and target language units are word alignments

5) Run first stage of label propagation• Source language target language

6) Run second stage of label propagation • within target language vertices• graph objective function with squared

penalties

Idea 3: Graph-Based Projections

54

Page 55: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

ist gut bei

ist lebhafter bei

ist wichtig bei

ist fein bei

gutem Essen zugetan

fuers Essen drauf

1000 Essen proschlechtes Essen

und

zum Essen niederlassen

zu realisieren ,zu essen ,

zu stecken , zu erreichen ,eat

food

eateating

NOUNVERB

VERBVERB

goodADJ

nicelyADV

fineADJ

importantADJ

Second Stage of Label Propagation

VERBNOUN

55

ADJADV

ADJADV

Page 56: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

ADJist gut bei

ist lebhafter bei

ist wichtig bei

ist fein bei

gutem Essen zugetan

fuers Essen drauf

1000 Essen proschlechtes Essen

und

zum Essen niederlassen

zu realisieren ,zu essen ,

zu stecken , zu erreichen ,eat

food

eateating

NOUNVERB

VERBVERB

goodADJ

nicelyADV

fineADJ

importantADJ

Second Stage of Label Propagation

VERBNOUN

ADJ

VERBNOUN

56

ADJADV

ADJADV

Page 57: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

ADJist gut bei

ist lebhafter bei

ist wichtig bei

ist fein bei

gutem Essen zugetan

fuers Essen drauf

1000 Essen proschlechtes Essen

und

zum Essen niederlassen

zu realisieren ,zu essen ,

zu stecken , zu erreichen ,eat

food

eateating

NOUNVERB

VERBVERB

goodADJ

nicelyADV

fineADJ

importantADJ

Second Stage of Label Propagation

VERBNOUN

ADJ

VERBNOUN

VERB

VERBNOUN

57

ADJADV

ADJADV

Page 58: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

ADJist gut bei

ist lebhafter bei

ist wichtig bei

ist fein bei

gutem Essen zugetan

fuers Essen drauf

1000 Essen proschlechtes Essen

und

zum Essen niederlassen

zu realisieren ,zu essen ,

zu stecken , zu erreichen ,eat

food

eateating

NOUNVERB

VERBVERB

goodADJ

nicelyADV

fineADJ

importantADJ

Second Stage of Label Propagation

VERBNOUN

ADJ

VERBNOUN

VERB

VERBNOUN

58

ADJADV

ADJADV

Continues till convergence...

Page 59: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

fein lebhafter realisieren

Idea 3: Graph-Based Projections

Portland

gedeihende

hat

eine

Musikszene

.

End result?

59

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

ADVNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

Page 60: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

fein lebhafter realisieren

Idea 3: Graph-Based Projections

Portland

gedeihende

hat

eine

Musikszene

.

60

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

ADJNOUN

ADVNUM

PRONVERB

DET

ADJNOUN

XNUM

PRONVERB

DET

A larger set of tag distributions better and larger dictionary

Page 61: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

Idea 3: Graph-Based Projections

Feature HMM constrained with graph-based dictionary

Danish

Dutch German

Greek Italian Portuguese

Spanish

Swedish

Average

68.7 57.0 75.9 65.8 63.7 62.9 71.5 68.4 66.769.1 65.1 81.3 71.8 68.1 78.4 80.2 70.1 73.0

73.6 77.0 83.2 79.3 79.7 82.6 80.1 74.7 78.879.0 78.8 82.4 76.3 84.8 87.0 82.8 79.4 81.383.2

79.5 82.8 82.5 86.8 87.9 84.2 80.5 83.4

EM-HMM

DirectprojectionProjectedDictionar

yGraph-BasedProjections

Feature-HMM

Page 62: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

Idea 3: Graph-Based Projections

Feature HMM constrained with graph-based dictionary

Danish

Dutch German

Greek Italian Portuguese

Spanish

Swedish

Average

68.7 57.0 75.9 65.8 63.7 62.9 71.5 68.4 66.769.1 65.1 81.3 71.8 68.1 78.4 80.2 70.1 73.0

73.6 77.0 83.2 79.3 79.7 82.6 80.1 74.7 78.879.0 78.8 82.4 76.3 84.8 87.0 82.8 79.4 81.383.2

79.5 82.8 82.5 86.8 87.9 84.2 80.5 83.4

EM-HMM

Feature-HMM

DirectprojectionProjectedDictionar

yGraph-BasedProjections

93.1 94.7 93.5 96.6 96.4 94.0 95.8 85.5 93.7w/ gold dictionary

96.9 94.9 98.2 97.8 95.8 97.2 96.8 94.8 96.6supervised62

Page 63: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

63

Idea 3: Graph-Based Projections

Lexicon Expansion

Danish

Dutch

German

Greek

Italian

Portu

guese

Span

ish

Swed

ish0

20000400006000080000

100000120000140000

Projected Dictionary Graph-Based Projections

thousandsof words

Page 64: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

Concluding Notes• Soft expansion of lexicon using parallel

data and supervision in a resource rich language–Graph-based learning helps in almost all cases

• Reasonably accurate POS taggers without direct supervision• Traditional evaluation of unsupervised POS

taggers done using greedy metrics that use labeled data–Our presented models avoid these evaluation methods

• Practically no hyperparameter tuning–except a threshold parameter for dictionary construction

64

Page 65: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

Future Directions• Scaling up the number of nodes in the

graph from 2M to billions may help create larger lexicons

• Including penalties in the graph objective that induce sparse tag distributions at each graph vertex

• Inclusion of multiple languages in the graph may further improve results–Label propagation in one huge multilingual graph

65

Page 66: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

66

Projected Data

http://code.google.com/p/pos-projection/

Available at:

Page 67: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

Portland has a thriving music scene .

NOUNADJADJ

Portland hat eine prächtig gedeihende Musikszene .

প�োর্ট�ল্যোন্ড শহর এর সঙ্গীত �রিরবে�শ প�শ উন্নত | 

NOUN VERB DET ADJ NOUN NOUN .

Portland tiene una escena musical vibrante .

波特兰 有 一个 生机勃勃的 音乐 场景

Portland a une scène musicale florissante .

ADJNOUNNOUN NOUN

ADP

.NOUN VERB DET ADJ

ADJ

.NOUN VERB DET ADJ NOUN NOUNNOUN VERB DET NOUN

ADJ

ADJ .

Questions?

67