Unsupervised Constraint Driven Learning for Transliteration Discovery M. Chang, D. Goldwasser, D. Roth, and Y. Tu

Unsupervised Constraint Driven Learning for Transliteration Discovery

M. Chang, D. Goldwasser, D. Roth, and Y. Tu

What I am going to do today…

Goal 1 : Present the transliteration work Get feedback!

Goal 2: Think about this work with CCM Tutorial …. I will try to present this work in a slightly

different way Some of them are my personal comment Different than our yesterday discussion

Please give us comment about this Make this work more general (not only

transliteration)

Wait a sec! What is CCM?

),(maxarg xyfwTy

Cc

cT

y xycxyfw ),,(),(maxarg

Constraints Driven Learning

Why Constraints? The Goal: Building a good system easily We have prior knowledge at our hand

Why not inject knowledge directly ?

How useful are constraints? Useful for supervised learning [Yih and Roth 04]

[many others]

Useful for semi-supervised learning [Chang et.al. ACL 2007]

Some times more efficient than labeling data directly

Unsupervised Constraint Driven Learning

In this work We do not use any label instance Achieve to good performance that competitive

several supervised model

Compared to [Chang et.al. ACL 2007]

In ACL 07, they use a small amount of dataset (5-20)

Reason: Bad Models can not benefit from constraints!

For some applications, we have very good resource We do not need labeled instances at all!

6

In a nutshell:

Traditional semi-supervised learning.

• Model can drift from the correct one. Model

Unlabeled Data

Prediction

Label unlabeled data

Feedback

Learn from labeled data

Unsupervised Learning

Resource

7

In a nutshell:

CODLUse constraints to generate better training samples in unsupervised learning.

Prediction+ Constraints

Model

Unlabeled Data

PredictionFeedback

More accurate labeling

Better Model

CODLImproves “Simple” Model

Using Expressive Constraints

Outline

Constraint Driven Learning (CoDL)

Transliteration Discovery

Algorithm

Experimental Results

Transliteration Generation (Not our focus)

Given a Source Transliteration; What is the target transliteration? Bush

布希 Sushi

壽司 Issues

Ambiguity : For the same source word, many different

transliteration Think about Chinese

What we want: find the most widely used transliteration

Transliteration Discovery (Our focus)

Problem Settings Give you two list of words, map them!

Advantages A relatively easy problem Can find the most widely used transliteration

Assumption: Source: English Each source entities has a transliteration in the

target candidates Target candidates might not be named entities

Outline



Algorithm


Algorithm Outline

Prediction Model

How to use existing resource to construct the Model?

Constraints?

Learning Algorithm

The Prediction Model

How do we make prediction? Given a source word, how to predict the best target

?

Model 1 : Vs, Vt Yes or No Issue: Not many obvious constraints can be added Not a structure prediction problem

Model 2: Vs, Vt Hidden variables Yes or No Predicting F is a structure prediction algorithm We can add constraints more easily

The Prediction Model

Score for a pair

A CCM formulation

A slightly different scoring function

Ff

tsts vvfWfvvWFscore ),|(]1[),,|(

Cc

tststs cvvFvvWFscorevvFg ),,|(),,|(),,(

More on this point in the next few slides

),(maxarg*ts vvscorev

t

||

),,(maxarg),(

t

tsFts v

vvFgvvscore

Hidden Variables

Violation

Prediction Model: Another View

The scoring function looks like weight times features!

If there is a bad feature, score - ∞

Our Hidden variable (Feature Vectors): Character Mapping

),(

),|(]1[),,|(

tsT

Fftsts

vvFW

vvfWfvvWFscore

Cc

tstsT

ts cvvFvvFWvvFg ),,|(),(),,(

),( ts vvF

Everything

(a,a), (o,O), (w,_),……

Algorithm Outline

Prediction Model


Constraints?

Learning Algorithm

Resource: Romanization Table

Hebrew, Russian How can you type Hebrew or Russian?

Use English Keyboard, C maps to A similar character “C” or “S” in Hebrew or Russian

Very easy to get Ambiguous

Special Case: Chinese (Pin Yin)壽司 shòu sī (Low ambiguity) Map Pin-Yin to English (sushi) Romanization Table? a a

Initialize the Table

Every character pair in the Romanization Table Weight = 0 Everything else, -1 Could have better way to do initialization

Note: All (v_s,v_t) will get zero without constraints

Cc

tstsT

ts cvvFvvFWvvFg ),,|(),(),,(

Algorithm Outline

Prediction Model


Constraints?

Learning Algorithm

Constraints

General Constraints Coverage: all character need to be mapped at

least once No crossing: character mappings can not cross

each other

Language Specific Constraints General Restricted Mapping Initial Restricted Mapping Length Restriction

Constraints

Pin-Yin to EnglishMany other works use similar information as well!

Algorithm Outline

Prediction Model


Constraints?

Learning Algorithm

High-Level Overview

Model Resource While Converge

Use Model + Constraints to get Labels (for both F, y)

Update Model with newly labeled F and y (without Constraints) (details in the next slide)

Similar to ACL 07 Update the model without Constraints

Difference from ACL 07 We get feedback from the labels of both hidden

variables and output

Training

Predict hidden variables and the labels

Update

Algorithm

Outline



Algorithm


Experimental Setting

Evaluation

ACC: Top candidate is (one of) the right answer

Learning Algorithm Linear SVM with C = 0.5

Dataset English-Hebrew 300: 300 English-Chinese 581:681 English-Russian 727:50648 (Target includes all

words)

Results - Hebrew

Results - Russian

Analysis

A small Russian subset was used here

1) Without Constraints (on features), Romanization Table is useless!

2) General Constraints are more important!

4) Better Constraints Lead to Better Final Results

3) Learning has great impact here!But constraints are very important, too!

Related Works (Need more work here)

Learning the score for Edit Distance

Previous transliteration works

Machine translation?

Conclusion

ML: unsupervised constraint driven algorithm Use hidden variable to find more constraints (e.g. co-ref) Use constraints to find “cleaner” feature representation

Transliteration: Usage of Normalization Table as the starting point

We can get good results without training data Right constraints (modeling) is the key

Future Work Transliteration Model: Better Model, Quicker Inference CoDL: Other applications for unsupervised CoDL

33

Constraint - Driven Learning (CODL)

=learn(Tr)For N iterations do

T= For each x in unlabeled dataset

y Inference(x, )T=T {(x, y)}

= +(1- )learn(T)

Any supervised learning algorithm parametrized by

Learn from new training data.Weight supervised and unsupervised model(Nigam2000*).

Augmenting the training set (feedback). Any inference algorithm (with constraints).

Inference(x,C, )

34

Unsupervised Constraint - Driven Learning

=Construct(Resource)For N iterations do

T= For each x in unlabeled dataset

y Inference(x, )T=T {(x, y)}

= +(1- )learn(T)

Construct the model with Resources

Learn from new training data. = 0 in this work

Augmenting the training set (feedback). Any inference algorithm (with constraints).

Inference(x,C, )

Documents

Unsupervised Constraint Driven Learning for Transliteration Discovery M. Chang, D. Goldwasser, D. Roth, and Y. Tu