10 24 Smoothing

Smoothing

LING 570

Fei Xia

Week 5: 10/24/07

Smoothing

• What?

• Why?– To deal with events observed zero times.– “event”: a particular ngram

• How?– To shave a little bit of probability mass from the higher

counts, and pile it instead on the zero counts

– For the time being, we assume that there are no unknown words; that is, V is a closed vocabulary.

Smoothing methods

• Laplace smoothing (a.k.a. Add-one smoothing)• Good-Turing Smoothing

• Linear interpolation (a.k.a. Jelinek-Mercer smoothing)• Katz Backoff• Class-based smoothing

• Absolute discounting • Kneser-Ney smoothing

Laplace smoothing

• Add 1 to all frequency counts.

• Let V be the vocabulary size.

P (wi jwi ¡ 1) = c(wi ;wi ¡ 1)c(wi ¡ 1)P (wi jwi ¡ 1)

Laplace smoothing: n-gram

Bigram:

n-gram:

Problem with Laplace smoothing

• Example: |V|=100K, a bigram “w1 w2” occurs 10 times, and the bigram ‘w1 w2 w3” occurs 9 times.– P_{MLE} (w3 | w1, w2) = 0.9– P_{Lap}(w3 | w1, w2) = (9+1)/(10+100K) =

0.0001

• Problem: give too much probability mass to unseen n-grams.

Add-one smoothing works horribly in practice.

It works better than add-one, but still works horribly.

Need to choose

Good-Turing smoothing

Basic ideas• Re-estimate the frequency of zero-count N-grams with the

number of N-grams that occur once.

• Let Nc be the number of n-grams that occurred c times.

• The Good-Turing estimate for any n-gram that occurs c times, we should pretend that it occurs c* times:

N0 is the number of unseen ngrams

For unseen n-grams, we assume that each of them occurs c0

* times.

The total prob mass for unseen ngrams is:

Therefore, the prob of EACH unseen ngram is:

An example

N-gram counts to conditional probability

c* comes from GT estimate.

Issues in Good-Turing estimation

• If Nc+1 is zero, how to estimate c*?– Smooth Nc by some functions: Ex: log(Nc) = a + b log(c)

– Large counts are assumed to be reliabl gt_max[ ] Ex: c* = c for c > gt_max

• May also want to treat n-grams with low counts (especially 1) as zeroes gt_min[ ].

• Need to renormalize all the estimate to ensure that the probs add to one.

• Good-Turing is often not used by itself; it is used in combination with the backoff and interpolation algorithms.

One way to implement Good-Turing

• Let N be the number of trigram tokens in the training corpus, and min3 and max3 be the min and max cutoffs for trigrams.

• From the trigram counts

calculate N_0, N_1, …, Nmax3+1, and N calculate a function f(c) , for c=0, 1, …, max3.

• Define c* = c if c > max3 = f(c) otherwise

• Do the same for bigram counts and unigram counts.

Good-Turing implementation (cont)

• Estimate trigram conditional prob:

• For an unseen trigram, the joint prob is:

Do the same for unigram and bigram models

Backoff and interpolation

N-gram hierarchy

• P3(w3|w1,w2), P2(w3|w2), P1(w3)

• Back off to a lower N-gram

backoff estimation

• Mix the probability estimates from all the N-grams interpolation

Katz Backoff

Pkatz (wi|wi-2, wi-1) =

P3(wi |wi-2, wi-1) if c(wi-2, wi-1, wi) > 0

®(wi-2, wi-1) Pkatz (wi|wi-1) o.w.

Pkatz (wi|wi-1) =

P2(wi | wi-1) if c(wi-1, wi) > 0

®(wi-1) P1(wi) o.w

Katz backoff (cont)

• ® are used to normalize probability mass so that it still sums to 1, and to “smooth” the lower order probabilities that are used.

• See J&M Sec 4.7.1 for details of how to calculate ® (and M&S 6.3.2 for additional discussion)

Jelinek-Mercer smoothing(interpolation)

Bigram:

Trigram:

Interpolation (cont)

How to set the value for ¸ i ?

How to set ¸i?

• Generally, here’s what’s done:– Split data into training, held-out, and test– Train model on training set– Use held-out to test different values and pick

the ones that works best (i.e., maximize the likelihood of the held-out data)

– Test the model on the test data

So far

• Laplace smoothing:

• Good-Turing Smoothing: gt_min[ ], gt_max[ ].

• Linear interpolation: ¸i(wi-2,wi-1)

• Katz Backoff: ®(wi-2,wi-1)

Additional slides

Another example for Good-Turing

• 10 tuna, 3 unagi, 2 salmon, 1 shrimp, 1 octopus, 1 yellowtail

• How likely is octopus? Since c(octopus) = 1. The GT estimate is 1*.

• To compute 1*, we need n1=3 and n2=1.

• What happens when Nc = 0?

Absolute discounting

What is the value for D?

How to set ®(x)?

Intuition for Kneser-Ney smoothing

• I cannot find my reading __

• P(Francisco | reading ) > P(glasses | reading)– Francisco is common, so interpolation gives

P(Francisco | reading) a high value

– But Francisco occurs in few contexts (only after San), whereas glasses occurs in many contexts.

– Hence weight the interpolation based on number of contexts for the word using discounting

Words that have appeared in more contexts are more likely to appear in some new context as well.

Kneser-Ney smoothing (cont)

Interpolation:

Backoff:

Class-based LM

• Examples: – The stock rises to $81.24– He will visit Hyderabad next January

• P(wi | wi-1)

¼ P(ci-1 | wi-1) * P(ci | ci-1) * P(wi | ci)

• Hard clustering vs. soft clustering

Summary• Laplace smoothing (a.k.a. Add-one smoothing)

• Good-Turing Smoothing

• Linear interpolation: ¸(wi-2,wi-1), ¸(wi-1)

• Katz Backoff: ®(wi-2,wi-1), ®(wi-1)

• Absolute discounting: D, ®(wi-2,wi-1), ®(wi-1)

• Kneser-Ney smoothing: D, ®(wi-2,wi-1), ®(wi-1)

• Class-based smoothing: clusters

10 24 Smoothing

Documents

WHO SMOOTHES WHAT ASSET SMOOTHING ...economics.ucr.edu/seminars_colloquia/2010/applied...WHO SMOOTHES WHAT? ASSET SMOOTHING, CONSUMPTION SMOOTHING & UNMITIGATED RISK IN BURKINA FASO

1 Exponential Smoothing Methods. Slide 2 Chapter Topics Introduction to exponential smoothing Simple Exponential Smoothing Holt’s Trend Corrected Exponential

Bayesian Filtering and Smoothing - Cambridge University …assets.cambridge.org/97811070/30657/frontmatter/9781107030657... · Bayesian Filtering and Smoothing Filtering and smoothing

Gaussian Kernel Smoothing - Semantic Scholar · Gaussian Kernel Smoothing Moo K. Chung mchung@stat.wisc.edu July 30, 2012 1 Kernel Smoothing Gaussian kernel smoothing most widely

Doubly Cyclic Smoothing Splines and Analysis of Seasonal ...math.bu.edu/keio2016/talks/Minami.pdfSmoothing mechanism of cyclic cubic smoothing splines 5 10 15 20 2.70 2.75 2.80 2.85

Pension Smoothing

Exponential Smoothing Method - web.uettaxila.edu.pk€¦ · Exponential Smoothing with Trend Adjustment • Simple exponential smoothing - first-order smoothing • Trend adjusted

Inertial Fusion Power Development:Path for Global Warming … · 2008. 11. 24. · Laser tech.: harmonic irradiation, Smoothing (RPP, SSD) 10-keV temperature demo:Japan and US ~1013

Deep Imitation Learning of Sequential Fabric Smoothing Policies · 2019-10-14 · Deep Imitation Learning of Sequential Fabric Smoothing Policies Daniel Seita 1, Aditya Ganapathi

SMOOTHING CERTIFICATION...Keratin Smoothing Treatment, Natural Keratin Smoothing Treatment for Blonde Hair and Express Blow Out®. This comprehensive smoothing class is designed to

Distributed Smoothing Based on Relative …hespanha/published/JoshRussel-20120605.pdfDistributed Smoothing Based on Relative Measurements ... Distributed Smoothing Based on Relative

Managerial Entrenchment and Earnings Smoothing · Managerial Entrenchment and Earnings Smoothing Abstract This paper investigates the association between earnings smoothing and managerial

Complex Exponential Smoothing · Keywords: Forecasting, exponential smoothing, ETS, model selection, information potential, complex variables 1. Introduction Exponential smoothing

Exponential Smoothing - Wharton Statistics Departmentstine/insr260_2009/lectures/expo_smth.pdf · Overview Smoothing Exponential smoothing Model behind exponential smoothing Forecasts

Gaussian Smoothing

Dividend Smoothing

On the Over-Smoothing Problem of CNN Based Disparity Estimationopenaccess.thecvf.com/content_ICCV_2019/papers/Chen_On... · 2019. 10. 23. · the over-smoothing problem suffered by

Return Smoothing, Liquidity Costs, and Investor Flows: Evidence …alo.mit.edu/wp-content/uploads/2015/08/Farnsworth_Paper2... · 2015-08-10 · Return Smoothing, Liquidity Costs,

TMRS Board Meeting Asset Smoothing · Agenda What is asset smoothing? How does asset smoothing really apply to a retirement system? Key components of asset smoothing • Smoothing

SMOOTHING Monday,October 15,2018 CERTIFICATION 10:00 … Smoothing Ed Flyer - Marlo 10-151.pdfRSVP: Marlo Meyers-Barer (510)363-6833 Attendance mandatory to be certified. Investment:$35