109

Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

  • Upload
    others

  • View
    14

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Chebyshev polynomials, moment matching

and optimal estimation of the unseen

Yihong Wu

Department of ECEUniversity of Illinois at Urbana-Champaign

[email protected]

Joint work with Pengkun Yang (Illinois)

Mar 17, 2014

Yihong Wu (Illinois) Estimating the unseen 1

Page 2: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Problem setup

Task

Given samples from a discrete distribution, how to make statisticalinference on certain property of the distribution?

discretedistribution

statisticalprocedure

decision/estimatesamples

Yihong Wu (Illinois) Estimating the unseen 2

Page 3: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Estimating the unseen

• Support size:

S(P ) =∑i

1{pi>0}

• Example:

S

= 5

• ⇔ estimating the number of unseens (SEEN + UNSEEN = S(P ))

Yihong Wu (Illinois) Estimating the unseen 3

Page 4: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Classical results

• maybe the Egyptians have studied it...

• Ecology:

• Linguistics, numismatics, etc:

Biomttrika (1976), 63, 3, pp. 436-47 4 3 5WithZ t«xt-flgun»

Printed in Great Britain

Estimating the number of unseen species: How manywords did Shakespeare know?

BY BRADLEY EFRON AND RONALD TBISTED

Department of Statistics, Stanford University, California

SUMMARY

Shakespeare wrote 31534 different words, of which 14376 appear only once, 4343 twice,etc. The question considered is how many words he knew but did not use. A parametricempirical Bayes model due to Fisher and a nonparametric model due to Good & Toulminare examined. The latter theory is augmented using linear programming methods. Weconclude that the models are equivalent to supposing that Shakespeare knew at least35000 more words.

Some key words: Empirical Bayes; Euler transformation; Linear programming; Negative binomial;Vocabulary.

1. LVTBODTJOTIOK

Estimating the number of unseen species is a familiar problem in ecological studies. Inthis paper the unseen species are words Shakespeare knew but did not use. Shakespeare'sknown works comprise 884647 total words, of which 14376 are types appearing just onetime, 4343 are types appearing twice, etc. These counts are based on Spevaok's (1968)concordance and on the summary appearing in an unpublished report by J. Gani &I. Saunders. Table 1 summarizes Shakespeare's word type counts, where nx is the numberof word types appearing exactly x times (x = 1,..., 100). Including the 846 word typeswhich appear more than 100 times, a total of

2 nx = 31534x-1

different word types appear. Note that 'type' or 'word type' will be used to indicate adistinct item in Shakespeare's vocabulary. 'Total words' will indicate a total word countincluding repetitions. The definition of type is any distinguishable arrangement of letters.Thus, 'girl' is a different type from 'girls' and 'throneroom' is a different type from both' throne' and ' room'.

How many word types did Shakespeare actually know? To put the question more opera-tionally, suppose another large quantity of work by Shakespeare were discovered, say884 647J total words. How many new word types in addition to the original 31534 would weexpect to find? For the case t = 1, corresponding to a volume of new Shakespeare equal tothe old, there is a surprisingly explicit answer. We will show that a parametric model dueto Fisher, Corbet & Williams (1943) and a nonparametric model due to Good & Toulmin(1956) both estimate about 11460 expected new word types, with an expected error of lessthan 150.

The case t = oo corresponds to the question as originally posed: how many word typesdid Shakespeare know? The mathematical model at the beginning of §2 makes explicitthe sense of the question. No upper bound is possible, but we will demonstrate a lower bound

at University of Illinois at U

rbana-Cham

paign on July 18, 2014http://biom

et.oxfordjournals.org/D

ownloaded from

• Will not discuss probability estimation[Good-Turing, Orlitsky et al., ...]

Yihong Wu (Illinois) Estimating the unseen 4

Page 5: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Classical results

• maybe the Egyptians have studied it...

• Ecology:

• Linguistics, numismatics, etc:

Biomttrika (1976), 63, 3, pp. 436-47 4 3 5WithZ t«xt-flgun»

Printed in Great Britain

Estimating the number of unseen species: How manywords did Shakespeare know?

BY BRADLEY EFRON AND RONALD TBISTED

Department of Statistics, Stanford University, California

SUMMARY

Shakespeare wrote 31534 different words, of which 14376 appear only once, 4343 twice,etc. The question considered is how many words he knew but did not use. A parametricempirical Bayes model due to Fisher and a nonparametric model due to Good & Toulminare examined. The latter theory is augmented using linear programming methods. Weconclude that the models are equivalent to supposing that Shakespeare knew at least35000 more words.

Some key words: Empirical Bayes; Euler transformation; Linear programming; Negative binomial;Vocabulary.

1. LVTBODTJOTIOK

Estimating the number of unseen species is a familiar problem in ecological studies. Inthis paper the unseen species are words Shakespeare knew but did not use. Shakespeare'sknown works comprise 884647 total words, of which 14376 are types appearing just onetime, 4343 are types appearing twice, etc. These counts are based on Spevaok's (1968)concordance and on the summary appearing in an unpublished report by J. Gani &I. Saunders. Table 1 summarizes Shakespeare's word type counts, where nx is the numberof word types appearing exactly x times (x = 1,..., 100). Including the 846 word typeswhich appear more than 100 times, a total of

2 nx = 31534x-1

different word types appear. Note that 'type' or 'word type' will be used to indicate adistinct item in Shakespeare's vocabulary. 'Total words' will indicate a total word countincluding repetitions. The definition of type is any distinguishable arrangement of letters.Thus, 'girl' is a different type from 'girls' and 'throneroom' is a different type from both' throne' and ' room'.

How many word types did Shakespeare actually know? To put the question more opera-tionally, suppose another large quantity of work by Shakespeare were discovered, say884 647J total words. How many new word types in addition to the original 31534 would weexpect to find? For the case t = 1, corresponding to a volume of new Shakespeare equal tothe old, there is a surprisingly explicit answer. We will show that a parametric model dueto Fisher, Corbet & Williams (1943) and a nonparametric model due to Good & Toulmin(1956) both estimate about 11460 expected new word types, with an expected error of lessthan 150.

The case t = oo corresponds to the question as originally posed: how many word typesdid Shakespeare know? The mathematical model at the beginning of §2 makes explicitthe sense of the question. No upper bound is possible, but we will demonstrate a lower bound

at University of Illinois at U

rbana-Cham

paign on July 18, 2014http://biom

et.oxfordjournals.org/D

ownloaded from

• Will not discuss probability estimation[Good-Turing, Orlitsky et al., ...]

Yihong Wu (Illinois) Estimating the unseen 4

Page 6: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Classical results

• maybe the Egyptians have studied it...

• Ecology:

• Linguistics, numismatics, etc:

Biomttrika (1976), 63, 3, pp. 436-47 4 3 5WithZ t«xt-flgun»

Printed in Great Britain

Estimating the number of unseen species: How manywords did Shakespeare know?

BY BRADLEY EFRON AND RONALD TBISTED

Department of Statistics, Stanford University, California

SUMMARY

Shakespeare wrote 31534 different words, of which 14376 appear only once, 4343 twice,etc. The question considered is how many words he knew but did not use. A parametricempirical Bayes model due to Fisher and a nonparametric model due to Good & Toulminare examined. The latter theory is augmented using linear programming methods. Weconclude that the models are equivalent to supposing that Shakespeare knew at least35000 more words.

Some key words: Empirical Bayes; Euler transformation; Linear programming; Negative binomial;Vocabulary.

1. LVTBODTJOTIOK

Estimating the number of unseen species is a familiar problem in ecological studies. Inthis paper the unseen species are words Shakespeare knew but did not use. Shakespeare'sknown works comprise 884647 total words, of which 14376 are types appearing just onetime, 4343 are types appearing twice, etc. These counts are based on Spevaok's (1968)concordance and on the summary appearing in an unpublished report by J. Gani &I. Saunders. Table 1 summarizes Shakespeare's word type counts, where nx is the numberof word types appearing exactly x times (x = 1,..., 100). Including the 846 word typeswhich appear more than 100 times, a total of

2 nx = 31534x-1

different word types appear. Note that 'type' or 'word type' will be used to indicate adistinct item in Shakespeare's vocabulary. 'Total words' will indicate a total word countincluding repetitions. The definition of type is any distinguishable arrangement of letters.Thus, 'girl' is a different type from 'girls' and 'throneroom' is a different type from both' throne' and ' room'.

How many word types did Shakespeare actually know? To put the question more opera-tionally, suppose another large quantity of work by Shakespeare were discovered, say884 647J total words. How many new word types in addition to the original 31534 would weexpect to find? For the case t = 1, corresponding to a volume of new Shakespeare equal tothe old, there is a surprisingly explicit answer. We will show that a parametric model dueto Fisher, Corbet & Williams (1943) and a nonparametric model due to Good & Toulmin(1956) both estimate about 11460 expected new word types, with an expected error of lessthan 150.

The case t = oo corresponds to the question as originally posed: how many word typesdid Shakespeare know? The mathematical model at the beginning of §2 makes explicitthe sense of the question. No upper bound is possible, but we will demonstrate a lower bound

at University of Illinois at U

rbana-Cham

paign on July 18, 2014http://biom

et.oxfordjournals.org/D

ownloaded from

• Will not discuss probability estimation[Good-Turing, Orlitsky et al., ...]

Yihong Wu (Illinois) Estimating the unseen 4

Page 7: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Classical results

• maybe the Egyptians have studied it...

• Ecology:

• Linguistics, numismatics, etc:

Biomttrika (1976), 63, 3, pp. 436-47 4 3 5WithZ t«xt-flgun»

Printed in Great Britain

Estimating the number of unseen species: How manywords did Shakespeare know?

BY BRADLEY EFRON AND RONALD TBISTED

Department of Statistics, Stanford University, California

SUMMARY

Shakespeare wrote 31534 different words, of which 14376 appear only once, 4343 twice,etc. The question considered is how many words he knew but did not use. A parametricempirical Bayes model due to Fisher and a nonparametric model due to Good & Toulminare examined. The latter theory is augmented using linear programming methods. Weconclude that the models are equivalent to supposing that Shakespeare knew at least35000 more words.

Some key words: Empirical Bayes; Euler transformation; Linear programming; Negative binomial;Vocabulary.

1. LVTBODTJOTIOK

Estimating the number of unseen species is a familiar problem in ecological studies. Inthis paper the unseen species are words Shakespeare knew but did not use. Shakespeare'sknown works comprise 884647 total words, of which 14376 are types appearing just onetime, 4343 are types appearing twice, etc. These counts are based on Spevaok's (1968)concordance and on the summary appearing in an unpublished report by J. Gani &I. Saunders. Table 1 summarizes Shakespeare's word type counts, where nx is the numberof word types appearing exactly x times (x = 1,..., 100). Including the 846 word typeswhich appear more than 100 times, a total of

2 nx = 31534x-1

different word types appear. Note that 'type' or 'word type' will be used to indicate adistinct item in Shakespeare's vocabulary. 'Total words' will indicate a total word countincluding repetitions. The definition of type is any distinguishable arrangement of letters.Thus, 'girl' is a different type from 'girls' and 'throneroom' is a different type from both' throne' and ' room'.

How many word types did Shakespeare actually know? To put the question more opera-tionally, suppose another large quantity of work by Shakespeare were discovered, say884 647J total words. How many new word types in addition to the original 31534 would weexpect to find? For the case t = 1, corresponding to a volume of new Shakespeare equal tothe old, there is a surprisingly explicit answer. We will show that a parametric model dueto Fisher, Corbet & Williams (1943) and a nonparametric model due to Good & Toulmin(1956) both estimate about 11460 expected new word types, with an expected error of lessthan 150.

The case t = oo corresponds to the question as originally posed: how many word typesdid Shakespeare know? The mathematical model at the beginning of §2 makes explicitthe sense of the question. No upper bound is possible, but we will demonstrate a lower bound

at University of Illinois at U

rbana-Cham

paign on July 18, 2014http://biom

et.oxfordjournals.org/D

ownloaded from

• Will not discuss probability estimation[Good-Turing, Orlitsky et al., ...]

Yihong Wu (Illinois) Estimating the unseen 4

Page 8: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Classical results

• maybe the Egyptians have studied it...

• Ecology:

• Linguistics, numismatics, etc:

Biomttrika (1976), 63, 3, pp. 436-47 4 3 5WithZ t«xt-flgun»

Printed in Great Britain

Estimating the number of unseen species: How manywords did Shakespeare know?

BY BRADLEY EFRON AND RONALD TBISTED

Department of Statistics, Stanford University, California

SUMMARY

Shakespeare wrote 31534 different words, of which 14376 appear only once, 4343 twice,etc. The question considered is how many words he knew but did not use. A parametricempirical Bayes model due to Fisher and a nonparametric model due to Good & Toulminare examined. The latter theory is augmented using linear programming methods. Weconclude that the models are equivalent to supposing that Shakespeare knew at least35000 more words.

Some key words: Empirical Bayes; Euler transformation; Linear programming; Negative binomial;Vocabulary.

1. LVTBODTJOTIOK

Estimating the number of unseen species is a familiar problem in ecological studies. Inthis paper the unseen species are words Shakespeare knew but did not use. Shakespeare'sknown works comprise 884647 total words, of which 14376 are types appearing just onetime, 4343 are types appearing twice, etc. These counts are based on Spevaok's (1968)concordance and on the summary appearing in an unpublished report by J. Gani &I. Saunders. Table 1 summarizes Shakespeare's word type counts, where nx is the numberof word types appearing exactly x times (x = 1,..., 100). Including the 846 word typeswhich appear more than 100 times, a total of

2 nx = 31534x-1

different word types appear. Note that 'type' or 'word type' will be used to indicate adistinct item in Shakespeare's vocabulary. 'Total words' will indicate a total word countincluding repetitions. The definition of type is any distinguishable arrangement of letters.Thus, 'girl' is a different type from 'girls' and 'throneroom' is a different type from both' throne' and ' room'.

How many word types did Shakespeare actually know? To put the question more opera-tionally, suppose another large quantity of work by Shakespeare were discovered, say884 647J total words. How many new word types in addition to the original 31534 would weexpect to find? For the case t = 1, corresponding to a volume of new Shakespeare equal tothe old, there is a surprisingly explicit answer. We will show that a parametric model dueto Fisher, Corbet & Williams (1943) and a nonparametric model due to Good & Toulmin(1956) both estimate about 11460 expected new word types, with an expected error of lessthan 150.

The case t = oo corresponds to the question as originally posed: how many word typesdid Shakespeare know? The mathematical model at the beginning of §2 makes explicitthe sense of the question. No upper bound is possible, but we will demonstrate a lower bound

at University of Illinois at U

rbana-Cham

paign on July 18, 2014http://biom

et.oxfordjournals.org/D

ownloaded from

• Will not discuss probability estimation[Good-Turing, Orlitsky et al., ...]

Yihong Wu (Illinois) Estimating the unseen 4

Page 9: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Mathematical formulation

• Data: X1, . . . , Xni.i.d.∼ P

• Estimate: S = S(X1, . . . , Xn) close to S(P ) in prob or expectation

• Goal: �nd minimal sample size & fast algorithms

• Need to assume minimum non-zero mass

Yihong Wu (Illinois) Estimating the unseen 5

Page 10: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Mathematical formulation

• Data: X1, . . . , Xni.i.d.∼ P

• Estimate: S = S(X1, . . . , Xn) close to S(P ) in prob or expectation

• Goal: �nd minimal sample size & fast algorithms

• Need to assume minimum non-zero mass

Yihong Wu (Illinois) Estimating the unseen 5

Page 11: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Mathematical formulation

• Data: X1, . . . , Xni.i.d.∼ P

• Estimate: S = S(X1, . . . , Xn) close to S(P ) in prob or expectation

• Goal: �nd minimal sample size & fast algorithms

• Need to assume minimum non-zero mass

Yihong Wu (Illinois) Estimating the unseen 5

Page 12: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Mathematical formulation

• Data: X1, . . . , Xni.i.d.∼ P

• Estimate: S = S(X1, . . . , Xn) close to S(P ) in prob or expectation

• Goal: �nd minimal sample size & fast algorithms

• Need to assume minimum non-zero mass

Yihong Wu (Illinois) Estimating the unseen 5

Page 13: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Sample complexity

Space of distributions

Dk , {prob distributions whose non-zero mass is at least 1/k}

Sample complexity

n∗(k, ε) , min{n : ∃S, s.t. P[|S − S(P )| ≤ εk] ≥ 0.5,∀P ∈ Dk}

Remarks

• Upgrade the con�dence: n→ n log 1δ ⇒ 0.5→ 1− δ (subsample +

median + Hoe�ding)

• Zero error (ε = 0): n∗(k, 0) � k log k (coupon collector)

Yihong Wu (Illinois) Estimating the unseen 6

Page 14: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Sample complexity

Space of distributions

Dk , {prob distributions whose non-zero mass is at least 1/k}

Sample complexity

n∗(k, ε) , min{n : ∃S, s.t. P[|S − S(P )| ≤ εk] ≥ 0.5, ∀P ∈ Dk}

Remarks

• Upgrade the con�dence: n→ n log 1δ ⇒ 0.5→ 1− δ (subsample +

median + Hoe�ding)

• Zero error (ε = 0): n∗(k, 0) � k log k (coupon collector)

Yihong Wu (Illinois) Estimating the unseen 6

Page 15: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Sample complexity

Space of distributions

Dk , {prob distributions whose non-zero mass is at least 1/k}

Sample complexity

n∗(k, ε) , min{n : ∃S, s.t. P[|S − S(P )| ≤ εk] ≥ 0.5, ∀P ∈ Dk}

Remarks

• Upgrade the con�dence: n→ n log 1δ ⇒ 0.5→ 1− δ (subsample +

median + Hoe�ding)

• Zero error (ε = 0): n∗(k, 0) � k log k (coupon collector)

Yihong Wu (Illinois) Estimating the unseen 6

Page 16: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Sample complexity

Space of distributions

Dk , {prob distributions whose non-zero mass is at least 1/k}

Sample complexity

n∗(k, ε) , min{n : ∃S, s.t. P[|S − S(P )| ≤ εk] ≥ 0.5, ∀P ∈ Dk}

Remarks

• Upgrade the con�dence: n→ n log 1δ ⇒ 0.5→ 1− δ (subsample +

median + Hoe�ding)

• Zero error (ε = 0): n∗(k, 0) � k log k (coupon collector)

Yihong Wu (Illinois) Estimating the unseen 6

Page 17: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Naive approach: plug-in

• WYSIWYE:

Sseen = number of seen symbols

• underestimate:

Sseen ≤ S(P ), P -a.s.

• severely underbiased in the sublinear-sampling regime: n� k

Yihong Wu (Illinois) Estimating the unseen 7

Page 18: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Naive approach: plug-in

• WYSIWYE:

Sseen = number of seen symbols

• underestimate:

Sseen ≤ S(P ), P -a.s.

• severely underbiased in the sublinear-sampling regime: n� k

Yihong Wu (Illinois) Estimating the unseen 7

Page 19: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Do we have to estimate the distribution itself?

From a statistical perspective

• high-dimensional problemI estimating P provably requires n = Θ(k) samplesI empirical distribution is optimal up to constants

• functional estimation

I scalar functional (support size)?

=⇒ n = o(k) su�cesI plug-in is frequently suboptimal

Yihong Wu (Illinois) Estimating the unseen 8

Page 20: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Su�cient statistics

• Histogram:

Nj =∑i

1{Xi=j} : # of occurences of jth symbol

• Histogram2/�ngerprints/pro�les:

hi =∑j

1{Nj=i} : # of symbols that occured exactly i times

• h0: # of unseens

Yihong Wu (Illinois) Estimating the unseen 9

Page 21: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Su�cient statistics

• Histogram:

Nj =∑i

1{Xi=j} : # of occurences of jth symbol

• Histogram2/�ngerprints/pro�les:

hi =∑j

1{Nj=i} : # of symbols that occured exactly i times

• h0: # of unseens

Yihong Wu (Illinois) Estimating the unseen 9

Page 22: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Su�cient statistics

• Histogram:

Nj =∑i

1{Xi=j} : # of occurences of jth symbol

• Histogram2/�ngerprints/pro�les:

hi =∑j

1{Nj=i} : # of symbols that occured exactly i times

• h0: # of unseens

Yihong Wu (Illinois) Estimating the unseen 9

Page 23: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Linear estimators

Estimators that are linear in the �ngerprints:

S =∑i

f(Ni) =∑j≥1

f(j)hj

Classical procedures:

• Plug-in:Sseen = h1 + h2 + h3 + . . .

• Good-Toulmin '56: empirical Bayes

SGT = th1 − t2h2 + t3h3 − t4h4 + . . .

• Efron-Thisted '76: Bayesian

SET =

J∑j=1

(−1)j+1tjbjhj

where bj = P[Binomial(J, 1/(t+ 1)) ≥ j]

Yihong Wu (Illinois) Estimating the unseen 10

Page 24: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Linear estimators

Estimators that are linear in the �ngerprints:

S =∑i

f(Ni) =∑j≥1

f(j)hj

Classical procedures:

• Plug-in:Sseen = h1 + h2 + h3 + . . .

• Good-Toulmin '56: empirical Bayes

SGT = th1 − t2h2 + t3h3 − t4h4 + . . .

• Efron-Thisted '76: Bayesian

SET =

J∑j=1

(−1)j+1tjbjhj

where bj = P[Binomial(J, 1/(t+ 1)) ≥ j]

Yihong Wu (Illinois) Estimating the unseen 10

Page 25: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Linear estimators

Estimators that are linear in the �ngerprints:

S =∑i

f(Ni) =∑j≥1

f(j)hj

Classical procedures:

• Plug-in:Sseen = h1 + h2 + h3 + . . .

• Good-Toulmin '56: empirical Bayes

SGT = th1 − t2h2 + t3h3 − t4h4 + . . .

• Efron-Thisted '76: Bayesian

SET =

J∑j=1

(−1)j+1tjbjhj

where bj = P[Binomial(J, 1/(t+ 1)) ≥ j]

Yihong Wu (Illinois) Estimating the unseen 10

Page 26: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Linear estimators

Estimators that are linear in the �ngerprints:

S =∑i

f(Ni) =∑j≥1

f(j)hj

Classical procedures:

• Plug-in:Sseen = h1 + h2 + h3 + . . .

• Good-Toulmin '56: empirical Bayes

SGT = th1 − t2h2 + t3h3 − t4h4 + . . .

• Efron-Thisted '76: Bayesian

SET =

J∑j=1

(−1)j+1tjbjhj

where bj = P[Binomial(J, 1/(t+ 1)) ≥ j]Yihong Wu (Illinois) Estimating the unseen 10

Page 27: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

State of the art

• Sseen: n∗(k, ε) ≤ k log 1

ε

• Valiant '08, Raskhodnikova et al. '09, Valiant-Valiant '11-'13:sublinear is possible.

I Upper bound: n∗(k, ε) . klog k

1ε2 by LP [Efron-Thisted '76]

I Lower bound: n∗(k, ε) & klog k

Theorem (W.-Yang '14)

n∗(k, ε) � k

log klog2

1

ε

Yihong Wu (Illinois) Estimating the unseen 11

Page 28: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

State of the art

• Sseen: n∗(k, ε) ≤ k log 1

ε

• Valiant '08, Raskhodnikova et al. '09, Valiant-Valiant '11-'13:sublinear is possible.

I Upper bound: n∗(k, ε) . klog k

1ε2 by LP [Efron-Thisted '76]

I Lower bound: n∗(k, ε) & klog k

Theorem (W.-Yang '14)

n∗(k, ε) � k

log klog2

1

ε

Yihong Wu (Illinois) Estimating the unseen 11

Page 29: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

State of the art

• Sseen: n∗(k, ε) ≤ k log 1

ε

• Valiant '08, Raskhodnikova et al. '09, Valiant-Valiant '11-'13:sublinear is possible.

I Upper bound: n∗(k, ε) . klog k

1ε2 by LP [Efron-Thisted '76]

I Lower bound: n∗(k, ε) & klog k

Theorem (W.-Yang '14)

n∗(k, ε) � k

log klog2

1

ε

Yihong Wu (Illinois) Estimating the unseen 11

Page 30: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Minimax risk

Theorem (W.-Yang '14)

infS

supP∈Dk

E[(S − S(P ))2] � k2 exp

(−√n log k

k∨ nk

)

Yihong Wu (Illinois) Estimating the unseen 12

Page 31: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Remainder of this talk

Objectives

• a principled way to obtain rate-optimal linear estimator

• a natural lower bound to establish optimality via duality

Yihong Wu (Illinois) Estimating the unseen 13

Page 32: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Best polynomial approximation

Page 33: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Best polynomial approximation

• PL = {polynomials of degree at most L}.• I = [a, b]: a �nite interval.

• Optimal approximation error

EL(f, I) , infp∈PL

supx∈I|f(x)− p(x)|

• Stone-Weierstrass theorem: f continuous ⇒ EL(f, I)L→∞−−−−→ 0

• Speed of convergence related to modulus of continuity.

• Finite-dim convex optimization/In�nite-dim LP

• Many fast algorithms (e.g., Remez)

Yihong Wu (Illinois) Estimating the unseen 15

Page 34: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Best polynomial approximation

• PL = {polynomials of degree at most L}.• I = [a, b]: a �nite interval.

• Optimal approximation error

EL(f, I) , infp∈PL

supx∈I|f(x)− p(x)|

• Stone-Weierstrass theorem: f continuous ⇒ EL(f, I)L→∞−−−−→ 0

• Speed of convergence related to modulus of continuity.

• Finite-dim convex optimization/In�nite-dim LP

• Many fast algorithms (e.g., Remez)

Yihong Wu (Illinois) Estimating the unseen 15

Page 35: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Example

deg-6 approximation Chebyshev alternation theorem

0.2 0.4 0.6 0.8 1.0

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.2 0.4 0.6 0.8 1.0

-0.006

-0.004

-0.002

0.002

0.004

0.006

Yihong Wu (Illinois) Estimating the unseen 16

Page 36: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Example

deg-6 approximation Chebyshev alternation theorem

0.2 0.4 0.6 0.8 1.0

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.2 0.4 0.6 0.8 1.0

-0.006

-0.004

-0.002

0.002

0.004

0.006

Yihong Wu (Illinois) Estimating the unseen 16

Page 37: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Example

deg-6 approximation Chebyshev alternation theorem

0.2 0.4 0.6 0.8 1.0

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.2 0.4 0.6 0.8 1.0

-0.006

-0.004

-0.002

0.002

0.004

0.006

Yihong Wu (Illinois) Estimating the unseen 16

Page 38: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Moment matching

EL(f, I) , sup E [f(U)]− E[f(U ′)

]s.t. E

[U j]

= E[U ′j], j = 1, . . . , L

U,U ′ ∈ I

Yihong Wu (Illinois) Estimating the unseen 17

Page 39: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Moment matching

EL(f, I) , sup

∫fdµ−

∫fdµ′

s.t.

∫fdµ =

∫fdµ′, j = 1, . . . , L,

µ, µ′ supported on I

Yihong Wu (Illinois) Estimating the unseen 18

Page 40: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Moment matching

EL(f, I) , sup E [f(U)]− E[f(U ′)

]s.t. E

[U j]

= E[U ′j], j = 1, . . . , L,

λj ∈ R

U,U ′ ∈ I

In�nite-dim linear programming. Dual:

infλL1

supU,U ′∈I

E [f(U)]− E[f(U ′)

]+∑j

λj(E[U j]− E

[U ′j])

= infλL1

supU∈I

E[f(U)−

∑j

λjUj]− infU ′∈I

E[f(U ′)−

∑j

λjU′j]

= infλL0

(supu∈I

f(u)−∑j

λjuj)−(

infu∈I

f(u′)−∑j

λjuj)

= 2 infp∈PL

supu∈I|f(u)− p(u)|

Yihong Wu (Illinois) Estimating the unseen 19

Page 41: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Moment matching

EL(f, I) , sup E [f(U)]− E[f(U ′)

]s.t. E

[U j]

= E[U ′j], j = 1, . . . , L, λj ∈ R

U,U ′ ∈ I

In�nite-dim linear programming. Dual:

infλL1

supU,U ′∈I

E [f(U)]− E[f(U ′)

]+∑j

λj(E[U j]− E

[U ′j])

= infλL1

supU∈I

E[f(U)−

∑j

λjUj]− infU ′∈I

E[f(U ′)−

∑j

λjU′j]

= infλL0

(supu∈I

f(u)−∑j

λjuj)−(

infu∈I

f(u′)−∑j

λjuj)

= 2 infp∈PL

supu∈I|f(u)− p(u)|

Yihong Wu (Illinois) Estimating the unseen 19

Page 42: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Moment matching

EL(f, I) , sup E [f(U)]− E[f(U ′)

]s.t. E

[U j]

= E[U ′j], j = 1, . . . , L, λj ∈ R

U,U ′ ∈ I

In�nite-dim linear programming. Dual:

infλL1

supU,U ′∈I

E [f(U)]− E[f(U ′)

]+∑j

λj(E[U j]− E

[U ′j])

= infλL1

supU∈I

E[f(U)−

∑j

λjUj]− infU ′∈I

E[f(U ′)−

∑j

λjU′j]

= infλL0

(supu∈I

f(u)−∑j

λjuj)−(

infu∈I

f(u′)−∑j

λjuj)

= 2 infp∈PL

supu∈I|f(u)− p(u)|

Yihong Wu (Illinois) Estimating the unseen 19

Page 43: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Moment matching

EL(f, I) , sup E [f(U)]− E[f(U ′)

]s.t. E

[U j]

= E[U ′j], j = 1, . . . , L, λj ∈ R

U,U ′ ∈ I

In�nite-dim linear programming. Dual:

infλL1

supU,U ′∈I

E [f(U)]− E[f(U ′)

]+∑j

λj(E[U j]− E

[U ′j])

= infλL1

supU∈I

E[f(U)−

∑j

λjUj]− infU ′∈I

E[f(U ′)−

∑j

λjU′j]

= infλL0

(supu∈I

f(u)−∑j

λjuj)−(

infu∈I

f(u′)−∑j

λjuj)

= 2 infp∈PL

supu∈I|f(u)− p(u)|

Yihong Wu (Illinois) Estimating the unseen 19

Page 44: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Moment matching

EL(f, I) , sup E [f(U)]− E[f(U ′)

]s.t. E

[U j]

= E[U ′j], j = 1, . . . , L, λj ∈ R

U,U ′ ∈ I

In�nite-dim linear programming. Dual:

infλL1

supU,U ′∈I

E [f(U)]− E[f(U ′)

]+∑j

λj(E[U j]− E

[U ′j])

= infλL1

supU∈I

E[f(U)−

∑j

λjUj]− infU ′∈I

E[f(U ′)−

∑j

λjU′j]

= infλL0

(supu∈I

f(u)−∑j

λjuj)−(

infu∈I

f(u′)−∑j

λjuj)

= 2 infp∈PL

supu∈I|f(u)− p(u)|

Yihong Wu (Illinois) Estimating the unseen 19

Page 45: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Moment matching

EL(f, I) , sup E [f(U)]− E[f(U ′)

]s.t. E

[U j]

= E[U ′j], j = 1, . . . , L, λj ∈ R

U,U ′ ∈ I

In�nite-dim linear programming. Dual:

infλL1

supU,U ′∈I

E [f(U)]− E[f(U ′)

]+∑j

λj(E[U j]− E

[U ′j])

= infλL1

supU∈I

E[f(U)−

∑j

λjUj]− infU ′∈I

E[f(U ′)−

∑j

λjU′j]

= infλL0

(supu∈I

f(u)−∑j

λjuj)−(

infu∈I

f(u′)−∑j

λjuj)

= 2 infp∈PL

supu∈I|f(u)− p(u)|

Yihong Wu (Illinois) Estimating the unseen 19

Page 46: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Moment matching ⇔ best polynomial approximation

EL(f, I) = 2EL(f, I)

Yihong Wu (Illinois) Estimating the unseen 20

Page 47: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Moment matching ⇔ best polynomial approximation

EL(f, I) = 2EL(f, I)

Yihong Wu (Illinois) Estimating the unseen 20

Page 48: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Optimal estimator

Page 49: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Poissonization

• Poisson sampling modelI draw sample size n′ ∼ Poi(n)I draw n′ i.i.d. samples from P .

• Histograms are independent: Niind∼ Poi(npi)

• sample complexity/minimax risks remain unchanged within constantfactors

Yihong Wu (Illinois) Estimating the unseen 22

Page 50: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

MSE

RecallMSE = bias2 + variance

Main problem of Sseen: huge bias.

Yihong Wu (Illinois) Estimating the unseen 23

Page 51: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Unbiased estimators?

Unbiased estimator for f(P ) from n samples:

• Independent sampling: f(P ) is polynomial of degree ≤ n• Poissonized sampling: f(P ) is real analytic.

Example

• Flip a coin with bias p for n times and estimate f(p)

• Su�cient stat: Y ∼ Binomial(n, p).

• Unbiased estimator exists ⇔ f(p) is a polynomial of degree ≤ n

E[f(Y )] =n∑k=0

f(k)

(n

k

)pk(1− p)k.

Yihong Wu (Illinois) Estimating the unseen 24

Page 52: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Unbiased estimators?

Unbiased estimator for f(P ) from n samples:

• Independent sampling: f(P ) is polynomial of degree ≤ n• Poissonized sampling: f(P ) is real analytic.

Example

• Flip a coin with bias p for n times and estimate f(p)

• Su�cient stat: Y ∼ Binomial(n, p).

• Unbiased estimator exists ⇔ f(p) is a polynomial of degree ≤ n

E[f(Y )] =n∑k=0

f(k)

(n

k

)pk(1− p)k.

Yihong Wu (Illinois) Estimating the unseen 24

Page 53: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Unbiased estimators?

Unbiased estimator for f(P ) from n samples:

• Independent sampling: f(P ) is polynomial of degree ≤ n• Poissonized sampling: f(P ) is real analytic.

Example

• Flip a coin with bias p for n times and estimate f(p)

• Su�cient stat: Y ∼ Binomial(n, p).

• Unbiased estimator exists ⇔ f(p) is a polynomial of degree ≤ n

E[f(Y )] =n∑k=0

f(k)

(n

k

)pk(1− p)k.

Yihong Wu (Illinois) Estimating the unseen 24

Page 54: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

No unbiased estimator

S(P ) =∑i

1{pi>0}

• Approximate 1{x>0} by q(x) =∑L

m=0 amxm

• Find an unbiased estimator for the proxy

S(P ) =∑i

q(pi)

• |bias| ≤ uniform approx error

• But the function is discontinuous...

Yihong Wu (Illinois) Estimating the unseen 25

Page 55: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

No unbiased estimator

S(P ) =∑i

1{pi>0}

• Approximate 1{x>0} by q(x) =∑L

m=0 amxm

• Find an unbiased estimator for the proxy

S(P ) =∑i

q(pi)

• |bias| ≤ uniform approx error

• But the function is discontinuous...

Yihong Wu (Illinois) Estimating the unseen 25

Page 56: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

No unbiased estimator

S(P ) =∑i

1{pi>0}

• Approximate 1{x>0} by q(x) =∑L

m=0 amxm

• Find an unbiased estimator for the proxy

S(P ) =∑i

q(pi)

• |bias| ≤ uniform approx error

• But the function is discontinuous...

Yihong Wu (Illinois) Estimating the unseen 25

Page 57: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Linear estimators

Consider estimators that are linear in the �ngerprints:

S =∑i

f(Ni) =∑j≥1

f(j)hj

Guidelines:

• f(0) = 0

• f(j) = 1 for su�ciently large j > L

• How to choose f(1), . . . , f(L)?

Yihong Wu (Illinois) Estimating the unseen 26

Page 58: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Linear estimators

Consider estimators that are linear in the �ngerprints:

S =∑i

f(Ni) =∑j≥1

f(j)hj

Guidelines:

• f(0) = 0

• f(j) = 1 for su�ciently large j > L

• How to choose f(1), . . . , f(L)?

Yihong Wu (Illinois) Estimating the unseen 26

Page 59: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Linear estimators

Consider estimators that are linear in the �ngerprints:

S =∑i

f(Ni) =∑j≥1

f(j)hj

Guidelines:

• f(0) = 0

• f(j) = 1 for su�ciently large j > L

• How to choose f(1), . . . , f(L)?

Yihong Wu (Illinois) Estimating the unseen 26

Page 60: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Bias

Choose

• L = c0 log k.

• S =∑

j≥1 f(Ni), Ni ∼ Poi(npi)

Bias:

E[S − S] =∑

E[(f(Ni)− 1)1{Ni≤L}]1{pi>1/k}

≈∑

E[(f(Ni)− 1)1{Ni≤L}]︸ ︷︷ ︸

e−npi× poly of deg L!

1{2L/n>pi>1/k}

Yihong Wu (Illinois) Estimating the unseen 27

Page 61: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Bias

Choose

• L = c0 log k.

• S =∑

j≥1 f(Ni), Ni ∼ Poi(npi)

Bias:

E[S − S] =∑

E[(f(Ni)− 1)1{Ni≤L}]1{pi>1/k}

≈∑

E[(f(Ni)− 1)1{Ni≤L}]︸ ︷︷ ︸

e−npi× poly of deg L!

1{2L/n>pi>1/k}

Yihong Wu (Illinois) Estimating the unseen 27

Page 62: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Bias

Choose

• L = c0 log k.

• S =∑

j≥1 f(Ni), Ni ∼ Poi(npi)

Bias:

E[S − S] =∑

E[(f(Ni)− 1)1{Ni≤L}]1{pi>1/k}

≈∑

E[(f(Ni)− 1)1{Ni≤L}]︸ ︷︷ ︸e−npi× poly of deg L!

1{2L/n>pi>1/k}

Yihong Wu (Illinois) Estimating the unseen 27

Page 63: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Bias

• Observe

E[(f(N)− 1)1{N≤L}] = e−λ∑j≥0

f(j)− 1

j!λj

︸ ︷︷ ︸q(λ)

• Then|bias| ≤ k sup

n/k≤λ≤c log k|q(λ)|

• Choose the best deg-L polynomial q s.t. q(0) = −1

• Solution: Chebyshev polynomial

Yihong Wu (Illinois) Estimating the unseen 28

Page 64: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Bias

• Observe

E[(f(N)− 1)1{N≤L}] = e−λ∑j≥0

f(j)− 1

j!λj

︸ ︷︷ ︸q(λ)

• Then|bias| ≤ k sup

n/k≤λ≤c log k|q(λ)|

• Choose the best deg-L polynomial q s.t. q(0) = −1

• Solution: Chebyshev polynomial

Yihong Wu (Illinois) Estimating the unseen 28

Page 65: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Chebyshev polynomial

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

best approximation to one bypolynomial passing through origin isChebyshev polynomial

pL(x) = 1− cosL arccosx

cosL arccos a

Yihong Wu (Illinois) Estimating the unseen 29

Page 66: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Final estimator

• Chebyshev polynomial: r , c1 log k and l , nk ,

−cosL arccos( 2

r−lx−r+lr−l )

cosL arccos(− r+lr−l )

,L∑j=0

amxm.

• Choose

f(j) =

0 j = 0

1 + ajj! j = 1, . . . , L

1 j > L.

• Linear estimator (precomputable coe�cients): no sample splitting!!

S =L∑j=1

f(j)hj +∑j>L

hj

• Signi�cantly faster than LP [Efron-Thisted '76, Valiant-Valiant '11]

Yihong Wu (Illinois) Estimating the unseen 30

Page 67: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Final estimator

• Chebyshev polynomial: r , c1 log k and l , nk ,

−cosL arccos( 2

r−lx−r+lr−l )

cosL arccos(− r+lr−l )

,L∑j=0

amxm.

• Choose

f(j) =

0 j = 0

1 + ajj! j = 1, . . . , L

1 j > L.

• Linear estimator (precomputable coe�cients): no sample splitting!!

S =L∑j=1

f(j)hj +∑j>L

hj

• Signi�cantly faster than LP [Efron-Thisted '76, Valiant-Valiant '11]

Yihong Wu (Illinois) Estimating the unseen 30

Page 68: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Final estimator

• Chebyshev polynomial: r , c1 log k and l , nk ,

−cosL arccos( 2

r−lx−r+lr−l )

cosL arccos(− r+lr−l )

,L∑j=0

amxm.

• Choose

f(j) =

0 j = 0

1 + ajj! j = 1, . . . , L

1 j > L.

• Linear estimator (precomputable coe�cients): no sample splitting!!

S =L∑j=1

f(j)hj +∑j>L

hj

• Signi�cantly faster than LP [Efron-Thisted '76, Valiant-Valiant '11]

Yihong Wu (Illinois) Estimating the unseen 30

Page 69: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Final estimator

• Chebyshev polynomial: r , c1 log k and l , nk ,

−cosL arccos( 2

r−lx−r+lr−l )

cosL arccos(− r+lr−l )

,L∑j=0

amxm.

• Choose

f(j) =

0 j = 0

1 + ajj! j = 1, . . . , L

1 j > L.

• Linear estimator (precomputable coe�cients): no sample splitting!!

S =L∑j=1

f(j)hj +∑j>L

hj

• Signi�cantly faster than LP [Efron-Thisted '76, Valiant-Valiant '11]

Yihong Wu (Illinois) Estimating the unseen 30

Page 70: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Analysis

1 bias ≤ approximation error of Chebyshev polynomial:

1

|cosM arccos(− r+lr−l )|

� exp

(−c√n log k

k

),

2 variance ≈ poly(k).

Yihong Wu (Illinois) Estimating the unseen 31

Page 71: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Optimal estimator

Plot of coe�cients (k = 106 and n = 2× 105):

S =∑j≥1

f(j)hj

-300

-200

-100

0

100

200

300

1 3 5 7 9 11 13

gL(

j)

j

Yihong Wu (Illinois) Estimating the unseen 32

Page 72: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Why oscillatory and alternating?

S =∑j≥1

f(j)hj

The same oscillation also happens in:

• Good-Toulmin '56: empirical Bayes

SGT = th1 − t2h2 + t3h3 − t4h4 + . . .

• Efron-Thistle '76: Bayesian

SET =

J∑j=1

(−1)j+1tjbjhj

I HAVE NO EXPLANATION!

Yihong Wu (Illinois) Estimating the unseen 33

Page 73: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Impossibility results

Page 74: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Minimax lower bound

n∗(k, ε) &k

log klog2

1

ε

Yihong Wu (Illinois) Estimating the unseen 35

Page 75: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Total variation

• TV(P0, P1) = 12

∫|dP0 − dP1|

• optimal error probability for testing P0 vs P1

1− TV(P0, P1) = minψP0[ψ = 1] + P1[ψ = 0]

Yihong Wu (Illinois) Estimating the unseen 36

Page 76: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Poisson mixtures

Given U ∼ µ,E[Poi(U)] =

∫R+

Poi(λ)µ(dλ)

Yihong Wu (Illinois) Estimating the unseen 37

Page 77: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Randomization

Two-prior argument (composite HT):

• draw random distribution PPoisson−−−−→ Ni

ind∼ Poi(npi)

• draw random distribution P′Poisson−−−−→ N ′i

ind∼ Poi(np′i)

Le Cam's lemma applies if

• S(P) and S(P′) di�er with high probability

• Distributions of N and N ′ are indistinguishable (k-dim Poissonmixtures)

Main hurdle: di�cult to work with distributions on high-dimensionalprobability simplex.

Yihong Wu (Illinois) Estimating the unseen 38

Page 78: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Randomization

Two-prior argument (composite HT):

• draw random distribution PPoisson−−−−→ Ni

ind∼ Poi(npi)

• draw random distribution P′Poisson−−−−→ N ′i

ind∼ Poi(np′i)

Le Cam's lemma applies if

• S(P) and S(P′) di�er with high probability

• Distributions of N and N ′ are indistinguishable (k-dim Poissonmixtures)

Main hurdle: di�cult to work with distributions on high-dimensionalprobability simplex.

Yihong Wu (Illinois) Estimating the unseen 38

Page 79: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Randomization

Two-prior argument (composite HT):

• draw random distribution PPoisson−−−−→ Ni

ind∼ Poi(npi)

• draw random distribution P′Poisson−−−−→ N ′i

ind∼ Poi(np′i)

Le Cam's lemma applies if

• S(P) and S(P′) di�er with high probability

• Distributions of N and N ′ are indistinguishable (k-dim Poissonmixtures)

Main hurdle: di�cult to work with distributions on high-dimensionalprobability simplex.

Yihong Wu (Illinois) Estimating the unseen 38

Page 80: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Key construction: reduction to one dimension

• Given U,U ′ with unit mean:

P =1

k(U1, . . . , Uk︸ ︷︷ ︸

i.i.d.∼ U

), P′ =1

k(U ′1, . . . , U

′k︸ ︷︷ ︸

i.i.d.∼ U ′

)

• By LLN,

I P and P′ are not, but close to, probability distributions.

I support size concentrates on the mean:

E [S(P)]− E [S(P′)] = k(P {U > 0} − P {U ′ > 0})

• Su�cient statistic are iid:

Nii.i.d.∼ E[Poi(nU/k)], N ′i

i.i.d.∼ E[Poi(nU ′/k)].

• Su�ce to show TV(E[Poi(nU/k)],E[Poi(nU ′/k)]︸ ︷︷ ︸one-dimensional Poisson mixtures

) = o(1/k).

Yihong Wu (Illinois) Estimating the unseen 39

Page 81: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Key construction: reduction to one dimension

• Given U,U ′ with unit mean:

P =1

k(U1, . . . , Uk︸ ︷︷ ︸

i.i.d.∼ U

), P′ =1

k(U ′1, . . . , U

′k︸ ︷︷ ︸

i.i.d.∼ U ′

)

• By LLN,

I P and P′ are not, but close to, probability distributions.I support size concentrates on the mean:

E [S(P)]− E [S(P′)] = k(P {U > 0} − P {U ′ > 0})

• Su�cient statistic are iid:

Nii.i.d.∼ E[Poi(nU/k)], N ′i

i.i.d.∼ E[Poi(nU ′/k)].

• Su�ce to show TV(E[Poi(nU/k)],E[Poi(nU ′/k)]︸ ︷︷ ︸one-dimensional Poisson mixtures

) = o(1/k).

Yihong Wu (Illinois) Estimating the unseen 39

Page 82: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Key construction: reduction to one dimension

• Given U,U ′ with unit mean:

P =1

k(U1, . . . , Uk︸ ︷︷ ︸

i.i.d.∼ U

), P′ =1

k(U ′1, . . . , U

′k︸ ︷︷ ︸

i.i.d.∼ U ′

)

• By LLN,

I P and P′ are not, but close to, probability distributions.I support size concentrates on the mean:

E [S(P)]− E [S(P′)] = k(P {U > 0} − P {U ′ > 0})

• Su�cient statistic are iid:

Nii.i.d.∼ E[Poi(nU/k)], N ′i

i.i.d.∼ E[Poi(nU ′/k)].

• Su�ce to show TV(E[Poi(nU/k)],E[Poi(nU ′/k)]︸ ︷︷ ︸one-dimensional Poisson mixtures

) = o(1/k).

Yihong Wu (Illinois) Estimating the unseen 39

Page 83: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Moment matching ⇒ statistically close Poisson mixtures

Lemma

• U,U ′ ∈ [0, k log kn ]

• E[U j]

= E[U ′j], j = 1, . . . , L = C log k

• Then

TV(E [Poi (nU/k)] ,E[Poi

(nU ′/k

)]) = o(1/k)

Yihong Wu (Illinois) Estimating the unseen 40

Page 84: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Optimize the lower bound

Let λ = k log k/n.

Choose the best U,U ′:

sup P {U = 0} − P{U ′ = 0

}s.t. E [U ] = E

[U ′]

= 1

E[U j]

= E[U ′j], j ∈ [L]

U,U ′ ∈ {0} ∪ [1, λ]

= sup E [1/X]− E[1/X ′

]s.t. E

[Xj]

= E[X ′j], j ∈ [L]

X,X ′ ∈ [1, λ],

= 2EL(1/x, [1, λ]) & e−c

√n log k

k

PU (du) =(1− E

[1X

])δ0(du) + 1

uPX(du)

Yihong Wu (Illinois) Estimating the unseen 41

Page 85: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Optimize the lower bound

Let λ = k log k/n.

Choose the best U,U ′:

sup P {U = 0} − P{U ′ = 0

}s.t. E [U ] = E

[U ′]

= 1

E[U j]

= E[U ′j], j ∈ [L]

U,U ′ ∈ {0} ∪ [1, λ]

= sup E [1/X]− E[1/X ′

]s.t. E

[Xj]

= E[X ′j], j ∈ [L]

X,X ′ ∈ [1, λ],

= 2EL(1/x, [1, λ]) & e−c

√n log k

k

PU (du) =(1− E

[1X

])δ0(du) + 1

uPX(du)

Yihong Wu (Illinois) Estimating the unseen 41

Page 86: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Optimize the lower bound

Let λ = k log k/n.

Choose the best U,U ′:

sup P {U = 0} − P{U ′ = 0

}s.t. E [U ] = E

[U ′]

= 1

E[U j]

= E[U ′j], j ∈ [L]

U,U ′ ∈ {0} ∪ [1, λ]

= sup E [1/X]− E[1/X ′

]s.t. E

[Xj]

= E[X ′j], j ∈ [L]

X,X ′ ∈ [1, λ],

= 2EL(1/x, [1, λ]) & e−c

√n log k

k

PU (du) =(1− E

[1X

])δ0(du) + 1

uPX(du)

Yihong Wu (Illinois) Estimating the unseen 41

Page 87: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Optimize the lower bound

Let λ = k log k/n.

Choose the best U,U ′:

sup P {U = 0} − P{U ′ = 0

}s.t. E [U ] = E

[U ′]

= 1

E[U j]

= E[U ′j], j ∈ [L]

U,U ′ ∈ {0} ∪ [1, λ]

= sup E [1/X]− E[1/X ′

]s.t. E

[Xj]

= E[X ′j], j ∈ [L]

X,X ′ ∈ [1, λ],

= 2EL(1/x, [1, λ]) & e−c

√n log k

k

PU (du) =(1− E

[1X

])δ0(du) + 1

uPX(du)

0.1 0.2 0.3 0.4 0.5

1

2

3

4

5

Yihong Wu (Illinois) Estimating the unseen 41

Page 88: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Related work in statistics

Our inspiration: earlier work on Gaussian models

• Ibragimov-Nemirovskii-Khas'minskii '87: smooth functions

• Lepski-Nemirovski-Spokoiny '99: Lq norm of Gaussian regressionfunction

• Cai-Low '11: L1 norm of normal mean

Yihong Wu (Illinois) Estimating the unseen 42

Page 89: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Comparison

Lower bound in [Valiant-Valiant '11]

• Deal with �ngerprints � high-dim distribution with dependentcomponents

• Approximate distribution by quantized Gaussian

• Bound distance between mean and covariance matrices

Lower bound here: reduce to one dimension

Yihong Wu (Illinois) Estimating the unseen 43

Page 90: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Experiments

Page 91: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Uniform over 1 million elements

0

5

10

15

20

25

30

35

40

0 1 2 3 4 5 6 7 8 9 10

est

imate

/10

5

n/105

Plug-inPolynomial

Turing-GoodChao-Lee 1Chao-Lee 2

Yihong Wu (Illinois) Estimating the unseen 45

Page 92: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Uniform mixed with point mass

0

2

4

6

8

10

0 1 2 3 4 5 6 7 8 9 10

est

imate

/10

5

n/105

PlugPolynomial

Turing-Good

Yihong Wu (Illinois) Estimating the unseen 46

Page 93: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

How many words did Shakespeare know?

• Hamlet: total words 32000, total distinct words ∼ 7700,

• deg-10 Chebyshev polynomial

• sampling with replacement

• compare with LP [Efron-Thisted '76, Valiant-Valiant '13]

0

0.2

0.4

0.6

0.8

1

1.2

1.4

0 5 10 15 20 25 30

est

imate

/10

3

n/103

TruthPlug-in

PolynomialLP

Yihong Wu (Illinois) Estimating the unseen 47

Page 94: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

How many words did Shakespeare know?

· · · · · · · · · · · · · · · · · · · · · · · ·Feed the entire Shakespearean canon into the estimator:

• S = 68944 ∼ 73257

• Efron-Thisted '76: 66534

Yihong Wu (Illinois) Estimating the unseen 48

Page 95: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Species problem

Formulation

Given an urn containing k balls, estimate the number of distinct colors Sby sampling (e.g. with replacement).

• Special case of support size estimation: pi ∈ {0, 1k ,2k , . . .}.

• Same sample complexity as DISTINCT-ELEMENT problem in TCS.

• Use Chebyshev: klog k samples can achieve achieve 0.1k

• Converse: klog k samples are necessary to achieve 0.1k [Valiant '12]

Yihong Wu (Illinois) Estimating the unseen 49

Page 96: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Species problem

Formulation

Given an urn containing k balls, estimate the number of distinct colors Sby sampling (e.g. with replacement).

• Special case of support size estimation: pi ∈ {0, 1k ,2k , . . .}.

• Same sample complexity as DISTINCT-ELEMENT problem in TCS.

• Use Chebyshev: klog k samples can achieve achieve 0.1k

• Converse: klog k samples are necessary to achieve 0.1k [Valiant '12]

Yihong Wu (Illinois) Estimating the unseen 49

Page 97: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Can we do better?

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Use Lagrange interpolationpolynomial to achieve zero bias

• Uniform approximation:ε . exp(−c

√log k)

• Interpolation:ε . exp(−c log k).

qL(x) = 1−∏Lj=1(j − x)

L!

Yihong Wu (Illinois) Estimating the unseen 50

Page 98: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Can we do better?

0.2 0.4 0.6 0.8 1.0

0.2

0.4

0.6

0.8

1.0

Use Lagrange interpolationpolynomial to achieve zero bias

• Uniform approximation:ε . exp(−c

√log k)

• Interpolation:ε . exp(−c log k).

qL(x) = 1−∏Lj=1(j − x)

L!

Yihong Wu (Illinois) Estimating the unseen 50

Page 99: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

More generally...

minimax risk & k2 exp

(−cn log k

k

)

• Tight when n = 0.1k

• Compare to general support size:

minimax risk � k2 exp

(−c√n log k

k

)

Yihong Wu (Illinois) Estimating the unseen 51

Page 100: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Estimating entropy

H(P ) =∑

pi log1

pi

Theorem (W.-Yang '14)

Sample complexity to estimate within ε bits: n � max{

kε log k ,

log2 kε2

}(upper bound also in Jiao et al. '14)

Strategy

• degree: L ∼ log k

• small masses: polynomialapproximation

• large masses: plug-in with biascorrection

• coe�'s bounded by Chebyshev

0.2 0.4 0.6 0.8 1.0

0.05

0.10

0.15

0.20

0.25

0.30

0.35

Yihong Wu (Illinois) Estimating the unseen 52

Page 101: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Estimating Rényi entropy

• Estimating Hα(P ) = 11−α log

∑pαi [Jiao et al. '14, Acharya et al.

'14]

Yihong Wu (Illinois) Estimating the unseen 53

Page 102: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Concluding remarks

To estimateF (P ) =

∑f(pi)

Sample complexity is roughly governed by the following convexoptimization problem (over logarithmic variables):

F(λ) , sup E [f(U)]− E[f(U ′)

]s.t. E

[U j]

= E[U ′j]j = 1, . . . , log k,

E [U ] ≤ 1/k,

U, U ′ ∈ [0, log k/n],

• Lower bound: primal program (inapproximability result)

• Upper bound: dual program (approximability result)

Yihong Wu (Illinois) Estimating the unseen 54

Page 103: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Concluding remarks

• Many open problems and directionsI Con�dence intervalsI Adaptive estimationI How to go beyond iid samplingI How to incorporate structures

References

• W. & P. Yang (2014). Minimax rates of entropy estimation on large

alphabets via best polynomial approximation. arXiv:1407.0381

• W. & P. Yang (2015). Chebyshev polynomials, moment matching, and

optimal estimation of the unseen. arXiv:1503.xxxx

Yihong Wu (Illinois) Estimating the unseen 55

Page 104: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Bias

Choose

• M = c log k.

• S =∑

j≥1 f(Ni), Ni ∼ Poi(npi)

Bias:

E[S − S] =∑

E[f(Ni)]− 1{pi>0}

f(0)=0=

∑E[(f(Ni)− 1)]1{pi>0}

=∑

E[(f(Ni)− 1)]1{pi>1/k}

=∑

E[(f(Ni)− 1)1{Ni≤L}]1{pi>1/k}

whp=

∑E[(f(Ni)− 1)1{Ni≤L}]︸ ︷︷ ︸1{L/2n>pi>1/k}

Observe: g(λ) , E[(f(N)− 1)1{N≤L}] = e−λ× poly of deg L

Yihong Wu (Illinois) Estimating the unseen 56

Page 105: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Bias

Choose

• M = c log k.

• S =∑

j≥1 f(Ni), Ni ∼ Poi(npi)

Bias:

E[S − S] =∑

E[f(Ni)]− 1{pi>0}

f(0)=0=

∑E[(f(Ni)− 1)]1{pi>0}

=∑

E[(f(Ni)− 1)]1{pi>1/k}

=∑

E[(f(Ni)− 1)1{Ni≤L}]1{pi>1/k}

whp=

∑E[(f(Ni)− 1)1{Ni≤L}]︸ ︷︷ ︸1{L/2n>pi>1/k}

Observe: g(λ) , E[(f(N)− 1)1{N≤L}] = e−λ× poly of deg L

Yihong Wu (Illinois) Estimating the unseen 56

Page 106: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Bias

Choose

• M = c log k.

• S =∑

j≥1 f(Ni), Ni ∼ Poi(npi)

Bias:

E[S − S] =∑

E[f(Ni)]− 1{pi>0}

f(0)=0=

∑E[(f(Ni)− 1)]1{pi>0}

=∑

E[(f(Ni)− 1)]1{pi>1/k}

=∑

E[(f(Ni)− 1)1{Ni≤L}]1{pi>1/k}

whp=

∑E[(f(Ni)− 1)1{Ni≤L}]︸ ︷︷ ︸1{L/2n>pi>1/k}

Observe: g(λ) , E[(f(N)− 1)1{N≤L}] = e−λ× poly of deg L

Yihong Wu (Illinois) Estimating the unseen 56

Page 107: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Bias

Choose

• M = c log k.

• S =∑

j≥1 f(Ni), Ni ∼ Poi(npi)

Bias:

E[S − S] =∑

E[f(Ni)]− 1{pi>0}

f(0)=0=

∑E[(f(Ni)− 1)]1{pi>0}

=∑

E[(f(Ni)− 1)]1{pi>1/k}

=∑

E[(f(Ni)− 1)1{Ni≤L}]1{pi>1/k}

whp=

∑E[(f(Ni)− 1)1{Ni≤L}]︸ ︷︷ ︸1{L/2n>pi>1/k}

Observe: g(λ) , E[(f(N)− 1)1{N≤L}] = e−λ× poly of deg L

Yihong Wu (Illinois) Estimating the unseen 56

Page 108: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Bias

Choose

• M = c log k.

• S =∑

j≥1 f(Ni), Ni ∼ Poi(npi)

Bias:

E[S − S] =∑

E[f(Ni)]− 1{pi>0}

f(0)=0=

∑E[(f(Ni)− 1)]1{pi>0}

=∑

E[(f(Ni)− 1)]1{pi>1/k}

=∑

E[(f(Ni)− 1)1{Ni≤L}]1{pi>1/k}

whp=

∑E[(f(Ni)− 1)1{Ni≤L}]︸ ︷︷ ︸1{L/2n>pi>1/k}

Observe: g(λ) , E[(f(N)− 1)1{N≤L}] = e−λ× poly of deg L

Yihong Wu (Illinois) Estimating the unseen 56

Page 109: Chebyshev polynomials, moment matching and optimal … · Chebyshev polynomials, moment matching and optimal estimation of the unseen Yihong Wu Department of ECE University of Illinois

Bias

Choose

• M = c log k.

• S =∑

j≥1 f(Ni), Ni ∼ Poi(npi)

Bias:

E[S − S] =∑

E[f(Ni)]− 1{pi>0}

f(0)=0=

∑E[(f(Ni)− 1)]1{pi>0}

=∑

E[(f(Ni)− 1)]1{pi>1/k}

=∑

E[(f(Ni)− 1)1{Ni≤L}]1{pi>1/k}

whp=

∑E[(f(Ni)− 1)1{Ni≤L}]︸ ︷︷ ︸1{L/2n>pi>1/k}

Observe: g(λ) , E[(f(N)− 1)1{N≤L}] = e−λ× poly of deg L

Yihong Wu (Illinois) Estimating the unseen 56