University of Texas at San Antonio - Double Double …faculty.business.utsa.edu/manderso/presentations/Duality...[3]R. W. Conway and W. L. Maxwell, A queueing model with state dependent

COM-Poisson

Duality

Hyper-Poisson

Weighted Hyper-Poisson

Count Regression

Home Page

Title Page

JJ II

J I

Page 1 of 16

Go Back

Full Screen

Close

Quit

Double DoubleUsing Duality in Count Regression

Michael AndersonDepartment of Management Science and Statistics, College of Business

University of Texas at San Antonio

Joint Statistical Meetings – August 2011

Abstract

The COM-Poisson distribution of Conway and Maxwell[3] is used in weighted Poisson regression to model phenomenawhich are significantly under- or over-dispersed. This flexibility in modeling dispersion derives from the property ofduality inherent in the Poisson distribution, in distinction to most other discrete models. Other distributions alsopossess duality and also provide flexibility in modeling. Two will be discussed and examples of inference will be shown.

http://faculty.business.utsa.edu/manderso/

COM-Poisson

Duality

Hyper-Poisson


Count Regression

Home Page

Title Page

JJ II

J I

Page 2 of 16

Go Back

Full Screen

Close

Quit

1. COM-Poisson

One technique for modeling over- or under-dispersion is to modify the Poisson distribution with a weighting function.Recent work by Shmueli, et. al.[9][8], Jowaheer and Khan[6], and Guikema and Goffelt[4] has popularized the COM-Poissondistribution. In this distribution, the weighting function appears as an exponent to the factorial in the denominator of thePMF:

Poisson P (k) =1

C

θk

k!C =

∑i≥0

θi

i!= eθ

COM-Poisson Pγ(k) =1

Cγ

θk

(k!)γCγ =

∑i≥0

θi

(i!)γ


COM-Poisson

Duality

Hyper-Poisson


Count Regression

Home Page

Title Page

JJ II

J I

Page 3 of 16

Go Back

Full Screen

Close

Quit

The weighting exponent, γ > 0 allows thedistribution to be either under-dispersed (γ > 1)or over-dispersed (0 < γ < 1). The indexof dispersion is a function of both the locationparameter θ and the shape parameter γ and variessymmetrically around γ = 1.

Sellers and Shmueli[8] have had great success using this distribution in Poisson regression to model processes both over-and under-dispersed.


COM-Poisson

Duality

Hyper-Poisson


Count Regression

Home Page

Title Page

JJ II

J I

Page 4 of 16

Go Back

Full Screen

Close

Quit

2. Duality and GHPDs

The key to the flexibility of the COM-Poisson distribution is the property of duality, as described by Kokonendji, Mizere,and Balakrishnan[7].

2.1. Over- and Under-Dispersion

Kokonedji, et. al. observed that there were several different weighting schemes that could be applied to the Poissondistribution that could model either over- or under-dispersion:

Pγ(i) =1

E[wi(γ)]

wi(γ)e−θθi

i!

size-biased wi ∝ e−γi

wi ∝ e−γ|i−θ| θ ≥ 0wi ∝ (i+ δ)−γ δ > 0

factorial biased wi ∝ (i!)−(γ−1) γ < 2

These weighting schemes all share the property of having dual weights such that

wi(+γ)× wi(−γ) = 1


COM-Poisson

Duality

Hyper-Poisson


Count Regression

Home Page

Title Page

JJ II

J I

Page 5 of 16

Go Back

Full Screen

Close

Quit

2.2. GHPDs

The Poisson distribution is one of the simplest examples of a family of power series distributions known as the generalizedhypergeometric probability distributions (GHPDs). These distributions have probabilities which are normalized terms froma generalized hypergeometric function:

pFq[a1, a2, . . . ; b1, b2, . . . ; θ] =∑i≥0

(a1)i(a2)i · · · (ap)iθi

(b1)i(b2)i · · · (bq)ii!(a)i = a(a+ 1) · · · (a+ i− 1)

and probability generating functions which are the ratio of hypergeometric functions

Gx(z) =pFq[a1, a2, . . . , ap; b1, b2, . . . , bq; θz]

pFq[a1, a2, . . . , ap; b1, b2, . . . , bq; θ]

The GHPD family includes many well-known discrete distributions, including the Poisson, negative binomial, binomial, andhypergeometric distributions[5, p. 88]. These are a few examples:

name PGFbinomial C 1F0[−n;−;−λz] λ = p/(1− p)Poisson C 0F0[−;−;λz]displaced Poisson C 1F1[1; r + 1;λz]hyper-Poisson C 1F1[1;λ; θz]geometric C 1F0[1;−; qz]negative binomial C 1F0[k;−; qz]Polya-Eggenberger C 1F0[a;−; pz] a = h/θ θ = p/(1 = p)logarithmic C 2F1[1, 1; 2; θz]hypergeometric C 2F1[−n,−M ;N −M − n+ 1; z]


COM-Poisson

Duality

Hyper-Poisson


Count Regression

Home Page

Title Page

JJ II

J I

Page 6 of 16

Go Back

Full Screen

Close

Quit

2.3. Duality Condition

The COM-weighted Poisson distribution can be thought of as a member of a variation of the GHPDs, the polyfactorialGHPDs, where a weighting exponent is applied to the factorial in the denominator of the series terms

pFγq [a1, a2, . . . ; b1, b2, . . . ; θ] =

∑i≥0

(a1)i(a2)i · · · θi

(b1)i(b2)i · · · (i!)γ

The COM-Poisson is the simplest of these, with

Gx(z) =0F

γ0 [−;−; θz]

0Fγ0 [−;−; θ]

Pγ(k) =1

0Fγ0 [−;−; θ]

θk

(k!)γ

Most polyfactorial GHPDs do not exhibit duality. A weighting exponent γ > 1 will reduce the dispersion of thedistribution, but an exponent γ < 1 will produce a series which does not converge.

However, in the case where the numerator order p does not exceed the denominator order q, the weightedhypergeometric series does converge, and these distributions show duality.


COM-Poisson

Duality

Hyper-Poisson


Count Regression

Home Page

Title Page

JJ II

J I

Page 7 of 16

Go Back

Full Screen

Close

Quit

3. Hyper-Poisson

Based on the general duality condition, the next likely distribution after the Poisson is Bardwell and Crowe’s hyper-Poissondistribution[1] with PGF

Gx(z) =1F1[1;λ; θz]

1F1[1;λ; θ]P (k) =

1

1F1[1;λ; θ]

θk

(λ)k

where

1F1[1;λ; θ] =∑j≥0

(1)jθj

(λ)jj!=∑j≥0

θj

(λ)j

The λ is a shift parameter which displaces the distribution. (When λ is an integer, this is a shifted Poisson distribution.)


COM-Poisson

Duality

Hyper-Poisson


Count Regression

Home Page

Title Page

JJ II

J I

Page 8 of 16

Go Back

Full Screen

Close

Quit

This distribution is interesting in that it has a property very similar to duality. When the shift parameter λ < 1, thedistribution is under-dispersed and is called sub-Poisson, while λ > 1 makes the distribution over-dispersed or super-Poisson. However, this is not true duality, since

(1− ε)i 6= (1 + ε)i and (λ)i 6=(

1

λ

)i


COM-Poisson

Duality

Hyper-Poisson


Count Regression

Home Page

Title Page

JJ II

J I

Page 9 of 16

Go Back

Full Screen

Close

Quit

4. Weighted Hyper-Poisson

True duality can be introduced to the hyper-Poisson distribution with COM-type weighting, which gives a distribution thatcan be shifted and compressed or stretched.

Gx(z) =1F

γ1 [1;λ; θz]

1Fγ1 [1;λ; θ]

Pγ(k) =1

1Fγ1 [1;λ; θ]

θk

(λ)k(k!)γ−1


COM-Poisson

Duality

Hyper-Poisson


Count Regression

Home Page

Title Page

JJ II

J I

Page 10 of 16

Go Back

Full Screen

Close

Quit

The index of dispersion can be adjusted by varying either of the shift (λ) or weight (γ) parameters:

Is this additional flexibility useful?


COM-Poisson

Duality

Hyper-Poisson


Count Regression

Home Page

Title Page

JJ II

J I

Page 11 of 16

Go Back

Full Screen

Close

Quit

5. Count Regression

How do the hyper-Poisson and polyfactorial hyper-Poisson compare to the COM-Poisson in fitting regression models wherethere is over- or under-dispersion?

Sellers and Shmueli’s paper[8] provides an extensive comparison of COM-Poisson regression to (ordinary) Poissonregression and negative binomial regression. I’ve extended their comparisons to the hyper-Poisson distributions, unweightedand weighted.

COM-Poisson P (y | x) =1

0Fγ0 [−;−; θ(x)]

θ(x)y

(1)γy

hyper-Poisson P (y | x) =1

1F1[1;λ; θ(x)]

θ(x)y

(λ)y

pf hyper-Poisson P (y | x) =1

1Fγ1 [1;λ; θ(x)]

θ(x)y

(λ)y(1)γ−1y

θ(x) = eβ0+β1x


COM-Poisson

Duality

Hyper-Poisson


Count Regression

Home Page

Title Page

JJ II

J I

Page 12 of 16

Go Back

Full Screen

Close

Quit

5.1. Data Sets

All three models were used to fit two data sets, one under-dispersed, the other over-dispersed. These data sets come fromSellers and Shmueli, to allow direct comparison with existing results.

Airfreight Breakage The number of broken ampules (out of 1000) in 10 air shipments. The predictor variable is thenumber of times the ampule carton is transferred between aircraft. This data set showed under-dispersion in Poissonregression.

Textile Faults The number of yarn breaks during each of 32 textile process runs. The predictor variable is the log of eachtextile roll. This data set showed over-dispersion in Poisson regression.

5.2. Results

All three models were fitted with maximum likelihood estimates using the NMaximize[Method→"NelderMead"] function inMathematica. Results for the COM-Poisson fits were nearly identical to those reported by Sellers and Shmueli.

Air Freight Fabric Rollsmodel MSE CAIC time MSE CAIC timeCOM-Poisson 1.9 47.198 4.727 21.80 191.03 8.783hyper-Poisson 2.6 55.716 3.198 21.99 195.35 40.903pf hyper-Poisson 1.9 50.341 24.960 20.83 192.52 65.579

Sellers and Shmueli are generous in their definition of MSE (no parameters penalty), but do use a highly penalized AIC:

MSE =1

n

n∑i=1

e2i CAIC = −2 ln(L) + k(1 + lnn)


COM-Poisson

Duality

Hyper-Poisson


Count Regression

Home Page

Title Page

JJ II

J I

Page 13 of 16

Go Back

Full Screen

Close

Quit

The set of four models (Poisson, COM-Poisson, hyper-Poisson, and polyfactorial hyper-Poisson) form a network of nestedmodels which can be compared with likelihood ratio tests. It’s clear for the air freight data that the COM-Poisson modelis the most parsimonious model:

The small p-value for the test comparing the hyper-Poisson and pf-hyper-Poisson confirm that the γ-weighting is whatimproves the model fit.


COM-Poisson

Duality

Hyper-Poisson


Count Regression

Home Page

Title Page

JJ II

J I

Page 14 of 16

Go Back

Full Screen

Close

Quit

The comparison for the textile faults data is similar, with the somewhat arresting indication that the addition of the λparameter to the COM-Poisson model, while reducing the MSE, actually has a negative LRT statistic.


COM-Poisson

Duality

Hyper-Poisson


Count Regression

Home Page

Title Page

JJ II

J I

Page 15 of 16

Go Back

Full Screen

Close

Quit

5.3. Conclusions

• The hyper-Poisson and polyfactorial hyper-Poisson distributions can model over- and under-dispersion in countregression.

• In simple cases, they do not show any marked advantage over COM-Poisson regression.

“..it doesn’t take much to see that the problemsof three little people [models] don’t amount to ahill of beans in this crazy world.”–Rick, Casablanca


COM-Poisson

Duality

Hyper-Poisson


Count Regression

Home Page

Title Page

JJ II

J I

Page 16 of 16

Go Back

Full Screen

Close

Quit

References

[1] George E. Bardwell and Edwin L. Crow, A two-parameter family of hyper-poisson distributions, Journal of the AmericanStatistical Association 59 (1964), no. 305, 133–141.

[2] A. Colin Cameron and Privin K. Trivedi, Regression analysis of count data, Econometric Society Monographs, CambridgeUniversity Press, Cambridge, United Kingdom, 1998.

[3] R. W. Conway and W. L. Maxwell, A queueing model with state dependent service rates, Journal of Industrial Engineering12 (1962), 132–136.

[4] Seth D. Guikema and Jeremy P. Goffelt, A flexible count data regression model for risk analysis, Risk Analysis 28 (2008),no. 1, 213–223.

[5] Norman L. Johnson, Samuel Kotz, and Adrienne W. Kemp, Univariate discrete distributions, 2nd ed., John S. Wileyand Sons, 1993.

[6] Vanda Jowaheer and Muushad Khan, Estimating regression effects in com poisson generalized linear model, WorldAcademy of Science, Engineering and Technology 53 (2009), 1046–1050.

[7] Celestin C. Kokonendji, Dominique Mizere, and N. Balakrishnan, Connections of the Poisson weight function tooverdispersion and underdispersion, Journal of Statistical Planning and Inference 138 (2008), no. 5, 1287–1296.

[8] Kimberly F. Sellers and Galit Shmueli, A flexible regression model for count data, Tech. Report Research Paper No.RHS 06-060, Robert H. Smith School, December 4 2008.

[9] Galit Shmueli, Thomas P. Minka, Joseph B. Kadane, Sharad Borle, and Peter Boatwright, A useful distribution for fittingdiscrete data: Revival of the Conway-Maxwell-Poisson distribution, Journal of the Royal Statistical Society, Series C:Applied Statistics 54 (2005), no. 1, 127–142.


Documents

University of Texas at San Antonio - Double Double …faculty.business.utsa.edu/manderso/presentations/Duality...[3]R. W. Conway and W. L. Maxwell, A queueing model with state dependent