Maximum Likelihood Estimation of Triangular and Polygonal ... · Triangular distributions are a well-known class of distributions that are often used as elementary example of a probability

Maximum Likelihood Estimation of Triangularand Polygonal Distributions

Hien D. Nguyen and Geoffrey J. McLachlan

February 14, 2016

School of Mathematics and Physics, University of Queensland

AbstractTriangular distributions are a well-known class of distributions that

are often used as elementary example of a probability model. In the past,enumeration and order statistic-based methods have been suggested forthe maximum likelihood (ML) estimation of such distributions. A novelparametrization of triangular distributions is presented. The parametriza-tion allows for the construction of an MM (minorization–maximization)algorithm for the ML estimation of triangular distributions. The algo-rithm is shown to both monotonically increase the likelihood evaluations,and be globally convergent. Using the parametrization is then applied toconstruct an MM algorithm for the ML estimation of polygonal distribu-tions. This algorithm is shown to have the same numerical properties asthat of the triangular distribution. Numerical simulation are provided todemonstrate the performances of the new algorithms against establishedenumeration and order statistics-based methods.

1 IntroductionTriangular distributions are a class of continuous distributions that are oftenused as an elementary example in first-level probability and statistics coursesdue to their simple geometric forms; see Johnson et al. (1995, Sec. 26.9) andForbes et al. (2011, Ch. 44) for characterizations of such distributions. Outsideof the classroom, triangular distributions have been used to model task comple-tion times for PERT (program evaluation and review technique) models, pricesassociated with orders placed by investors on single securities that are tradedon the New York Stock Exchange, and hauling times in civil engineering data(see Kotz and van Dorp (2004, Ch. 1) and references within for details).

Let X ∈ [0, 1] be a random variable with probability density

f (x; θ) =

{2x/θ if 0 ≤ x ≤ θ,2 (1− x) / (1− θ) if θ < x ≤ 1,

(1)

1

arX

iv:1

602.

0369

2v2

[st

at.C

O]

13

Feb

2016

where θ ∈ (0, 1). Densities of form (1) are known as triangular distributions.Suppose that we observe realizations x1, ..., xn of a random sample X1, ..., Xn

from a distribution with density (1) with unknown population parameter θ0.We can define the likelihood and log-likelihood of such as sample as Ln (θ) =∏nj=1 f (xj ; θ) and `n (θ) =

∑nj=1 log f (xj ; θ) and estimate the parameter θ0 via

the maximum likelihood (ML) estimator

θ̂n = arg maxθ∈(0,1)

Ln (θ) = arg maxθ∈(0,1)

`n (θ) . (2)

Unfortunately in the case of (1), the computation of (2) cannot be performedby solving the usual score equation, d`n/dθ = 0, for θ, since (1) is not differen-tiable in θ. In attempting to find an elementary ML estimator for (1), Oliver(1972) established that θ̂n = xi for some i = 1, ..., n; see also Johnson and Kotz(1999). Furthermore, Oliver (1972) deduced that if the sample is ordered, suchthat x(1) < x(2) < ... < x(n), then θ̂n ∈ Θn, where

Θn =

{x(i) :

i− 1

n< x(i) <

i

n, i = 1, ..., n

}.

As such, it is enough to find all of the sample observations that are in the setΘn and compute

θ̂n = arg maxθ∈Θn

Ln (θ) . (3)

An analysis on the distribution of the number of observations in Θn forvarious n and θ0 is conducted in Huang and Shen (2007). Here, it appears thatasymptotically, on average, the number of observations in Θn is approximatelytwo. Thus, very few evaluations of the likelihood function are needed whenutilizing the method of Oliver (1972).

Although the argument for the use of (3) is compelling, we wish to establishan estimation algorithm for computing (2) that utilizes a root finding argu-ment, as opposed to one that is based on ordering and sieving. In order todo so, we consider an alternative parametrization of (1) and construct an MM(minorization–maximization) algorithm for the maximization of its likelihoodfunction; see Lange (2013, Ch. 8) regarding MM algorithms.

The MM paradigm has been used to construct iterative algorithms for thecomputation of difficult ML estimates in the past. For example, by Wu andLange (2010) for the multivariate t distribution and power series distributions,by Zhou and Zhang (2012) for the Dirichlet-multinomial distribution, and byZhou and Lange (2010) for various other multivariate discrete distributions.

Furthermore, MM algorithms have been constructed in Hunter (2004) for theestimation of generalized Bradley-Terry models, and in Hunter and Li (2005) toperform variable selection in linear regression models. More recently, MM algo-rithms have been constructed in Nguyen and McLachlan (2015) to obtain MLestimates for mixtures of Gaussian distributions without using matrix opera-tions, and in Nguyen and McLachlan (2016) to obtain ML estimates for Laplacemixtures of linear experts.

2

Apart from our construction of an MM algorithm for the ML estimation ofthe triangular distribution, we also extend our algorithm to the ML estimationfor the polygonal distribution of Karlis and Xekalaki (2008). Additionally, weobtain global convergence results for our MM algorithms in both the triangularand polygonal distribution cases. Reports on some simulations performed todemonstrate the relative performances of our algorithms against establishedmethods of estimation are also included. The article proceeds as follows.

In Section 2, a novel parametrization of the triangular distribution is in-troduced. In Section 3, the MM algorithm paradigm is introduced and an MMalgorithm is constructed for the ML estimation of the triangular distribution. InSection 3, the polygonal distribution is introduced, along with an MM algorithmfor its ML estimation. Reports from some simulation studies are presented inSection 4, and conclusions are drawn in Section 5. Supplementary simulationresults are presented in the Appendix.

2 Alternative Parametrization of the TriangularDistribution

We seek to write (1) in the form

f (x) = min {g1 (x) , g2 (x)} , (4)

for x ∈ [0, 1], g1 (x) = a1 + b1x and g2 (x) = a2 + b2x, for some a1, a2, b1, b2 ∈ R.Without loss of generality, suppose that g1 is the left ray and g2 is the right

ray of the triangle; see Figure 1. This implies that a1 = 0 since g1 has a root atx = 0 and b1 > 0, similarly, since g2 must have a root at when x = 1, we havea2 = −b2 and b2 < 0. Thus, g1 (x) = b1x and g2 (x) = −b2 + b2x.

Next, we find that x∗ = −b2/ (b1 − b2) solves the intercept equation g1 (x∗) =g2 (x∗), which yields a modal density value of f (x∗) = b1b2/ (b2 − b1). Since fis a probability density function, we require that the area under f for x ∈ [0, 1]must equal to one. Since f forms a triangle with the x-axis over the domain, wehave b1b2/2 (b2 − b1) = 1 or b2 = −2b1/ (b1 − 2), by the formula for the area ofa triangle. Finally, we make the substitution β = b1 and note b2 < 0 to obtainthe alternative parametrization of the triangular distribution

f (x;β) = min {βx, 2β (1− x) / (β − 2)} , (5)

for x ∈ [0, 1] and β > 2.Note that the mode x∗ is equal to θ from (1), which we can use to map

between the two parametrizations via the equation θ = 2/β. Therefore, wehave the symmetric triangular distribution when β = 4. Furthermore, usingKarlis and Xekalaki (2008, Eq. 2.3), we have the moment equations

E (Xr) =2(1− 2k+1/βk+1

)

(r + 1) (r + 2) (1− 2/β)

3

x0 1

f(x)

f(x*)

x*

g₁(x)g₂(x)

Figure 1: Reference diagram for the derivation of the alternative parametriza-tion for the triangular distribution.

4

in terms of β, for each k ∈ N, which yields the mean and variance of X: E (X) =1/3 + 2/3β and var (X) = 1/18− 1/9β + 2/9β2, respectively.

3 Maximum Likelihood EstimationLet x1, ..., xn be a realization of a random sample X1, ..., Xn from a distributionwith density (5) with unknown population parameter β0. We can write thelikelihood and log-likelihood for the sample as Ln (β) =

∏nj=1 f (xj ;β) and

`n (β) =n∑

j=1

log min {βxj , 2β (1− xj) / (β − 2)}

=n∑

j=1

log

(βxj + 2

β (1− xj)β − 2

−∣∣∣∣β (xjβ − 2)

β − 2

∣∣∣∣)− n log 2, (6)

by noting that min {x, y} = (x+ y − |x− y|) /2 for any x, y ∈ R. Due to theabsolute value in (6), we cannot obtain the score equation, d`n/dβ = 0, in orderto compute the ML estimate

β̂n = arg maxβ>2

Ln (β) = arg maxβ>2

`n (β) . (7)

As such, we seek to construct an MM algorithm for the iterative computationof (7).

3.1 MM AlgorithmsSuppose that we wish to maximize some objective function L (θ), where θ ∈Θ ⊂ Rd. If we cannot obtain the maximizer of L directly (e.g. due to its lack ofdifferentiability or difficult first-order conditions), then we can seek to minorizeL over Θ, instead. We say that U (θ;ψ) is a minorizer of L if L (ψ) = U (ψ;ψ)and L (θ) ≥ U (θ;ψ), whenever θ 6= ψ and ψ ∈ Θ. Here, we say that L isminorized by U .

Let θ(0) be some initial value of the MM algorithm, and let θ(r) denote therth iterate of the algorithm. The MM algorithm can thus be defined by theupdate scheme

θ(r+1) = arg maxθ∈Θ

U(θ;θ(r)

). (8)

By the property of the minorizer, and by definition (8), we have the inequal-ities

L(θ(r+1)

)≥ U

(θ(r+1);θ(r)

)≥ U

(θ(r);θ(r)

)= L

(θ(r)

).

This implies that the sequence of objective function evaluations, L(θ(r)

), is

monotonically increasing at each iteration. We now present a set of results thatare useful in our MM algorithm construction.

5

Fact 1. If Θ = R\ {0}, then L (θ) = − |θ| can be minorized by

U (θ;ψ) =1

2 |ψ|(θ2 + ψ2

).

Fact 2. If g is convex over Θ, then L (θ) can be minorized by

U (θ;ψ) = g (ψ) + g′ (ψ) (θ − ψ) .

Fact 3. Let F be a strictly increasing function over Θ. If L (θ) is minorized byU (θ;ψ), then F (L (θ)) is minorized by F (U (θ;ψ)).

3.2 Algorithm ConstructionIn order to construct our MM algorithm, we first seek a minorizer for 2f (x;β),conditioned on the rth iterate β(r). An application of Fact 1 yields

u(r)1 (x;β) = βx+ 2

β (1− x)

β − 2− 1

2w(r) (x)

([β (xβ − 2)

β − 2

]2

+[w(r) (x)

]2)

= βx+ 2β (1− x)

β − 2− 1

2w(r) (x)

β2 (xβ − 2)2

(β − 2)2 − w(r) (x)

2,

where w(r) (x) =∣∣β(r)

(xβ(r) − 2

)/(β(r) − 2

)∣∣, by making the substitution θ =|β (xβ − 2) / (β − 2)|.

Using Fact 2, we minorize u(r)1 by

u(r)2 (x;β) = βx+ a(r) (x)− b(r) (x)

(β − β(r)

)

− 1

2w(r) (x)

β2 (xβ − 2)2

(β − 2)2 − w(r) (x)

2, (9)

where a(r) (x) = 2β(r) (1− x) /(β(r) − 2

)and b(r) (x) = 4 (x− 1) /

(β(r) − 2

)2,

by making the substitution g (β) = 2β (1− x) / (β − 2), and by noting that it isconvex for β > 2 when x ∈ [0, 1].

We further note that (9) is concave in β as it is composed of an affinecombination of concave terms. The only nonlinear term in the expression isβ2 (xβ − 2)

2/ (β − 2)

2, which can be shown to be convex as it is the productof three positive convex functions; thus its negative is concave. See Boyd andVandenberghe (2004, Ch. 3) regarding the algebra of convex functions. Anapplication of Fact 3, by making the substitutions of log for F and L (β) =

u(r)2 (x;β), yields the following result.

Proposition 1. Conditioned on the rth iterate of the MM algorithm, the log-likelihood function (6) can be minorized by

U(β;β(r)

)=

n∑

j=1

log u(r)2 (xj ;β)− 2 log n, (10)

6

and U(β;β(r)

)is a concave function in β.

Since (10) is concave, if the first-order condition dU/dβ = 0 has a solutionβ∗, then β∗ must be the global maximum of (10). Unfortunately, we cannotobtain β∗ in closed form. Fortunately, we can obtain the derivative of (10), bynoting that if u (θ) > 0 and differentiable, then d log u/dθ = (du/dθ) /u (θ), andby computing the derivative

du(r)2

dβ= x− b(r) (x)− β (βx− 2)

2

w(r) (x) (β − 2)2 −

β2x (βx− 2)

w(r) (x) (β − 2)2 +

β2 (βx− 2)2

w(r) (x) (β − 2)3 .

A root finding algorithm such as the bisection method, or the method ofBrent (1971) can then be used to compute β∗. Furthermore, a Newton algo-rithm can be constructed by noting that, d2 log u/dθ2 =

(d2θ/dθ2

)/u (θ) −

(du/dθ)2/u2 (θ), and by computing the second derivative

d2u(r)2

dβ2= − (βx− 2)

2

w(r) (x) (β − 2)2 − 4

βx (βx− 2)

w(r) (x) (β − 2)2 + 4

β (βx− 2)2

w(r) (x) (β − 2)3

− β2x2

w(r) (x) (β − 2)2 + 4

β2x (βx− 2)

w(r) (x) (β − 2)3 − 3

β2 (βx− 2)2

w(r) (x) (β − 2)4 .

We can summarize the MM algorithm as follows. First, initialize the algo-rithm by some β(0). At the (r + 1) th iteration of the algorithm, set β(r+1) = β∗

where β∗ globally maximizes U(β;β(r)

).

3.3 Convergence AnalysisIn general, the MM algorithm is iterated until convergence is achieved. In thisarticle, we choose to use the convergence criterion `

(β(r+1)

)− `

(β(r)

)< ε,

where ε > 0 is a small constant; see Lange (2013, Sec. 11.5) regarding therelative merits of convergence criteria. Upon convergence, we declare the finaliterate of the algorithm to be the ML estimator, and we denote it as β̂n.

Let β(∞) = limr→∞ β(r) be a finite limit point of the MM algorithm, startingfrom some initialization β(0) (or alternative, β̂n → β(∞) as ε→ 0). We note thatthe MM algorithm fulfills the assumptions required to be defined as a successivelower-bound maximization (SLM) algorithm in the sense of Razaviyayn et al.(2013). As such, we can apply Razaviyayn et al. (2013, Thm. 1) directly toobtain the following result.

Theorem 1. If β(∞) is a limit point of the MM algorithm, for some initializa-tion β(0), then β(∞) is a local maximizer or inflection point of (6).

Thus, the MM algorithm monotonically increases the likelihood evaluationsand is globally convergent. Unfortunately, it is notable that (6) is highly mul-timodal and thus there is a great likelihood of the algorithm converging to amaximizer that does not globally maximize (6). Fortunately, initialization of thealgorithm is simple since the method of moments estimator β̃n = 2/ (3x̄n − 1)is an unbiased estimator of β0, where x̄n is the sample mean.

7

3.4 Time ComplexityWe now consider a brief analysis of the time complexity of the MM algorithm, interms of the sample size n, in comparison to alternative methods for computing(7). First, note that each Newton iteration involved in the MM algorithmrequires a constant order of operation for each observation and thus each Newtoniterate is of order O (n). If we let A[n]

N and A[n]MM be the average number Newton

iterations taken to achieve a convergence level of ε and the average numberof MM iterations taken to a achieve a convergence level of ε, then the MMalgorithm has a total complexity of order O

(nA

[n]N A

[n]MM

).

Next, in order to compute (3), a sort operation is needed, which has acomplexity order of O (n log n); notable sorting algorithms with this complexityare heap sort and quick sort (cf. Cormen et al. (2002, Pt. 2)). Furthermore,O (n) evaluations are needed to obtain the set Θn and then A

[n]Θ evaluation

valuations of the likelihood function are required at the complexity of O (n),where A[n]

Θ is the average number of elements in Θn. Thus, the complexity ofcomputing (3) is of order O

(n(

log n+A[n]Θ

)). Lastly, it is not difficult to see

that the naive computation the likelihood function at θ̂n = xi for each i has atime complexity order of O

(n2).

If the observations of Huang and Shen (2007) are correct then computing(3) reduces to order O (n log n). Thus, the MM algorithm would be competitiveto (3) if A[n]

N A[n]MM grows slower than log n.

4 Polygonal DistributionsLet X ∈ [0, 1] be a random variable with probability density function

f (x;ψ) =

g∑

i=1

πif (x;βi) , (11)

where πi > 0 for each i = 1, ..., g,∑gi=1 πi = 1, and f (x;βi) is as in (5).

Here, ψ = (π1, ..., πg−1, β1, ..., βg)T is the parameter vector of (11). If X has

density function (11), then we say that X arises from a g component mixture oftriangular distributions, or a g component polygonal distribution in the languageof Karlis and Xekalaki (2008).

Let x1, ..., xn be a realization of the random sample X1, ..., Xn from a pop-ulation characterized by the polygonal density function with parameter ψ0. Itis clear that the log-likehood function of the sample,

`n (ψ) =n∑

j=1

log

g∑

i=1

πif (x;βi)

=

n∑

j=1

log

g∑

i=1

πi min {βixj , 2βi (1− xj) / (βi − 2)} , (12)

8

is difficult to maximize due to its log-of-sums form. Using the results fromSection 3, we now propose an MM algorithm for the maximization of (12), inorder to obtain an ML estimate ψ̂n.

4.1 MM AlgorithmIn order to construct an MM algorithm for the maximization of (12), we requirethe following result from Zhou and Lange (2010).

Fact 4. If Θ = [0,∞)d, then L (θ) = log

(∑di=1 θi

)can be minorized by

U (θ;ψ) =d∑

i=1

τi (ψ) log (θi)−d∑

i=1

τi (ψ) log τi (ψ) ,

where τi (ψ) = ψi/∑dj=1 ψj.

Conditioned on the rth iterate ψ(r) and using Fact 4, we can minorize (12)by

U1

(ψ;ψ(r)

)=

g∑

i=1

n∑

j=1

τ(r)ij log πi −

g∑

i=1

n∑

j=1

τ(r)ij log τ

(r)ij

+

g∑

i=1

n∑

j=1

τ(r)ij min {βixj , 2βi (1− xj) / (βi − 2)} , (13)

where τ (r)ij = π

(r)i f

(xj ;β

(r)i

)/f(xj ;ψ

(r)), by making the substitution θi =

πif (xj ;βi) for each i and j. Since log is a concave function, a direct applicationof Proposition 1 yields the following result.

Proposition 2. Conditioned on the rth iterate of the MM algorithm, both theminorizer (13) and log-likelihood function (12) can be minorized by

U(ψ;ψ(r)

)=

g∑

i=1

n∑

j=1

τ(r)ij log πi −

g∑

i=1

n∑

j=1

τ(r)ij log τ

(r)ij (14)

g∑

i=1

n∑

j=1

τ(r)ij log u

(r)2 (xj ;βi)− 2 log n,

and U(ψ;ψ(r)

)is a concave function in ψ.

It is well known that the unique global maximizer of∑gi=1

∑nj=1 τ

(r)ij log πi

over the unit simplex is π∗i = n−1∑nj=1 τ

(r)ij (cf. McLachlan and Peel (2000,

Ch. 2.8.2)). Furthermore, U(ψ;ψ(r)

)is concave and linearly separable in its

parameter elements.

9

The MM algorithm can be summarized as follows. First, initialize the algo-rithm by some ψ(0). At the (r + 1) th iteration of the algorithm, set π(r+1)

i =

n−1∑nj=1 τ

(r)ij , and β(r+1)

i = β∗ where β∗ maximizes∑nj=1 τ

(r)ij log u

(r)2 (xj ;β),

for each i = 1, ..., g.The algorithm is iterated until `

(ψ(r+1)

)− `

(ψ(r)

)< ε, whereupon we

terminate the algorithm and declare the final iterate the ML estimate ψ̂n. Likein Section 3, if we define ψ(∞) = limr→∞ψ(r) to be a finite limit point of theMM algorithm, and note that the algorithm is an SLM algorithm, then we getthe following result by application of Razaviyayn et al. (2013, Thm. 1).

Theorem 2. If ψ(∞) is a limit point of the MM algorithm, for some initializa-tion ψ(0), then ψ(∞) is a local maximizer or inflection point of (12).

Thus, like the algorithm from Section 3, the MM algorithm for the maxi-mization of (12), again, monotonically increases the likelihood evaluations andis globally convergent. Unfortunately it is not simple to derive a method ofmoments estimator for the polygonal density function parameters, and thus agood initialization ψ(0) is not easily attainable. Furthermore, it is known thatthe log-likelihood function of a mixture distribution is generally multimodal.

One technique for obtaining good initializations is that of McLachlan andPeel (2000, Sec. 2.12.2). Using the clustering interpretation of a mixture model,we can assign arbitrary labels to the data according to some predetermined priorprobabilities π1, ..., πg−1 (e.g. if we let π1, ..., πg−1 = 1/g, then for each j wesample i∗ ∈ {1, ..., g} uniformly, and set τ (0)

ij = 1 if i = i∗ and 0 otherwise).Using the labeled data, we proceed to compute the estimate ψ(1) and the re-spective log-likelihood `n

(ψ(1)

). We repeat this process multiple times and

note the estimate and log-likelihood values from each repetition. The estimatethat yields the highest log-likelihood value is considered the best initializationamong the randomizations.

4.2 Time ComplexityThe MM algorithm for the polygonal distribution with g components is, in effect,maximizing g triangular distribution weighted-log-likelihoods in each iteration.As such, using the same notation as in Section 3.4, the time complexity of theMM algorithm for maximizing the log-likelihood of a g component polygonaldistribution is of order O

(gnA

[n]N A

[n]MM

).

In the EM algorithm of Karlis and Xekalaki (2008), it is suggested thatthe subproblems of maximizing

∑nj=1 τ

(r)ij f (xj ;βi) be solved using exhaustive

enumeration over all observations to obtain the updates β(r+1)i , for each i, at

each iteration. The algorithm suggested in Karlis and Xekalaki (2008) thereforehas a complexity of order O

(gn2A

[n]EM

), where A[n]

EM is the average numberof EM iterations required to achieve a convergence level of ε. Thus, the MMalgorithm would be competitive to the current method if A[n]

N A[n]MM grows slower

than nA[n]EM .

10

5 Numerical SimulationsWe present two sets of numerical simulations, S1 and S2, in order to assessthe performance of our MM algorithms. In S1, we compare the MM algorithmfrom Section 3 to the triangular distribution ML estimation method of Oliver(1972). In S2, we compare the MM algorithm from Section 3 to the polygonaldistribution EM algorithm of Karlis and Xekalaki (2008).

All of our simulations are performed in the R statistical programming envi-ronment (R Core Team, 2013), on an Intel Core i7-2600 CPU running at 3.40GHz with 16 GB of internal RAM. In particular, we utilize the sort function inR, which is an implementation of the quick sort algorithm (see Cormen et al.(2002, Ch. 7) for details), for the sorting of samples necessary in the computa-tion of (3). Furthermore, all computations of log-likelihood values, minorizers,and derivatives are performed using functions that are programmed in Rcppand Rcpparmadillo (Eddelbuettel, 2013). The convergence thresholds for theEM algorithm, MM algorithm, and Newton algorithm are all set to ε = 10−3,and function timing is conducted using the proc.time function in R.

5.1 Simulation S1In S1, we simulate n ∈ {5, 10, 20, 50, 100, 500, 1000, 10000, 100000, 1000000} ob-servations from triangular distributions with β0 ∈ {4, 5, 20/3, 10, 20}, as per thesimulations from Huang and Shen (2007). For each combination of n and β,we compute (3) and note the time taken in seconds. Additionally, we apply theMM algorithm from Section 3 and note the time taken, as well as the numberof MM iterations and Newton iterations that are required. Each combination isrepeated 1000 times and the average times and number of iterations are reportedin Table 1.

From Table 1, we see that the computation times of the MM algorithm arenot competitive with the times taken to compute(3), in the range of scenariosassessed. At best, the MM algorithm is 1.84 times slower than (3) (n = 10 andβ0 = 5), and at worst, the MM algorithm is almost 300 times slower (n = 1000and β0 = 4).

We plot the product of A[n]MM and A[n]

N against log10 n, for each β0, in Figure2. Upon inspection, it is observable that the products are decreasing in log10n.Further, as n increases, the product appears to reach a small asymptotic con-stant in each case. Thus, based on the conclusions from Section 3.4, the MMalgorithm may become more competitive to (3) as n increases.

In Figure 3, we plot the logarithm of the average time taken to computethe ML using the MM algorithm divided by the average time taken using (3)against log10 n, for each β0. It is observable that in each case, the ratio increasesup until n = 500 or n = 1000, and then decreases as n increases. This resultprovides further evidence that the MM algorithm may become more competitiveto (3) for larger n.

Further results regarding the the accuracy of the ML estimates that arecomputed using either the MM algorithm or (3) are provided in Appendix A.

11

Tab

le1:

SimulationresultsforS1

.The

rowsOliv

eran

dMM

repo

rttheaverag

etimetakento

compu

tetheMLestimateusing

therespective

metho

ds,insecond

s.The

rowsA

[n]

MM

andA

[n]

Nrepo

rttheaverag

enu

mbe

rof

MM

andNew

toniterations

taken.

Allaverag

esaretakenover

1000

repe

tition

s.β0

n5

1020

5010

050

010

0010

000

1000

0010

0000

0

4O

liver

0.00

007

0.00

007

0.00

005

0.00

019

0.00

011

0.00

013

0.00

014

0.00

203

0.02

125

0.26

572

MM

0.00

023

0.00

036

0.00

038

0.00

101

0.00

158

0.00

578

0.00

858

0.03

511

0.38

384

3.97

548

A[n

]M

M3.

553

3.19

42.

596

2.23

52.

129

2.28

72.

646

7.73

312

.627

13.0

53A

[n]

N40

.724

23.8

2623

.869

25.9

0724

.593

18.8

7211

.365

1.55

41.

000

1.00

05

Oliv

er0.

0001

00.

0001

30.

0000

50.

0000

50.

0001

30.

0001

10.

0001

50.

0018

60.

0216

70.

2658

3M

M0.

0002

90.

0002

40.

0003

30.

0009

20.

0018

0.00

805

0.01

163

0.04

340.

3934

24.

3645

4A

[n]

MM

3.43

42.

989

2.51

52.

337

2.18

01.

945

2.14

35.

480

13.0

0614

.596

A[n

]N

50.3

2832

.188

25.0

0728

.320

32.2

0930

.392

20.6

872.

892

1.01

01.

000

20/3

Oliv

er0.

0001

90.

0000

90.

0001

10.

0001

40.

0000

80.

0001

20.

0003

50.

0023

0.02

169

0.26

682

MM

0.00

039

0.00

043

0.00

066

0.00

145

0.00

246

0.01

011

0.01

937

0.07

310.

4156

84.

9372

3A

[n]

MM

3.38

43.

013

2.65

22.

441

2.19

31.

983

1.95

73.

981

12.8

3016

.517

A[n

]N

75.8

7249

.292

37.2

5037

.605

39.8

6840

.978

37.4

926.

954

1.09

51.

000

10O

liver

0.00

007

0.00

014

0.00

009

0.00

013

0.00

013

0.00

025

0.00

037

0.00

226

0.02

211

0.26

784

MM

0.00

038

0.00

072

0.00

109

0.00

234

0.00

352

0.01

658

0.03

213

0.15

409

0.57

734

6.10

410

A[n

]M

M3.

173

2.96

52.

774

2.55

72.

372

2.11

12.

090

2.53

810

.822

20.4

26A

[n]

N87

.219

76.4

6467

.396

66.1

0051

.434

59.7

9258

.826

23.4

831.

916

1.00

120

Oliv

er0.

0000

90.

0001

80.

0001

40.

0000

80.

0001

40.

0001

40.

0002

40.

0018

70.

0214

80.

2562

5M

M0.

0006

40.

0011

00.

0018

20.

0040

80.

0071

70.

0353

60.

0694

70.

4487

61.

6951

38.

7322

9A

[n]

MM

3.34

83.

194

2.92

62.

732

2.61

82.

341

2.25

82.

027

4.80

724

.595

A[n

]N

133.

337

114.

601

102.

336

108.

354

106.

633

116.

912

117.

251

85.9

2613

.589

1.21

9

12

1 2 3 4 5 6

0100

200

300

400

log10(n)

AMMAN

Figure 2: Product of A[n]MM and A

[n]N versus log10 n, in S1. Circles, triangles,

pluses, crosses, and diamonds indicate the scenarios β0 = 4, 5, 20/3, 10, 20, re-spectively.

1 2 3 4 5 6

0.5

1.0

1.5

2.0

2.5

log10(n)

log10(Ratio)

Figure 3: Log-ratio of the average computation time for the MM algorithmand the average computation time for (3) versus log10 n, in S1. Circles, trian-gles, pluses, crosses, and diamonds indicate the scenarios β0 = 4, 5, 20/3, 10, 20,respectively.

13

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

PDFs of Polygonal Distributions

x

Den

sity

Figure 4: Plots of the Case 1 (solid) and Case 2 (dashed) polygonal probabilitydensity functions.

Results regarding the sensitivity of our conclusions to changes in ε can be foundin Appendix B.

5.2 Simulation S2In S2, we simulate n ∈ {5, 10, 20, 50, 100, 500, 1000} observations from twopolygonal distributions, Cases 1 and 2, that were suggested by Karlis andXekalaki (2008). In Case 1, g = 2 and

ψ0 = (π01, β01, β02)T

=

(1

2,

8

3, 8

)T,

and in Case 2, g = 3 and

ψ0 = (π01, π02, β01, β02, β03)T

=

(1

3,

1

3,

20

9, 4, 20

)T.

Probability density functions for the two cases are displayed in Figure 2. Foreach case and each n, we compute the ML estimate via the EM algorithm ofKarlis and Xekalaki (2008) and note the time taken in seconds, as well as thenumber of EM iterations taken. Additionally, we apply the MM algorithm fromSection 4 and note the time taken, as well as the number of MM iterations thatare required. Each combination is repeated 1000 times and the average timesand number of iterations are reported in Table 2.

From Table 2, we see that the MM algorithm is faster than the EM algorithmin all cases except for in Case 1 with n = 5, where the MM algorithm is 1.03

14

Table 2: Simulation results for S2. The rows EM and MM report the averagetime taken to compute the ML estimate using the respective methods, in sec-onds. The rows A[n]

EM and A[n]MM report the average number of EM and MM

iterations taken. All averages are taken over 1000 repetitions.Case n 5 10 20 50 100 500 1000

1 EM 0.00097 0.00285 0.00562 0.01809 0.03889 0.29362 0.85880MM 0.00100 0.00172 0.00207 0.00537 0.00912 0.04291 0.08671A

[n]EM 6.219 9.494 11.673 15.164 16.244 16.944 17.513

A[n]MM 9.248 13.646 13.182 15.921 17.329 23.318 30.001

2 EM 0.00206 0.00596 0.01262 0.04252 0.11752 1.43063 5.12175MM 0.00154 0.00196 0.00472 0.01053 0.02176 0.18104 0.49679A

[n]EM 9.505 13.285 17.311 24.658 33.503 56.824 72.536

A[n]MM 11.436 17.772 26.304 39.620 49.130 77.511 98.241

times slower than the EM algorithm. In Figure 5, we plot the logarithm of theaverage computation-time ratio for the MM algorithm versus the EM algorithm,for the two simulation cases. Upon inspection of Figure 5, we see that the ratiois decreasing as n increases, for both cases, indicating that the MM algorithmscales better than the EM algorithm, with respect to n.

The results from Section 5.1 indicate that the average number of Newtonsteps A[n]

N appears to decrease as a function of n. Thus, from the analysis ofSection 4.2, the MM algorithm would be an improvement over the EM algorithmif A[n]

MM and A[n]EM both grow at similar rates, with respect to n. In Figure 6,

we plot A[n]MM and A[n]

EM as a function of log10 n, for the two simulation cases.Figure 6 indicates that the rate of growth of A[n]

MM and A[n]EM appear to be linear

in log10 n, for both simulation cases. Furthermore, the rates of growth of A[n]MM

and A[n]EM appear to be similar in both cases, which indicates that the MM

algorithm improves upon the EM algorithm in the scenarios studied. Furtherresults regarding the the accuracy of the ML estimates that are computed usingeither the MM or EM algorithms are provided in Appendix C.

6 ConclusionsIn this article, we considered the ML estimation of the triangular distribu-tion via an MM algorithm. The MM algorithm was constructed via a novelparametrization of the triangular distribution. The constructed algorithm wasshown to have desirable numerical properties, such as monotonicity in likeli-hood evaluations, and convergence to a saddle point or local maximum of thelog-likelihood function.

Using the novel triangular distribution parametrization, we also proposedan MM algorithm for the ML estimation of the polygonal distribution. Thisalgorithm retains the desirable numerical properties that were shown for the

15

1.0 1.5 2.0 2.5 3.0

−1.

0−

0.8

−0.

6−

0.4

−0.

20.

0

log10(n)

log 1

0(Rat

io)

Figure 5: Log-ratio of the average computation time for the MM and EM al-gorithms versus log10 n, in S2. Circles and triangles indicate Cases 1 and 2,respectively.

1.0 1.5 2.0 2.5 3.0

2e+

044e

+04

6e+

048e

+04

1e+

05

log10(n)

AM

MA

EM

Figure 6: Average number of EM and MM iterations versus the log10 n, from S2.Solid and dashed lines refer to the MM and EM algorithm results, respectively.Circles and triangles refer to Cases 1 and 2 results, respectively.

16

triangular distribution MM algorithm.To assess our algorithms, we performed a set of numerical simulations. The

MM algorithm for the triangular distribution appears to be not competitive withthe established method, in the scenarios that were assessed. However, analysis ofrequired numbers of iterations indicate that the MM algorithm scales well withincreases in the number of observations. The MM algorithm for the polygonaldistribution is shown to be faster than the currently used EM algorithm, in thescenarios that were assessed.

We believe that this article introduces an interesting approach to the MLestimation of piecewise linear density functions. In the future, it would not bedifficult to extend our current methodology to the ML estimation of other piece-wise densities, such as the trapezoidal densities of van Dorp and Kotz (2003), orother distributions that are defined by piecewise density functions on an intervalsupport such as the generalized trapezoidal and two-sided distributions that areproposed in Kotz and van Dorp (2004). Such distributions have applications inmodeling axial burnup distributions in nuclear engineering and USA certificatedeposit rate distributions in financial engineering (cf. Sections 5 and 6 of Kotzand van Dorp (2004), respectively).

Furthermore, the new estimation techniques that are discussed in this articlemay be applied to the estimation of piecewise density functions that are definedon compact two-dimensional or multidimensional supports. Although we knowof no such densities currently, it is possible to construct pyramidal-type densityfunctions using the concept of nodal basis functions in finite element analysis;see Sangalli et al. (2013) and Nguyen et al. (2016), for example.

Appendices

A. Accuracy of the ML Estimates in S1For each β0 and n, we simulate data from a triangular distribution as per Sec-tion 5.1. Using these data, we compute the squared error of the ML estimatecomputed using either the MM algorithm or (3) to the true parameter β0. Thisprocess is repeated 1000 times and the MSE (mean-squared error) is calculatedas the average squared error over the replicates, for each combination of β0 andn. The results of these simulations are presented in Table 3.

Upon inspection of Table 3, we see that the MSE of the ML estimates aredecreasing in n as would generally be expected. This appears to be the case forboth the estimates obtained via the MM algorithm and via (3), and across allvalues of β0. The Ratio row of Table 3 appears to indicate that neither the MMalgorithm nor (3) yield consistently more accurate estimates than the other.

B. Sensitivity of Results to ε in S1We now analyze whether the conclusions from Section 5.1 and Appendix Aare sensitive to different values of ε. To conduct the analysis, we repeat the

17

Tab

le3:

MSE

resultsforS1

.The

rowsOliv

eran

dMM

repo

rttheMSE

sof

theMLestimates

usingtherespective

techniqu

es.

The

MSE

sarecompu

tedas

averag

esqua

rederrors

over

1000

repe

tition

s.The

Ratio

row

repo

rtstheratioof

theMM

and

Oliv

errows.

The

notation

aEbequa

testoa×

10b.

β0

n5

1020

5010

050

010

0010

000

1000

0010

0000

0

4O

liver

3.08

E+

038.

53E

+02

1.59

E+

028.

15E

-01

2.12

E-0

13.

77E

-02

1.76

E-0

21.

69E

-03

1.67

E-0

41.

55E

-05

MM

4.83

E+

023.

09E

+01

6.39

E+

009.

94E

-01

2.74

E-0

14.

54E

-02

1.97

E-0

21.

67E

-03

1.58

E-0

41.

49E

-05

Rat

io0.

160.

040.

041.

221.

291.

201.

120.

980.

940.

965

Oliv

er5.

83E

+02

5.04

E+

022.

23E

+02

3.38

E+

007.

36E

-01

8.74

E-0

24.

23E

-02

3.85

E-0

33.

67E

-04

3.80

E-0

5M

M4.

63E

+02

6.14

E+

018.

53E

+00

2.16

E+

007.

54E

-01

1.13

E-0

14.

88E

-02

3.96

E-0

33.

47E

-04

3.54

E-0

5R

atio

0.80

0.12

0.04

0.64

1.03

1.30

1.15

1.03

0.94

0.93

20/3

Oliv

er4.

83E

+03

1.18

E+

034.

93E

+02

1.12

E+

012.

06E

+00

2.41

E-0

11.

18E

-01

1.09

E-0

21.

06E

-03

1.12

E-0

4M

M2.

13E

+03

3.54

E+

033.

52E

+01

7.49

E+

003.

06E

+00

3.00

E-0

11.

52E

-01

1.21

E-0

21.

01E

-03

1.02

E-0

4R

atio

4.41

3.00

0.07

0.67

1.48

1.25

1.29

1.11

0.96

0.92

10O

liver

1.96

E+

052.

50E

+03

1.34

E+

031.

15E

+03

9.48

E+

001.

01E

+00

4.50

E-0

14.

41E

-02

3.85

E-0

34.

17E

-04

MM

1.82

E+

043.

88E

+03

2.18

E+

025.

53E

+01

8.21

E+

001.

27E

+00

5.40

E-0

15.

01E

-02

3.43

E-0

33.

72E

-04

Rat

io0.

091.

550.

160.

050.

871.

251.

201.

140.

890.

8920

Oliv

er8.

57E

+03

3.91

E+

031.

52E

+03

8.43

E+

027.

74E

+02

1.05

E+

015.

10E

+00

3.73

E-0

13.

72E

-02

3.62

E-0

3M

M5.

01E

+03

2.02

E+

039.

62E

+02

4.43

E+

021.

23E

+02

1.31

E+

015.

60E

+00

4.58

E-0

14.

14E

-02

2.83

E-0

3R

atio

0.59

0.52

0.63

0.53

0.16

1.24

1.10

1.23

1.11

0.78

18

1 2 3 4 5 6

050

100

150

200

250

log10(n)

AMMAN


[n]N versus log10 n, in S1 with ε = 10−2.

Circles, triangles, pluses, crosses, and diamonds indicate the scenarios β0 =4, 5, 20/3, 10, 20, respectively.

simulation study from Section 5.1 and Appendix A with ε set to 10−2 and10−4, instead. For brevity, we do produce plots of the product of A[n]

MM andA

[n]N against log10 n and plots of the logarithm of the average times taken to

compute the ML using the MM algorithm divided by the average time takenusing (3) against log10 n, for each β0 (as in Figure 2 and 3), instead of the fullresult tables (as in Table 1). The plots for the ε = 10−2 and ε = 10−4 casesare presented in Figures 7 and 8, and Figures 9 and 10, respectively. Tables4 presents the MSE results for the algorithm using ε = 10−2 and ε = 10−4,respectively.

Upon inspection of Figures 7 and 9, we observe that the product of A[n]MM

and A[n]N is decreasing for each value of β0 and for both ε values. Figures 8

and 10 indicate that the average ratio of computation times is also a decreasingfunction of n, after n = 1000. Both of these results conform with the conclusionsfrom Section 5.1. The results of Table 4 closely corresponds with those obtainedin Appendix A. We note that the MSE is decreasing in n, and that there areno discernible structural relationships between the MSEs of the MM algorithmand those of (3).

C. Accuracy of ML Estimates in S2Using the two cases and values of n, from Section 5.2, we simulate data frompolygonal distributions. Using these data, we compute the squared error ofthe ML estimate computed using either the EM algorithm or MM algorithm

19

1 2 3 4 5 6

0.5

1.0

1.5

2.0

log10(n)

log10(Ratio)

Figure 8: Log-ratio of the average computation time for the MM algorithmand the average computation time for (3) versus log10 n, in S1 with ε = 10−2.Circles, triangles, pluses, crosses, and diamonds indicate the scenarios β0 =4, 5, 20/3, 10, 20, respectively.

1 2 3 4 5 6

100

200

300

400

500

600

log10(n)

AMMAN


[n]N versus log10 n, in S1 with ε = 10−4.

Circles, triangles, pluses, crosses, and diamonds indicate the scenarios β0 =4, 5, 20/3, 10, 20, respectively.

20

Tab

le4:

Add

itiona

lMSE

resultsforS1

.The

rowsε

=10−

2an

dε

=10−

4repo

rttheMSE

sof

theML

estimates

usingMM

algo

rithm

withtherespective

conv

ergenc

ecriteria.The

MSE

sarecompu

tedas

averag

esqua

rederrors

over

1000

repe

tition

s.The

notation

aEbequa

testoa×

10b.

β0

n5

1020

5010

050

010

0010

000

1000

0010

0000

0

4ε=

10−2

3.04

E+

032.

10E

+01

3.49

E+

001.

05E

+00

2.58

E-0

13.

36E

-02

1.65

E-0

21.

78E

-03

1.60

E-0

41.

70E

-05

ε=

10−4

2.43

E+

032.

94E

+01

6.32

E+

007.

44E

-01

3.83

E-0

15.

20E

-02

2.59

E-0

22.

09E

-03

1.84

E-0

41.

50E

-05

5ε=

10−2

4.27

E+

025.

29E

+01

1.52

E+

011.

56E

+00

7.04

E-0

17.

81E

-02

3.96

E-0

23.

98E

-03

3.80

E-0

43.

68E

-05

ε=

10−4

1.18

E+

036.

52E

+01

9.18

E+

002.

01E

+00

6.79

E-0

11.

24E

-01

5.27

E-0

25.

00E

-03

4.51

E-0

43.

62E

-05

20/3

ε=

10−2

2.01

E+

031.

94E

+02

2.81

E+

014.

56E

+00

1.94

E+

002.

52E

-01

1.13

E-0

19.

45E

-03

9.73

E-0

41.

00E

-04

ε=

10−4

1.36

E+

045.

28E

+03

5.17

E+

017.

69E

+00

2.85

E+

003.

20E

-01

1.47

E-0

11.

40E

-02

1.18

E-0

31.

03E

-04

10ε=

10−2

4.27

E+

025.

29E

+01

1.52

E+

011.

56E

+00

7.04

E-0

17.

81E

-02

3.96

E-0

23.

98E

-03

3.80

E-0

43.

68E

-05

ε=

10−4

1.70

E+

049.

81E

+03

1.73

E+

026.

97E

+01

8.02

E+

001.

15E

+00

5.71

E-0

15.

32E

-02

4.68

E-0

34.

29E

-04

20ε=

10−2

1.01

E+

042.

59E

+03

5.58

E+

023.

55E

+02

1.35

E+

021.

10E

+01

4.06

E+

003.

05E

-01

2.65

E-0

22.

28E

-03

ε=

10−4

9.03

E+

038.

91E

+03

9.00

E+

023.

03E

+02

1.84

E+

021.

44E

+01

5.45

E+

004.

90E

-01

4.52

E-0

24.

43E

-03

21

1 2 3 4 5 6

0.5

1.0

1.5

2.0

2.5

log10(n)

log10(Ratio)

Figure 10: Log-ratio of the average computation time for the MM algorithmand the average computation time for (3) versus log10 n, in S1 with ε = 10−4.Circles, triangles, pluses, crosses, and diamonds indicate the scenarios β0 =4, 5, 20/3, 10, 20, respectively.

to the true parameter vector ψ0. This process is repeated 1000 times and theMSE (mean-squared error) is calculated as the average squared error over thereplicates, for each element of the parameter vector in each of the case and eachn. The results of these simulations are presented in Table 5.

Similarly to the results of Appendix A, we observe that the MSEs are de-creasing in n, for both algorithms. However, there appears to be no evidence ofeither algorithm being superior in accuracy to the other.

ReferencesBoyd, S., Vandenberghe, L., 2004. Convex Optimization. Cambridge University

Press, Cambridge.

Brent, R. P., 1971. An algorithm with guaranteed convergence for finding a zeroof a function. Computer Journal 14, 422–425.

Cormen, T. H., Leiserson, C. E., Rivest, R. L., Stein, C., 2002. Introduction ToAlgorithms. MIT Press, Cambridge.

Eddelbuettel, D., 2013. Seamless R and C++ Integration with Rcpp. Springer,New York.

Forbes, C., Evans, M., Hastings, N., Peacock, B., 2011. Statistical Distributions.Wiley, New York.

22

Tab

le5:

MSE

resultsforS2

.The

rowsEM

andMM

repo

rttheMSE

sof

theML

estimates

usingtherespective

techniqu

es.

Foreach

parameter

vector

compo

nent,the

MSE

sarecompu

tedas

averag

esqua

rederrors

over

1000

repe

tition

s.The

Ratio

row

repo

rtstheratioof

theMM

andEM

rows.

The

notation

aEbequa

testoa×

10b.

Cas

eE

lem

ent

n5

1020

5010

050

010

00

1π1

EM

1.10

E-0

18.

81E

-02

3.34

E-0

28.

07E

-03

3.33

E-0

32.

95E

-04

4.12

E-0

6M

M4.

06E

-02

2.01

E-0

29.

03E

-03

3.24

E-0

31.

23E

-03

1.14

E-0

31.

10E

-03

Rat

io3.

69E

-01

2.28

E-0

12.

70E

-01

4.01

E-0

13.

69E

-01

3.86

E+

002.

67E

+02

β1

EM

7.43

E+

012.

25E

+01

1.76

E+

011.

32E

+01

3.51

E+

001.

37E

+00

2.00

E-0

2M

M7.

51E

+01

7.15

E+

013.

35E

+00

1.17

E+

002.

37E

-01

1.72

E-0

33.

10E

-04

Rat

io1.

01E

+00

3.18

E+

001.

90E

-01

8.86

E-0

26.

75E

-02

1.26

E-0

31.

55E

-02

β2

EM

1.37

E+

006.

87E

-02

2.38

E-0

28.

98E

-03

1.61

E-0

36.

06E

-04

4.58

E-0

6M

M1.

78E

-01

9.06

E-0

28.

26E

-02

2.73

E-0

21.

18E

-02

1.29

E-0

41.

79E

-06

Rat

io1.

30E

-01

1.32

E+

003.

47E

+00

3.04

E+

007.

33E

+00

2.13

E-0

13.

91E

-01

2π1

EM

3.27

E-0

32.

05E

-03

2.38

E-0

41.

73E

-04

1.55

E-0

49.

24E

-05

2.95

E-0

5M

M5.

69E

-02

4.92

E-0

24.

19E

-02

2.14

E-0

29.

48E

-03

2.75

E-0

32.

49E

-03

Rat

io1.

74E

+01

2.40

E+

011.

76E

+02

1.24

E+

026.

12E

+01

2.98

E+

018.

44E

+01

π2

EM

2.27

E-0

21.

49E

-02

5.42

E-0

34.

33E

-03

3.97

E-0

37.

52E

-04

4.37

E-0

8M

M9.

89E

-02

8.08

E-0

25.

21E

-02

3.54

E-0

22.

48E

-02

2.26

E-0

21.

60E

-02

Rat

io4.

36E

+00

5.42

E+

009.

61E

+00

8.18

E+

006.

25E

+00

3.01

E+

013.

66E

+05

β1

EM

2.17

E+

021.

88E

+02

1.65

E+

021.

32E

+02

1.26

E+

021.

17E

+02

6.76

E+

01M

M1.

16E

+03

2.48

E+

022.

19E

+02

1.94

E+

021.

20E

+02

5.17

E+

015.

14E

+00

Rat

io5.

35E

+00

1.32

E+

001.

33E

+00

1.47

E+

009.

52E

-01

4.42

E-0

17.

60E

-02

β2

EM

2.02

E+

016.

11E

+00

1.64

E+

001.

36E

+00

4.98

E-0

11.

45E

-02

9.06

E-0

3M

M2.

53E

+00

2.25

E-0

11.

86E

-01

1.38

E-0

14.

98E

-02

3.89

E-0

23.

14E

-02

Rat

io1.

25E

-01

3.68

E-0

21.

13E

-01

1.01

E-0

11.

00E

-01

2.68

E+

003.

47E

+00

β3

EM

3.93

E+

011.

81E

+01

9.35

E+

008.

67E

+00

6.17

E+

002.

83E

+00

2.75

E+

00M

M1.

14E

+01

4.88

E+

004.

62E

+00

3.82

E+

002.

50E

+00

2.42

E+

001.

70E

+00

Rat

io2.

90E

-01

2.70

E-0

14.

94E

-01

4.41

E-0

14.

05E

-01

8.55

E-0

16.

18E

-01

23

Huang, J. S., Shen, P. S., 2007. More maximum likelihood oddities. Journal ofStatistical Planning and Inference 137, 2151–2155.

Hunter, D. R., 2004. MM algorithms for generalized Bradley-Terry models. An-nals of Statistics 32, 384–406.

Hunter, D. R., Li, R., 2005. Variable selection using MM algorithms. Annals ofStatistics 33.

Johnson, N. L., Kotz, S., 1999. Non-smooth sailing or triangular distributionsrevisited after some 50 years. Statistician 48, 179–187.

Johnson, N. L., Kotz, S., Balakrishnan, N., 1995. Continuous Univariate Distri-butions. Vol. 2. Wiley, New York.

Karlis, D., Xekalaki, E., 2008. The polygonal distribution. In: Advances inMathematical and Statistical Modeling. Birkhauser, Boston.

Kotz, S., van Dorp, J. R., 2004. Beyond Beta: Other Continuous Families ofDistributions with Bounded Support and Applications. World Scientific, Sin-gapore.

Lange, K., 2013. Optimization. Springer, New York.

McLachlan, G. J., Peel, D., 2000. Finite Mixture Models. Wiley, New York.

Nguyen, H. D., McLachlan, G. J., 2015. Maximum likelihood estimation ofGaussian mixture models without matrix operations. Advances in Data Anal-ysis and Classification In Press.

Nguyen, H. D., McLachlan, G. J., 2016. Laplace mixture of linear experts.Computational Statistics and Data Analysis 93, 177–191.

Nguyen, H. D., McLachlan, G. J., Wood, I. A., 2016. Mixture of spatial spline re-gressions for clustering and classification. Computational Statistics and DataAnalysis 93, 76–85.

Oliver, E. H., 1972. A maximum likelihood oddity. American Statistician 26,43–44.

R Core Team, 2013. R: a language and environment for statistical computing.R Foundation for Statistical Computing.

Razaviyayn, M., Hong, M., Luo, Z.-Q., 2013. A unified convergence analysisof block successive minimization methods for nonsmooth optimization. SIAMJournal of Optimization 23, 1126–1153.

Sangalli, L. M., Ramsay, J. O., Ramsay, T. O., 2013. Spatial spline regressionmodels. Journal of the Royal Statistical Society Series B 75, 681–703.

van Dorp, J. R., Kotz, S., 2003. Generalized trapezoidal distributions. Metrika58, 85–97.

24

Wu, T. T., Lange, K., 2010. The MM alternative to EM. Statistical Science 25,492–505.

Zhou, H., Lange, K., 2010. MM algorithms for some discrete multivariate dis-tributions. Journal of Computational and Graphical Statistics 19, 645–665.

Zhou, H., Zhang, Y., 2012. EM vs MM: a case study. Computational Statisticsand Data Analysis 56, 3909–3920.

25

Documents

Maximum Likelihood Estimation of Triangular and Polygonal ... · Triangular distributions are a well-known class of distributions that are often used as elementary example of a probability