45

RJ - IBM...r to publication should b e limited to p eer communications and sp eci c requests. After outside publication, requests should b e lled only b y rep rints o r legally obtained

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: RJ - IBM...r to publication should b e limited to p eer communications and sp eci c requests. After outside publication, requests should b e lled only b y rep rints o r legally obtained

RJ 10025 (90521) May 29, 1996 (Revised 3/20/98)Computer Science

Research ReportESTIMATING THE NUMBER OF CLASSES IN

A FINITE POPULATION

Peter J. Haas

IBM Research DivisionAlmaden Research Center650 Harry RoadSan Jose, CA 95120-6099

Lynne Stokes

Department of Management Science andInformation Systems

University of TexasAustin, TX 78712

LIMITED DISTRIBUTION NOTICE

This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication.It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the

outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and speci�crequests. After outside publication, requests should be �lled only by reprints or legally obtained copies of the article (e.g.,payment of royalties).

IBMResearch DivisionYorktown Heights, New York � San Jose, California � Zurich, Switzerland

Page 2: RJ - IBM...r to publication should b e limited to p eer communications and sp eci c requests. After outside publication, requests should b e lled only b y rep rints o r legally obtained
Page 3: RJ - IBM...r to publication should b e limited to p eer communications and sp eci c requests. After outside publication, requests should b e lled only b y rep rints o r legally obtained

ESTIMATING THE NUMBER OF CLASSES IN

A FINITE POPULATION

Peter J. Haas

IBM Research DivisionAlmaden Research Center650 Harry RoadSan Jose, CA 95120-6099e-mail: [email protected]

Lynne Stokes

Department of Management Science andInformation Systems

University of TexasAustin, TX 78712e-mail: [email protected]

ABSTRACT: We use an extension of the generalized jackknife approach of Gray andSchucany to obtain new nonparametric estimators for the number of classes in a �nitepopulation of known size. We also show that generalized jackknife estimators are closelyrelated to certain Horvitz-Thompson estimators, to an estimator of Shlosser, and to esti-mators based on sample coverage. In particular, the generalized jackknife approach leads toa modi�cation of Shlosser's estimator that does not su�er from the erratic behavior of theoriginal estimator. The performance of both new and previous estimators is investigatedby means of an asymptotic variance analysis and a Monte Carlo simulation study.

Keywords: jackknife, sample coverage, number of species, number of classes, database,census

Page 4: RJ - IBM...r to publication should b e limited to p eer communications and sp eci c requests. After outside publication, requests should b e lled only b y rep rints o r legally obtained
Page 5: RJ - IBM...r to publication should b e limited to p eer communications and sp eci c requests. After outside publication, requests should b e lled only b y rep rints o r legally obtained

1. Introduction

The problem of estimating the number of classes in a population has been studied formany years. A recent review article (Bunge and Fitzpatrick 1993) lists more than 125references. In this article, we consider an important special case of the general problem |estimating the number of classes in a �nite population of known size. Only a handful ofpapers have addressed this problem and none has reached an entirely satisfactory solution,despite the fact that the �rst attempt at solution appeared in the statistical literature nearly50 years ago (Mosteller 1949). The problem we consider has arisen in the literature in avariety of applications, including the following.

(i) In a company-sponsored contest, many entries (say several hundred thousand) havebeen received. It is known that some people have entered more than once. The goal isto estimate the number of di�erent people who have entered from a sample of entries(Mosteller 1949; Sudman 1976).

(ii) A sampling frame is constructed by combining a number of lists that may containoverlapping entries. It is desired to estimate, using a sample from all lists, the numberof units on the combined list (Deming and Glasser 1959; Goodman 1952; Kish 1965,Sec. 11.2; Sudman 1976, Sec. 3.6). An important example of such a problem is an\administrative records census," currently under study by the U.S. Bureau of theCensus. In such a census, several administrative �les (such as AFDC or IRS records)are combined, and the total number of distinct individuals included in the combined�le is determined. Exact computation of the number of distinct individuals in thecombined �le is extremely expensive because of the high cost of determining thenumber of duplicated entries. A similar problem and proposed solution was discussedin the London Financial Times (March 2, 1949) by C. F. Carter, who was interestedin estimating the number of di�erent investors in British industrial stocks based onsamples from share registers of companies (Mosteller 1949).

(iii) In a relational database system, data are organized in tables called relations (see,e.g., Korth and Silberschatz 1991, Chap. 3). In a typical relation, each row mightrepresent a record for an individual employee in a company, and each column mightcorrespond to a di�erent attribute of the employee, such as salary, years of experience,department number, and so forth. A relational query speci�es an output relation thatis to be computed from the set of base relations stored by the system. Knowledgeof the number of distinct values for each attribute in the base relations is centralto determining the most e�cient method for computing a speci�ed output relation(Hellerstein and Stonebraker 1994; Selinger, Astrahan, Chamberlain, Lorie, and Price1979). The size of the base relations in modern database systems often is so largethat exact computation of the distinct-value parameters is prohibitively expensive,and thus estimation of these parameters is desired (Astrahan, Schkolnick, and Whang1987; Flajolet and Martin 1985; Gelenbe and Gardy 1982; Hou, Ozsoyoglu, and Taneja

1

Page 6: RJ - IBM...r to publication should b e limited to p eer communications and sp eci c requests. After outside publication, requests should b e lled only b y rep rints o r legally obtained

1988, 1989; Naughton and Seshadri 1990; Ozsoyoglu, Du, Tjahjana, Hou, and Rowland1991; Whang, Vander-Zanden, and Taylor 1990).

In each of these applications, the size of the population (number of contest entries, totalnumber of units over all lists, and number of rows in the base relation) is known, and thissize is too large for easy computation of the number of classes.

The problem studied in this article can be described formally as follows. A populationof size N consists of D mutually disjoint classes of items, labelled C1; C2; : : : ; CD. De�neNj to be the size of class Cj, so that N =

PDj=1Nj . A simple random sample of n

items is selected (without replacement) from the population. This sample includes nj itemsfrom class Cj. The problem we consider is that of estimating D using information fromthe sample along with knowledge of the value of N . We denote by Fi the number ofclasses of size i in the population, so that D =

PNi=1 Fi. Similarly, we denote by fi the

number of classes represented exactly i times in the sample and by d the total number ofclasses represented in the sample. Thus d =

Pni=1 fi and

Pni=1 ifi = n. De�ne vectors

N = (N1; N2; : : : ; ND), n = (n1; n2; : : : ; nD), and f = (f1; f2; : : : ; fn). Note that n is notobservable, but f is. Because we sample without replacement, the random vector n has amultivariate hypergeometric distribution with probability mass function

P (n j D;N) =

�N1

n1

��N2

n2

� � � � �NDnD��Nn

� : (1)

The probability mass function of the observable random vector f is simply P (n j D;N)summed over all points n that correspond to f :

P (f j D;N) =XS

P (n j D;N);

where S = fn : #(nj = i) = fi for 1 � i � D g. The probability mass function P (f j D;N)does not have a closed-form expression in general.

In Section 2 we review the estimators that have been proposed for estimating D fromdata generated under model (1). In Section 3 we provide several new estimators of D basedon an extension of the generalized jackknife approach of Gray and Schucany (1972). Wethen show that generalized jackknife estimators of the number of classes in a population areclosely related to certain \Horvitz-Thompson" estimators, to an estimator due to Shlosser(1981), and to estimators based on the notion of \sample coverage" (Chao and Lee 1992).In Section 4 we provide and compare approximate expressions for the asymptotic varianceof several of the estimators, and in Section 5 apply our formulas to a well-known examplefrom the literature. We provide a simulation-based empirical comparison of the variousestimators in Section 6, and summarize our results and give recommendations in Section 7.

2

Page 7: RJ - IBM...r to publication should b e limited to p eer communications and sp eci c requests. After outside publication, requests should b e lled only b y rep rints o r legally obtained

2. Previous Estimators

Bunge and Fitzpatrick (1993) mention only two non-Bayesian estimators that have beendeveloped as estimators of D under model (1). These are the estimators of Goodman (1949)and Shlosser (1981). Goodman proved that

bDGood1 = d+nXi=1

(�1)i+1 (N � n+ i� 1)! (n � i)!

(N � n� 1)!n!fi

is the unique unbiased estimator of D when n > Mdef= max(N1; N2; : : : ; ND). He further

proved that no unbiased estimator of D exists when n � M . Unfortunately, unless thesampling fraction is quite large, the variance of bDGood1 is so great and the numerical dif-�culties encountered when computing bDGood1 are so severe that the estimator is unusable.Goodman, who made note of the high variance of bDGood1 himself, suggested the alternativeestimator

bDGood2 = N � N(N � 1)

n(n� 1)f2

for overcoming the variance problem. Although bDGood2 has lower variance than bDGood1, itcan take on negative values and can have a large bias for any n if D is small. For example,consider the case in which D = 1 and n > 2, and observe that f2 = 0 and bDGood2 = N .

Under the assumption that the population size N is large and the sampling fractionq = n=N is nonnegligible, Shlosser (1981) derived the estimator

bDSh = d+ f1

Pni=1(1� q)ifiPn

i=1 iq(1� q)i�1fi:

For the two examples considered in his paper, Shlosser found that use of bDSh with a 10%sampling fraction resulted in an error rate below 20%. In our experiments, however, weobserved root mean squared errors (rmse's) exceeding 200%, even for well-behaved popu-lations with relatively little variation among the class sizes (see Sec. 6). Considering therelationship between bDSh and generalized jackknife estimators (see Sec. 3.4) provides insightinto the source of this erratic behavior and suggests some possible modi�cations of bDSh toimprove performance.

In related work, Burnham and Overton (1978, 1979) proposed a family of (traditional)generalized jackknife estimators for estimating the size of a closed population when captureprobabilities vary among animals. The D individuals in the population play the role of ourD classes; a given individual can appear up to n times in the overall sample if capturedon one or more of n possible trapping occasions. The capture probability for an individualis assumed to be constant over time, and the capture probabilities for the D individualsare modeled as D iid random samples from a �xed probability distribution. Burnham and

3

Page 8: RJ - IBM...r to publication should b e limited to p eer communications and sp eci c requests. After outside publication, requests should b e lled only b y rep rints o r legally obtained

Overton's sample design is clearly di�erent from model (1). Under the Burnham and Over-ton model, for example, the quantities f1; f2; : : : ; fn have a joint multinomial distribution.Closely related to the work of Burnham and Overton are the ordinary jackknife estimatorsof the number of species in a closed region developed by Heltshe and Forrester (1983) andSmith and van Belle (1984). The sample data consist of a list of the species that appear ineach of n quadrats. (The number of times that a species is represented in a quadrat is notrecorded.) This setup is essentially identical to that of Burnham and Overton, with the Dspecies playing the role of the D individuals and the n quadrats playing the role of the ntrapping occasions.

3. Generalized Jackknife Estimators

In this section we outline an extension of the generalized jackknife approach to biasreduction and then use this approach to derive new estimators for the number of classesin a �nite population. We also point out connections between our generalized jackknifeapproach and several other estimation approaches in the literature.

3.1. The Generalized Jackknife Approach

Let � be an unknown real-valued parameter. A generalized jackknife estimator of � isan estimator of the form

G(b�1; b�2) = b�1 �Rb�21�R

; (2)

where b�1 and b�2 are biased estimators of � and R (6= 1) is a real number (Gray and Schucany1972). The idea underlying the generalized jackknife approach is to try and choose R suchthat G(b�1; b�2) has lower bias than either b�1 or b�2. To motivate the choice of R, observe thatfor

R =E[b�1]� �

E[b�2]� �; (3)

the estimator G(b�1; b�2) is unbiased for �. This optimal value of R is typically unknown,however, and can only be approximated, resulting in bias reduction but not complete biaselimination. In the following, we extend the original de�nition of the generalized jackknifegiven by Gray and Schucany (1972) by allowing R to depend on the data; that is, we allowR to be random.

Recall that d is the number of classes represented in the sample. Write dn for d toemphasize the dependence of d on the sample size, and denote by dn�1(k) the number ofclasses represented in the sample after the kth observation has been removed. Set

d(n�1) =1

n

nXk=1

dn�1(k):

4

Page 9: RJ - IBM...r to publication should b e limited to p eer communications and sp eci c requests. After outside publication, requests should b e lled only b y rep rints o r legally obtained

We focus on generalized jackknife estimators that are obtained by taking b�1 = dn andb�2 = d(n�1) in (2); these are the usual choices for b�1 and b�2 in the classical �rst-orderjackknife estimator (Miller 1974). Observe that dn�1(k) = dn � 1 if the class for thekth observation is represented only once in the sample; otherwise, dn�1(k) = dn. Thusd(n�1) = dn � (f1=n) and, by (2), G(b�1; b�2) = bD, where

bD = dn +Kf1n

(4)

and K = R=(1�R). It follows from (3) that the optimal choice of K is

K =E [dn]�D

E[d(n�1)]�E [dn]=D �E [dn]

E [f1] =n: (5)

To derive a more explicit formula for K, denote by I[A] the indicator of event A and observethat

E [dn] = E

24 DXj=1

I[nj > 0]

35 =

DXj=1

P fnj > 0 g = D �DXj=1

P fnj = 0 g :

Similar reasoning shows that

E [f1] =DXj=1

P fnj = 1 g ; (6)

so that

K = n

PDj=1 P fnj = 0 gPDj=1 P fnj = 1 g : (7)

Following Shlosser (1981), we focus on the case in which the population size N is large andthe sampling fraction q = n=N is nonnegligible, and we make the approximation

P fnj = k g ��Nj

k

�qk(1� q)Nj�k (8)

for 0 � k � n and 1 � j � D. That is, the probability distribution of each nj is approxi-mated by the probability distribution of nj under a Bernoulli sample design in which eachitem is included in the sample with probability q, independently of all other items in thepopulation. Use of this approximation leads to estimators that behave almost identically toestimators derived using the exact distribution of n but are simpler to compute and derive(see App. A for further discussion). Substituting (8) into (7), we obtain

K � n

PDj=1(1� q)NjPD

j=1Njq(1� q)Nj�1: (9)

5

Page 10: RJ - IBM...r to publication should b e limited to p eer communications and sp eci c requests. After outside publication, requests should b e lled only b y rep rints o r legally obtained

The quantity K de�ned in (9) depends on unknown parameters N1; N2; : : : ; ND that aredi�cult to estimate. Our approach is to approximate K by a function of D and of otherparameters that are easier to estimate, thereby obtaining an approximate version of (4). Theestimates for these parameters, including bD forD, are then substituted into the approximateversion of (4) and the resulting equation is solved for bD.

We also consider \smoothed" jackknife estimators. The idea is to replace the quantityf1=n in (4) by its expected value E [f1] =n in the hope that the resulting estimator of D willbe more stable than the original \unsmoothed" estimator. As with the parameter K, thequantity E [f1] =n depends on the unknown parameters N1; N2; : : : ; ND; see (6) and (8).Thus our approach to estimating E [f1] =n is the same as our approach to estimating K.

Estimators also can be based on high-order jackkni�ng schemes that consider the num-ber of distinct values in the sample when two elements are removed, when three elementsare removed, and so forth. Typically, using a high-order jackkni�ng scheme requires es-timating high-order moments (skewness, kurtosis, and so forth) of the set of numbersfN1; N2; : : : ; ND g. Initial experiments indicated that the reduction in estimation errordue to using the high-order jackknife is outweighed by the increase in error due to un-certainty in the moment estimates. Thus we do not pursue high-order jackknife schemesfurther.

3.2. The Estimators

Di�erent approximations for K and E [f1] =n lead to di�erent estimators for D. Herewe develop a number of the possible estimators.

3.2.1. First-Order Estimators The simplest estimators of D can be derived using a�rst-order approximation to K. Speci�cally, approximate each Nj in (9) by the averagevalue

N =1

D

DXj=1

Nj =N

D

and substitute the resulting expression for K into (4) to obtain

bD = dn +(1� q)f1D

n: (10)

Now substitute bD for D on the right side of (10) and solve for bD. The resulting solution,denoted by bDuj1, is given by

bDuj1 =

�1� (1� q)f1

n

��1dn: (11)

We refer to this estimator as the \unsmoothed �rst-order jackknife estimator."

6

Page 11: RJ - IBM...r to publication should b e limited to p eer communications and sp eci c requests. After outside publication, requests should b e lled only b y rep rints o r legally obtained

To derive a \smoothed �rst-order jackknife estimator," observe that by (6) and (8),

E [f1]

n� 1

n

DXj=1

Njq(1� q)Nj�1: (12)

Approximating each Nj in (12) by N , we have

E [f1]

n� (1� q)N�1: (13)

On the right side of (10), replace f1=n with the approximate expression for E [f1] =n givenin (13), yielding

bD = dn +D(1� q)N :

Replacing D with bD and N with N= bD in the foregoing expression leads to the relation

bD�1� (1� q)N=bD�= dn:

We de�ne the smoothed �rst-order jackknife estimator bDsj1 as the value of bD that solves

this equation. Given dn, n, and N , bDsj1 can be computed numerically using standardroot-�nding procedures. Observe that if in fact N1 = N2 = � � � = ND = N=D, then

E [dn] � D�1� (1 � q)N=D

�:

In this case bDsj1 can be viewed as a simple method-of-moments estimator obtained byreplacing E [dn] with the estimate dn and solving for D. If, moreover, the sampling fractionq is small enough so that the distribution of (n1; n2; : : : ; nD) is approximately multinomial(see Sec. 3.3), then bDsj1 is approximately equal to the maximum likelihood estimator for

D (see Good 1950). Observe that both bDuj1 and bDsj1 are consistent for D: bDuj1 ! D andbDsj1 ! D as q ! 1.

3.2.2. Second-Order Estimators A second-order approximation to K can be derivedas follows. Denote by 2 the squared coe�cient of variation of the class sizesN1; N2; : : : ; ND:

2 =(1=D)

PDj=1(Nj �N)2

N2 : (14)

Suppose that 2 is relatively small, so that each Nj is close to the average value N . Sub-stitute the Taylor approximations

(1� q)Nj � (1� q)N + (1� q)N ln(1� q)(Nj �N)

7

Page 12: RJ - IBM...r to publication should b e limited to p eer communications and sp eci c requests. After outside publication, requests should b e lled only b y rep rints o r legally obtained

and

Njq(1� q)Nj�1 � Njq�(1� q)N�1 + (1� q)N�1 ln(1� q)(Nj �N)

�for 1 � j � D into (9) to obtain

K � D(1� q)

�1

1 + ln(1� q)N 2

�� D(1� q)

�1� ln(1� q)N 2

�: (15)

The unknown parameter 2 can be estimated using the following approach (cf. Chao andLee 1992). With the usual convention that

�nm

�= 0 for n < m, we �nd that

NXi=1

i(i� 1)E [fi] �NXi=1

i(i � 1)DXj=1

�Nj

i

�qi(1� q)Nj�i

= q2DXj=1

Nj(Nj � 1)

NjXi=2

�Nj � 2

i� 2

�qi�2(1� q)Nj�i

= q2DXj=1

Nj(Nj � 1);

so that

2 � D

n2

NXi=1

i(i� 1)E [fi] +D

N� 1:

Thus if D were known, then a natural method-of-moments estimator 2(D) of 2 would be

2(D) = max�0;

D

n2

nXi=1

i(i� 1)fi +D

N� 1�: (16)

To develop a second-order estimate of D, substitute (15) into (4) to obtain

bD = dn +Df1(1� q)

n

�1� ln(1� q)N 2

�; (17)

from which it follows that

bD = dn +Df1(1� q)

n� f1(1� q) ln(1� q) 2

q:

Replacing D with bD on the right side of this equation and solving for bD yields the relation�1� f1(1� q)

n

� bD = dn � f1(1� q) ln(1 � q) 2

q: (18)

8

Page 13: RJ - IBM...r to publication should b e limited to p eer communications and sp eci c requests. After outside publication, requests should b e lled only b y rep rints o r legally obtained

An estimator of D can be obtained by substituting 2( bD) for 2 in (18) and solving forbD numerically. Alternatively, we can start with a simple initial estimator of D and thencorrect this estimator using (18). Following this latter approach, we use bDuj1 as our initialestimator and de�ne

bDuj2 =

�1� f1(1� q)

n

��1 dn � f1(1� q) ln(1� q) 2( bDuj1)

q

!:

A smoothed second-order jackknife estimator can be obtained by replacing the expres-sion f1=n in (17) with the approximation to E [f1] =n given in (13), leading to

bD = dn +D(1� q)N�1� ln(1� q)N 2

�:

Replacing D with bD and proceeding as before, we obtain the estimator

bDsj2 =�1� (1� q)

~N��1 �

dn � (1� q)~N ln(1� q)N 2( bDuj1)

�;

where ~N = N= bDuj1. As with the �rst-order estimators bDuj1 and bDsj1, the second-order

estimators bDuj2 and bDsj2 are consistent for D.

3.2.3. Horvitz-Thompson Jackknife Estimators In this section we discuss an al-ternative approach to estimation of K based on a technique of Horvitz and Thompson.(See Sarndal, Swensson, and Wretman 1992 for a general discussion of Horvitz-Thompsonestimators.) First, consider the general problem of estimating a parameter of the form�(g) =

PDj=1 g(Nj), where g is a speci�ed function. Observe that because P fnj > 0 g > 0

for 1 � j � D, we have �(g) = E [X(g)], where

X(g) =

DXj=1

g(Nj)I(nj > 0)

P fnj > 0 g =X

fj:nj>0g

g(Nj)

P fnj > 0 g :

It follows from (8) that P fnj > 0 g � 1� (1 � q)Nj , and the foregoing discussion suggeststhat we estimate �(g) by

b�(g) = Xfj:nj>0g

g( bNj)

1� (1� q)bNj; (19)

where bNj is an estimator for Nj . The key point is that we need to estimate Nj only whennj > 0. To do this, observe that

E [nj j nj > 0] =E [nj]

P fnj > 0 g �qNj

1� (1� q)Nj:

9

Page 14: RJ - IBM...r to publication should b e limited to p eer communications and sp eci c requests. After outside publication, requests should b e lled only b y rep rints o r legally obtained

Replacing E [nj j nj > 0] with nj leads to the estimating equation

nj =qNj

1� (1� q)Nj; (20)

and a method-of-moments estimator bNj can be de�ned as the value of Nj that solves (20).Now consider the problem of estimating K, and hence D. By (9), K � �(f)=�(g), where

f(x) = (1 � q)x and g(x) = xq(1 � q)x�1=n. Thus a natural estimator of K is given byb�(f)=b�(g), leading to the �nal estimator,bDHTj = dn +

b�(f)b�(g) f1n :A smoothed variant of bDHTj can be obtained by replacing f1=n with the Horvitz-Thompson

estimator of E [f1] =n, namely b�(g). The resulting estimator, denoted by bDHTsj, is given by

bDHTsj = dn + b�(f):Finally, a hybrid estimator can be obtained using a �rst-order approximation for the nu-merator of K and a Horvitz-Thompson estimator for the denominator. This leads to theestimator bDhj, de�ned as the solution bD of the equation

bD 1� f1(1� q)N=bD

nb�(g)!= dn:

If we replace f1=n with the Horvitz-Thompson estimator for E [f1] =n in the foregoingequation in order to obtain a smoothed variant of bDhj, then the resulting estimator coincides

with bDsj1.Because D = �(u), where u(x) � 1, it may appear that a \non-jackknife" Horvitz-

Thompson estimator bDHT can be de�ned by setting bDHT = b�(u). It is straightforwardto show, however, that bDHT = bDHTsj, so that bDHT can in fact be viewed as a smoothedjackknife estimator.

Simulation experiments indicate that the behavior of the Horvitz-Thompson jackknifeestimators bDHTj and bDHTsj is erratic (see App. D for detailed results). Overall, the poor

performance of bDHTj and bDHTsj is caused by inaccurate estimation of b�(f). The problem

seems to be that when Nj is small, the estimator bNj is unstable and yet typically has a

large e�ect on the value of b�(f) through the term (1� q)bNj=�1� (1� q)

bNj�. The estimatorbDhj uses a Taylor approximation in place of b�(f) and hence has lower bias and rmse than

the other two Horvitz-Thompson jackknife estimators. However, other estimators performbetter than bDhj, and we do not consider the estimators bDHTj, bDHTsj, and bDhj further.

10

Page 15: RJ - IBM...r to publication should b e limited to p eer communications and sp eci c requests. After outside publication, requests should b e lled only b y rep rints o r legally obtained

3.3. Relation to Estimators Based on Sample Coverage

The generalized jackknife approach for deriving an estimator of D works for sampledesigns other than hypergeometric sampling. For example, the most thoroughly studiedversion of the number-of-classes problem is that in which the population is assumed tobe in�nite and n is assumed to have a multinomial distribution with parameter vector� = (�1; �2; : : : ; �D); that is,

P (n j D;�) =�

n

n1n2 � � �nD

��n11 �n22 � � � �nDD : (21)

When we proceed as in Section 3.1 to derive a generalized jackknife estimator under themodel in (21), the estimator turns out to be nearly identical to the \coverage-based" esti-mator proposed by Chao and Lee (1992). To see this, start again with (4) and select K asin (5). Because

E [dn]�D = �DXj=1

(1 � �j)n

under the model in (21), it follows that

K =

PDj=1 vn(�j)PD

j=1 �jvn�1(�j);

where vn(x) = (1� x)n. Set � = 1=D and use the Taylor approximations

vn(�j) � vn(�) + (�j � �)v0n(�)

and

�jvn�1(�j) � �j�vn�1(�) + (�j � �)v0n�1(�)

�in a manner analogous to the derivation in Section 3.2.2 to obtain

K � (D � 1) + (n� 1) 2; (22)

where 2 = �1+DPDj=1 �

2j is the squared coe�cient of variation of the numbers �1; �2; : : : ;

�D. Denote by bDmult the estimator of D under the multinomial model. Then, by (4),

bDmult = dn +�(D � 1) + (n� 1) 2

�f1n: (23)

Replace D with bDmult and 2 with an estimator ~ 2 in (23) and solve for bDmult to obtain

bDmult =dnbC +

n(1� bC)bC�n� 1

n~ 2 � 1

n

�;

11

Page 16: RJ - IBM...r to publication should b e limited to p eer communications and sp eci c requests. After outside publication, requests should b e lled only b y rep rints o r legally obtained

where bC = 1 � (f1=n). When the sample size n is large, the estimator bDmult is essentiallythe same as the estimator

bDCL =dnbC +

n(1� bC)bC ~ 2

proposed by Chao and Lee (1992). The estimator bDCL was developed from a di�erentpoint of view, using the concept of sample coverage. The sample coverage for an in�nitepopulation is de�ned as

PDj=1 �jI[nj > 0], and the quantity bC = 1 � (f1=n) is a standard

estimator of the sample coverage.Conversely, when Chao and Lee's derivation is modi�ed to account for hypergeomet-

ric sampling, the resulting estimator is equal to bDuj2 (see App. B). Thus at least someestimators based on sample coverage can be viewed as generalized jackknife estimators.

3.4. Relation to Shlosser's Estimator

Observe that the estimator bDSh, though not developed from a jackknife perspective, canbe viewed as an estimator of the form (4) with K estimated by

bKSh = n

Pni=1(1� q)ifiPn

i=1 iq(1� q)i�1fi:

To analyze the behavior of bDSh, we �rst rewrite the jackknife quantity K de�ned in (9) asfollows:

K = n

PNi=1(1� q)iFiPN

i=1 iq(1� q)i�1Fi: (24)

Shlosser's justi�cation of bDSh assumes that

E [fi]

E [f1]� FiF1

(25)

for 1 � i � N . When the assumption in (25) holds and the sample size is large enough sothat

fi � E [fi] (26)

for 1 � i � N ,

bKSh � n

PNi=1(1� q)iE [fi]PN

i=1 iq(1� q)i�1E [fi]

= n

PNi=1(1� q)i

�E [fi] =E [f1]

�PNi=1 iq(1� q)i�1

�E [fi] =E [f1]

�� n

F�11

PNi=1(1� q)iFi

F�11

PNi=1 iq(1 � q)i�1Fi

= K;

12

Page 17: RJ - IBM...r to publication should b e limited to p eer communications and sp eci c requests. After outside publication, requests should b e lled only b y rep rints o r legally obtained

so that bDSh behaves as a generalized jackknife estimator. Although the relations in (25)and (26) hold exactly for n = N (implying that bDSh is consistent for D), these relationscan fail drastically for smaller sample sizes. For example, when F1 = 0 and Fi > 0 for somei > 1, the right side of (25) is in�nite, whereas the left side is �nite for n su�ciently small.This observation leads one to expect that bDSh will not perform well when the sample size isrelatively small and N1; N2; : : : ; ND have similar values (with Nj > 1 for each j). Both thevariance analysis in Section 4 and the simulation experiments described in Section 6 bearout this conjecture.

The foregoing discussion suggests that replacing bKSh with

bK�Sh =

K

E[ bKSh]bKSh (27)

in the formula for bDSh might result in an improved estimator, because bK�Sh is unbiased

for K. Of course we cannot perform this replacement exactly, since K and E[ bKSh] areunknown, but we can approximate bK�

Sh as follows. Using the fact that

E [fr] =

DXj=1

P fnj = r g �DXj=1

�Nj

r

�qr(1� q)Nj�r =

NXi=r

�i

r

�qr(1� q)i�rFi (28)

for 1 � r � n, we have, to �rst order,

E[ bKSh] � n

PNi=1(1� q)iE [fi]PN

i=1 iq(1� q)i�1E [fi]

= n

PNi=1(1� q)i

�(1 + q)i � 1

�FiPN

i=1 iq2(1� q2)i�1Fi

:

(29)

Using the �rst-order approximation N1 = N2 = � � � = ND = N together with (24), (27),and (29), we �nd that

bK�Sh �

q(1 + q)N�1

(1 + q)N � 1

! bKSh:

We thus obtain a modi�ed Shlosser estimator given by

bDSh2 = dn + f1

q(1 + q)

~N�1

(1 + q) ~N � 1

!� Pni=1(1� q)ifiPn

i=1 iq(1� q)i�1fi

�;

where ~N is an initial estimate of N based on an initial estimate of D. We set ~N equal toN= bDuj1 throughout. As with bDSh, the estimator bDSh2 is consistent for D.

13

Page 18: RJ - IBM...r to publication should b e limited to p eer communications and sp eci c requests. After outside publication, requests should b e lled only b y rep rints o r legally obtained

An alternative consistent estimator of D can be obtained by directly using the expres-sions in (24), (27), and (29) with Fi estimated by

bFi = f1fiPni=1 iq(1� q)i�1fi

(30)

for 1 � i � N ; these estimators of F1; F2; : : : ; FN were proposed by Shlosser (1981) inconjunction with the estimator bDSh. Substituting the resulting estimator of K and E[ bKSh]into (27) leads to the �nal estimator

bDSh3 = dn + f1

Pni=1 iq

2(1� q2)i�1fiPni=1(1� q)i

�(1 + q)i � 1

�fi

!� Pni=1(1� q)ifiPn

i=1 iq(1� q)i�1fi

�2

:

As with the estimator bDSh, Shlosser's justi�cation of the estimators in (30) rests on theassumption in (25). Thus one might expect that, like bDSh, the estimator bDSh3 will beunstable when the sample size is relatively small and N1; N2; : : : ; ND have similar values.On the other hand, the reduction in bias of bK�

Sh relative to bKSh leads one to expect thatbDSh3 will perform better than bDSh when 2 is su�ciently large. (One might be temptedto avoid the assumption in (25) when estimating F1; F2; : : : ; FN by taking a method-of-moments approach: replace E [fr] with fr in (28) for 1 � r � n and solve the resultingset of linear equations either exactly or approximately. As pointed out by Shlosser (1981),however, this system of equations is nearly singular, and hence extremely unstable.)

4. Variance and Variance Estimates

Consider an estimator bD that is a function of the sample only through f = (f1; f2; : : : ;fM ), where M = max(N1; N2; : : : ; ND). All of the estimators introduced in Section 3 areof this type. In general, we also allow bD to depend explicitly on the population size N andwrite bD = bD(f ; N). Suppose that, for any N > 0 and nonnegative M -dimensional vectorf 6= 0, the function bD is continuously di�erentiable at the point (f ; N) and

bD(cf ; cN) = c bD(f ; N) (31)

for c > 0. Approximating the hypergeometric sample design by a Bernoulli sample designas in (8), we can obtain the following approximate expression for the asymptotic varianceof bD(f ; N) as D becomes large:

AVar[ bD(f ; N)] �MXi=1

A2iVar [fi] +

X1�i;i0�M

i6=i0

AiAi0Cov [fi; fi0 ] ; (32)

where Ai is the partial derivative of bD with respect to fi, evaluated at the point (f ; N).(When computing each Ai, we replace each occurrence of n and dn in the formula for bD by

14

Page 19: RJ - IBM...r to publication should b e limited to p eer communications and sp eci c requests. After outside publication, requests should b e lled only b y rep rints o r legally obtained

PMi=1 ifi and

PMi=1 fi before taking derivatives.) The approximation in (32) is valid when

there is not too much variability in the class sizes (see App. C for a precise formulation andproof of this result). It follows from the proof that, to a good approximation, the varianceof an estimator bD satisfying (31) increases linearly as D increases.

Straightforward calculations show that each of the speci�c estimators bDuj1, bDuj2, bDSh,bDSh2, and bDSh3 is continuously di�erentiable as stated previously and also satis�es (31).Thus we can use (32) to study the asymptotic variance of these estimators. We focus onbDuj1, bDuj2, bDSh2, and bDSh3 because each of these estimators performs best for at least onepopulation studied in the simulation experiments described in Section 6; we also considerbDSh, because bDSh is the most useful of the estimators previously proposed in the literature.Computation of the Ai coe�cients for each estimator is tedious, but straightforward. WhenbD = bDuj2, for example, we obtain

A(uj2)1 = A

(uj1)1 � N(1� q) ln(1� q)

n� (1� q)f1

�" 2 + f1

A(uj1)1bDuj1

� 2 + 1

�� 2

n

� 2 + 1�

bDuj1

N

�� 2

n� (1� q)f1+ 2

n

!#and

A(uj2)i = A

(uj1)i � N(1� q) ln(1� q)

n� (1� q)f1

� f1

A(uj1)ibDuj1

� 2 + 1

��2i

n

� 2 + 1�

bDuj1

N

�+i(i� 1) bDuj1

n2� i 2

n� (1� q)f1+i 2

n

!

for 1 < i � n, where 2 = 2( bDuj1),

A(uj1)1 = bDuj1

�1

dn+

(1� q)

n� (1� q)f1

�1� f1

n

��;

and

A(uj1)i = bDuj1

�1

dn+i(1� q)(f1=n)

n� (1� q)f1

�for 1 < i � n.

Figures 1 and 2 compare the variances of the estimators bDuj1, bDuj2, bDSh, bDSh2, andbDSh3 for a number of populations with equal class sizes. For these special populations, bDuj1

and bDuj2 are approximately unbiased, so that the relative variances of these estimators areappropriate measures of relative performance. It is particularly instructive to compare thevariance of bDuj1 and bDuj2, since bDuj2 is obtained from bDuj1 by adjusting the latter estimatorto compensate for bias induced by the assumption of equal class sizes. This adjustment isunnecessary for our special populations, and a comparison allows evaluation of the penalty(i.e., the increase in variance) that is being paid for the adjustment.

15

Page 20: RJ - IBM...r to publication should b e limited to p eer communications and sp eci c requests. After outside publication, requests should b e lled only b y rep rints o r legally obtained

bDSh3

bDSh2

bDSh

bDuj2

bDuj1

sampling fraction (q)

standarddeviation

0.180.140.10.060.02

8007006005004003002001000

Figure 1: Standard deviation of bDuj1,bDuj2, bDSh, bDSh2, and bDSh3 (D = 15; 000and N = 10).

bDSh2

bDuj2

bDuj1

class size (N)

standarddeviation

100806040200

160140120100806040200

Figure 2: Standard deviation of bDuj1,bDuj2, and bDSh2 (D = 1500 and q = 0:10).

Figure 1 displays the standard deviations of bDuj1, bDuj2, bDSh, bDSh2, and bDSh3 for anequal-class-size population with N = 15; 000 and D = 1500 (so that N = 10) as thesampling fraction q varies. Observe that bDuj2 is only slightly less e�cient than bDuj1, sothat the penalty for bias adjustment is small in this case. Performance of the estimatorsbDuj1 and bDSh2 is nearly indistinguishable. The most striking observation is that for this

population, bDSh and bDSh3 are not competitive with the other three estimators. The relativeperformance of bDSh and bDSh3 is especially poor for small sampling fractions. On the otherhand, the variance analysis indicates that modi�cation of bDSh as in (27) and (29) indeedreduces the instability of the original Schlosser estimator in this case. Thus we focus on theestimators bDuj1, bDuj2, and bDSh2 in the remainder of this section and in the next section.

(We return to the estimator bDSh3 in Section 6, where our simulation experiments indicatethat bDSh3 can exhibit smaller rmse than the other estimators, but only at large samplesizes and for certain \ill-conditioned" populations in which 2 is extremely large.)

Figure 2 compares the three estimators bDuj1, bDuj2, and bDSh2 for equal-class-size pop-ulations with a range of class sizes; for these calculations the number of classes and thesampling fraction are held constant at D = 1500 and q = 10. This �gure illustrates thedi�culty of precisely estimating D when the class size is small (but greater than 1). Again,we see that these three estimators perform similarly, with nearly equal variability when Nexceeds about 40.

We checked the accuracy of the variance approximation in some example populations bycomparing the values computed from (32) with results of a simulation experiment. (Thisexperiment is discussed more completely in Section 6 below.) Simulated sampling withq = 0:05, 0:10, and 0:20 from the population examined in Figure 1 (N = 15; 000, D = 1500)yields variance estimates within 10% (on average) of those calculated from (32). Similarresults were found in sampling from an equal-class-size population with N = 15; 000 andD = 150. The only di�culties we encountered occurred for equal-class-size populations with

16

Page 21: RJ - IBM...r to publication should b e limited to p eer communications and sp eci c requests. After outside publication, requests should b e lled only b y rep rints o r legally obtained

class sizes ofN = 1 andN = 2. For these small class sizes the variance approximation, whichis based on the approximation of the hypergeometric sample design by a Bernoulli sampledesign, is not su�ciently accurate. In particular, the approximate variance strongly re ectsrandom uctuations in the sample size due to the Bernoulli sample design; such uctuationsare not present in the actual hypergeometric sample design. Simulation experiments indicatethat for N � 3 the di�erences caused by Bernoulli versus hypergeometric sampling becomenegligible. (Of course, if the sample design is in fact Bernoulli, then this problem does notoccur.)

In practice, we estimate the asymptotic variance of an estimator bD by substitutingestimates for fVar [fi] : 1 � i �M g, and fCov [fi; fi0 ] : 1 � i 6= i0 �M g into (32). Toobtain such estimates, we approximate the true population by a population with D classes,each of size N=D. Under this approximation and the assumption in (8) of a Bernoullisample design, the random vector f has a multinomial distribution with parameters D andp = (p1; p2; : : : ; pn), where

pi =

�N=D

i

�qi(1� q)(N=D)�i

for 1 � i � n. It follows that Var [fi] = Dpi(1� pi) and Cov [fi; fi0 ] = �Dpipi0 . Each pi canbe estimated either by

bpi = �N= bDi

�qi(1� q)(N=

bD)�i

or simply by fi= bD. It turns out that the latter formula yields better variance estimates,and so we take

dVar[fi] = fi

�1� fibD

�and

dCov[fi; fi0 ] = �fifi0bDfor 1 � i; i0 � n. These formulas coincide with the estimators obtained using the \un-conditional approach" of Chao and Lee (1992). A computer program that calculates bDuj1,bDuj2, bDSh2 and their estimated standard errors from sample data can be obtained from thesecond author.

5. An Example

The following example illustrates how knowledge of the population size N can a�ectestimates of the number of classes. When the population size N is unknown, Chao andLee (1992, Sec. 3) have proposed that the estimator bDCL de�ned in Section 3.3 be used to

17

Page 22: RJ - IBM...r to publication should b e limited to p eer communications and sp eci c requests. After outside publication, requests should b e lled only b y rep rints o r legally obtained

N bDuj1bDuj2

bDSh2

1000 455 502 455(47) (60) (51)

10,000 709 788 707(125) (161) (128)

100,000 752 835 749(141) (183) (144)

Table 1: Values of bDuj1, bDuj2, and bDSh2 for three hypothetical combined lists. (Standarderrors are in parentheses.)

estimate the number of classes, because the formula for bDCL does not involve the unknownparameter N . When N is known, a slight modi�cation of the derivation of bDCL leads tothe unsmoothed second-order jackknife estimator bDuj2 (see App. B).

Our example is based on one discussed by Chao and Lee (1992), who borrowed data�rst described and analyzed by Holst (1981). These data arose from an application innumismatics in which 204 ancient coins were classi�ed according to die type in order toestimate the number of di�erent dies used in the minting process. Among the die types onthe reverse sides of the 204 coins were 156 singletons, 19 pairs, 2 triplets, and 1 quadruplet(f1 = 156, f2 = 19, f3 = 2, f4 = 1, d = 178). Because the total number of coins mintedis unknown in this case, model (1) is inappropriate for analyzing these data. But supposethat the same data had arisen from an application in which N was known. For example,suppose that the data were obtained by selecting a simple random sample of 204 namesfrom a sampling frame that had been constructed by combining 5 lists of 200 names each(N = 1000), 50 lists of 200 names each (N = 10; 000), or 500 lists of 200 names each(N = 100; 000). In each case our object is to estimate the number of unique individualson the combined list, based on the sample results. We focus on the three estimators bDuj1,bDuj2, and bDSh2. The estimates for the three cases are given in Table 1; the standard errorsdisplayed in Table 1 are estimated using the procedure outlined in Section 4.

We would expect similar inferences to be made from the same data under the multi-nomial model and the �nite population model when N is very large. Indeed, the valuebDuj2 = 835 agrees closely with Chao and Lee's estimate bDCL = 844 (se 187) when

N = 100;000. Moreover, when N = 100;000 we �nd that 2( bDuj1) � 0:13, which is thesame estimate of 2 given by Chao and Lee. As the population size decreases, however,both our assessment of the magnitude of D and our uncertainty about that magnitudedecrease, because we are observing a larger and larger fraction of both the population andthe classes.

The most extreme divergence between the estimate obtained using bDCL and estimatesobtained using bDuj1, bDuj2, or bDSh2 occurs when the sample consists of all singletons (f1 = n).

In that case, bDCL =1, whereas bDuj1 = bDuj2 = bDSh2 = N . This result indicates that whenthe population size N is known, it is better to use an estimator that exploits knowledge

18

Page 23: RJ - IBM...r to publication should b e limited to p eer communications and sp eci c requests. After outside publication, requests should b e lled only b y rep rints o r legally obtained

of N than to sample with replacement and use the estimator bDCL. In some applications,sampling with replacement is not even an option. For example, the only available samplingmechanism in at least one current database system is a one-pass reservoir algorithm (as inVitter 1985).

The empirical results in Section 6 indicate that, of the three estimators displayed inTable 1, bDuj2 is the superior estimator when

2 is small (< 1). Thus for our example, bDuj2

would be the preferred estimator, since 2( bDuj1) � 0:13 in all three cases. Note that bDuj2

consistently has the highest variance of the three estimators in Table 1. The bias of bDuj2

is typically lower than that of bDuj1 or bDSh2 when 2 is small, however, so that the overallrmse is lower.

6. Simulation Results

This section describes the results of a simulation study done to compare the performanceof the various estimators described in Section 3. Our comparison is based on the perfor-mance of the estimators for sampling fractions of 5%, 10%, and 20% in 52 populations.(Initial experiments indicated that the performance of the various estimators is best viewedas a function of sampling fraction, rather than absolute sample size. This is in contrast toestimators of, for example, population averages.)

We consider several sets of populations. The �rst set comprises synthetic populations ofthe type considered in the literature. Populations EQ10 and EQ100 have equal class sizes of10 and 100. In populations NGB/1, NGB/2, and NGB/4, the class sizes follow a negativebinomial distribution. Speci�cally, the fraction f(m) of classes in population NGB/k withclass size equal to m is given by

f(m) ��m� 1

k � 1

�rk(1� r)m�k

for m � k, where r = 0:04. Chao and Lee (1992) considered populations of this type.The populations in the second set are meant to be representative of data that could beencountered when a sampling frame for a population census is constructed by combining anumber of lists which may contain overlapping entries. Population GOOD and SUDM werestudied by Goodman (1949) and Sudman (1976). Population FRAME2 mimics a samplingframe that might arise in an administrative records census of the type described in Section 1.One approach to such a census is to augment the usual census address list with a smallnumber of relatively large administrative records �les, such as AFDC or Food Stamps, andthen estimate the number of distinct individuals on the combined list from a sample. Wehave constructed FRAME2 so that a given individual can appear at most �ve times, butmost individuals appear exactly once, mimicking the case in which four administrative listsare used to supplement the census address list. Population FRAME3 is similar to FRAME2,but for the FRAME3 population it is assumed that the combined list is made up of a numberof small lists (perhaps obtained from neighborhood-level organizations) rather than a few

19

Page 24: RJ - IBM...r to publication should b e limited to p eer communications and sp eci c requests. After outside publication, requests should b e lled only b y rep rints o r legally obtained

Name N D 2 SkewEQ10 15000 1500 0.00 0.00EQ100 15000 150 0.00 0.00NGB/4 82135 874 0.18 0.50NGB/2 41197 906 0.37 0.81NGB/1 20213 930 0.75 1.25

Table 2: Characteristics of synthetic popu-lations.

Name N D 2 SkewGOOD 10000 9595 0.04 5.64

FRAME2 33750 19000 0.31 1.18FRAME3 111500 36000 0.52 1.92

SUDM 330000 100000 1.87 2.71

Table 3: Characteristics of \merged list"populations.

Name N D 2 SkewZ20A 50000 247 114.38 14.60Z15 50000 772 166.18 23.44

Z20B 50000 10384 234.81 73.54

Table 4: Characteristics of \ill-conditioned" populations.

large lists. The populations in the third set, denoted by Z20A, Z20B, and Z15, are usedto study the behavior of the estimators when the data are extremely ill-conditioned. Theclass sizes in each of these populations follow a generalized Zipf distribution (see Knuth1973, p. 398). Speci�cally, Nj=N / j��, where � equals 1.5 or 2.0. These populationshave extremely high values of 2. Descriptive statistics for these three sets of populationsare given in Tables 2, 3, and 4. The column entitled \skew" displays the dimensionlesscoe�cient of skewness �, which is de�ned by

� =

PDj=1(Nj �N)3=D�PD

j=1(Nj �N)2=D�3=2 :

The �nal set comprises 40 real populations that demonstrate the type of distributionsencountered when estimating the number of distinct values of an attribute in a relationaldatabase. Speci�cally, the populations studied correspond to various relational attributesfrom a database of enrollment records for students at the University of Wisconsin and adatabase of billing records from a large insurance company. The population size N rangesfrom 15,469 to 1,654,700, with D ranging from 3 to 1,547,606 and 2 ranging from 0 to81.63 (see App. D for further details). It is notable that values of 2 encountered in theliterature (Chao and Lee 1992; Goodman 1949; Shlosser 1981; Sudman 1976) tend not toexceed the value 2, and are typically less than 1, whereas the value of 2 exceeds 2 for morethan 50% of the real populations.

For each estimator, population, and sampling fraction, we estimated the bias and rmseby repeatedly drawing a simple random sample from the population, evaluating the esti-mator, and then computing the error of each estimate. (When evaluating the estimator,we truncated each estimate below at d and above at N .) The �nal estimate of bias wasobtained by averaging the error over all of the experimental replications, and rmse was

20

Page 25: RJ - IBM...r to publication should b e limited to p eer communications and sp eci c requests. After outside publication, requests should b e lled only b y rep rints o r legally obtained

sample

size 2 bDuj1bDsj1

bDuj2bDsj2

bDShbDSh2

bDSh3 2

5% � 0 and < 1 Average 13.48 14.20 11.84 12.27 79.17 13.23 202.16 56.65Maximum 43.81 45.14 39.56 39.67 428.25 46.59 3299.10 96.72

� 1 and < 50 Average 38.14 39.17 65.34 45.25 54.30 36.67 93.92 46.51

Maximum 70.47 70.48 186.15 186.15 218.02 66.82 1042.73 91.70� 50 Average 74.11 75.92 388.77 77.78 28.13 71.23 21.45 74.72

Maximum 85.09 88.49 564.57 112.13 47.63 83.71 38.58 85.55

all Average 30.95 31.91 68.61 34.44 62.33 29.86 132.06 52.78Maximum 85.09 88.49 564.57 186.15 428.25 83.71 3299.10 96.72

10% � 0 and < 1 Average 11.30 12.14 9.05 9.71 33.09 11.19 22.68 49.68Maximum 39.80 42.32 31.73 31.90 200.79 44.83 131.15 90.68

� 1 and < 50 Average 31.41 32.59 90.96 38.74 34.96 29.16 50.17 38.34

Maximum 61.27 61.28 267.08 186.15 107.16 54.03 357.43 83.12� 50 Average 63.92 65.88 682.55 115.77 15.50 58.82 11.51 64.43

Maximum 76.47 81.21 1133.61 281.98 28.97 73.14 21.81 76.89

all Average 25.79 26.89 103.38 32.94 32.71 24.18 36.10 44.93Maximum 76.47 81.21 1133.61 281.98 200.79 73.14 357.43 90.68

20% � 0 and < 1 Average 8.89 9.86 5.77 6.53 12.91 8.30 9.05 40.65Maximum 33.01 37.28 29.82 27.49 79.16 30.14 79.16 81.03

� 1 and < 50 Average 23.44 24.81 123.00 32.79 18.14 20.88 17.91 28.65

Maximum 46.77 49.73 369.77 186.15 49.20 43.38 74.99 67.42� 50 Average 50.10 52.19 1093.07 130.30 7.73 42.58 6.32 50.51

Maximum 62.96 69.06 2010.61 381.51 15.12 56.72 10.62 63.37

all Average 19.62 20.88 150.28 29.69 15.23 17.47 13.44 35.18Maximum 62.96 69.06 2010.61 381.51 79.16 56.72 79.16 81.03

Table 5: Average and maximum rmse (%) for various estimators.

estimated as the square root of the averaged square error. We used 100 replications, whichwas su�cient to estimate the rmse with a standard error below 5% in nearly all cases;typically the standard error was much less.

Summary results from the simulations are displayed in Tables 5 and 6. Table 5 gives theaverage and maximum rmse's for each estimator of D over all populations with 0 � 2 < 1,with 1 � 2 < 50, and with 2 � 50, as well as the average and maximum rmse's foreach estimator over all populations combined. Similarly, Table 6 gives the average andmaximum bias for each estimator. In these tables, the rmse and bias are each expressed asa percentage of the true number of classes. Tables 5 and 6 also display the rmse and biasof the estimator 2( bDuj1) used in the second-order jackknife estimators; the rmse and biasare expressed as a percentage of the true value 2 and are displayed in the column labelled 2.

Comparing Tables 5 and 6 indicates that for each estimator the major component ofthe rmse is almost always bias, not variance. Thus, even though the standard error can beestimated as in Section 4, this estimated standard error usually does not give an accuratepicture of the error in estimation of D. Another consequence of the predominance of biasis that when 2 is large, the rmse for the second-order estimator bDuj2 does not decrease

21

Page 26: RJ - IBM...r to publication should b e limited to p eer communications and sp eci c requests. After outside publication, requests should b e lled only b y rep rints o r legally obtained

sample

size 2 bDuj1bDsj1

bDuj2bDsj2

bDShbDSh2

bDSh3 2

5% � 0 and < 1 Average -12.71 -13.43 -10.76 -11.38 71.11 -10.98 90.57 -55.75Maximum -43.77 -45.10 -39.51 -39.62 427.53 -46.59 958.74 -94.97

� 1 and < 50 Average -37.95 -38.99 42.77 -16.83 39.13 -36.35 61.98 -46.19

Maximum -70.32 -70.32 186.15 186.15 218.01 -66.49 663.26 -91.70� 50 Average -74.10 -75.91 382.88 -22.16 22.92 -71.22 3.17 -74.71

Maximum -85.09 -88.49 556.68 110.28 44.65 -83.71 33.44 -85.54

all Average -30.54 -31.51 47.31 -15.04 50.80 -28.79 69.00 -52.24Maximum -85.09 -88.49 556.68 186.15 427.53 -83.71 958.74 -94.97

10% � 0 and < 1 Average -10.93 -11.78 -8.38 -9.31 28.12 -9.47 17.66 -48.87Maximum -39.79 -42.31 -31.47 -31.88 200.49 -44.83 130.80 -90.59

� 1 and < 50 Average -31.16 -32.34 74.62 -10.44 25.41 -28.61 35.20 -37.98

Maximum -61.00 -61.00 261.47 186.15 107.16 -53.88 264.38 -83.12� 50 Average -63.91 -65.87 677.18 24.90 11.57 -58.78 3.10 -64.41

Maximum -76.47 -81.21 1125.89 280.63 27.09 -73.13 18.47 -76.88

all Average -25.51 -26.62 87.45 -7.26 25.44 -23.20 25.65 -44.41Maximum -76.47 -81.21 1125.89 280.63 200.49 -73.13 264.38 -90.59

20% � 0 and < 1 Average -8.57 -9.55 -4.99 -6.20 9.99 -6.75 5.73 -39.71Maximum -33.01 -37.27 -17.83 -22.67 45.86 -28.38 28.17 -81.00

� 1 and < 50 Average -23.12 -24.49 112.39 -3.41 12.09 -20.13 10.02 -28.23

Maximum -46.54 -49.73 362.12 186.15 49.20 -43.38 49.34 -67.36� 50 Average -50.09 -52.17 1087.89 60.47 5.03 -42.53 1.72 -50.49

Maximum -62.96 -69.06 2003.12 381.51 13.90 -56.71 8.23 -63.36

all Average -19.32 -20.59 140.02 0.38 10.70 -16.45 7.65 -34.58Maximum -62.96 -69.06 2003.12 381.51 49.20 -56.71 49.34 -81.00

Table 6: Average and maximum bias (%) for various estimators.

monotonically as the sampling fraction increases. (In all other cases the rmse decreasesmonotonically.)

Comparing bDuj1 with bDsj1 and then comparing bDuj2 with bDsj2, we see that smoothing a�rst-order jackknife estimator never results in a better �rst-order estimator. On the otherhand, smoothing a second-order jackknife estimator can result in signi�cant performanceimprovement when 2 is large.

Similarly, using higher-order Taylor expansions leads to mixed results. Second-orderestimators perform better than �rst-order estimators when 2 is relatively small, but notwhen 2 is large. The di�culty is partially that the estimator 2( bDuj1) tends to underes-timate 2 when 2 is large, leading to underestimates of the number of classes. Moreover,the Taylor approximations underlying bDuj1, bDsj1, bDuj2, and bDsj2 are derived under the as-sumption of not too much variability between class sizes; this assumption is violated when 2 is large. There apparently is no systematic relation between the coe�cient of skewnessfor the class sizes and the performance of second-order jackknife estimators.

As predicted in Sections 3.4 and 4, the estimators bDSh and bDSh3 behave poorly when 2 is relatively small, and bDSh3 performs better than bDSh when 2 is large. For small tomedium values of 2, the modi�ed estimator bDSh2 has a smaller rmse than bDSh or bDSh3, and

22

Page 27: RJ - IBM...r to publication should b e limited to p eer communications and sp eci c requests. After outside publication, requests should b e lled only b y rep rints o r legally obtained

sample

size 2 bDuj2bDuj2a

bDSh2bDSh3

bDhybrid

5% � 0 and < 1 Average 11.84 19.46 13.23 202.16 11.84Maximum 39.56 192.64 46.59 3299.10 39.56

� 1 and < 50 Average 65.34 27.47 36.67 93.92 27.47

Maximum 186.15 54.51 66.82 1042.73 54.51� 50 Average 388.77 23.00 71.23 21.45 26.17

Maximum 564.57 36.60 83.71 38.58 39.20

all Average 68.61 23.89 29.86 132.06 21.06Maximum 564.57 192.64 83.71 3299.10 54.51

10% � 0 and < 1 Average 9.05 13.26 11.19 22.68 9.05Maximum 31.73 120.14 44.83 131.15 31.73

� 1 and < 50 Average 90.96 19.22 29.16 50.17 19.55

Maximum 267.08 48.12 54.03 357.43 48.12� 50 Average 682.55 17.82 58.82 11.51 11.51

Maximum 1133.61 27.30 73.14 21.81 21.81

all Average 103.38 16.71 24.18 36.10 14.69Maximum 1133.61 120.14 73.14 357.43 48.12

20% � 0 and < 1 Average 5.77 8.12 8.30 9.05 5.77Maximum 29.82 79.16 30.14 79.16 29.82

� 1 and < 50 Average 123.00 17.44 20.88 17.91 17.69

Maximum 369.77 76.57 43.38 74.99 76.57� 50 Average 1093.07 37.30 42.58 6.32 6.32

Maximum 2010.61 83.69 56.72 10.62 10.62

all Average 150.28 15.20 17.47 13.44 12.00Maximum 2010.61 83.69 56.72 79.16 76.57

Table 7: Average and maximum rmse (%) of bDuj2, bDuj2a, bDSh2, bDSh3, and bDhybrid.

its performance is comparable to the generalized jackknife estimators. For extremely largevalues of 2 and also for large sample sizes, the estimator bDSh3 has the best performanceof the three Shlosser-type estimators. (For a 20% sampling fraction, bDSh3 in fact has thelowest average rmse of all the estimators considered.)

As indicated earlier, smoothing can improve the performance of the second-order jack-knife estimator bDuj2. An alternative ad hoc technique for improving performance is to

\stabilize" bDuj2 using a method suggested by Chao, Ma, and Yang (1993). Fix c � 1 andremove any class whose frequency in the sample exceeds c; that is, remove from the sampleall members of classes fCj : j 2 B g, where B = f 1 � j � D : nj > c g. Then compute

the estimator bDuj2 from the reduced sample and subsequently increment it by jBj to pro-

duce the �nal estimate, denoted by bDuj2a. (Here jBj denotes the number of elements in

the set B.) When computing bDuj2 from the reduced sample, take the population size as

N �Pj2BbNj, where each bNj is a method-of-moments estimator of Nj as in Section 3.2.3.

If n�Pj2B nj = 0, then simply compute bDuj2 from the full sample. The idea behind this

procedure is as follows. When 2 is large, the population consists of a few large classesand many smaller classes. By in e�ect removing the largest classes from the population,

23

Page 28: RJ - IBM...r to publication should b e limited to p eer communications and sp eci c requests. After outside publication, requests should b e lled only b y rep rints o r legally obtained

we obtain a reduced population for which 2 is smaller, so that D is easier to estimate;the contribution to D from the jBj removed classes is then added back at the �nal stepof the estimation process. (We also experimented with another stabilization technique inwhich the k most frequent classes are removed for some �xed k, but this technique is notas e�ective.) Preliminary experiments indicated that c approximately 50 yields the bestperformance. For larger values of c, not enough of the frequent classes are removed; forsmaller c, the size of the reduced sample is too small, and the resulting inaccuracy of bDuj2

when computed from this sample o�sets the bene�ts of the reduction in 2. We thereforetake c = 50 in our experiments. As can be seen from Table 7, the rmse for bDuj2a is indeed

much lower than that for bDuj2 when 2 exceeds 1. Moreover, by comparing the rmse of bDsj2

and bDuj2a in Tables 5 and 7, respectively, it can be seen that stabilization is more e�ective

than smoothing. Observe, however, that the performance of bDuj2a is worse than that ofbDuj2 when 2 is small. Interestingly, experiments indicate that none of the other estimators

that we consider appears to bene�t from stabilization, and we apply this technique onlyto bDuj2. Overall, the most e�ective estimators appear to be bDuj2a, which has the smallest

average rmse over the various populations, and bDSh2, which has the smallest worst-casermse.

Our next observation is based on a comparison of the bias and rmse of bDuj1 and bDSh2

for all of the populations studied. The behavior of the two estimators is quite similar: thecorrelation between the bias of the estimators is 0.990 and the correlation between the rmseis 0.993. The rmse and bias of bDuj1 are usually slightly greater than the rmse and bias,

respectively, of bDSh2. On the other hand, using bDSh2 requires computation of f1; f2; : : : ; fn,whereas using bDuj1 requires computation only of f1. Thus, if computational resources are

limited, then it may be desirable to use bDuj1 as a surrogate for bDSh2; the quantity f1 can becomputed e�ciently using \Bloom �lter" techniques as described by Ramakrishna (1989).

The experimental results show that the relative performance of the estimators is stronglyin uenced by the value of 2. As can be seen from Table 7, the estimator bDuj2 has the

smallest average rmse when 0 � 2 < 1, the estimator bDuj2a has the smallest average

rmse when 1 � 2 < 50, and the estimator bDSh3 has the smallest average rmse when 2 � 50. These results indicate that it may be desirable to allow an estimator to dependexplicitly on the (estimated) value of 2. To illustrate this idea, we consider a simple adhoc branching estimator, denoted by bDhybrid. The idea is to estimate 2 by 2( bDuj1), �xparameters 0 < �1 < �2, and set

bDhybrid =

8><>:bDuj2 if 0 � 2( bDuj1) < �1;bDuj2a if �1 � 2( bDuj1) < �2;bDSh3 if 2( bDuj1) � �2:

(33)

Table 7 displays the estimated rmse for bDhybrid when �1 = 0:9 and �2 = 30. As can

be seen, the rmse for the combined estimator bDhybrid almost never exceeds that for bDuj2,bDuj2a, or bDSh3 separately.

24

Page 29: RJ - IBM...r to publication should b e limited to p eer communications and sp eci c requests. After outside publication, requests should b e lled only b y rep rints o r legally obtained

7. Conclusions

Both new and previous nonparametric estimators of the number of classes in a �nitepopulation can be viewed as generalized jackknife estimators. This viewpoint has suggestedways to improve Shlosser's original estimator and has shed new light on certain Horvitz-Thompson estimators as well as estimators based on notions of \sample coverage." Wehave used delta-method arguments to develop estimators of the standard error of generalizedjackknife estimators. As indicated by the example in Section 5, knowledge of the populationsize can lead to more precise estimation of the number of classes.

Of the estimators considered, the best appears to be the branching estimator bDhybrid

de�ned by (33), in which a modi�ed Shlosser estimator is used when the coe�cient ofvariation of the class sizes is estimated to be extremely large and unsmoothed second-orderjackknife estimators are used otherwise. The systematic development of such branchingestimators is a topic for future research. If a nonbranching estimator is desired, then werecommend the stabilized unsmoothed second-order jackknife estimator bDuj2a, followed by

the modi�ed Shlosser estimator bDSh2. If computing resources are scarce, then bDuj1 is areasonable estimator.

The various estimators of D discussed in this article embody di�erent approaches fordealing with the di�culties caused by variation in the class sizes N1; N2; : : : ; ND. Suchvariation is re ected by large values of 2. First-order estimates simply approximate eachNj by N . It is well-known in the literature that such an approach tends to yield down-wardly biased estimates (see Bunge and Fitzpatrick 1993). More sophisticated approachesconsidered here include

� Taylor corrections to the �rst-order approximation, as in the estimators bDuj2 and bDsj2;

� the stabilization technique of Section 6, in which the population is in e�ect modi�edso that the variation in class sizes is reduced;

� the Horvitz-Thompson approach, in which the �rst-order assumption is avoided byestimating explicitly each Nj such that nj > 0; and

� Shlosser's approach, which replaces the �rst-order assumption with the assumption in(25) and in its purest form results in the estimators bDSh and bDSh3.

The poor performance of the Horvitz-Thompson estimators indicates that approaches basedon direct estimation of the Nj's are unlikely to be successful. The second-order Taylorcorrection is e�ective mainly for small values of 2, and both the stabilization techniqueand Shlosser's approach are e�ective mainly for large values of 2. Thus, until a bettersolution is found, the best estimators will result from a judicious combination of the variousapproaches considered here.

25

Page 30: RJ - IBM...r to publication should b e limited to p eer communications and sp eci c requests. After outside publication, requests should b e lled only b y rep rints o r legally obtained

Acknowledgements

Hongmin Lu made substantial contributions to the early phases of the work reportedhere, and an anonymous reviewer suggested the \stabilization" technique for bDuj2 used inSection 6. The Wisconsin student database was graciously provided by Bob Nolan of theUniversity of Wisconsin-Madison Department of Information Technology (DoIT) and Je�Naughton of the University of Wisconsin-Madison Computer Sciences Department.

26

Page 31: RJ - IBM...r to publication should b e limited to p eer communications and sp eci c requests. After outside publication, requests should b e lled only b y rep rints o r legally obtained

References

Astrahan, M., Schkolnick, M., and Whang, K. (1987), \Approximating the Number ofUnique Values of an Attribute Without Sorting," Information Systems, 12, 11{15.

Billingsley, P. (1986), Convergence of Probability Measures (2nd ed.), New York: Wiley.

Bishop, Y., Fienberg, S., and Holland, P. (1975), Discrete Multivariate Analysis, Cam-bridge, MA: MIT Press.

Bunge, J., and Fitzpatrick, M. (1993), \Estimating the Number of Species: A Review,"Journal of the American Statistical Association, 88, 364{373.

Burnham, K. P., and Overton, W. S. (1978), \Estimation of the Size of a ClosedPopulation When Capture Probabilities Vary Among Animals," Biometrika, 65,625{633.

| (1979), \Robust Estimation of Population Size When Capture Probabilities Varyamong Animals," Ecology , 60, 927{936.

Chao, A., and Lee, S. (1992), \Estimating the Number of Classes via Sample Coverage,"Journal of the American Statistical Association, 87, 210{217.

Chao, A., Ma, M-C., and Yang, M. C. K. (1993), \Stopping Rules and Estimation forRecapture Debugging With Unequal Failure Rates," Biometrika, 80, 193{201.

Chung, K. L. (1974), A Course in Probability Theory (2nd ed.), New York: AcademicPress.

Deming, W. E., and Glasser, G. J. (1959), \On the Problem of Matching Lists bySamples," Journal of the American Statistical Association, 54, 403{415.

Flajolet, P., and Martin, G. N. (1985), \Probabilistic Counting Algorithms for DataBase Applications," Journal of Computer and System Sciences, 31, 182{209.

Gelenbe, E., and Gardy, D. (1982), \On the Sizes of Projections: I," InformationProcessing Letters, 14, 18{21.

Good, I. L. (1950), Probability and the Weighing of Evidence, London: Charles Gri�n.

Goodman, L. A. (1949), \On the Estimation of the Number of Classes in a Population,"Annals of Mathematical Statistics, 20, 572{579.

| (1952), \On the Analysis of Samples From k Lists," Annals of Mathematical Statis-

tics, 23, 632{634.

Gray, H. L., and Schucany, W. R. (1972), The Generalized Jackknife Statistic, NewYork: Marcel Dekker.

Hellerstein, J. M., and Stonebraker, M. (1994), \Predicate Migration: OptimizingQueries With Expensive Predicates," in Proceedings of the 1994 ACM SIGMOD

International Conference on Managment of Data, pp. 267{276.

27

Page 32: RJ - IBM...r to publication should b e limited to p eer communications and sp eci c requests. After outside publication, requests should b e lled only b y rep rints o r legally obtained

Heltshe, J. F., and Forrester, N. E. (1983), \Estimating Species Richness Using theJackknife Procedure," Biometrics, 39, 1{11.

Holst, L. (1981), \Some Asymptotic Results for Incomplete Multinomial or PoissonSamples," Scandinavian Journal of Statistics, 8, 243{246.

Hou, W., Ozsoyoglu, G., and Taneja, B. (1988), \Statistical Estimators for RelationalAlgebra Expressions," in Proceedings of the Seventh ACM SIGACT-SIGMOD-

SIGART Symposium on Principles of Database Systems, pp. 276{287.

| (1989), \Processing Aggregate Relational Queries With Hard Time Constraints," inProceedings of the 1989 ACM SIGMOD International Conference on Managmentof Data, pp. 68{77.

Kish, L. (1965), Survey Sampling , New York: Wiley.

Knuth, D. E. (1973), The Art of Computer Programming, Vol. 3: Sorting and Search-

ing , Reading, MA: Addison-Wesley.

Korth, H. F., and Silberschatz, A. (1991), Database System Concepts (2nd ed.), NewYork: McGraw-Hill.

Miller, R. G. (1974), \The Jackknife{ A Review," Biometrika, 61, 1{17.

Mosteller, F. (1949), \Questions and Answers," American Statistician, 3, 12{13.

Naughton, J. F., and Seshadri, S. (1990), \On Estimating the Size of Projections," inProceedings of the Third International Conference on Database Theory , pp. 499{513.

Ozsoyoglu, G., Du, K., Tjahjana, A., Hou, W., and Rowland, D. Y. (1991), \On Esti-mating COUNT, SUM, and AVERAGE Relational Algebra Queries," in Database

and Expert Systems Applications, Proceedings of the International Conference in

Berlin, Germany, 1991 (DEXA 91), pp. 406{412.

Ramakrishna, M. V. (1989), \Practical Performance of Bloom Filters and ParallelFree-Text Searching," Communications of the ACM , 32, 1237{1239.

Sarndal, C.-E., Swensson, B., and Wretman, J. (1992), Model Assisted Survey Sam-

pling , New York: Springer-Verlag.

Selinger, P. G., Astrahan, D. D., Chamberlain, R. A., Lorie, R. A., and Price, T. G.(1979), \Access Path Selection in a Relational Database Management System," inProceedings of the 1979 ACM SIGMOD International Conference on Managment

of Data, pp. 23{34.

Ser ing, R. J. (1980), Approximation Theorems of Mathematical Statistics, New York:Wiley.

Shlosser, A. (1981), \On Estimation of the Size of the Dictionary of a Long Text onthe Basis of a Sample," Engineering Cybernetics, 19, 97{102.

28

Page 33: RJ - IBM...r to publication should b e limited to p eer communications and sp eci c requests. After outside publication, requests should b e lled only b y rep rints o r legally obtained

Smith, E. P., and van Belle, G. (1984), \Nonparametric Estimation of Species Rich-ness," Biometrics, 40, 119{129.

Sudman, S. (1976), Applied Sampling , New York: Academic Press.

Vitter, J. S. (1985), \Random Sampling With a Reservoir," ACM Transactions on

Mathematical Software, 27, 703{718.

Whang, K., Vander-Zanden, B. T., and Taylor, H. M. (1990), \A Linear-Time Prob-abilistic Counting Algorithm for Database Applications," ACM Transactions on

Database Systems, 15, 208{229.

A. Estimators Based On Hypergeometric Probabilities

As in Section 3, denote by nj the number of elements in the sample that belong toclass j for 1 � j � D. Under the hypergeometric model (1), we have

P fnj = k g

=

�Nj

k

��N �Nj

n� k

� � �N

n

�=

�Nj

k

��n(n� 1) � � � (n� k + 1)

N(N � 1) � � � (N � k + 1)

��(N � n)(N � n� 1) � � � (N � n�Nj + k + 1)

(N � k)(N � k � 1) � � � (N �Nj + 1)

�=

�Nj

k

� q(q � 1

N ) � � � (q � k�1N )

1(1 � 1N ) � � � (1� k�1

N )

! (1� q)(1� q � 1

N ) � � � (1� q � Nj�k�1N )

(1� kN )(1 � k+1

N ) � � � (1� Nj�1N )

!for 1 � j � D and 0 � k � min(n;Nj), where q = n=N . When N is large relative to Nj , wehave the approximate equality given in (8). That is, P fnj = k g is approximately equal tothe probability that nj = k under the Bernoulli sampling model.

Estimators analogous to those in Section 3 can be derived using the exact hypergeometricprobabilities. The starting point in such a derivation is the pair of identities

P fnj = 0 g = hn(Nj)

and

P fnj = 1 g =�

nNj

N � n+ 1

�hn�1(Nj);

where

hn(x) =

(�(N�x+1)�(N�n+1)�(N�n�x+1) �(N+1) if x � N � n;

0 if x > N � n

for x � 0. (By an elementary property of the gamma function,

hn(Nj) =

�N �Nj

n

� � �N

n

�29

Page 34: RJ - IBM...r to publication should b e limited to p eer communications and sp eci c requests. After outside publication, requests should b e lled only b y rep rints o r legally obtained

for 1 � j � D.) It follows that the optimal value of the parameter K in (4) is given by

K =

PDj=1 hn(Nj)PD

j=1

�Nj

N�n+1

�hn�1(Nj)

:

First-order and second-order jackknife estimators can now be derived using arguments par-allel to those in Section 3.2. The second-order Taylor approximations use the identity

h0n(x) = �hn(x)gn(x)

for x > 0 and n � 1, where

gn(x) =

nXk=1

1

N � x� n+ k:

The estimators analogous to those in Section 3.2 are

bDuj1 =

�dn � f1

n

��1� (N � n+ 1)f1

nN

��1;

bDuj2 =

�1� (N � ~N � n+ 1)f1

nN

��1 dn +

(N � ~N � n+ 1)gn�1( ~N) 2( bDuj1)f1n

!;

and bDsj2 = (1� hn( ~N ))�1�dn +N 2( bDuj1)gn�1( ~N)hn( ~N)

�;

where ~N = N= bDuj1 and

2(D) = max�0;

(N � 1)D

Nn(n� 1)

nXi=1

i(i� 1)fi +D

N� 1�:

Moreover, the smoothed �rst-order jackknife estimator bDsj1 is de�ned as the value of bDthat solves the equation

bD�1� hn(N= bD)�= dn:

Finally, Horvitz-Thompson estimators can be derived in a manner similar to that in Sec-tion 3.2.3. For each j such that nj > 0, de�ne the method-of-moments estimator bNj of Nj

as the value of bN that solves the equation

nj =q bN

1� hn( bN):

30

Page 35: RJ - IBM...r to publication should b e limited to p eer communications and sp eci c requests. After outside publication, requests should b e lled only b y rep rints o r legally obtained

Then de�ne the estimator of b�(g) of �(g) =PDj=1 g(Nj) as

b�(g) = Xfj:nj>0g

g( bNj)

1� hn( bNj):

We compared the estimators based on the Bernoulli approximation (8) against the es-timators based on the exact hypergeometric probabilities, using the populations describedin Section 6. The error induced by the approximation (8) turned out to be less than 1% inall cases.

The derivation of bDuj2 and bDsj2 using the exact hypergeometric probabilities assumesthat Nj � N � n for 1 � j � D. Without this assumption, Taylor approximations ofhn(Nj) fail because hn is not continuous, and the subsequent derivation for each estimator

is inappropriate. We conclude by providing a technique for modifying bDuj2 and bDsj2 to dealwith this problem. For concreteness, we focus on the unsmoothed second-order jackknifeestimator bDuj2. Denote by J the set of indices of the \big" classes: J = f j : Nj > N � n g.(Observe that if n < N=2, then J can contain at most one element.) If j 2 J , then withprobability 1 class j is represented in the sample. We can decompose D according to

D = jJ j+ j f 1; 2; : : : ;D g � J j: (34)

The �rst term on the right side of (34) is the number of big classes, and the second termrepresents the number of classes in the reduced population that is formed by removing thebig classes. We can estimate jJ j by the number of elements in the set bJ = f j : bNj > N�n g,where bNj is a method-of-moments estimator of Nj de�ned as the numerical solution of theequation E [nj j nj > 0] = nj (cf. Sec. 3.2.3). Since we assume a hypergeometric sampling

model, bNj is de�ned more precisely as the solution of the equationbNj(n=N)

1� hn( bNj)= nj :

To estimate the remaining term in (34), apply the unsmoothed second-order jackknifeestimator to the reduced population obtained by removing the classes in bJ . Set N� =N �P

j2 bJbNj, n

� = n�Pj2 bJ

nj, d�n = dn � j bJ j, f�1 = jf j : nj = 1 and j 62 bJ gj, and

bD�uj1 =

�d�n �

f�1n�

��1� (N� � n� + 1)f�1

n�N�

��1:

(Observe that bNj = 1, and hence j 62 bJ , whenever nj = 1, so that f�1 = f1.) The modi�ed

version of bDuj2 is then given by

bD�uj2 = j bJ j+�1� (N� � ~N� � n� + 1)f�1

n�N�

��1� d�n +

(N� � ~N� � n� + 1)gn��1( ~N�) 2( bD�

uj1)f�1

n�

!;

31

Page 36: RJ - IBM...r to publication should b e limited to p eer communications and sp eci c requests. After outside publication, requests should b e lled only b y rep rints o r legally obtained

where ~N� = N�= bD�uj1. If nj = n for some j, then bNj = N and bD�

uj2 = 1. If bNj > N� � n�

for some j 62 bJ , then the foregoing process can be repeated. Similar modi�cations can bemade to the estimators bDsj1 and bDsj2.

B. Derivation of bDuj1 and bDuj2 Based on Sample Coverage

For a �nite population of size N , the sample coverage is de�ned as

C =

DXj=1

(Nj=N)I[nj > 0]:

Using the approximation in (8), we have to �rst order

E [C] =DXj=1

Nj

NP fnj > 0 g �

DXj=1

Nj

N

�1� (1� q)Nj

� � 1� (1� q)N : (35)

Similarly, E [dn] � D�1� (1� q)N

�, so that D � E [dn] =E [C]. Observe that by (35) and

(13),

E [C] � 1� (1� q)E [f1]

n:

The foregoing relations suggest the method-of-moments estimator bD = dn= bC, wherebC = 1� (1� q)f1

n:

This estimator is identical to bDuj1.To derive a second-order estimator, use a Taylor approximation as in Section 3.2.2 to

obtain

E [C] �DXj=1

Nj

N

�1� (1� q)Nj

�� 1� 1

N

DXj=1

Nj

�(1� q)N + (1� q)N ln(1� q)(Nj �N)

�= 1� (1� q)N � (1� q)N ln(1� q)N 2;

where 2 is the squared coe�cient of variation of N1; N2; : : : ; ND. It follows that

E [dn]

E [C]�

D�1� (1� q)N

�1� (1� q)N � (1� q)N ln(1� q)N 2

� D

1 +

(1� q)N ln(1� q)N 2

1� (1� q)N

!;

32

Page 37: RJ - IBM...r to publication should b e limited to p eer communications and sp eci c requests. After outside publication, requests should b e lled only b y rep rints o r legally obtained

and hence

D � E [dn]

E [C]� (1� q)N ln(1� q)N 2

1� (1� q)N� 1

E [C]

�E [dn]� E [f1] (1� q) ln(1� q)N 2

n

�(36)

where we have used the relations (35) and (13). De�ne 2 as in (16). Estimating E [dn] bydn, E [f1] by f1,

2 by 2( bDuj1), and E [C] by bC in (36), we obtain the formula for bDuj2.

C. Asymptotic Variance

In this appendix we study the asymptotic variance of an estimator bD as D becomeslarge. Consider an in�nite sequence C1; C2; : : : of classes with corresponding class sizesN1; N2; : : : and construct a sequence of increasing populations in which the Dth populationcomprises classes C1; C2; : : : ; CD. As in (8), approximate the hypergeometric sample designby a Bernoulli sample design. Although the population size N depends on D, as does eachsample statistic fi, we suppress this dependence in our notation. Suppose that there existsa �nite, positive integer M and a positive real number � such that

Nj �M (37)

for j � 1 and

limD!1

pD

�N

D� �

�= lim

D!1

pD

1

D

DXi=1

(Nj � �)

!= 0: (38)

Also suppose that there exists a nonnegative vector � = (�1; �2; : : : ; �M ) 6= 0 and a non-negative symmetric matrix � = k�i;i0k 6= 0 such that

limD!1

pD

�E [fi]

D� �i

�= lim

D!1

pD

0@ 1

D

DXj=1

(�j;i � �i)

1A = 0; (39)

limD!1

Var [fi]

D= lim

D!1

1

D

DXj=1

�j;i(1� �j;i) = �i;i; (40)

and

limD!1

Cov [fi; fi0 ]

D= lim

D!1� 1

D

DXj=1

�j;i�j;i0 = �i;i0 (41)

for 1 � i; i0 �M , where

�j;i =

�Nj

i

�qi(1� q)Nj�i:

33

Page 38: RJ - IBM...r to publication should b e limited to p eer communications and sp eci c requests. After outside publication, requests should b e lled only b y rep rints o r legally obtained

The conditions in (37){(41) are satis�ed, for example, when the class-size sequenceN1; N2; : : :is of the form I1; I2; : : : ; Ir; I1; I2; : : : ; Ir; : : : , where I1; I2; : : : ; Ir are �xed nonnegative in-tegers; in e�ect, this sequence of populations is obtained from an initial population byuniformly scaling up the initial Fi's.

As in Section 4, suppose that the estimator bD is a function of the sample only throughf = (f1; f2; : : : ; fM ) and satis�es the condition in (31). Also suppose that the di�erentia-bility assumption in Section 4 holds, so that bD is continuously di�erentiable at the point(�; �).

Write fi =PD

j=1 I[nj = i] for 1 � i � M and observe that, under the foregoingassumptions, each fi is the sum of D independent (but not identically distributed) Bernoullirandom variables. An application of Theorem 5.1.2 in Chung (1974) followed by (39) showsthat

limD!1

(f ; N ) = (�; �) (42)

with probability 1, where f = f=D and N = N=D. Similarly, since (39){(41) hold byassumption, an application of Theorem B in Ser ing (1980, Sec. 1.9.2) and then Slutsky'sTheorem (see Ser ing 1980, Sec. 1.5.4) shows that

pD(f��)) N(0;�) as D !1, where

\)" denotes convergence in distribution and N denotes a multivariate normal randomvariable. It then follows from (38) and Theorem 4.4 in Billingsley (1986) that

pD�(f ; N )� (�; �)

�) �N(0;�); 0

�:

Since bD is assumed di�erentiable at (�; �), an application of the Delta Method (see Bishop,Fienberg, and Holland 1975, Sec. 14.6) shows that

pD� bD(f ; N)� bD(�; �)

�) N(0;Bt�B) (43)

as D ! 1, where B = r1bD(�; �), r1

bD denotes the gradient of bD(u; k) with respect tou, and N(0;Bt�B) is a univariate normal random variable. Using (31) we can rewrite theforegoing limit as

1pD

� bD(f ; N) � bD(D�;D�)�) N(0;Bt�B);

so that the asymptotic variance of bD(f ; N) is equal to (Bt�B)D.To approximate this asymptotic variance, set A = r1

bD(f ; N) and let C = C(f) bethe covariance matrix of the random vector f . It follows from (31) that r1

bD(cu; ck) =r1bD(u; k) for any c; k > 0 and nonnegative M -dimensional vector u. Thus,

limD!1

A = limD!1

r1bD(f ; N) = lim

D!1r1bD(f ; N ) = r1

bD(�; �) = B (44)

34

Page 39: RJ - IBM...r to publication should b e limited to p eer communications and sp eci c requests. After outside publication, requests should b e lled only b y rep rints o r legally obtained

Name N D 2 Skew Name N D 2 SkewDB01 15469 15469 0.00 0.00 DB21 15469 131 3.76 3.79DB02 1288928 1288928 0.00 0.00 DB22 624473 168 3.90 3.06DB03 624473 624473 0.00 0.00 DB23 1547606 21 6.30 3.26DB04 597382 591564 0.01 17.61 DB24 1547606 49 6.55 2.70DB05 113600 110074 0.04 6.87 DB25 1463974 535328 7.60 639.24DB06 621498 591564 0.05 4.70 DB26 1547606 909 7.99 7.70DB07 1341544 1288927 0.05 9.95 DB27 1463974 10 8.12 2.66DB08 1547606 51168 0.23 0.24 DB28 931174 73 12.96 6.43DB09 1547606 3 0.38 -0.67 DB29 597382 17 14.27 3.73DB10 147811 110076 0.47 7.41 DB30 633756 221480 15.68 454.61DB11 113600 3 0.70 0.08 DB31 633756 213 16.16 7.36DB12 173805 109688 0.93 4.84 DB32 173805 72 16.98 7.14DB13 1463974 624472 0.94 4.77 DB33 931174 398 19.70 7.89DB14 1654700 624473 1.13 4.38 DB34 113600 6155 24.17 54.66DB15 633756 202462 1.19 3.53 DB35 1654700 235 30.85 10.35DB16 597382 437654 1.53 114.62 DB36 173805 61 31.71 7.04DB17 931174 110076 1.63 4.51 DB37 1341544 37 33.03 5.82DB18 931174 29 3.22 4.29 DB38 147811 62 34.68 7.22DB19 1547606 33 3.33 1.66 DB39 1463974 233 37.75 11.06DB20 1547606 194 3.35 2.97 DB40 624473 14047 81.63 69.00

Table 8: Characteristics of \database" populations

with probability 1, where the third equality follows by (42) and the assumed continuity ofr1bD. Using (40), (41), and (44), we �nd that

limD!1

AtCA

Bt(D�)B= 1

with probability 1, and the asymptotic variance of bD can be approximated by AtCA.

D. Detailed Experimental Results

This section contains further details about the experiments described in Section 6. Ta-ble 8 displays characteristics of the \database" populations used in the experiments. Theprintouts on the following pages contain simulation results for all of the estimators andfor each experimental population. In the printouts, \Psize" denotes the population size,\Nclass" denotes the number of classes in the population, and \gm2hat" denotes the esti-mator 2( bDuj1) used to estimate 2 in the second-order jackknife estimators.

35

Page 40: RJ - IBM...r to publication should b e limited to p eer communications and sp eci c requests. After outside publication, requests should b e lled only b y rep rints o r legally obtained

Pe

rform

an

ce M

ea

sure

: Bia

s (%)

Sa

mp

le size

: 5.0

%

Na

me

Psize

Ncla

ss ga

mm

a2

skew

Du

j1 D

sj1 D

uj2

Du

j2a

Dsj2

DS

h D

Sh

2 D

Sh

3 D

HT

j D

HT

sj Dh

j Hyb

rid g

m2

ha

tD

B0

1 1

54

69

15

46

9 0

.00

0.0

0 0

.00

0.0

0 0

.00

0.0

0 0

.00

0.0

0 -0

.00

-0.0

0 0

.00

0

.00

0.0

0 0

.00

0.0

0D

B0

2 1

28

89

28

12

88

92

8 0

.00

0.0

0 0

.00

0.0

0 0

.00

0.0

0 0

.00

0.0

0 -0

.00

0.0

0 0

.00

0

.00

0.0

0 0

.00

0.0

0D

B0

3 6

24

47

3 6

24

47

3 0

.00

0.0

0 0

.00

0.0

0 0

.00

0.0

0 0

.00

-0.0

0 -0

.00

0.0

0 0

.00

0

.00

0.0

0 0

.00

0.0

0E

Q1

00

15

00

0 1

50

0.0

0 0

.00

0.1

0 0

.10

0.1

3 0

.13

0.1

2 1

2.0

0 0

.10

2.4

3 1

9.4

5

62

.52

-0.3

4 0

.13

0.7

5E

Q1

0 1

50

00

15

00

0.0

0 0

.00

0.8

3 0

.82

1.9

7 1

.97

1.9

6 4

27

.53

0.9

3 3

40

.21

48

7.5

4 5

42

.17

-26

.66

1.9

7 1

.41

DB

04

59

73

82

59

15

64

0.0

1 1

7.6

1 -1

.27

-1.3

0 -1

.17

-1.1

7 -1

.18

0.8

1 0

.81

0.7

6 0

.85

0

.87

-13

.51

-1.1

7 -9

2.7

3G

OO

D 1

00

00

95

95

0.0

4 5

.64

-4.3

1 -4

.39

-4.0

6 -4

.06

-4.0

6 3

.50

3.5

0 3

.28

3.6

3

3.7

3 -2

5.0

0 -4

.06

-93

.62

DB

05

11

36

00

11

00

74

0.0

4 6

.87

-3.3

5 -3

.41

-3.1

6 -3

.16

-3.1

6 2

.69

2.6

9 2

.54

2.7

9

2.8

5 -2

6.8

3 -3

.16

-94

.40

DB

06

62

14

98

59

15

64

0.0

5 4

.70

-4.3

2 -4

.40

-4.1

0 -4

.10

-4.1

0 4

.31

4.3

1 4

.08

4.4

5

4.5

5 -3

2.8

9 -4

.10

-94

.96

DB

07

13

41

54

4 1

28

89

27

0.0

5 9

.95

-4.2

3 -4

.32

-3.9

3 -3

.93

-3.9

3 3

.42

3.4

2 3

.23

3.5

5

3.6

3 -3

1.2

6 -3

.93

-93

.14

NG

B/4

82

13

5 8

74

0.1

8 0

.50

-1.5

7 -2

.59

-0.2

0 -0

.20

-1.9

7 3

3.1

4 -1

.53

5.9

6 7

7.7

6 1

63

.84

-2.9

5 -0

.20

-10

.81

DB

08

15

47

60

6 5

11

68

0.2

3 0

.24

-10

.89

-12

.77

-7.4

4 -7

.44

-8.8

9 2

52

.82

-11

.02

11

7.9

1 3

86

.55

54

2.2

1 -1

8.7

6 -7

.44

-57

.99

FR

AM

E2

33

75

0 1

90

00

0.3

1 1

.18

-22

.87

-23

.23

-21

.74

-21

.74

-21

.76

61

.36

-16

.55

56

.81

64

.26

6

6.3

8 -5

9.3

7 -2

1.7

4 -9

4.9

7N

GB

/2 4

11

97

90

6 0

.37

0.8

1 -1

0.5

4 -1

3.5

6 -4

.77

-4.7

7 -8

.79

15

0.3

7 -1

0.5

7 4

9.4

0 2

89

.71

43

6.0

0 -1

5.8

1 -4

.77

-39

.92

DB

09

15

47

60

6 3

0.3

8 -0

.67

0.0

0 0

.00

0.0

0 0

.00

0.0

0 0

.00

0.0

0 0

.00

0.0

0

0.0

0 0

.00

0.0

0 -0

.03

DB

10

14

78

11

11

00

76

0.4

7 7

.41

-28

.55

-29

.46

-25

.48

-25

.48

-25

.53

25

.88

-32

.32

23

.45

27

.48

2

8.5

3 -6

2.4

2 -2

5.4

8 -9

0.3

7F

RA

ME

3 1

11

50

0 3

60

00

0.5

2 1

.92

-30

.75

-31

.75

-27

.83

-27

.83

-27

.95

14

0.5

8 -2

9.4

8 1

23

.23

15

2.2

6 1

60

.71

-58

.45

-27

.83

-90

.82

DB

11

11

36

00

3 0

.70

0.0

8 -3

0.3

3 -3

0.3

3 -2

8.2

7 2

6.6

7 -3

0.3

3 2

6.6

7 -2

7.6

2 9

58

.74

26

.67

2

6.6

7 -3

0.3

3 -2

8.2

7 -7

3.5

1N

GB

/1 2

02

13

93

0 0

.75

1.2

5 -2

8.5

4 -3

2.1

0 -2

0.0

2 -2

0.0

2 -2

3.2

6 2

22

.19

-28

.79

10

1.7

3 3

58

.24

46

6.7

0 -3

6.6

7 -2

0.0

2 -6

6.9

2D

B1

2 1

73

80

5 1

09

68

8 0

.93

4.8

4 -4

3.7

7 -4

5.1

0 -3

9.5

1 -3

9.5

1 -3

9.6

2 3

8.6

7 -4

6.5

9 3

3.2

1 4

2.4

0

44

.77

-69

.15

-39

.51

-91

.20

DB

13

14

63

97

4 6

24

47

2 0

.94

4.7

7 -4

2.4

6 -4

4.3

4 -3

6.3

4 -3

6.3

4 -3

6.5

6 8

7.2

7 -4

1.7

9 7

5.0

6 9

5.9

1 1

01

.35

-65

.94

-36

.34

-87

.53

DB

14

16

54

70

0 6

24

47

3 1

.13

4.3

8 -4

6.2

4 -4

8.3

6 -3

9.3

9 -3

9.3

9 -3

9.7

0 9

8.9

6 -4

6.9

1 8

2.5

9 1

10

.89

11

8.2

5 -6

6.7

0 -3

9.3

9 -8

7.2

3D

B1

5 6

33

75

6 2

02

46

2 1

.19

3.5

3 -4

7.7

9 -4

9.8

5 -4

1.3

6 -4

1.3

6 -4

1.7

3 1

17

.75

-47

.98

95

.45

13

4.2

6 1

44

.71

-66

.14

-41

.36

-87

.95

DB

16

59

73

82

43

76

54

1.5

3 1

14

.62

-35

.18

-39

.18

3.4

8 3

.48

3.1

1 2

5.8

1 -3

2.3

3 2

2.8

3 2

8.0

9

29

.21

-64

.76

3.4

8 -5

8.8

5D

B1

7 9

31

17

4 1

10

07

6 1

.63

4.5

1 -4

3.8

4 -4

8.8

6 -2

4.7

2 -2

4.7

2 -2

7.8

2 2

18

.01

-44

.26

14

0.0

8 2

95

.35

33

5.8

9 -5

5.9

2 -2

4.7

2 -7

0.6

6S

UD

M 3

30

00

0 1

00

00

0 1

.87

2.7

1 -5

9.6

1 -6

1.3

8 -5

4.5

0 -5

4.5

0 -5

4.9

3 9

6.7

5 -5

9.3

2 6

8.9

3 1

18

.32

13

2.1

5 -7

1.8

1 -5

4.5

0 -9

1.7

0D

B1

8 9

31

17

4 2

9 3

.22

4.2

9 -4

.86

-4.8

6 0

.47

27

.93

-4.8

6 2

6.0

2 -3

.39

66

3.2

6 2

8.3

7

28

.61

-4.8

6 2

7.9

3 -6

.19

DB

19

15

47

60

6 3

3 3

.33

1.6

6 -9

.00

-9.0

0 2

.45

-6.2

6 -9

.00

-1.7

9 -8

.66

-5.3

7 5

4.9

9

66

.41

-9.0

0 -6

.26

-11

.74

DB

20

15

47

60

6 1

94

3.3

5 2

.97

-4.8

1 -4

.82

8.9

2 -0

.64

-4.8

2 5

.30

-4.3

4 3

.96

54

.42

8

1.2

2 -4

.82

-0.6

4 -6

.23

DB

21

15

46

9 1

31

3.7

6 3

.79

-19

.87

-22

.41

35

.93

21

.21

-21

.50

74

.24

-17

.84

15

.94

24

7.7

0 3

60

.32

-22

.42

21

.21

-26

.47

DB

22

62

44

73

16

8 3

.90

3.0

6 -1

2.1

3 -1

2.1

5 8

.17

-4.9

6 -1

2.1

5 3

.52

-11

.41

0.0

4 7

2.1

6 1

08

.39

-12

.15

-4.9

6 -1

5.2

1D

B2

3 1

54

76

06

21

6.3

0 3

.26

-8.9

0 -8

.90

14

.88

-4.1

5 -8

.90

0.9

9 -8

.43

-1.6

4 6

1.7

6

71

.26

-8.9

0 -4

.15

-10

.33

DB

24

15

47

60

6 4

9 6

.55

2.7

0 -2

3.5

2 -2

3.5

3 3

5.9

5 -7

.84

-23

.53

28

.21

-21

.07

19

.34

16

0.9

2 2

17

.77

-23

.53

-7.8

4 -2

7.1

3D

B2

5 1

46

39

74

53

53

28

7.6

0 6

39

.24

-36

.72

-38

.84

17

2.7

5 -3

2.0

1 1

72

.67

11

5.1

9 -3

3.0

4 1

00

.92

12

5.1

3 1

32

.05

-62

.27

-32

.01

-41

.33

DB

26

15

47

60

6 9

09

7.9

9 7

.70

-15

.75

-15

.84

43

.50

-6.8

7 -1

5.8

4 6

.10

-14

.80

-4.2

0 1

18

.14

16

1.1

6 -1

5.8

4 -6

.87

-17

.76

DB

27

14

63

97

4 1

0 8

.12

2.6

6 -3

7.3

0 -3

7.3

0 1

8.9

3 -1

6.7

5 -3

7.3

0 1

3.1

1 -3

4.9

0 3

3.2

0 1

57

.30

16

8.5

0 -3

7.3

0 -1

6.7

5 -4

1.9

0D

B2

8 9

31

17

4 7

3 1

2.9

6 6

.43

-24

.68

-24

.70

10

8.9

3 5

.57

-24

.70

41

.47

-21

.55

11

5.7

0 2

00

.83

24

8.4

3 -2

4.7

0 5

.57

-26

.64

DB

29

59

73

82

17

14

.27

3.7

3 -5

1.1

1 -5

1.1

2 3

1.8

4 -2

5.4

9 -5

1.1

2 8

.30

-48

.29

19

.54

15

3.4

9 1

74

.30

-51

.12

-25

.49

-54

.70

DB

30

63

37

56

22

14

80

15

.68

45

4.6

1 -2

7.5

7 -3

0.8

9 1

86

.15

-22

.81

18

6.1

5 1

31

.99

-28

.02

11

9.3

5 1

40

.80

14

7.3

2 -5

7.8

5 -2

2.8

1 -2

8.8

1D

B3

1 6

33

75

6 2

13

16

.16

7.3

6 -5

3.1

6 -5

3.2

0 2

8.4

6 -2

5.6

8 -5

3.2

0 -1

.47

-50

.73

18

.52

14

1.5

6 1

72

.69

-53

.20

-25

.68

-56

.60

DB

32

17

38

05

72

16

.98

7.1

4 -4

4.3

8 -4

4.4

2 3

4.3

2 -2

9.4

1 -4

4.4

2 -1

4.8

3 -4

3.0

1 -1

6.3

0 9

8.2

3 1

24

.28

-44

.42

-29

.41

-47

.18

DB

33

93

11

74

39

8 1

9.7

0 7

.89

-54

.02

-54

.05

14

.34

-40

.71

-54

.05

-26

.37

-52

.73

-26

.08

78

.73

10

2.3

7 -5

4.0

5 -4

0.7

1 -5

6.7

5D

B3

4 1

13

60

0 6

15

5 2

4.1

7 5

4.6

6 -6

1.0

3 -6

5.5

9 1

15

.69

-32

.23

16

.66

71

.92

-60

.40

2.5

0 2

06

.19

25

4.0

4 -6

6.0

0 -3

2.2

3 -6

3.2

3D

B3

5 1

65

47

00

23

5 3

0.8

5 1

0.3

5 -4

9.1

4 -4

9.1

6 1

32

.99

-23

.09

-49

.16

3.2

1 -4

6.6

7 1

8.1

0 1

46

.53

18

3.9

6 -4

9.1

6 -2

3.0

9 -5

0.6

5D

B3

6 1

73

80

5 6

1 3

1.7

1 7

.04

-68

.09

-68

.11

39

.91

-29

.89

-68

.11

3.2

5 -6

4.7

2 7

4.3

0 1

21

.53

14

7.0

1 -6

8.1

1 -2

9.8

9 -7

0.3

1D

B3

7 1

34

15

44

37

33

.03

5.8

2 -7

0.3

2 -7

0.3

2 3

9.7

4 -2

7.2

2 -7

0.3

2 1

0.2

7 -6

6.4

9 6

5.8

2 1

33

.29

15

1.6

2 -7

0.3

2 -2

7.2

2 -7

2.4

5D

B3

8 1

47

81

1 6

2 3

4.6

8 7

.22

-66

.47

-66

.50

64

.00

-33

.42

-66

.50

-1.6

1 -6

3.4

1 2

8.4

3 1

31

.95

15

8.3

9 -6

6.5

0 -3

3.4

2 -6

8.4

5D

B3

9 1

46

39

74

23

3 3

7.7

5 1

1.0

6 -4

9.2

4 -4

9.2

6 1

72

.90

-23

.60

-49

.26

2.2

6 -4

6.8

1 1

8.1

8 1

45

.56

18

2.4

1 -4

9.2

6 -2

3.6

0 -5

0.5

5D

B4

0 6

24

47

3 1

40

47

81

.63

69

.00

-62

.37

-64

.81

42

6.3

7 -1

7.8

4 -4

8.6

6 3

1.3

3 -6

0.2

7 -1

8.4

0 1

83

.86

23

1.5

0 -6

4.8

2 -1

8.1

9 -6

3.0

3Z

20

A 5

00

00

24

7 1

14

.38

14

.60

-75

.27

-75

.54

24

2.2

2 -3

3.5

2 -7

5.5

4 3

.48

-71

.78

-0.0

5 1

16

.23

14

5.2

2 -7

5.5

4 -2

6.9

9 -7

5.9

7Z

15

50

00

0 7

72

16

6.1

8 2

3.4

4 -7

3.6

8 -7

4.8

0 5

56

.68

-5.1

7 -7

4.7

4 4

4.6

5 -6

9.1

1 3

3.4

4 1

69

.56

20

0.3

6 -7

4.8

0 3

3.4

4 -7

4.2

9Z

20

B 5

00

00

10

38

4 2

34

.81

73

.54

-85

.09

-88

.49

30

6.2

3 -2

1.7

5 1

10

.28

12

.22

-83

.71

-2.3

1 5

5.3

6

63

.57

-88

.60

-3.9

4 -8

5.5

4

(0.0

<=

ga

mm

a2

< 1

.0) A

vera

ge

: 3.2

0 -1

2.7

1 -1

3.4

3 -1

0.7

6 -8

.14

-11

.38

71

.11

-10

.98

90

.57

97

.31

12

6.5

5 -2

7.4

4 -1

0.7

6 -5

5.7

5

Ma

ximu

m: 1

7.6

1 -4

3.7

7 -4

5.1

0 -3

9.5

1 -3

9.5

1 -3

9.6

2 4

27

.53

-46

.59

95

8.7

4 4

87

.54

54

2.2

1 -6

9.1

5 -3

9.5

1

-94

.97

(1.0

<=

ga

mm

a2

< 5

0) A

vera

ge

: 51

.27

-37

.95

-38

.99

42

.77

-18

.33

-16

.83

39

.13

-36

.35

61

.98

12

8.3

9 1

55

.66

-43

.74

-18

.33

-46

.19

M

axim

um

: 63

9.2

4 -7

0.3

2 -7

0.3

2 1

86

.15

-54

.50

18

6.1

5 2

18

.01

-66

.49

66

3.2

6 2

95

.35

36

0.3

2 -7

1.8

1 -5

4.5

0

-91

.70

(50

<=

ga

mm

a2

< in

f) Ave

rag

e: 4

5.1

4 -7

4.1

0 -7

5.9

1 3

82

.88

-19

.57

-22

.16

22

.92

-71

.22

3.1

7 1

31

.25

16

0.1

6 -7

5.9

4 -3

.92

-74

.71

M

axim

um

: 73

.54

-85

.09

-88

.49

55

6.6

8 -3

3.5

2 1

10

.28

44

.65

-83

.71

33

.44

18

3.8

6 2

31

.50

-88

.60

33

.44

-8

5.5

4

Ave

rag

e: 3

1.3

9 -3

0.5

4 -3

1.5

1 4

7.3

1 -1

4.3

1 -1

5.0

4 5

0.8

0 -2

8.7

9 6

9.0

0 1

16

.06

14

4.2

5 -3

9.6

4 -1

4.1

6

-52

.24

M

axim

um

: 63

9.2

4 -8

5.0

9 -8

8.4

9 5

56

.68

-54

.50

18

6.1

5 4

27

.53

-83

.71

95

8.7

4 4

87

.54

54

2.2

1 -8

8.6

0 -5

4.5

0

-94

.97

36

Page 41: RJ - IBM...r to publication should b e limited to p eer communications and sp eci c requests. After outside publication, requests should b e lled only b y rep rints o r legally obtained

Pe

rfo

rma

nce

Me

asu

re: R

MS

Err

or

(%)

Sa

mp

le s

ize

: 5

.0%

Na

me

P

size

N

cla

ss g

am

ma

2 s

kew

D

uj1

D

sj1

D

uj2

D

uj2

a D

sj2

D

Sh

D

Sh

2 D

Sh

3 D

HT

j

DH

Tsj

D

hj

Hyb

rid

g

m2

ha

tD

B0

1 1

54

69

1

54

69

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0.0

0 0

.00

0

.00

0

.00

DB

02

1

28

89

28

1

28

89

28

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0.0

0 0

.00

0

.00

0

.00

DB

03

6

24

47

3 6

24

47

3 0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0.0

0 0

.00

0

.00

0

.00

EQ

10

0 1

50

00

1

50

0

.00

0

.00

0

.65

0

.63

0

.67

0

.67

0

.64

1

2.9

9 0

.65

2

.76

2

3.1

0

66

.77

0

.69

0

.67

1

.37

EQ

10

1

50

00

1

50

0 0

.00

0

.00

7

.02

6

.84

7

.97

7

.97

7

.86

4

28

.25

7

.35

3

41

.21

4

88

.14

5

42

.60

2

6.9

4 7

.97

2

.65

DB

04

5

97

38

2 5

91

56

4 0

.01

1

7.6

1 1

.40

1

.43

1

.31

1

.31

1

.31

0

.82

0

.82

0

.77

0

.85

0.8

7 1

3.8

4 1

.31

9

3.0

3G

OO

D 1

00

00

9

59

5 0

.04

5

.64

8

.21

8

.30

8

.02

8

.02

8

.02

3

.55

3

.55

3

.38

3

.67

3.7

5 3

2.2

9 8

.02

9

6.7

2D

B0

5 1

13

60

0 1

10

07

4 0

.04

6

.87

3

.96

4

.02

3

.79

3

.79

3

.79

2

.69

2

.69

2

.55

2

.79

2.8

6 2

7.6

5 3

.79

9

4.6

0D

B0

6 6

21

49

8 5

91

56

4 0

.05

4

.70

4

.46

4

.53

4

.23

4

.23

4

.24

4

.31

4

.31

4

.09

4

.45

4.5

5 3

2.9

8 4

.23

9

4.9

8D

B0

7 1

34

15

44

1

28

89

27

0

.05

9

.95

4

.27

4

.36

3

.97

3

.97

3

.97

3

.42

3

.42

3

.23

3

.55

3.6

3 3

1.2

9 3

.97

9

3.2

0N

GB

/4 8

21

35

8

74

0

.18

0

.50

1

.72

2

.67

0

.88

0

.88

2

.09

3

3.4

4 1

.69

6

.13

7

9.0

9 1

64

.87

3

.02

0

.88

1

4.4

5D

B0

8 1

54

76

06

5

11

68

0

.23

0

.24

1

0.8

9 1

2.7

7 7

.45

7

.45

8

.90

2

52

.83

1

1.0

2 1

17

.92

3

86

.56

5

42

.22

1

8.7

6 7

.45

5

8.0

2F

RA

ME

2 3

37

50

1

90

00

0

.31

1

.18

2

3.7

6 2

4.1

2 2

2.6

7 2

2.6

7 2

2.6

9 6

1.4

1 1

8.9

6 5

6.8

8 6

4.2

9

66

.40

5

9.4

5 2

2.6

7 9

5.1

2N

GB

/2 4

11

97

9

06

0

.37

0

.81

1

0.6

4 1

3.6

3 5

.24

5

.24

8

.96

1

50

.80

1

0.6

7 4

9.7

5 2

90

.68

4

36

.68

1

5.8

5 5

.24

4

0.5

8D

B0

9 1

54

76

06

3

0

.38

-0

.67

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0.0

0 0

.00

0

.00

0

.51

DB

10

1

47

81

1 1

10

07

6 0

.47

7

.41

2

8.6

6 2

9.5

7 2

5.6

6 2

5.6

6 2

5.7

0 2

5.8

9 3

2.3

3 2

3.4

7 2

7.4

8

28

.53

6

2.4

3 2

5.6

6 9

0.4

7F

RA

ME

3 1

11

50

0 3

60

00

0

.52

1

.92

3

0.8

4 3

1.8

3 2

7.9

4 2

7.9

4 2

8.0

6 1

40

.61

3

0.0

7 1

23

.27

1

52

.27

1

60

.72

5

8.4

6 2

7.9

4 9

0.8

6D

B1

1 1

13

60

0 3

0

.70

0

.08

3

1.8

0 3

1.8

0 3

2.5

3 1

92

.64

3

1.8

0 1

92

.64

3

3.0

6 3

29

9.1

0 1

92

.64

1

92

.64

3

1.8

0 3

2.5

3 7

7.1

5N

GB

/1 2

02

13

9

30

0

.75

1

.25

2

8.6

3 3

2.1

6 2

0.3

2 2

0.3

2 2

3.4

6 2

22

.91

2

8.8

8 1

02

.49

3

59

.19

4

67

.39

3

6.7

1 2

0.3

2 6

7.1

8D

B1

2 1

73

80

5 1

09

68

8 0

.93

4

.84

4

3.8

1 4

5.1

4 3

9.5

6 3

9.5

6 3

9.6

7 3

8.6

8 4

6.5

9 3

3.2

3 4

2.4

1

44

.78

6

9.1

6 3

9.5

6 9

1.2

3D

B1

3 1

46

39

74

6

24

47

2 0

.94

4

.77

4

2.4

6 4

4.3

4 3

6.3

5 3

6.3

5 3

6.5

7 8

7.2

7 4

1.7

9 7

5.0

7 9

5.9

1 1

01

.35

6

5.9

4 3

6.3

5 8

7.5

3D

B1

4 1

65

47

00

6

24

47

3 1

.13

4

.38

4

6.2

4 4

8.3

6 3

9.4

0 3

9.4

0 3

9.7

1 9

8.9

6 4

6.9

1 8

2.5

9 1

10

.89

1

18

.25

6

6.7

0 3

9.4

0 8

7.2

4D

B1

5 6

33

75

6 2

02

46

2 1

.19

3

.53

4

7.7

9 4

9.8

6 4

1.3

7 4

1.3

7 4

1.7

4 1

17

.76

4

7.9

8 9

5.4

6 1

34

.27

1

44

.72

6

6.1

4 4

1.3

7 8

7.9

5D

B1

6 5

97

38

2 4

37

65

4 1

.53

1

14

.62

3

5.2

0 3

9.1

9 6

.86

6

.86

6

.64

2

5.8

1 3

2.3

3 2

2.8

4 2

8.0

9

29

.21

6

4.7

7 6

.86

5

9.2

0D

B1

7 9

31

17

4 1

10

07

6 1

.63

4

.51

4

3.8

4 4

8.8

6 2

4.7

4 2

4.7

4 2

7.8

4 2

18

.02

4

4.2

6 1

40

.10

2

95

.36

3

35

.89

5

5.9

2 2

4.7

4 7

0.6

7S

UD

M 3

30

00

0 1

00

00

0 1

.87

2

.71

5

9.6

1 6

1.3

8 5

4.5

1 5

4.5

1 5

4.9

4 9

6.7

7 5

9.3

2 6

8.9

6 1

18

.33

1

32

.16

7

1.8

1 5

4.5

1 9

1.7

0D

B1

8 9

31

17

4 2

9 3

.22

4

.29

5

.31

5

.31

8

.19

4

9.2

3 5

.31

4

6.2

4 5

.04

1

04

2.7

3 4

9.4

5

49

.67

5

.31

4

9.2

3 6

.83

DB

19

1

54

76

06

3

3 3

.33

1

.66

9

.38

9

.38

1

0.2

1 7

.75

9

.38

7

.98

9

.11

7

.73

7

3.9

3

84

.79

9

.38

7

.75

1

2.2

5D

B2

0 1

54

76

06

1

94

3

.35

2

.97

4

.95

4

.96

1

0.1

4 2

.41

4

.96

6

.69

4

.52

5

.68

5

9.4

4

85

.03

4

.96

2

.41

6

.43

DB

21

1

54

69

1

31

3

.76

3

.79

2

0.0

3 2

2.5

3 3

8.4

2 2

3.8

7 2

1.6

5 7

6.7

5 1

8.0

5 1

8.8

3 2

55

.23

3

65

.49

2

2.5

4 2

3.8

7 2

7.1

4D

B2

2 6

24

47

3 1

68

3

.90

3

.06

1

2.2

1 1

2.2

4 1

0.3

7 5

.98

1

2.2

4 6

.65

1

1.5

1 5

.27

7

8.1

5 1

12

.72

1

2.2

4 5

.98

1

5.3

5D

B2

3 1

54

76

06

2

1 6

.30

3

.26

9

.63

9

.63

2

9.2

4 9

.14

9

.63

1

3.2

9 9

.35

1

1.8

8 9

3.9

5 1

03

.58

9

.63

9

.14

1

1.2

0D

B2

4 1

54

76

06

4

9 6

.55

2

.70

2

3.8

7 2

3.8

7 4

3.5

6 1

2.9

9 2

3.8

7 3

5.9

0 2

1.5

9 2

8.8

5 1

78

.57

2

31

.67

2

3.8

7 1

2.9

9 2

7.5

2D

B2

5 1

46

39

74

5

35

32

8 7

.60

6

39

.24

3

6.7

2 3

8.8

5 1

72

.77

3

2.0

1 1

72

.70

1

15

.19

3

3.0

4 1

00

.92

1

25

.13

1

32

.05

6

2.2

7 3

2.0

1 4

1.9

8D

B2

6 1

54

76

06

9

09

7

.99

7

.70

1

5.7

7 1

5.8

6 4

4.0

0 7

.09

1

5.8

6 6

.81

1

4.8

2 4

.73

1

19

.48

1

62

.06

1

5.8

6 7

.09

1

7.7

9D

B2

7 1

46

39

74

1

0 8

.12

2

.66

3

8.3

3 3

8.3

3 6

8.9

9 3

6.5

4 3

8.3

3 6

5.0

6 3

6.6

6 1

14

.18

2

40

.99

2

50

.47

3

8.3

3 3

6.5

4 4

3.0

5D

B2

8 9

31

17

4 7

3 1

2.9

6 6

.43

2

4.9

8 2

5.0

0 1

15

.76

1

4.8

6 2

5.0

0 4

7.6

4 2

2.0

4 1

29

.57

2

11

.36

2

56

.76

2

5.0

0 1

4.8

6 2

6.9

6D

B2

9 5

97

38

2 1

7 1

4.2

7 3

.73

5

1.7

5 5

1.7

5 8

0.4

5 4

0.6

1 5

1.7

5 5

7.1

3 4

9.3

6 7

8.8

9 2

18

.24

2

34

.39

5

1.7

5 4

0.6

1 5

5.3

8D

B3

0 6

33

75

6 2

21

48

0 1

5.6

8 4

54

.61

2

7.5

9 3

0.9

0 1

86

.15

2

2.8

5 1

86

.15

1

31

.99

2

8.0

2 1

19

.36

1

40

.80

1

47

.32

5

7.8

5 2

2.8

5 3

0.1

9D

B3

1 6

33

75

6 2

13

1

6.1

6 7

.36

5

3.2

1 5

3.2

4 3

5.6

1 2

7.6

1 5

3.2

4 1

5.5

6 5

0.8

1 3

1.7

8 1

48

.75

1

78

.34

5

3.2

4 2

7.6

1 5

6.6

5D

B3

2 1

73

80

5 7

2 1

6.9

8 7

.14

4

4.4

9 4

4.5

2 4

8.9

8 3

1.4

9 4

4.5

2 2

2.4

4 4

3.1

6 2

6.5

2 1

15

.88

1

38

.43

4

4.5

2 3

1.4

9 4

7.3

1D

B3

3 9

31

17

4 3

98

1

9.7

0 7

.89

5

4.0

3 5

4.0

7 1

9.7

7 4

0.9

3 5

4.0

7 2

7.2

3 5

2.7

5 2

7.3

5 8

2.9

4 1

05

.50

5

4.0

7 4

0.9

3 5

6.7

7D

B3

4 1

13

60

0 6

15

5 2

4.1

7 5

4.6

6 6

1.0

3 6

5.6

0 1

16

.75

3

2.3

1 1

8.4

2 7

2.1

1 6

0.4

1 4

.29

2

06

.36

2

54

.18

6

6.0

1 3

2.3

1 6

3.3

0D

B3

5 1

65

47

00

2

35

3

0.8

5 1

0.3

5 4

9.1

9 4

9.2

0 1

38

.32

2

4.5

3 4

9.2

0 1

3.8

3 4

6.7

3 2

7.6

3 1

51

.93

1

88

.20

4

9.2

0 2

4.5

3 5

0.7

0D

B3

6 1

73

80

5 6

1 3

1.7

1 7

.04

6

8.2

1 6

8.2

3 7

1.6

5 4

1.8

1 6

8.2

3 3

9.1

2 6

4.9

5 1

17

.57

1

47

.06

1

67

.64

6

8.2

3 4

1.8

1 7

0.4

3D

B3

7 1

34

15

44

3

7 3

3.0

3 5

.82

7

0.4

7 7

0.4

8 7

8.7

7 4

4.9

6 7

0.4

8 5

2.6

1 6

6.8

2 1

24

.91

1

66

.86

1

80

.87

7

0.4

8 4

4.9

6 7

2.6

1D

B3

8 1

47

81

1 6

2 3

4.6

8 7

.22

6

6.5

9 6

6.6

2 8

9.3

5 4

0.4

9 6

6.6

2 3

3.6

9 6

3.6

3 6

7.5

8 1

53

.62

1

76

.40

6

6.6

2 4

0.4

9 6

8.5

7D

B3

9 1

46

39

74

2

33

3

7.7

5 1

1.0

6 4

9.2

9 4

9.3

1 1

79

.98

2

5.3

0 4

9.3

1 1

4.7

7 4

6.8

8 2

9.6

4 1

52

.03

1

87

.37

4

9.3

1 2

5.3

0 5

0.6

0D

B4

0 6

24

47

3 1

40

47

8

1.6

3 6

9.0

0 6

2.3

7 6

4.8

1 4

26

.73

1

7.9

2 4

8.6

7 3

1.4

3 6

0.2

7 1

8.4

7 1

83

.95

2

31

.56

6

4.8

2 1

8.2

6 6

3.0

4Z

20

A 5

00

00

2

47

1

14

.38

1

4.6

0 7

5.3

0 7

5.5

7 2

55

.60

3

6.6

0 7

5.5

7 2

0.5

3 7

1.8

3 2

3.6

0 1

23

.22

1

50

.60

7

5.5

7 3

9.2

0 7

5.9

9Z

15

5

00

00

7

72

1

66

.18

2

3.4

4 7

3.6

9 7

4.8

1 5

64

.57

1

4.6

6 7

4.7

5 4

7.6

3 6

9.1

3 3

8.5

8 1

71

.82

2

02

.10

7

4.8

1 3

8.5

8 7

4.3

1Z

20

B 5

00

00

1

03

84

2

34

.81

7

3.5

4 8

5.0

9 8

8.4

9 3

08

.17

2

2.8

0 1

12

.13

1

2.9

2 8

3.7

1 5

.15

5

5.5

4

63

.72

8

8.6

0 8

.65

8

5.5

5

(0.0

<=

ga

mm

a2

< 1

.0)

A

vera

ge

: 3

.20

1

3.4

8 1

4.2

0 1

1.8

4 1

9.4

6 1

2.2

7 7

9.1

7 1

3.2

3 2

02

.16

1

05

.57

1

34

.79

2

7.9

6 1

1.8

4 5

6.6

5

Ma

xim

um

: 1

7.6

1 4

3.8

1 4

5.1

4 3

9.5

6 1

92

.64

3

9.6

7 4

28

.25

4

6.5

9 3

29

9.1

0 4

88

.14

5

42

.60

6

9.1

6 3

9.5

6

9

6.7

2

(1.0

<=

ga

mm

a2

< 5

0)

A

vera

ge

: 5

1.2

7 3

8.1

4 3

9.1

7 6

5.3

4 2

7.4

7 4

5.2

5 5

4.3

0 3

6.6

7 9

3.9

2 1

42

.86

1

68

.64

4

3.9

3 2

7.4

7 4

6.5

1

Ma

xim

um

: 6

39

.24

7

0.4

7 7

0.4

8 1

86

.15

5

4.5

1 1

86

.15

2

18

.02

6

6.8

2 1

04

2.7

3 2

95

.36

3

65

.49

7

1.8

1 5

4.5

1

9

1.7

0

(50

<

= g

am

ma

2 <

inf)

A

vera

ge

: 4

5.1

4 7

4.1

1 7

5.9

2 3

88

.77

2

3.0

0 7

7.7

8 2

8.1

3 7

1.2

3 2

1.4

5 1

33

.63

1

62

.00

7

5.9

5 2

6.1

7 7

4.7

2

Ma

xim

um

: 7

3.5

4 8

5.0

9 8

8.4

9 5

64

.57

3

6.6

0 1

12

.13

4

7.6

3 8

3.7

1 3

8.5

8 1

83

.95

2

31

.56

8

8.6

0 3

9.2

0

8

5.5

5

Ave

rag

e: 3

1.3

9 3

0.9

5 3

1.9

1 6

8.6

1 2

3.8

9 3

4.4

4 6

2.3

3 2

9.8

6 1

32

.06

1

27

.09

1

54

.46

3

9.9

4 2

1.0

6

5

2.7

8

Ma

xim

um

: 6

39

.24

8

5.0

9 8

8.4

9 5

64

.57

1

92

.64

1

86

.15

4

28

.25

8

3.7

1 3

29

9.1

0 4

88

.14

5

42

.60

8

8.6

0 5

4.5

1

9

6.7

2

37

Page 42: RJ - IBM...r to publication should b e limited to p eer communications and sp eci c requests. After outside publication, requests should b e lled only b y rep rints o r legally obtained

Pe

rform

an

ce M

ea

sure

: Bia

s (%)

Sa

mp

le size

: 10

.0%

Na

me

Psize

Ncla

ss ga

mm

a2

skew

Du

j1 D

sj1 D

uj2

Du

j2a

Dsj2

DS

h D

Sh

2 D

Sh

3 D

HT

j D

HT

sj Dh

j Hyb

rid g

m2

ha

tD

B0

1 1

54

69

15

46

9 0

.00

0.0

0 0

.00

0.0

0 0

.00

0.0

0 0

.00

0.0

0 -0

.00

0.0

0 0

.00

0

.00

0.0

0 0

.00

0.0

0D

B0

2 1

28

89

28

12

88

92

8 0

.00

0.0

0 0

.00

0.0

0 0

.00

0.0

0 0

.00

0.0

0 0

.00

-0.0

0 0

.00

0

.00

0.0

0 0

.00

0.0

0D

B0

3 6

24

47

3 6

24

47

3 0

.00

0.0

0 0

.00

0.0

0 0

.00

0.0

0 0

.00

-0.0

0 0

.00

-0.0

0 0

.00

0

.00

0.0

0 0

.00

0.0

0E

Q1

00

15

00

0 1

50

0.0

0 0

.00

0.0

0 0

.00

0.0

0 0

.00

0.0

0 0

.02

0.0

0 0

.00

0.1

0

0.3

2 0

.00

0.0

0 0

.28

EQ

10

15

00

0 1

50

0 0

.00

0.0

0 -0

.03

0.0

4 0

.48

0.4

8 0

.50

20

0.4

9 -0

.11

13

0.8

0 2

53

.36

31

7.7

7 -1

4.4

6 0

.48

0.8

8D

B0

4 5

97

38

2 5

91

56

4 0

.01

17

.61

-1.1

6 -1

.22

-0.9

7 -0

.97

-0.9

8 0

.67

0.6

7 0

.59

0.7

3

0.7

7 -7

.44

-0.9

7 -8

6.1

1G

OO

D 1

00

00

95

95

0.0

4 5

.64

-3.8

7 -4

.01

-3.4

6 -3

.46

-3.4

7 2

.96

2.9

6 2

.62

3.1

8

3.3

4 -1

9.4

3 -3

.46

-89

.40

DB

05

11

36

00

11

00

74

0.0

4 6

.87

-2.9

3 -3

.05

-2.5

7 -2

.57

-2.5

7 2

.28

2.2

8 2

.03

2.4

4

2.5

6 -1

6.8

4 -2

.57

-88

.93

DB

06

62

14

98

59

15

64

0.0

5 4

.70

-4.1

1 -4

.26

-3.6

8 -3

.68

-3.6

8 3

.64

3.6

4 3

.26

3.8

9

4.0

7 -2

1.9

4 -3

.68

-90

.00

DB

07

13

41

54

4 1

28

89

27

0.0

5 9

.95

-3.8

9 -4

.06

-3.3

3 -3

.33

-3.3

4 2

.86

2.8

6 2

.53

3.0

8

3.2

3 -2

0.2

4 -3

.33

-86

.81

NG

B/4

82

13

5 8

74

0.1

8 0

.50

-0.1

9 -0

.32

0.0

5 0

.05

-0.3

1 1

.42

-0.1

7 0

.04

4.5

0

13

.48

-0.3

2 0

.05

-1.8

9D

B0

8 1

54

76

06

51

16

8 0

.23

0.2

4 -6

.38

-8.2

4 -3

.81

-3.8

1 -6

.36

39

.52

-6.2

6 8

.58

83

.31

14

6.1

4 -9

.48

-3.8

1 -3

3.9

9F

RA

ME

2 3

37

50

19

00

0 0

.31

1.1

8 -2

1.9

6 -2

2.6

8 -1

9.9

2 -1

9.9

2 -2

0.0

1 4

8.2

4 -1

4.6

4 4

1.3

3 5

2.9

4

56

.59

-48

.51

-19

.92

-90

.59

NG

B/2

41

19

7 9

06

0.3

7 0

.81

-4.9

3 -6

.65

-1.3

7 -1

.37

-5.6

4 2

0.3

2 -4

.67

1.7

8 5

8.0

9 1

04

.97

-6.8

8 -1

.37

-18

.53

DB

09

15

47

60

6 3

0.3

8 -0

.67

0.0

0 0

.00

0.0

0 0

.00

0.0

0 0

.00

0.0

0 0

.00

0.0

0

0.0

0 0

.00

0.0

0 0

.01

DB

10

14

78

11

11

00

76

0.4

7 7

.41

-26

.35

-27

.91

-20

.90

-20

.90

-21

.06

19

.43

-31

.32

15

.77

22

.06

2

3.8

1 -5

1.3

7 -2

0.9

0 -8

2.5

7F

RA

ME

3 1

11

50

0 3

60

00

0.5

2 1

.92

-28

.01

-29

.87

-22

.68

-22

.68

-23

.12

95

.48

-25

.33

73

.83

11

1.8

1 1

24

.32

-47

.44

-22

.68

-81

.98

DB

11

11

36

00

3 0

.70

0.0

8 -2

8.0

0 -2

8.0

0 -2

4.6

8 1

7.0

0 -2

8.0

0 1

7.0

0 -2

3.9

1 1

7.0

0 1

7.0

0

17

.07

-28

.00

-24

.68

-68

.07

NG

B/1

20

21

3 9

30

0.7

5 1

.25

-20

.51

-24

.22

-11

.34

-11

.34

-18

.34

51

.05

-20

.17

8.0

4 1

20

.48

17

8.6

8 -2

5.4

4 -1

1.3

4 -4

8.4

7D

B1

2 1

73

80

5 1

09

68

8 0

.93

4.8

4 -3

9.7

9 -4

2.3

1 -3

1.4

7 -3

1.4

7 -3

1.8

8 2

5.7

5 -4

4.8

3 1

8.3

3 3

1.5

1

35

.00

-59

.08

-31

.47

-82

.92

DB

13

14

63

97

4 6

24

47

2 0

.94

4.7

7 -3

7.5

2 -4

0.7

2 -2

6.4

2 -2

6.4

2 -2

7.2

0 5

9.4

9 -3

9.8

5 4

4.4

3 7

1.7

0

79

.30

-55

.33

-26

.42

-77

.26

DB

14

16

54

70

0 6

24

47

3 1

.13

4.3

8 -4

0.5

8 -4

4.2

1 -2

7.9

8 -2

7.9

8 -2

9.0

9 6

3.9

0 -3

7.5

3 4

5.0

4 7

9.9

5

89

.58

-56

.25

-27

.98

-76

.55

DB

15

63

37

56

20

24

62

1.1

9 3

.53

-42

.05

-45

.68

-30

.19

-30

.19

-31

.53

70

.90

-40

.22

46

.85

92

.06

10

5.0

1 -5

5.9

1 -3

0.1

9 -7

7.4

3D

B1

6 5

97

38

2 4

37

65

4 1

.53

11

4.6

2 -2

9.8

4 -3

3.7

9 1

9.1

2 8

.02

18

.35

19

.54

-31

.23

15

.77

22

.62

2

4.3

8 -5

3.1

9 1

7.7

7 -4

9.6

2D

B1

7 9

31

17

4 1

10

07

6 1

.63

4.5

1 -3

2.6

7 -3

9.3

5 -2

.27

-2.2

9 -1

1.4

2 1

07

.16

-32

.71

54

.90

17

2.3

9 2

08

.23

-43

.01

-2.2

7 -5

2.7

2S

UD

M 3

30

00

0 1

00

00

0 1

.87

2.7

1 -5

4.0

7 -5

7.5

4 -4

3.8

5 -4

3.8

5 -4

5.6

3 4

3.6

0 -5

3.8

8 1

7.2

4 6

9.4

1

84

.07

-63

.40

-43

.85

-83

.12

DB

18

93

11

74

29

3.2

2 4

.29

-3.2

8 -3

.28

4.8

8 1

8.4

2 -3

.28

20

.15

-1.1

5 2

64

.38

20

.50

2

1.4

0 -3

.28

18

.42

-4.2

3D

B1

9 1

54

76

06

33

3.3

3 1

.66

-6.4

8 -6

.48

3.2

9 -2

.82

-6.4

8 1

.73

-5.7

4 8

.73

19

.42

2

4.0

8 -6

.48

-2.8

2 -8

.44

DB

20

15

47

60

6 1

94

3.3

5 2

.97

-2.4

5 -2

.45

5.4

1 0

.05

-2.4

5 2

.12

-2.0

4 2

.46

12

.98

2

1.1

5 -2

.45

0.0

5 -3

.21

DB

21

15

46

9 1

31

3.7

6 3

.79

-11

.17

-11

.89

21

.14

3.2

6 -1

1.8

8 1

2.1

1 -9

.71

1.1

3 4

6.2

8

85

.67

-11

.89

3.2

6 -1

4.4

8D

B2

2 6

24

47

3 1

68

3.9

0 3

.06

-8.4

9 -8

.50

6.0

0 -4

.04

-8.5

0 -0

.59

-7.7

8 -0

.87

19

.90

3

1.6

7 -8

.50

-4.0

4 -1

0.6

0D

B2

3 1

54

76

06

21

6.3

0 3

.26

-6.1

4 -6

.14

17

.17

-1.4

0 -6

.14

3.5

2 -5

.26

10

.63

27

.72

3

1.7

0 -6

.14

-1.4

0 -7

.14

DB

24

15

47

60

6 4

9 6

.55

2.7

0 -1

5.5

7 -1

5.5

7 4

0.0

4 -3

.34

-15

.57

7.7

0 -1

3.4

6 -3

.43

58

.98

8

3.4

2 -1

5.5

7 -3

.34

-17

.96

DB

25

14

63

97

4 5

35

32

8 7

.60

63

9.2

4 -3

2.8

0 -3

5.4

3 1

73

.44

-25

.55

17

3.4

1 7

8.5

6 -3

2.2

1 6

0.2

2 9

2.6

1 1

02

.76

-51

.48

-25

.55

-36

.98

DB

26

15

47

60

6 9

09

7.9

9 7

.70

-10

.08

-10

.12

40

.71

-1.8

3 -1

0.1

2 5

.82

-8.6

7 6

.92

39

.86

5

9.2

6 -1

0.1

2 -1

.83

-11

.37

DB

27

14

63

97

4 1

0 8

.12

2.6

6 -2

9.4

0 -2

9.4

0 4

6.6

2 -6

.94

-29

.40

20

.36

-24

.88

47

.47

80

.06

9

2.3

0 -2

9.4

0 -6

.94

-33

.02

DB

28

93

11

74

73

12

.96

6.4

3 -1

4.3

8 -1

4.3

8 1

32

.39

18

.59

-14

.38

32

.16

-10

.15

41

.65

83

.82

11

4.2

2 -1

4.3

8 1

8.5

9 -1

5.5

0D

B2

9 5

97

38

2 1

7 1

4.2

7 3

.73

-40

.76

-40

.76

11

0.4

6 -2

.87

-40

.76

26

.59

-34

.64

38

.63

11

4.3

0 1

27

.52

-40

.76

-2.8

7 -4

3.6

4D

B3

0 6

33

75

6 2

21

48

0 1

5.6

8 4

54

.61

-24

.17

-26

.71

18

6.1

5 -1

9.4

3 1

86

.15

95

.08

-26

.43

77

.14

10

7.9

2 1

18

.24

-46

.21

-19

.43

-25

.16

DB

31

63

37

56

21

3 1

6.1

6 7

.36

-43

.73

-43

.76

83

.32

-2.7

3 -4

3.7

6 1

3.8

8 -3

8.5

2 4

1.4

9 7

6.7

3

96

.32

-43

.76

-2.7

3 -4

6.5

2D

B3

2 1

73

80

5 7

2 1

6.9

8 7

.14

-37

.61

-37

.64

73

.71

-15

.99

-37

.64

-2.6

9 -3

4.4

6 2

3.9

4 5

2.3

0

65

.15

-37

.64

-15

.99

-39

.96

DB

33

93

11

74

39

8 1

9.7

0 7

.89

-47

.94

-47

.96

49

.73

-28

.34

-47

.96

-16

.20

-45

.08

-2.3

1 3

4.9

3

47

.19

-47

.96

-28

.34

-50

.35

DB

34

11

36

00

61

55

24

.17

54

.66

-49

.84

-54

.19

20

2.1

7 -6

.56

-5.7

5 2

4.0

0 -4

7.4

5 -8

.44

10

0.5

7 1

31

.22

-54

.30

-6.5

6 -5

1.6

1D

B3

5 1

65

47

00

23

5 3

0.8

5 1

0.3

5 -4

0.1

6 -4

0.1

7 2

01

.07

-1.2

9 -4

0.1

7 1

0.1

6 -3

5.5

9 3

1.1

6 6

4.9

1

87

.57

-40

.17

-1.2

9 -4

1.4

5D

B3

6 1

73

80

5 6

1 3

1.7

1 7

.04

-58

.77

-58

.79

13

8.0

9 -7

.88

-58

.79

7.5

2 -5

2.7

6 1

6.3

7 7

1.5

1

87

.39

-58

.79

-7.8

8 -6

0.6

6D

B3

7 1

34

15

44

37

33

.03

5.8

2 -6

1.0

0 -6

1.0

0 1

51

.05

1.1

3 -6

1.0

0 2

4.4

3 -5

3.2

3 6

2.3

9 8

2.5

2

96

.03

-61

.00

1.1

3 -6

2.8

4D

B3

8 1

47

81

1 6

2 3

4.6

8 7

.22

-57

.91

-57

.94

15

1.6

0 -9

.94

-57

.94

3.3

7 -5

2.3

6 1

7.5

0 6

4.9

0

81

.14

-57

.94

-9.9

4 -5

9.6

2D

B3

9 1

46

39

74

23

3 3

7.7

5 1

1.0

6 -4

0.0

7 -4

0.0

9 2

61

.47

-0.5

2 -4

0.0

9 1

1.1

3 -3

5.4

3 3

3.4

8 6

7.2

7

89

.68

-40

.09

-0.5

2 -4

1.1

8D

B4

0 6

24

47

3 1

40

47

81

.63

69

.00

-51

.27

-53

.09

66

5.0

9 -3

.78

-51

.85

12

.19

-47

.15

-3.7

7 8

6.5

9 1

14

.80

-53

.09

-3.7

7 -5

1.7

3Z

20

A 5

00

00

24

7 1

14

.38

14

.60

-65

.65

-65

.89

53

6.2

1 -8

.82

-65

.89

1.3

1 -5

9.7

8 -1

.45

63

.76

8

1.0

9 -6

5.8

9 -1

.45

-66

.28

Z1

5 5

00

00

77

2 1

66

.18

23

.44

-62

.26

-63

.29

11

25

.89

17

.84

-63

.29

27

.09

-55

.07

18

.47

92

.74

11

4.9

5 -6

3.2

9 1

8.4

7 -6

2.7

7Z

20

B 5

00

00

10

38

4 2

34

.81

73

.54

-76

.47

-81

.21

38

1.5

1 2

5.8

6 2

80

.63

5.6

7 -7

3.1

3 -0

.86

33

.84

4

0.7

2 -8

1.3

0 -0

.86

-76

.88

(0.0

<=

ga

mm

a2

< 1

.0) A

vera

ge

: 3.2

0 -1

0.9

3 -1

1.7

8 -8

.38

-6.4

0 -9

.31

28

.12

-9.4

7 1

7.6

6 4

0.0

1

52

.92

-20

.58

-8.3

8 -4

8.8

7

Ma

ximu

m: 1

7.6

1 -3

9.7

9 -4

2.3

1 -3

1.4

7 -3

1.4

7 -3

1.8

8 2

00

.49

-44

.83

13

0.8

0 2

53

.36

31

7.7

7 -5

9.0

8 -3

1.4

7

-90

.59

(1.0

<=

ga

mm

a2

< 5

0) A

vera

ge

: 51

.27

-31

.16

-32

.34

74

.62

-7.2

7 -1

0.4

4 2

5.4

1 -2

8.6

1 3

5.2

0 6

5.7

9

81

.72

-35

.56

-6.9

1 -3

7.9

8

Ma

ximu

m: 6

39

.24

-61

.00

-61

.00

26

1.4

7 -4

3.8

5 1

86

.15

10

7.1

6 -5

3.8

8 2

64

.38

17

2.3

9 2

08

.23

-63

.40

-43

.85

-8

3.1

2

(50

<=

ga

mm

a2

< in

f) Ave

rag

e: 4

5.1

4 -6

3.9

1 -6

5.8

7 6

77

.18

7.7

8 2

4.9

0 1

1.5

7 -5

8.7

8 3

.10

69

.23

8

7.8

9 -6

5.8

9 3

.10

-64

.41

M

axim

um

: 73

.54

-76

.47

-81

.21

11

25

.89

25

.86

28

0.6

3 2

7.0

9 -7

3.1

3 1

8.4

7 9

2.7

4 1

14

.95

-81

.30

18

.47

-7

6.8

8

Ave

rag

e: 3

1.3

9 -2

5.5

1 -2

6.6

2 8

7.4

5 -5

.76

-7.2

6 2

5.4

4 -2

3.2

0 2

5.6

5 5

5.6

4 7

0.5

6 -3

1.8

4 -6

.74

-4

4.4

1

Ma

ximu

m: 6

39

.24

-76

.47

-81

.21

11

25

.89

-43

.85

28

0.6

3 2

00

.49

-73

.13

26

4.3

8 2

53

.36

31

7.7

7 -8

1.3

0 -4

3.8

5

-90

.59

38

Page 43: RJ - IBM...r to publication should b e limited to p eer communications and sp eci c requests. After outside publication, requests should b e lled only b y rep rints o r legally obtained

Pe

rfo

rma

nce

Me

asu

re: R

MS

Err

or

(%)

Sa

mp

le s

ize

: 1

0.0

%

Na

me

P

size

N

cla

ss g

am

ma

2 s

kew

D

uj1

D

sj1

D

uj2

D

uj2

a D

sj2

D

Sh

D

Sh

2 D

Sh

3 D

HT

j

DH

Tsj

D

hj

Hyb

rid

g

m2

ha

tD

B0

1 1

54

69

1

54

69

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0.0

0 0

.00

0

.00

0

.00

DB

02

1

28

89

28

1

28

89

28

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0.0

0 0

.00

0

.00

0

.00

DB

03

6

24

47

3 6

24

47

3 0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0.0

0 0

.00

0

.00

0

.00

EQ

10

0 1

50

00

1

50

0

.00

0

.00

0

.01

0

.00

0

.01

0

.01

0

.00

0

.12

0

.01

0

.02

0

.59

1.0

8 0

.00

0

.01

0

.58

EQ

10

1

50

00

1

50

0 0

.00

0

.00

2

.88

2

.76

3

.28

3

.28

3

.14

2

00

.79

2

.55

1

31

.15

2

53

.68

3

18

.00

1

4.6

0 3

.28

1

.73

DB

04

5

97

38

2 5

91

56

4 0

.01

1

7.6

1 1

.18

1

.24

1

.01

1

.01

1

.01

0

.67

0

.67

0

.59

0

.73

0.7

7 7

.49

1

.01

8

6.2

7G

OO

D 1

00

00

9

59

5 0

.04

5

.64

5

.35

5

.50

4

.91

4

.91

4

.92

3

.02

3

.02

2

.73

3

.22

3.3

7 2

1.1

9 4

.91

9

0.3

1D

B0

5 1

13

60

0 1

10

07

4 0

.04

6

.87

3

.08

3

.20

2

.72

2

.72

2

.73

2

.28

2

.28

2

.04

2

.44

2.5

6 1

7.0

0 2

.72

8

9.0

6D

B0

6 6

21

49

8 5

91

56

4 0

.05

4

.70

4

.14

4

.28

3

.70

3

.70

3

.71

3

.64

3

.64

3

.26

3

.89

4.0

7 2

1.9

6 3

.70

9

0.0

1D

B0

7 1

34

15

44

1

28

89

27

0

.05

9

.95

3

.90

4

.08

3

.35

3

.35

3

.36

2

.86

2

.86

2

.53

3

.08

3.2

3 2

0.2

5 3

.35

8

6.8

5N

GB

/4 8

21

35

8

74

0

.18

0

.50

0

.26

0

.36

0

.22

0

.22

0

.36

1

.53

0

.24

0

.21

5

.04

14

.04

0

.37

0

.22

6

.23

DB

08

1

54

76

06

5

11

68

0

.23

0

.24

6

.38

8

.24

3

.81

3

.81

6

.36

3

9.5

2 6

.26

8

.59

8

3.3

2 1

46

.14

9

.48

3

.81

3

4.0

1F

RA

ME

2 3

37

50

1

90

00

0

.31

1

.18

2

2.2

1 2

2.9

2 2

0.2

3 2

0.2

3 2

0.3

1 4

8.2

8 1

4.8

6 4

1.4

1 5

2.9

6

56

.61

4

8.5

5 2

0.2

3 9

0.6

8N

GB

/2 4

11

97

9

06

0

.37

0

.81

5

.00

6

.69

1

.78

1

.78

5

.70

2

0.5

0 4

.74

2

.23

5

8.6

4 1

05

.34

6

.92

1

.78

1

9.3

5D

B0

9 1

54

76

06

3

0

.38

-0

.67

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0.0

0 0

.00

0

.00

0

.36

DB

10

1

47

81

1 1

10

07

6 0

.47

7

.41

2

6.3

7 2

7.9

3 2

0.9

5 2

0.9

5 2

1.1

1 1

9.4

4 3

1.3

2 1

5.7

8 2

2.0

7

23

.81

5

1.3

8 2

0.9

5 8

2.6

4F

RA

ME

3 1

11

50

0 3

60

00

0

.52

1

.92

2

8.0

4 2

9.8

8 2

2.7

2 2

2.7

2 2

3.1

7 9

5.5

0 2

5.3

4 7

3.8

6 1

11

.82

1

24

.33

4

7.4

4 2

2.7

2 8

2.0

2D

B1

1 1

13

60

0 3

0

.70

0

.08

3

0.5

5 3

0.5

5 3

1.7

3 1

20

.14

3

0.5

5 1

20

.14

3

2.3

2 1

20

.14

1

20

.14

1

20

.14

3

0.5

5 3

1.7

3 7

4.3

1N

GB

/1 2

02

13

9

30

0

.75

1

.25

2

0.5

6 2

4.2

6 1

1.6

1 1

1.6

1 1

8.4

2 5

1.4

5 2

0.2

3 8

.91

1

21

.15

1

79

.12

2

5.4

6 1

1.6

1 4

8.7

7D

B1

2 1

73

80

5 1

09

68

8 0

.93

4

.84

3

9.8

0 4

2.3

2 3

1.4

9 3

1.4

9 3

1.9

0 2

5.7

6 4

4.8

3 1

8.3

5 3

1.5

2

35

.01

5

9.0

8 3

1.4

9 8

2.9

3D

B1

3 1

46

39

74

6

24

47

2 0

.94

4

.77

3

7.5

3 4

0.7

2 2

6.4

3 2

6.4

3 2

7.2

1 5

9.4

9 3

9.8

5 4

4.4

3 7

1.7

0

79

.30

5

5.3

3 2

6.4

3 7

7.2

6D

B1

4 1

65

47

00

6

24

47

3 1

.13

4

.38

4

0.5

8 4

4.2

1 2

7.9

9 2

7.9

9 2

9.0

9 6

3.9

0 3

7.5

5 4

5.0

5 7

9.9

5

89

.58

5

6.2

5 2

7.9

9 7

6.5

5D

B1

5 6

33

75

6 2

02

46

2 1

.19

3

.53

4

2.0

5 4

5.6

8 3

0.1

9 3

0.1

9 3

1.5

3 7

0.9

0 4

0.2

2 4

6.8

6 9

2.0

7 1

05

.02

5

5.9

1 3

0.1

9 7

7.4

3D

B1

6 5

97

38

2 4

37

65

4 1

.53

1

14

.62

2

9.8

4 3

3.8

0 1

9.7

0 9

.85

1

8.9

5 1

9.5

4 3

1.2

3 1

5.7

8 2

2.6

2

24

.38

5

3.1

9 1

8.5

2 4

9.8

5D

B1

7 9

31

17

4 1

10

07

6 1

.63

4

.51

3

2.6

7 3

9.3

5 2

.37

2

.39

1

1.4

3 1

07

.16

3

2.7

2 5

4.9

1 1

72

.40

2

08

.23

4

3.0

1 2

.37

5

2.7

2S

UD

M 3

30

00

0 1

00

00

0 1

.87

2

.71

5

4.0

8 5

7.5

4 4

3.8

5 4

3.8

5 4

5.6

3 4

3.6

1 5

3.8

8 1

7.2

8 6

9.4

2

84

.07

6

3.4

0 4

3.8

5 8

3.1

2D

B1

8 9

31

17

4 2

9 3

.22

4

.29

4

.07

4

.07

1

0.5

2 3

0.1

8 4

.07

3

0.8

4 4

.21

3

57

.43

3

1.0

8

31

.78

4

.07

3

0.1

8 5

.34

DB

19

1

54

76

06

3

3 3

.33

1

.66

6

.88

6

.88

1

0.7

8 6

.48

6

.88

1

0.2

4 6

.40

2

2.3

6 3

2.4

3

36

.52

6

.88

6

.48

8

.95

DB

20

1

54

76

06

1

94

3

.35

2

.97

2

.56

2

.57

6

.32

1

.55

2

.57

3

.13

2

.21

3

.66

1

5.0

9

22

.79

2

.57

1

.55

3

.38

DB

21

1

54

69

1

31

3

.76

3

.79

1

1.3

3 1

2.0

2 2

2.9

9 6

.02

1

2.0

2 1

3.9

6 9

.93

4

.93

5

0.1

3

88

.30

1

2.0

2 6

.02

1

5.2

3D

B2

2 6

24

47

3 1

68

3

.90

3

.06

8

.56

8

.57

8

.05

4

.73

8

.57

3

.65

7

.89

4

.01

2

3.5

3

34

.21

8

.57

4

.73

1

0.7

1D

B2

3 1

54

76

06

2

1 6

.30

3

.26

6

.82

6

.82

2

6.6

6 6

.46

6

.82

1

1.0

0 6

.28

2

1.6

0 4

0.7

5

44

.12

6

.82

6

.46

7

.94

DB

24

1

54

76

06

4

9 6

.55

2

.70

1

6.0

2 1

6.0

3 4

6.8

1 9

.33

1

6.0

3 1

4.9

4 1

4.1

6 9

.87

6

9.1

8

91

.30

1

6.0

3 9

.33

1

8.4

9D

B2

5 1

46

39

74

5

35

32

8 7

.60

6

39

.24

3

2.8

0 3

5.4

4 1

73

.44

2

5.5

5 1

73

.42

7

8.5

6 3

2.2

1 6

0.2

3 9

2.6

1 1

02

.76

5

1.4

8 2

5.5

5 3

7.4

2D

B2

6 1

54

76

06

9

09

7

.99

7

.70

1

0.1

1 1

0.1

4 4

1.1

4 2

.41

1

0.1

4 6

.32

8

.71

7

.53

4

0.5

0

59

.71

1

0.1

4 2

.41

1

1.4

2D

B2

7 1

46

39

74

1

0 8

.12

2

.66

3

0.9

8 3

0.9

8 8

7.8

0 3

7.6

3 3

0.9

8 5

8.7

7 2

8.3

0 1

07

.65

1

22

.63

1

32

.99

3

0.9

8 3

7.6

3 3

4.7

9D

B2

8 9

31

17

4 7

3 1

2.9

6 6

.43

1

4.7

3 1

4.7

4 1

39

.37

2

3.0

1 1

4.7

4 3

6.2

4 1

0.9

7 4

7.5

8 9

0.8

0 1

19

.25

1

4.7

4 2

3.0

1 1

5.8

8D

B2

9 5

97

38

2 1

7 1

4.2

7 3

.73

4

1.7

7 4

1.7

7 1

47

.35

3

6.2

4 4

1.7

7 5

4.7

9 3

6.7

8 7

5.0

3 1

41

.58

1

52

.70

4

1.7

7 3

6.2

4 4

4.7

1D

B3

0 6

33

75

6 2

21

48

0 1

5.6

8 4

54

.61

2

4.1

7 2

6.7

1 1

86

.15

1

9.4

4 1

86

.15

9

5.0

8 2

6.4

3 7

7.1

5 1

07

.92

1

18

.24

4

6.2

2 1

9.4

4 2

6.0

8D

B3

1 6

33

75

6 2

13

1

6.1

6 7

.36

4

3.7

8 4

3.8

1 8

6.7

7 1

1.6

4 4

3.8

1 1

8.9

2 3

8.6

3 4

7.5

5 7

9.7

2

98

.61

4

3.8

1 1

1.6

4 4

6.5

7D

B3

2 1

73

80

5 7

2 1

6.9

8 7

.14

3

7.7

7 3

7.8

0 8

5.0

5 2

0.6

0 3

7.8

0 1

7.4

7 3

4.7

7 4

2.0

1 6

1.9

1

73

.07

3

7.8

0 2

0.6

0 4

0.1

3D

B3

3 9

31

17

4 3

98

1

9.7

0 7

.89

4

7.9

6 4

7.9

9 5

2.6

5 2

8.8

5 4

7.9

9 1

7.7

4 4

5.1

2 1

2.1

9 3

7.8

0

49

.30

4

7.9

9 2

8.8

5 5

0.3

7D

B3

4 1

13

60

0 6

15

5 2

4.1

7 5

4.6

6 4

9.8

4 5

4.1

9 2

02

.76

6

.95

6

.84

2

4.1

5 4

7.4

6 8

.71

1

00

.67

1

31

.30

5

4.3

0 6

.95

5

1.6

8D

B3

5 1

65

47

00

2

35

3

0.8

5 1

0.3

5 4

0.2

0 4

0.2

2 2

05

.67

1

0.1

3 4

0.2

2 1

5.2

4 3

5.6

9 3

6.6

7 6

8.1

1

89

.84

4

0.2

2 1

0.1

3 4

1.5

0D

B3

6 1

73

80

5 6

1 3

1.7

1 7

.04

5

8.9

5 5

8.9

7 1

60

.67

3

0.5

3 5

8.9

7 3

0.9

0 5

3.1

9 4

3.6

8 8

5.0

4

98

.70

5

8.9

7 3

0.5

3 6

0.8

5D

B3

7 1

34

15

44

3

7 3

3.0

3 5

.82

6

1.2

7 6

1.2

8 1

90

.26

4

8.1

2 6

1.2

8 5

2.9

2 5

4.0

3 1

00

.21

1

03

.19

1

13

.53

6

1.2

8 4

8.1

2 6

3.1

3D

B3

8 1

47

81

1 6

2 3

4.6

8 7

.22

5

8.0

9 5

8.1

1 1

73

.42

2

8.5

7 5

8.1

1 2

7.9

2 5

2.7

5 4

5.9

5 7

8.4

2

91

.89

5

8.1

1 2

8.5

7 5

9.8

0D

B3

9 1

46

39

74

2

33

3

7.7

5 1

1.0

6 4

0.1

3 4

0.1

4 2

67

.08

1

0.3

8 4

0.1

4 1

6.0

9 3

5.5

5 3

8.7

5 7

0.2

7

91

.92

4

0.1

4 1

0.3

8 4

1.2

4D

B4

0 6

24

47

3 1

40

47

8

1.6

3 6

9.0

0 5

1.2

7 5

3.0

9 6

65

.37

4

.12

5

1.8

5 1

2.3

2 4

7.1

6 4

.16

8

6.6

6 1

14

.84

5

3.0

9 4

.16

5

1.7

3Z

20

A 5

00

00

2

47

1

14

.38

1

4.6

0 6

5.6

8 6

5.9

3 5

49

.70

1

7.5

5 6

5.9

3 1

4.4

5 5

9.8

7 1

6.9

6 6

7.6

2

83

.97

6

5.9

3 1

6.9

6 6

6.3

2Z

15

5

00

00

7

72

1

66

.18

2

3.4

4 6

2.2

7 6

3.3

0 1

13

3.6

1 2

2.3

3 6

3.3

0 2

8.9

7 5

5.1

1 2

1.8

1 9

3.8

7 1

15

.81

6

3.3

0 2

1.8

1 6

2.7

9Z

20

B 5

00

00

1

03

84

2

34

.81

7

3.5

4 7

6.4

7 8

1.2

1 3

81

.51

2

7.3

0 2

81

.98

6

.27

7

3.1

4 3

.11

3

3.9

6

40

.81

8

1.3

0 3

.11

7

6.8

9

(0.0

<=

ga

mm

a2

< 1

.0)

A

vera

ge

: 3

.20

1

1.3

0 1

2.1

4 9

.05

1

3.2

6 9

.71

3

3.0

9 1

1.1

9 2

2.6

8 4

5.0

5

57

.95

2

0.8

1 9

.05

4

9.6

8

Ma

xim

um

: 1

7.6

1 3

9.8

0 4

2.3

2 3

1.7

3 1

20

.14

3

1.9

0 2

00

.79

4

4.8

3 1

31

.15

2

53

.68

3

18

.00

5

9.0

8 3

1.7

3

9

0.6

8

(1.0

<=

ga

mm

a2

< 5

0)

A

vera

ge

: 5

1.2

7 3

1.4

1 3

2.5

9 9

0.9

6 1

9.2

2 3

8.7

4 3

4.9

6 2

9.1

6 5

0.1

7 7

3.3

3

88

.30

3

5.8

0 1

9.5

5 3

8.3

4

Ma

xim

um

: 6

39

.24

6

1.2

7 6

1.2

8 2

67

.08

4

8.1

2 1

86

.15

1

07

.16

5

4.0

3 3

57

.43

1

72

.40

2

08

.23

6

3.4

0 4

8.1

2

8

3.1

2

(50

<

= g

am

ma

2 <

inf)

A

vera

ge

: 4

5.1

4 6

3.9

2 6

5.8

8 6

82

.55

1

7.8

2 1

15

.77

1

5.5

0 5

8.8

2 1

1.5

1 7

0.5

3

88

.86

6

5.9

1 1

1.5

1 6

4.4

3

Ma

xim

um

: 7

3.5

4 7

6.4

7 8

1.2

1 1

13

3.6

1 2

7.3

0 2

81

.98

2

8.9

7 7

3.1

4 2

1.8

1 9

3.8

7 1

15

.81

8

1.3

0 2

1.8

1

7

6.8

9

Ave

rag

e: 3

1.3

9 2

5.7

9 2

6.8

9 1

03

.38

1

6.7

1 3

2.9

4 3

2.7

1 2

4.1

8 3

6.0

9 6

1.6

9 7

6.0

9 3

2.0

6 1

4.6

9

4

4.9

3

Ma

xim

um

: 6

39

.24

7

6.4

7 8

1.2

1 1

13

3.6

1 1

20

.14

2

81

.98

2

00

.79

7

3.1

4 3

57

.43

2

53

.68

3

18

.00

8

1.3

0 4

8.1

2

9

0.6

8

39

Page 44: RJ - IBM...r to publication should b e limited to p eer communications and sp eci c requests. After outside publication, requests should b e lled only b y rep rints o r legally obtained

Pe

rform

an

ce M

ea

sure

: Bia

s (%)

Sa

mp

le size

: 20

.0%

Na

me

Psize

Ncla

ss ga

mm

a2

skew

Du

j1 D

sj1 D

uj2

Du

j2a

Dsj2

DS

h D

Sh

2 D

Sh

3 D

HT

j D

HT

sj Dh

j Hyb

rid g

m2

ha

tD

B0

1 1

54

69

15

46

9 0

.00

0.0

0 0

.00

0.0

0 0

.00

0.0

0 0

.00

0.0

0 -0

.00

-0.0

0 0

.00

0

.00

0.0

0 0

.00

0.0

0D

B0

2 1

28

89

28

12

88

92

8 0

.00

0.0

0 0

.00

0.0

0 0

.00

0.0

0 0

.00

0.0

0 0

.00

-0.0

0 0

.00

0

.00

0.0

0 0

.00

0.0

0D

B0

3 6

24

47

3 6

24

47

3 0

.00

0.0

0 0

.00

0.0

0 0

.00

0.0

0 0

.00

0.0

0 -0

.00

0.0

0 0

.00

0

.00

0.0

0 0

.00

0.0

0E

Q1

00

15

00

0 1

50

0.0

0 0

.00

0.0

0 0

.00

0.0

0 0

.00

0.0

0 0

.00

0.0

0 0

.00

0.0

0

0.0

0 0

.00

0.0

0 0

.16

EQ

10

15

00

0 1

50

0 0

.00

0.0

0 0

.12

0.1

2 0

.25

0.2

5 0

.25

43

.52

0.1

1 2

0.8

1 6

2.5

3 1

03

.25

-4.7

9 0

.25

0.4

8D

B0

4 5

97

38

2 5

91

56

4 0

.01

17

.61

-0.9

9 -1

.10

-0.6

5 -0

.65

-0.6

7 0

.44

0.4

4 0

.33

0.5

3

0.5

9 -3

.71

-0.6

5 -7

2.8

6G

OO

D 1

00

00

95

95

0.0

4 5

.64

-3.4

1 -3

.69

-2.6

1 -2

.61

-2.6

4 2

.01

2.0

1 1

.55

2.3

5

2.6

1 -1

1.7

3 -2

.61

-78

.61

DB

05

11

36

00

11

00

74

0.0

4 6

.87

-2.6

6 -2

.89

-1.9

8 -1

.98

-2.0

1 1

.53

1.5

3 1

.19

1.7

9

1.9

9 -9

.60

-1.9

8 -7

7.9

2D

B0

6 6

21

49

8 5

91

56

4 0

.05

4.7

0 -3

.68

-3.9

6 -2

.86

-2.8

6 -2

.89

2.5

1 2

.51

1.9

9 2

.90

3

.21

-12

.97

-2.8

6 -8

0.0

5D

B0

7 1

34

15

44

12

88

92

7 0

.05

9.9

5 -3

.37

-3.6

8 -2

.40

-2.4

0 -2

.44

1.9

3 1

.93

1.4

9 2

.26

2

.52

-11

.61

-2.4

0 -7

5.6

3N

GB

/4 8

21

35

87

4 0

.18

0.5

0 -0

.01

-0.0

1 0

.00

0.0

0 -0

.01

0.0

2 -0

.01

-0.0

0 0

.08

0

.31

-0.0

1 0

.00

-0.0

2D

B0

8 1

54

76

06

51

16

8 0

.23

0.2

4 -2

.43

-3.2

0 -1

.23

-1.2

3 -3

.06

3.1

9 -2

.21

-0.7

2 1

2.4

1

25

.39

-3.2

4 -1

.23

-13

.04

FR

AM

E2

33

75

0 1

90

00

0.3

1 1

.18

-19

.47

-20

.87

-15

.73

-15

.73

-16

.08

29

.19

-15

.36

21

.34

35

.30

4

0.6

7 -3

6.3

5 -1

5.7

3 -8

1.0

0N

GB

/2 4

11

97

90

6 0

.37

0.8

1 -1

.50

-1.8

6 -0

.17

-0.1

7 -1

.85

1.8

1 -1

.25

-0.2

7 7

.62

1

6.4

8 -1

.86

-0.1

7 -5

.60

DB

09

15

47

60

6 3

0.3

8 -0

.67

0.0

0 0

.00

0.0

0 0

.00

0.0

0 0

.00

0.0

0 0

.00

0.0

0

0.0

0 0

.00

0.0

0 0

.01

DB

10

14

78

11

11

00

76

0.4

7 7

.41

-22

.22

-24

.77

-13

.05

-13

.05

-13

.60

10

.71

-28

.38

6.6

4 1

4.1

9

16

.61

-38

.46

-13

.05

-69

.47

FR

AM

E3

11

15

00

36

00

0 0

.52

1.9

2 -2

2.7

0 -2

5.7

7 -1

4.1

5 -1

4.1

5 -1

5.6

8 4

5.8

6 -2

2.5

5 2

8.1

7 6

2.0

8

76

.25

-35

.01

-14

.15

-66

.64

DB

11

11

36

00

3 0

.70

0.0

8 -2

2.6

7 -2

2.6

7 -1

6.2

4 1

8.6

7 -2

2.6

7 1

8.6

7 -1

5.7

8 1

8.6

7 1

8.6

7

18

.73

-22

.67

-16

.24

-55

.18

NG

B/1

20

21

3 9

30

0.7

5 1

.25

-12

.06

-14

.31

-4.2

4 -4

.24

-13

.20

7.0

8 -1

0.9

5 -2

.94

30

.45

5

1.1

4 -1

4.4

0 -4

.24

-28

.33

DB

12

17

38

05

10

96

88

0.9

3 4

.84

-33

.01

-37

.27

-17

.83

-17

.83

-19

.32

11

.20

-27

.06

4.2

8 1

7.9

9

22

.02

-46

.60

-17

.83

-68

.64

DB

13

14

63

97

4 6

24

47

2 0

.94

4.7

7 -2

9.8

9 -3

4.5

2 -1

1.9

2 -1

1.9

2 -1

4.3

6 3

0.1

5 -2

6.7

4 1

7.8

3 4

2.4

5

51

.05

-42

.69

-11

.92

-61

.56

DB

14

16

54

70

0 6

24

47

3 1

.13

4.3

8 -3

1.8

3 -3

7.1

0 -1

1.1

2 -1

1.1

2 -1

4.6

4 3

0.5

1 -3

2.0

0 1

6.5

3 4

5.5

4

55

.53

-43

.56

-11

.12

-60

.13

DB

15

63

37

56

20

24

62

1.1

9 3

.53

-32

.79

-38

.24

-12

.91

-12

.91

-17

.34

30

.69

-33

.41

14

.40

49

.32

6

1.5

2 -4

3.3

4 -1

2.9

1 -6

0.4

1D

B1

6 5

97

38

2 4

37

65

4 1

.53

11

4.6

2 -2

3.6

4 -2

7.7

7 3

3.8

9 1

4.1

4 3

2.6

3 1

1.2

1 -2

8.0

5 7

.31

14

.80

1

7.2

0 -3

9.7

7 1

9.7

6 -3

9.4

0D

B1

7 9

31

17

4 1

10

07

6 1

.63

4.5

1 -1

9.2

3 -2

5.8

1 1

9.3

0 1

8.8

3 -1

.62

43

.39

-16

.77

22

.65

76

.21

10

1.6

0 -2

7.2

4 1

8.8

3 -3

1.0

9S

UD

M 3

30

00

0 1

00

00

0 1

.87

2.7

1 -4

3.8

8 -4

9.7

3 -2

3.6

4 -2

3.6

4 -3

0.6

2 1

0.3

8 -4

3.3

8 -4

.97

32

.07

4

2.6

4 -5

2.1

0 -2

3.6

4 -6

7.4

1D

B1

8 9

31

17

4 2

9 3

.22

4.2

9 -1

.62

-1.6

2 5

.06

4.0

8 -1

.62

6.1

0 -0

.33

5.2

5 6

.98

8

.11

-1.6

2 4

.08

-2.1

5D

B1

9 1

54

76

06

33

3.3

3 1

.66

-4.8

8 -4

.88

0.3

9 -2

.43

-4.8

8 -1

.70

-4.3

5 -0

.83

1.4

3

2.9

3 -4

.88

-2.4

3 -6

.35

DB

20

15

47

60

6 1

94

3.3

5 2

.97

-1.2

4 -1

.24

2.7

7 -0

.29

-1.2

4 0

.08

-1.0

2 -0

.35

2.5

9

4.5

1 -1

.24

-0.2

9 -1

.63

DB

21

15

46

9 1

31

3.7

6 3

.79

-6.8

6 -7

.01

7.6

1 -2

.93

-7.0

1 -2

.64

-6.2

8 -4

.84

5.2

5

12

.96

-7.0

1 -2

.93

-8.8

0D

B2

2 6

24

47

3 1

68

3.9

0 3

.06

-5.7

1 -5

.71

5.2

6 -3

.40

-5.7

1 -1

.84

-5.0

7 -2

.72

4.6

7

8.3

0 -5

.71

-3.4

0 -7

.15

DB

23

15

47

60

6 2

1 6

.30

3.2

6 -3

.81

-3.8

1 1

0.9

3 -0

.90

-3.8

1 1

.32

-2.9

5 3

.42

5.6

4

7.3

1 -3

.81

-0.9

0 -4

.41

DB

24

15

47

60

6 4

9 6

.55

2.7

0 -9

.00

-9.0

0 3

6.4

6 -1

.28

-9.0

0 3

.30

-6.9

5 2

.22

17

.52

2

6.6

4 -9

.00

-1.2

8 -1

0.3

9D

B2

5 1

46

39

74

53

53

28

7.6

0 6

39

.24

-26

.76

-30

.41

17

3.4

7 -1

5.8

0 1

73

.47

37

.85

-28

.07

22

.51

52

.17

6

3.8

1 -3

9.1

5 -1

5.8

0 -3

0.2

3D

B2

6 1

54

76

06

90

9 7

.99

7.7

0 -5

.74

-5.7

6 2

9.0

1 -0

.44

-5.7

6 1

.10

-4.6

1 0

.25

9.2

0

16

.02

-5.7

6 -0

.44

-6.4

6D

B2

7 1

46

39

74

10

8.1

2 2

.66

-19

.10

-19

.10

79

.01

13

.27

-19

.10

22

.98

-12

.09

41

.20

39

.11

4

7.4

1 -1

9.1

0 1

3.2

7 -2

1.4

5D

B2

8 9

31

17

4 7

3 1

2.9

6 6

.43

-5.9

7 -5

.97

95

.24

11

.00

-5.9

7 8

.76

-3.5

2 5

.09

20

.55

3

3.2

1 -5

.97

11

.00

-6.4

7D

B2

9 5

97

38

2 1

7 1

4.2

7 3

.73

-27

.94

-27

.94

16

1.3

4 1

3.5

1 -2

7.9

4 2

6.4

3 -1

8.8

8 4

9.3

4 4

2.5

1

56

.43

-27

.94

13

.51

-29

.91

DB

30

63

37

56

22

14

80

15

.68

45

4.6

1 -1

9.9

1 -2

2.5

6 1

86

.15

-13

.89

18

6.1

5 4

9.2

0 -2

2.4

8 3

2.8

9 6

2.8

9

75

.88

-33

.67

-13

.89

-20

.84

DB

31

63

37

56

21

3 1

6.1

6 7

.36

-31

.91

-31

.93

15

1.1

3 2

2.7

6 -3

1.9

3 1

0.5

2 -2

4.8

5 1

1.6

5 3

4.4

6

46

.03

-31

.93

22

.76

-33

.94

DB

32

17

38

05

72

16

.98

7.1

4 -2

9.0

5 -2

9.0

7 1

31

.47

2.5

7 -2

9.0

7 4

.27

-23

.51

10

.83

25

.36

3

2.2

0 -2

9.0

7 2

.57

-30

.83

DB

33

93

11

74

39

8 1

9.7

0 7

.89

-39

.45

-39

.46

11

0.9

5 -8

.45

-39

.46

-6.7

8 -3

4.0

2 0

.75

13

.27

1

9.4

7 -3

9.4

6 -8

.45

-41

.43

DB

34

11

36

00

61

55

24

.17

54

.66

-36

.42

-39

.54

29

0.8

7 1

2.0

8 -2

9.9

7 7

.53

-31

.71

0.0

4 3

8.2

9

55

.11

-39

.55

12

.08

-37

.78

DB

35

16

54

70

0 2

35

30

.85

10

.35

-29

.91

-29

.91

27

6.8

2 9

.09

-29

.91

1.5

1 -2

4.6

8 -1

.89

24

.78

3

5.6

8 -2

9.9

1 9

.09

-30

.87

DB

36

17

38

05

61

31

.71

7.0

4 -4

6.5

4 -4

6.5

6 2

69

.05

16

.10

-46

.56

4.9

6 -3

7.9

7 8

.34

28

.48

3

8.9

4 -4

6.5

6 1

6.1

0 -4

8.0

3D

B3

7 1

34

15

44

37

33

.03

5.8

2 -4

6.1

9 -4

6.1

9 3

40

.60

58

.20

-46

.19

19

.23

-35

.29

22

.98

41

.71

5

2.8

1 -4

6.1

9 5

8.2

0 -4

7.5

9D

B3

8 1

47

81

1 6

2 3

4.6

8 7

.22

-45

.45

-45

.47

30

3.2

9 1

5.6

8 -4

5.4

7 4

.48

-37

.14

8.0

9 2

8.6

2

38

.81

-45

.47

15

.68

-46

.77

DB

39

14

63

97

4 2

33

37

.75

11

.06

-29

.50

-29

.51

36

2.1

2 1

2.0

5 -2

9.5

1 3

.50

-24

.01

0.3

6 2

7.3

3

38

.32

-29

.51

12

.05

-30

.31

DB

40

62

44

73

14

04

7 8

1.6

3 6

9.0

0 -3

8.0

7 -3

9.2

6 9

40

.41

6.7

0 -3

9.2

6 3

.22

-32

.18

-1.6

4 3

3.5

1

48

.07

-39

.26

-1.6

4 -3

8.4

2Z

20

A 5

00

00

24

7 1

14

.38

14

.60

-52

.58

-52

.79

10

26

.51

13

.17

-52

.79

0.6

2 -4

3.8

8 0

.17

27

.52

3

7.7

4 -5

2.7

9 0

.17

-53

.08

Z1

5 5

00

00

77

2 1

66

.18

23

.44

-46

.73

-47

.58

20

03

.12

37

.38

-47

.58

13

.90

-37

.33

8.2

3 4

2.5

9

56

.84

-47

.58

8.2

3 -4

7.0

9Z

20

B 5

00

00

10

38

4 2

34

.81

73

.54

-62

.96

-69

.06

38

1.5

1 8

3.1

7 3

81

.51

2.3

7 -5

6.7

1 0

.11

17

.26

2

2.3

6 -6

9.1

2 0

.11

-63

.36

(0.0

<=

ga

mm

a2

< 1

.0) A

vera

ge

: 3.2

0 -8

.57

-9.5

5 -4

.99

-3.3

3 -6

.20

9.9

9 -6

.75

5.7

3 1

4.9

3

20

.61

-14

.08

-4.9

9 -3

9.7

1

Ma

ximu

m: 1

7.6

1 -3

3.0

1 -3

7.2

7 -1

7.8

3 1

8.6

7 -2

2.6

7 4

5.8

6 -2

8.3

8 2

8.1

7 6

2.5

3 1

03

.25

-46

.60

-17

.83

-8

1.0

0

(1.0

<=

ga

mm

a2

< 5

0) A

vera

ge

: 51

.27

-23

.12

-24

.49

11

2.3

9 4

.66

-3.4

1 1

2.0

9 -2

0.1

3 1

0.0

2 2

7.8

1

37

.01

-26

.24

4.8

7 -2

8.2

3

Ma

ximu

m: 6

39

.24

-46

.54

-49

.73

36

2.1

2 5

8.2

0 1

86

.15

49

.20

-43

.38

49

.34

76

.21

10

1.6

0 -5

2.1

0 5

8.2

0

-67

.41

(50

<=

ga

mm

a2

< in

f) Ave

rag

e: 4

5.1

4 -5

0.0

9 -5

2.1

7 1

08

7.8

9 3

5.1

1 6

0.4

7 5

.03

-42

.53

1.7

2 3

0.2

2

41

.25

-52

.19

1.7

2 -5

0.4

9

Ma

ximu

m: 7

3.5

4 -6

2.9

6 -6

9.0

6 2

00

3.1

2 8

3.1

7 3

81

.51

13

.90

-56

.71

8.2

3 4

2.5

9 5

6.8

4 -6

9.1

2 8

.23

-6

3.3

6

Ave

rag

e: 3

1.3

9 -1

9.3

2 -2

0.5

9 1

40

.02

3.7

8 0

.38

10

.70

-16

.45

7.6

5 2

2.7

9 3

0.7

2 -2

3.3

3 0

.65

-3

4.5

8

Ma

ximu

m: 6

39

.24

-62

.96

-69

.06

20

03

.12

83

.17

38

1.5

1 4

9.2

0 -5

6.7

1 4

9.3

4 7

6.2

1 1

03

.25

-69

.12

58

.20

-8

1.0

0

40

Page 45: RJ - IBM...r to publication should b e limited to p eer communications and sp eci c requests. After outside publication, requests should b e lled only b y rep rints o r legally obtained

Pe

rfo

rma

nce

Me

asu

re: R

MS

Err

or

(%)

Sa

mp

le s

ize

: 2

0.0

%

Na

me

P

size

N

cla

ss g

am

ma

2 s

kew

D

uj1

D

sj1

D

uj2

D

uj2

a D

sj2

D

Sh

D

Sh

2 D

Sh

3 D

HT

j

DH

Tsj

D

hj

Hyb

rid

g

m2

ha

tD

B0

1 1

54

69

1

54

69

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0.0

0 0

.00

0

.00

0

.00

DB

02

1

28

89

28

1

28

89

28

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0.0

0 0

.00

0

.00

0

.00

DB

03

6

24

47

3 6

24

47

3 0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0.0

0 0

.00

0

.00

0

.00

EQ

10

0 1

50

00

1

50

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0.0

0 0

.00

0

.00

0

.29

EQ

10

1

50

00

1

50

0 0

.00

0

.00

1

.02

0

.98

1

.13

1

.13

1

.04

4

3.6

0 0

.99

2

0.9

0 6

2.6

6 1

03

.33

4

.87

1

.13

0

.94

DB

04

5

97

38

2 5

91

56

4 0

.01

1

7.6

1 0

.99

1

.10

0

.66

0

.66

0

.68

0

.44

0

.44

0

.33

0

.53

0.5

9 3

.72

0

.66

7

2.9

5G

OO

D 1

00

00

9

59

5 0

.04

5

.64

3

.86

4

.14

3

.09

3

.09

3

.12

2

.08

2

.08

1

.69

2

.39

2.6

4 1

2.1

6 3

.09

7

9.1

7D

B0

5 1

13

60

0 1

10

07

4 0

.04

6

.87

2

.69

2

.92

2

.01

2

.01

2

.04

1

.54

1

.54

1

.20

1

.79

1.9

9 9

.62

2

.01

7

8.0

0D

B0

6 6

21

49

8 5

91

56

4 0

.05

4

.70

3

.68

3

.97

2

.87

2

.87

2

.90

2

.51

2

.51

1

.99

2

.90

3.2

1 1

2.9

8 2

.87

8

0.0

5D

B0

7 1

34

15

44

1

28

89

27

0

.05

9

.95

3

.37

3

.68

2

.40

2

.40

2

.44

1

.93

1

.93

1

.49

2

.26

2.5

2 1

1.6

1 2

.40

7

5.6

5N

GB

/4 8

21

35

8

74

0

.18

0

.50

0

.04

0

.04

0

.04

0

.04

0

.04

0

.06

0

.04

0

.04

0

.20

0.4

7 0

.04

0

.04

3

.86

DB

08

1

54

76

06

5

11

68

0

.23

0

.24

2

.44

3

.20

1

.24

1

.24

3

.06

3

.19

2

.21

0

.73

1

2.4

1

25

.39

3

.25

1

.24

1

3.0

6F

RA

ME

2 3

37

50

1

90

00

0

.31

1

.18

1

9.5

2 2

0.9

2 1

5.8

1 1

5.8

1 1

6.1

5 2

9.2

2 1

5.3

8 2

1.3

9 3

5.3

2

40

.69

3

6.3

6 1

5.8

1 8

1.0

3N

GB

/2 4

11

97

9

06

0

.37

0

.81

1

.56

1

.90

0

.56

0

.56

1

.89

1

.97

1

.32

0

.60

7

.93

16

.68

1

.91

0

.56

6

.57

DB

09

1

54

76

06

3

0

.38

-0

.67

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0

.00

0.0

0 0

.00

0

.00

0

.25

DB

10

1

47

81

1 1

10

07

6 0

.47

7

.41

2

2.2

3 2

4.7

7 1

3.0

8 1

3.0

8 1

3.6

3 1

0.7

2 2

8.3

8 6

.65

1

4.2

0

16

.61

3

8.4

7 1

3.0

8 6

9.5

2F

RA

ME

3 1

11

50

0 3

60

00

0

.52

1

.92

2

2.7

1 2

5.7

8 1

4.1

8 1

4.1

8 1

5.7

0 4

5.8

8 2

2.5

6 2

8.2

0 6

2.1

0

76

.26

3

5.0

1 1

4.1

8 6

6.6

6D

B1

1 1

13

60

0 3

0

.70

0

.08

2

7.4

9 2

7.4

9 2

9.8

2 7

9.1

6 2

7.4

9 7

9.1

6 3

0.1

4 7

9.1

6 7

9.1

6

79

.16

2

7.4

9 2

9.8

2 6

6.9

0N

GB

/1 2

02

13

9

30

0

.75

1

.25

1

2.1

0 1

4.3

4 4

.55

4

.55

1

3.2

4 7

.46

1

1.0

1 3

.43

3

0.7

5

51

.34

1

4.4

3 4

.55

2

8.6

3D

B1

2 1

73

80

5 1

09

68

8 0

.93

4

.84

3

3.0

1 3

7.2

8 1

7.8

4 1

7.8

4 1

9.3

4 1

1.2

1 2

7.0

6 4

.31

1

7.9

9

22

.02

4

6.6

1 1

7.8

4 6

8.6

5D

B1

3 1

46

39

74

6

24

47

2 0

.94

4

.77

2

9.8

9 3

4.5

2 1

1.9

2 1

1.9

2 1

4.3

6 3

0.1

5 2

6.7

4 1

7.8

4 4

2.4

5

51

.05

4

2.6

9 1

1.9

2 6

1.5

6D

B1

4 1

65

47

00

6

24

47

3 1

.13

4

.38

3

1.8

3 3

7.1

1 1

1.1

2 1

1.1

2 1

4.6

4 3

0.5

1 3

2.0

0 1

6.5

3 4

5.5

4

55

.53

4

3.5

6 1

1.1

2 6

0.1

3D

B1

5 6

33

75

6 2

02

46

2 1

.19

3

.53

3

2.7

9 3

8.2

4 1

2.9

1 1

2.9

1 1

7.3

5 3

0.7

0 3

3.4

1 1

4.4

1 4

9.3

2

61

.52

4

3.3

4 1

2.9

1 6

0.4

1D

B1

6 5

97

38

2 4

37

65

4 1

.53

1

14

.62

2

3.6

5 2

7.7

7 3

4.0

1 1

4.3

2 3

2.7

9 1

1.2

1 2

8.0

5 7

.31

1

4.8

0

17

.21

3

9.7

7 2

1.2

4 3

9.5

9D

B1

7 9

31

17

4 1

10

07

6 1

.63

4

.51

1

9.2

3 2

5.8

2 1

9.3

1 1

8.8

4 1

.67

4

3.4

0 1

6.7

7 2

2.6

5 7

6.2

2 1

01

.60

2

7.2

4 1

8.8

4 3

1.1

0S

UD

M 3

30

00

0 1

00

00

0 1

.87

2

.71

4

3.8

8 4

9.7

3 2

3.6

5 2

3.6

5 3

0.6

2 1

0.4

0 4

3.3

8 5

.01

3

2.0

8

42

.64

5

2.1

0 2

3.6

5 6

7.4

2D

B1

8 9

31

17

4 2

9 3

.22

4

.29

2

.65

2

.65

9

.22

1

0.2

8 2

.65

1

1.3

2 2

.92

1

0.7

6 1

2.0

7

12

.97

2

.65

1

0.2

8 3

.48

DB

19

1

54

76

06

3

3 3

.33

1

.66

5

.24

5

.24

8

.16

5

.63

5

.24

6

.12

5

.01

7

.79

9

.77

11

.04

5

.24

5

.63

6

.81

DB

20

1

54

76

06

1

94

3

.35

2

.97

1

.34

1

.34

3

.61

0

.94

1

.34

1

.11

1

.18

1

.03

3

.78

5.4

0 1

.34

0

.94

1

.77

DB

21

1

54

69

1

31

3

.76

3

.79

6

.97

7

.11

9

.71

3

.84

7

.11

3

.68

6

.43

5

.20

8

.38

14

.80

7

.11

3

.84

9

.47

DB

22

6

24

47

3 1

68

3

.90

3

.06

5

.81

5

.81

7

.05

3

.86

5

.81

3

.12

5

.22

3

.66

7

.02

9.9

2 5

.81

3

.86

7

.28

DB

23

1

54

76

06

2

1 6

.30

3

.26

4

.67

4

.67

2

2.8

7 6

.18

4

.67

8

.97

4

.60

1

3.9

1 1

5.1

5

16

.63

4

.67

6

.18

5

.42

DB

24

1

54

76

06

4

9 6

.55

2

.70

9

.61

9

.61

4

1.5

1 6

.18

9

.61

8

.49

8

.00

8

.41

2

2.3

3

30

.43

9

.61

6

.18

1

1.0

9D

B2

5 1

46

39

74

5

35

32

8 7

.60

6

39

.24

2

6.7

6 3

0.4

1 1

73

.47

1

5.8

0 1

73

.47

3

7.8

5 2

8.0

7 2

2.5

1 5

2.1

7

63

.81

3

9.1

5 1

5.8

0 3

0.4

3D

B2

6 1

54

76

06

9

09

7

.99

7

.70

5

.77

5

.78

2

9.3

6 1

.23

5

.78

1

.75

4

.66

1

.41

9

.58

16

.26

5

.78

1

.23

6

.50

DB

27

1

46

39

74

1

0 8

.12

2

.66

2

1.2

4 2

1.2

4 1

05

.63

3

5.5

7 2

1.2

4 4

3.3

1 1

7.8

0 7

4.9

9 5

8.5

7

65

.09

2

1.2

4 3

5.5

7 2

3.8

5D

B2

8 9

31

17

4 7

3 1

2.9

6 6

.43

6

.43

6

.44

1

02

.01

1

3.8

4 6

.44

1

1.3

9 4

.59

8

.33

2

4.3

2

35

.85

6

.44

1

3.8

4 6

.97

DB

29

5

97

38

2 1

7 1

4.2

7 3

.73

2

9.2

9 2

9.2

9 1

87

.15

3

3.6

1 2

9.2

9 4

1.6

8 2

2.1

3 7

1.9

0 5

6.6

8

67

.79

2

9.2

9 3

3.6

1 3

1.3

5D

B3

0 6

33

75

6 2

21

48

0 1

5.6

8 4

54

.61

1

9.9

1 2

2.5

7 1

86

.15

1

3.9

0 1

86

.15

4

9.2

0 2

2.4

8 3

2.8

9 6

2.8

9

75

.88

3

3.6

7 1

3.9

0 2

1.3

9D

B3

1 6

33

75

6 2

13

1

6.1

6 7

.36

3

2.0

1 3

2.0

2 1

54

.43

2

6.2

6 3

2.0

2 1

3.9

8 2

5.0

9 1

5.9

4 3

6.4

8

47

.54

3

2.0

2 2

6.2

6 3

4.0

5D

B3

2 1

73

80

5 7

2 1

6.9

8 7

.14

2

9.3

4 2

9.3

6 1

40

.80

1

6.1

1 2

9.3

6 1

4.6

6 2

4.1

8 2

1.0

7 3

0.9

2

36

.96

2

9.3

6 1

6.1

1 3

1.1

3D

B3

3 9

31

17

4 3

98

1

9.7

0 7

.89

3

9.4

9 3

9.5

0 1

13

.29

1

1.0

7 3

9.5

0 9

.45

3

4.1

1 8

.98

1

5.6

2

21

.15

3

9.5

0 1

1.0

7 4

1.4

7D

B3

4 1

13

60

0 6

15

5 2

4.1

7 5

4.6

6 3

6.4

2 3

9.5

4 2

91

.28

1

2.2

1 2

9.9

8 7

.68

3

1.7

1 1

.54

3

8.3

5

55

.15

3

9.5

5 1

2.2

1 3

7.8

6D

B3

5 1

65

47

00

2

35

3

0.8

5 1

0.3

5 2

9.9

8 2

9.9

9 2

81

.68

1

3.3

3 2

9.9

9 7

.60

2

4.8

4 8

.18

2

7.0

7

37

.22

2

9.9

9 1

3.3

3 3

0.9

5D

B3

6 1

73

80

5 6

1 3

1.7

1 7

.04

4

6.7

7 4

6.7

9 2

89

.60

3

4.6

6 4

6.7

9 2

1.3

3 3

8.6

2 2

6.7

7 3

8.1

5

46

.20

4

6.7

9 3

4.6

6 4

8.2

6D

B3

7 1

34

15

44

3

7 3

3.0

3 5

.82

4

6.6

3 4

6.6

3 3

69

.77

7

6.5

7 4

6.6

3 3

2.2

7 3

6.5

0 3

8.6

5 5

0.7

5

60

.33

4

6.6

3 7

6.5

7 4

8.0

5D

B3

8 1

47

81

1 6

2 3

4.6

8 7

.22

4

5.7

0 4

5.7

2 3

25

.72

3

3.2

6 4

5.7

2 2

0.4

9 3

7.8

1 2

5.8

0 3

8.1

9

45

.90

4

5.7

2 3

3.2

6 4

7.0

4D

B3

9 1

46

39

74

2

33

3

7.7

5 1

1.0

6 2

9.5

8 2

9.5

9 3

67

.45

1

5.6

2 2

9.5

9 8

.17

2

4.1

8 7

.96

2

9.2

7

39

.69

2

9.5

9 1

5.6

2 3

0.3

9D

B4

0 6

24

47

3 1

40

47

8

1.6

3 6

9.0

0 3

8.0

7 3

9.2

6 9

40

.73

6

.83

3

9.2

6 3

.41

3

2.1

9 2

.03

3

3.5

5

48

.10

3

9.2

6 2

.03

3

8.4

3Z

20

A 5

00

00

2

47

1

14

.38

1

4.6

0 5

2.6

3 5

2.8

4 1

03

9.4

3 1

9.9

8 5

2.8

4 9

.37

4

4.0

2 1

0.6

2 2

9.8

1

39

.42

5

2.8

4 1

0.6

2 5

3.1

4Z

15

5

00

00

7

72

1

66

.18

2

3.4

4 4

6.7

6 4

7.6

0 2

01

0.6

1 3

8.6

8 4

7.6

0 1

5.1

2 3

7.3

9 1

0.4

7 4

3.2

3

57

.28

4

7.6

0 1

0.4

7 4

7.1

1Z

20

B 5

00

00

1

03

84

2

34

.81

7

3.5

4 6

2.9

6 6

9.0

6 3

81

.51

8

3.6

9 3

81

.51

3

.03

5

6.7

2 2

.15

1

7.3

6

22

.44

6

9.1

2 2

.15

6

3.3

7

(0.0

<=

ga

mm

a2

< 1

.0)

A

vera

ge

: 3

.20

8

.89

9

.86

5

.77

8

.12

6

.53

1

2.9

1 8

.30

9

.05

1

7.8

6

23

.52

1

4.3

4 5

.77

4

0.6

5

Ma

xim

um

: 1

7.6

1 3

3.0

1 3

7.2

8 2

9.8

2 7

9.1

6 2

7.4

9 7

9.1

6 3

0.1

4 7

9.1

6 7

9.1

6 1

03

.33

4

6.6

1 2

9.8

2

8

1.0

3

(1.0

<=

ga

mm

a2

< 5

0)

A

vera

ge

: 5

1.2

7 2

3.4

4 2

4.8

1 1

23

.00

1

7.4

4 3

2.7

9 1

8.1

4 2

0.8

8 1

7.9

1 3

2.0

5

40

.54

2

6.5

6 1

7.6

9 2

8.6

5

Ma

xim

um

: 6

39

.24

4

6.7

7 4

9.7

3 3

69

.77

7

6.5

7 1

86

.15

4

9.2

0 4

3.3

8 7

4.9

9 7

6.2

2 1

01

.60

5

2.1

0 7

6.5

7

6

7.4

2

(50

<

= g

am

ma

2 <

inf)

A

vera

ge

: 4

5.1

4 5

0.1

0 5

2.1

9 1

09

3.0

7 3

7.3

0 1

30

.30

7

.73

4

2.5

8 6

.32

3

0.9

9

41

.81

5

2.2

0 6

.32

5

0.5

1

Ma

xim

um

: 7

3.5

4 6

2.9

6 6

9.0

6 2

01

0.6

1 8

3.6

9 3

81

.51

1

5.1

2 5

6.7

2 1

0.6

2 4

3.2

3 5

7.2

8 6

9.1

2 1

0.6

2

6

3.3

7

Ave

rag

e: 3

1.3

9 1

9.6

2 2

0.8

8 1

50

.28

1

5.2

0 2

9.6

9 1

5.2

3 1

7.4

7 1

3.4

4 2

6.2

4 3

3.7

6 2

3.6

0 1

2.0

0

3

5.1

8

Ma

xim

um

: 6

39

.24

6

2.9

6 6

9.0

6 2

01

0.6

1 8

3.6

9 3

81

.51

7

9.1

6 5

6.7

2 7

9.1

6 7

9.1

6 1

03

.33

6

9.1

2 7

6.5

7

8

1.0

3

41