61
7AD-A086 826 NAVAL OCEAN SYSTEMS CENTER SAN DIEGO CA F/S 1211O TWO-S-COMPLEMENT FIXED-POINT MULTIPLICATION ERRORS - THEORY.(U) APR 80 L P MULCAHY U1CLASSIFIED NOSC T 538 NL i *mmmuuuuuhmu iEggEEgggEEEEE EEggglggEEggEI lllllllllllEEI flllllN

U1CLASSIFIED i *mmmuuuuuhmu TWO-S-COMPLEMENT ...I-z 00 0 z Technical Report 538TWO'S-COMPLEMENT FIXED-POINT SMULTIPLICATION ERRORS -THEORY00 LP Mulcahy1 April 1980 Final Report: May

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

  • 7AD-A086 826 NAVAL OCEAN SYSTEMS CENTER SAN DIEGO CA F/S 1211OTWO-S-COMPLEMENT FIXED-POINT MULTIPLICATION ERRORS - THEORY.(U)APR 80 L P MULCAHY

    U1CLASSIFIED NOSC T 538 NLi *mmmuuuuuhmuiEggEEgggEEEEEEEggglggEEggEIlllllllllllEEIflllllN

  • I-z00

    0z Technical Report 538

    TWO'S-COMPLEMENT FIXED-POINTSMULTIPLICATION ERRORS - THEORY

    00 LP Mulcahy1 April 1980

    Final Report: May 1975 - September 1979

    Approved for public relese; distribution unlimited.

    CL

    Li NAVAL OCEAN SYSTEMS CENTER_j SAN DIEGO, CALIFORNIA 92152

    20715 903- - -r

  • NAVAL OCEAN SYSTEMS CENTER, SAN DIEGO, CA 92152

    AN ACTIVITY OF THE NAVAL MATERIAL COMMAND

    SL GUILLE, CAPT, USN HL BLOODCommander Technical Director

    ADMINISTRATIVE INFORMATION

    This work was an unfunded outgrowth of work supported by Independ-ent Exploratory Development funds at NOSC. Work reported herein wasperformed from May 1975 through September 1979.

    Reviewed by Under authority ofR. W. Larsen, Head E. B. Tunstall, HeadSystems Validation and Support Ocean Surveillance Systems

    Division Department

    ACKNOWLEDGMENT

    The author wishes to thank Messrs. J. W. Bond and J. M. Speiser of theNaval Ocean Systems Center for their helpful comments, suggestions, and re-views of technical accuracy.

    V

    I,

  • UNCLASSIFIED%lECURITY CLASSIFICATION OF THIS PAGE (When Dote Eniered)

    REPORT DOCUMENTATION PAGE BEFORE COMPLETING FORM1. REPORT NUMBER 12. GOVT ACCESSION NO. 3. R PIENTCA

    NOSC Technical Report 538 (TR 538) _-_/_ _

    4. TITLE (and Subtitle) R

    jWO'S-COMPLEMENTIXED.POINT MULTIPLICATION Fina le May 975-RRORS - THEORY& . .. Oimer 1079

    7. AUTHOR(e) S. CONTRACT OR GRANT NUMUER(e)

    L. P. ulcahy

    9. PERFORMING ORGANIZATION NAME AND ADDRESS 10. PROGRAM ELEMENT. PROJECT, TASKAREA II WORK UNIT NUMBERSNaval Ocean Systems Center

    San Diego, CA 92152

    I1. CONTROLLING OFFICE NAME AND ADDRESS 12 a

    Is. NUUPLMETAYvO"AGES57

    14. MONITORING AQF t DORrSS(If different Iro Controlling Office) I. SECURITY CLASS. (oa tfis report)

    chppge-oUnclassified

    IS&. DECL ASSI FIC ATION/DOWNGADINGSC HE DUL E

    16. DISTRIBUTION STATEMENT (of this Report)

    Approved f rpblcr,, :;*...

    17. DISTRIBUTION STATEMENT (of the albstraict eteolr in Bloclk 20, it different Imo Repot)

    III. SUPPLEMENTARY NOTES

    19i. KEY WORDS (Contilnue an reverse sidle |! nocoscm' an Iietifyl by block nU@164?l)

    digital filters two's-complementroundoff multiplication errorschopping fixed-point

    20. A4TRACT (Continue an reverte side If nooay and idmentify by block mmbe)

    Analytic forms of a variety of two's-complement multiplication error statistics are derived. An FIR filterstructure is used to define the individual error statistics which are used in the calculation of the filter output varianceand spectrum. The approach used in deriving the statistics Is to show that a regularity exists in the structure of theerrors as the multiplier input is stepped through a series of consecutive values. The error structure is a function of co-efficient value, number representation, and word lengths. This regular structure allows the use of the Poisson Summa-tion Formula in deriving error statistics based on multiplier inputs obtained from quantized random variables. Theerror statistics are categorized according to families of coefficient values denoted by the parameter a,. This parame-ter is the effective word length of the error. Specific forms of the statistics are given for Gaussian quantizer inputs.

    D o °O"s 1473 EOIFION OF N s IS oCI.ITE UNCLASSIFIED /AN 7 ~S/N 0102-L F-014-6601 UCASFE

    MSCURITY CLASSIFICATION OF TNIS PAll .II . / -ell,/

  • UNCLASSIFIEDMeCUmAt CLAMSPICAT14M Of THIS PAGE (U~e, Data Eatoted)

    UNCLASSIFIED

    SECURITY CLASSIFICATION OP THIS PAGSS(USUI Date Enteroo

  • SUMMARY

    PROBLEM

    Extend the theory of computation error generation in digital filters. Specifically,consider two's-complement fixed-point multiplication for digitized Gaussian signal inputs.

    RESULTS

    Deterministic properties of round-off and chopping were examined for two's-complement fixed-point multiplication errors, It was shown that a regularity exists in thestructure of the errors as the multiplier input is stepped through a series of consecutivevalues. The error structure is a function of coefficient value, number representation, andword lengths. The error properties are categorized according to families of coefficient valuesdenoted by the parameter v. This parameter is the effective word length of the error. Thisregular structure allowed the use of the Poisson Summation Formula in deriving error statis-tics based on multiplier inputs obtained from quantized random variables.

    A finite impulse response filter structure was used to define the individual errorstatistics which are used in the calculation of the filter output variance and spectrum. Theerror statistics which were derived are:

    (1) error probabilities (univariate and bivariate),

    (2) mean error,

    (3) second moment of the error,

    (4) the cross-correlation between a multiplier input and the error for the same or adifferent multiplier,

    (5) the autocorrelation of the error for one multiplier, and

    (6) the cross-correlation between the errors for two multipliers.

    The analytical form of each statistic was shown to consist of two parts, an asymptotic

    part and a decreasing part. The asymptotic part, which is dependent on v, is well behavedand finite for finite filter input mean. The asymptotic part was shown to be independent ofthe quantizer input probability density function shape. The only asymptotic part which de-pends on quantizer input statistics appears in the cross-correlation between a multiplier inputand the error for the same or a different multiplier. The quantizer input mean appears there.The decreasing part is in the form of a summation which varies as a function of the inputstandard deviation. It has the property that it decreases to zero in the limit as the inputstandard deviation is increased. Specific forms of the decreasing part of the statistics weregiven for Gaussian quantizer input probability density functions.

    RECOMMENDATIONS

    Examine the error statistics presented in this report relative to assumptions and engi-neering guidelines used presently. Show where present guidelines are adequate and wheremodifications are needed.

    I -* '.

  • CONTENTS

    I INTRODUCTION . .page 5

    II THE FILTER MODEL ... 6

    III DETERMINISTIC ERROR PROPERTIES .. . 11

    IV MULTIPLIER INPUT STATISTICS ... 17

    V ERROR STATISTICS .. .24

    VI GAUSSIAN FORMS ... 38

    VII DISCUSSION... 42

    VIII REFERENCES... 49

    APPENDIX A... 51

    GLOSSARY ... 59

    /Ju, st" 0r

    44

    3

  • I. INTRODUCTION

    The usual approach to the statistical analysis of the effects of two's-complement(TC) roundoff errors in digital filters makes use of the following argument. Since multipli-cation errors are similar to analog signal quantization errors, the same statistical results areassumed to hold. The multiplication error can be represented as a white noise which is zero-mean, which is uniformly distributed over the magnitude of the least significant bit (l.s.b.)after rounding, and which has zero cross-correlation with the multiplier input.

    Chopping errors are also assumed uniformly distributed, but they have a non-zeromean for the TC number representation [ I ]-[ 21.

    Recent results (31 presented a more accurate model of the error generation processthat is valid for "large" standard deviation of the multiplier input. Error statistics werederived showing their dependence on number representation, coefficient value, chopping orrounding scheme, and certain statistics of the multiplier input. Statistics for other thanlarge standard deviation were not obtained.

    This report derives analytic expressions for pertinent statistics of TC multiplicationerrors for the case of arbitrary standard deviation of the multiplier input. It starts first byexamining the form of the finite impulse response (FIR) filter and determining which statis-tics of the error are important. Next, the deterministic TC error properties are derived. Theanalog signal quantization process is discussed and certain useful formulas are presentedthrough use of the Poisson Summation Formula (hereafter abbreviated as PSF)[41. Thedesired error statistics are then derived, also through use of the PSF. Specific forms of thestatistics are presented for the case of a Gaussian quantizer input.

    5

  • 1I. THE FILTER MODEL

    This section is presented as motivation for the choice of error statistics which aresubsequently derived in this paper.

    A block diagram representation of an FIR filter is shown in Fig. I in the form of atapped delay line. The number of taps employed is equal to U and the time interval betweenconsecutive data samples is equal to T. The taps are weighted by the filter coefficient values{ah; h = 0, 1 .... U-I }. These are the quantized values which have been determined through

    some design procedure. The filter input data are the stationary random sequence values{ x(iT); i = 0,±1,-2± .... }. This sequence is obtained by sampling and quantization (withroundoff) of the analog waveform i(t). The filter incorporates the multiplication error asthe error sequence {eh([i-hI T); i = 0,±] ,±2, . . . }. The index h = 0, 1, . . . , U-I identifies

    the tap the error sequence is associated with. As shown in [3 1, the error sequence values aredeterministically related to the corresponding multiplier input values. Hence, the time indexfor the error sequence is written as shown to avoid confusion later on when evaluating theerror statistics associated with particular multiplier inputs. The filter output data are thesequence values {y(iT); i = 0, ±1, ±2, ... }. The number values associated with x, a, and yare assumed exactly representable by binary words of lengths K, L and M bits, respectively.A leading binary point is also assumed. These lengths are exclusive of sign bits. The input/output relation for this filter is

    y(iT) = b(iT) + w(iT) (I)

    where

    U-I

    b(iT)= ahx([i-h]T) (2)

    h=0

    U-I

    w(iT) I eh([i-hIT) . (3)h=0

    The analysis presented in this paper does not depend only on the use of samplingand quantization of an analog waveform. The filter input can come from the output ofanother digital filter for example. It turns out that the statistical results can be written interms of the filter input alone or in terms of the quantizer input. The latter is more inter-esting in that specific forms for the Gaussian case can then be applied to the analysis.

    From [3], multiplier overflow occurs only for certain coefficient values and for asmall number of values of x, if at all, at one or the other end point of the range of x (i.e., ator close to plus or minus 1.0, depending on the value of the coefficient). (Values of x and afor which overflow occurs are defined in this paper.) The sums represented by b and w canalso result in filter overflow. In the analysis it will be assumed that the probability of occur-rence of either kind of overflow is so small as to be negligible. Practically, this can be accom-plished by reducing the variance of the input sequence or by increasing the input and outputword lengths K and M.

    6

    4I.

    7 1

  • Iht sequence {b(iT)} in (2) is the desired filter output. The sequence {w(iT)} repre-sents additive noise which is deterministically related to the input sequence. However, theusual statistical descriptions can be employed. These are the sequence mean, the variance,and the power spectral density.

    The mean of y(iT) is given by /y = E[y(iT)] where E denotes expected value. It iscomputed from (1) as

    U-I U-1

    Ay=x ah+ A (4)

    h=0 h=0

    The variance of y(iT) is given by a where

    2 = E[y 21 2 (5)

    =a2

    b+ A . (6)

    The desired variance is given by 2 where

    U-1 U-I

    02= agahCx([g-h]T)] (7)

    g=0 h=0

    The term A0 represents the change imposed on the desired variance by the multiplication

    errors. It can take on negative values and is given by the following expression:

    U-I U-IAo=E E [2agCxgh(g-hlIT) + C g([g-hlT)l (8)

    g=O h=O

    The autocovariance Cx(.) and cross-covariances Cxgeh() and CegEh(.) are defined in terms

    of their corresponding auto- and cross-correlations and mean values as follows:

    2Cx(.) = Rx(.) - Jx, (9)

    Cx (.) = Rxgeh(.) - tx;', (10)

    CegEh( ') = Rgeh(*) -l'Eg/eh (ll)

    7

  • The power spectral density of y(iT) is given by S y(w.) where

    00

    Y R y (iT) exp(-jcoi7T) (2

    and R y (tT) is the autocorrelation of y(iT). The autocorrelation in turn can be written as

    R y(T1T) = Rb(,qT) + AR(?1T) (13)

    where i? = 0, ± 1, ±2,...

    The desired autocorrelation is given by Rb(nT) where

    Rb(7T) a hOaahRx(1g + il -h1T) (4

    K The term A R(17T) represents the change in the desired autocorrelation and, hence, the spec-trum. It is given by the following expression:

    U-1 U-1

    g0O h0O

    Thus, the spectrum in turn can be written as

    S y(W) =Sb(w) + AS(w~) (16)

    where

    00

    Sb(w) I Rb(7?T) exp(-jw??T) (17)7=-00

    and

    00

    As(W) I A0(7T) exp(-jwi7T) (18)

    The time delay argument [(g+,q-h)T] comes about because the same data sequencex(iT) is used as input to all multipliers. Let xg9(iT) and xh(iT) be the inputs to multipliersg and h. However,

    8

  • Xg(iT) = x([i - g] T) (19)

    and

    xh(iT) = x{[i-hIT) . (20)

    Then, for example,

    Rxgxhl(7?T) = E[xg(iT)Xh([i+ rnT)]

    =Rx([g+rn-h]T) . (21)

    Since the multiplication errors and multiplier outputs are deterministically related to theinputs, it seems most natural to write the second order statistics, Rx9,Ch and R.geh, in terms

    of the time delays imposed on the input sequence x(iT). The approach to the derivation ofthe equations to be used for these statistics depends on whether p = 1.0 or lpI < 1.0 for thequantizer input sequences used.* When g = h in the time delay argument, p = 1.0 for 77 = 0and Jpl < 1.0 for 7 :A0. Ifg * h, then p = 1.0 for h-g = rland Ip1 < 1.0 for h-g 17. Notethat when g = h1 only one coefficient value is involved. Then

    RxgEh(*) = Rxgeg(77T) (22)

    and

    Regeh( o) = Reg(7T) (23)

    When g * h,

    Rx geh(* eh([g +i?- hi T) (24)

    and

    Regeh(O)= Regeh([g+7 - h]T) . (25)

    Computationally, (24) is no different than (22) since only one coefficient value is involved.However, (25) is not usually the same as (23) since two coefficient values are involved. Ifthe two coefficients are different, two different transformations from x to e exist. Then(25) must be used. If the coefficients are equal, (23) may be used. Note that two separate.equal coefficient values lead to the curious situation where the cross-correlation of botherror sequences is equal to the autocorrelation of either error sequence alone except for aconstant difference in the time delay argument.

    •p p- is defined in Section IV.x

    9l!

  • The statistics necessary for the evaluation of the covariances are then the following:the filter input mean /x, the filter input autocorrelation R, '), the mean multiplication

    erroru., the error autocorrelation Re.), the error cross-correlation R.gCh( ' ) , and the cross-

    correlation between error and multiplier input RXe). These are the quantities for whichanalytical expressions will be presented. gh

    iI

    10

    . .0 -.. ..

  • II1. DETERMINISTIC ERROR PROPERTIES

    COMMENTS ON NOTATION

    Multiplier word lengths are defined as follows: input x, K bits plus sign; coefficient a,L bits plus sign; result of roundoff or chopping y, M bits plus sign; and N bits which areeliminated through roundoff or chopping. The lengths K, L, M, and N are arbitrary constantpositive integers with the only restriction that K+L = M+N. The relationships among thevarious word lengths are shown in the diagram in Fig. 2. The coefficient dependent parame-ters v and 6 are also shown.

    The following definitions will be used in this report. Let k be any positive or nega-

    tive integer. Then [k] 2 N is the set of non-negative integers such that 0 < [kl 2N < (2 N-I)

    where [k] 2N is related to k through the equivalence relation

    [k] 2 N = k mod 2N (26)

    The indicator function 12 N[k] has the property that

    I = for2N- 1I [k]2N

  • Let b be written as

    b = u2 -M-N (31)

    where the integer u can take on the values

    u = 0, 1, M+N 1 (32)

    for all non-negative numbers u > 0, and can take on the values

    u = - 2 M+N . -2, -1 (33)

    for negative numbers u < 0. The remainder r is a non-negative number which is definedthrough the following. Let

    M

    b*= I bi2-i + r*2- M (34)i=0

    where

    r* =r

    O.bM+lbM+ 2 ...bM+N (35)

    2-N[2M+Nb*2N

    Let y* be the machine representation of the result of roundoff or chopping of b*. Then

    M bM+ 2-M for roundoff,y* = +bi 2- i +, (36)

    i=b 0 for chopping.

    Note that bM+ 1 can be written as

    bM+l = I2N[2M+Nb* . (37)

    The value y associated with y* can be written as

    y = b + e (38)

    where e is the value of the roundoff or chopping error. The relation between y and y* is thesame as that between b and b* in (29)-(30). Thus, for non-negative numbers y > 0,

    12

  • y* =y (39)

    and for negative numbers y < 0,

    y* =2 +y (40)

    ROUNDOFF AND CHOPPING

    The error can be calculated from (29)-(38) as

    e y -b

    -* - b* (41)

    bM+ I2 M for roundoff,

    - r*2-M +

    where 0 for chopping,

    r*2-M = 2-M-N [u]I2 (42)

    and

    bM+1 2M = 2M 12N [u 1 (43)

    for

    u = 2M+,. .- 1,0, 1 2MN-l(44)

    EFFECT OF MULTIPLICATION COEFFICIENT

    Consider the product

    b = xa (45)

    where

    x = k2-K, (46)

    a =VL, (47)

    and

    K+L=M+N. (48)

    13

  • The integer k can take on the values

    k = -2 K . 2K-2, 2K-. (49)

    The intergers 2 are similarly described. (Replace k by Q and K by L in (49).) Then, from(31)

    u = kQ . (50)

    Consider values of 2 of the form

    = Q'26 (51)

    were R' is a constant odd number which can take on the values

    Q' = ±1, ±3, ±(2 L-6-1) (52)

    when 6 = 0, 1, . . . , L-1. Note that 2' can take on the value Q' = -1 when 6 L. The valuesof 2 of the form (51) thus span the set of all possible non-zero values of Q as defined by (47).

    The values of the error e in (41 ) can be evaluated through the substitution

    u = k'2 . (53)

    Simplification of the error equations can be made through introduction of the parameter Pwhere v = N-S. Thus, for example, for non-negative numbers u (or k2' > 0),

    r*2-M = 2 -M-N[k' 26] 2 N

    = 2 -M-N 2 6[Wk' ]

    2 N-6 (54)

    = 2 -M-v IkQ'] 2P

    and

    bM+ I 2 -M = 2 -M I2N[kQ' 261

    = 2- M 12v[kQ'1 ( (55)

    The same results are obtained for negative numbers u (or kQ' < 0).These equations are valid when no overflow occurs and also for the specified values

    of v and 6. Overflow can occur for roundoff of positive numbers (k2 > 0). It cannot occurfor roundoff of negative numbers (kQ < 0) for TC. Overflow occurs when the unmodifiedproduct b takes the form

    14

  • xa = 0.11 ... I1rr 3 ... rN. (56)

    Thus, the values k that result in overflow are those that satisfy the inequality

    kQ > {2M+ I - 1 Q2 - (57)

    The effect of overflow is not mirrored in the equations however. The interesting values ofP are given by the inequalities I < P < N when L > N, and (N-L) < v < N when L < N.Note that, when 5 = L, the coefficient value is equal to zero and no errors occur.

    TC ERROR PROPERTIES

    In the following sections. the integer form e of the error will be used. It is definedthrough (41 )-5 I ) as the following:

    M+Ve =e2

    - 2 v 2I,,kQ'l for roundoff,

    - 0 for chopping. (58)

    The following error properties become evident.

    PROPERTY 1: Consider values of k of the form k k' + p2 v where the integers k' and pare confined to the ranges

    0 < k' < 2V (59)

    and

    -2K - v < p < , K - v .10

    For constant coefficient a and associated factors ' and v, all values of kwith the same constant k' map onto the same value of the error e. Further-more, the mapping from k' to e is one-to-one with the mapping specifiedby (58).

    In much of the following analysis. this property finds use whenever the resultingrelation

    e(k') = e(k'+ p2V) (61)

    is employed. It is easily seen that the range of error values e which can occur are given as

    15

    a

  • -2v-1 +1 < e < 2v- 1 (62)

    for roundoff, and

    -2p+l

  • IV. MULTIPLIER INPUT STATISTICS

    In this section the analog signal quantization process is reviewed. Relevant quantizeroutput (hence filter input) statistics are presented ending with the Gaussian case. Statisticsare the filter input mean and the input auto- and cross-correlation for zero and non-zerotime lag. Use of the PSF is demonstrated and the notation applied to the rest of the paper isdiscussed.

    UNIVARIATE CHARACTERISTICS

    Let x be a discrete random variable (r.v.) which is obtained from a continuously dis-tributed r.v. 3 by quantization with roundoff. That is, let x = kq where the integer k ischosen such that q(k- /) < 3 < q(k+ ) and q = 2-K is the quantization step size. The A/Dconverter used has an output word length of K bits plus sign with TC number representation.This restricts the values of the A/D converter output to correspond to the range denoted byk2 - K where k=-,.K, - 2K+, ... ,2K-I. In the following, it is mathematically convenientto assume x (and hence k) is not bounded. In order to practically realize this assumption,the A/D converter input can be made small enough in most cases so that the effect of quan-tizer overflow on the derived statistics is negligible.

    In the following it is assumed that 3 is statistically stationary and has a probabilitydensity function (p.d.f.) ff(3) associated with it. This p.d.f. is also assumed to be continu-ous everywhere on the real line. The probabilities of occurrence of each k are written asPx(k) and are defined as

    f q(k+ 2)1x f,( ) d ( 64 )J q(k- 2)

    for k = 0, ± 1, ±2 ... The dependence on K is assumed understood.Associated with the p.d.f. f-() is the characteristic function Q-xG.) defined by the

    Fourier transform

    Q-(W) =J fR(Q) exp(jG)dt . (65)

    The special case that will be used is that of the Gaussian p.d.f. fK(3) where

    1 2

    1

    -' 3 exp -- ° (66)

    17

  • with characteristic function given by

    Qj (w) = exp 02 w-2 + jawo[ (67)

    The expression Px(k) is well-behaved for any real k. Its Fourier transform Qx(w) is

    given by

    Qx() =f Px() exp(jwe)dt (68)

    =sinc(wo/2r) Ql(q)

    where sinc(x) = sin(vrx)/lrx.

    Note that [41

    Qx() ju/q (69)dw W=0

    and

    d =X -XR ) - 1 ,(70)

    MEAN MULTIPLIER INPUT

    The mean multiplier input ;Lx can be obtained as follows:

    ;AX-- Elx]

    = 2-KEk] (71)

    00

    -q I kPx(k).k= -

    Since the function kPx(k) is well behaved for any real k, the PSF can be employed on (71)to yield

    18

    - --- -r

  • 00

    1AXu-iq J w=2 sS= -.40 (72)

    .tx + I'ax

    where, from (69),

    = jA (73)

    and

    00

    (x = - jq (-1)S Q (2ls/q) (74)2/rSS= .

    s*O

    Note that, aside from stationarity and continuity assumptions, no special form of f3(3) is

    assumed at this point. The Gaussian form of jux is obtained through substitution of (67)into (74). The result is

    00

    tAx I L- is--exp {-2(7rs)2 (a/q)2} sin(2Irsa/q) (75)s=1

    NOTATION AND ASSUMPTIONS

    There are some features of -.* and 4Ax which are present in subsequent equationswhere the arrow notation is used. As can be seen from (75) the function tax is dependent

    on two parameters, 1A and a, each in relation to the quantization step size q. The functioncan be made to approach arbitrarily close to zero through a suitable increase in the parame-ter a for any constant value of IA.

    Mathematical limits of px can be taken in either one of two ways. The first way isto increase a and keep IA constant. The second way is to increase a and keep the ratio o/Pconstant. The first limit has the distinction that 7a is a constant. For the second limit xincreases as a. It will be assumed that the horizontal arrow denotes a variable which is eitherconstant or depends in a well-behaved manner on p and/or a for a finite p and a. Thevertical arrow denotes a variable which tends to zero as a increases regardless of the assump-tions about ;A.

    As with the mean quantizer output, subsequent cases will occur where a progressionis made from a statistic expressed as an expected value to the asymptotic (--) and decreasing(W) terms. It will be assumed without comment that the functions involved are well-behavedand that the PSF was used to arrive at the final results.

    19

    L_______

  • MULTIPLIER INPUT SECOND MOMENT

    The second moment of the multiplier input is Rx(0). It is the autocorrelation func-

    tion for zero time lag. It can be obtained as follows:

    Rx(0) = E~x21

    = 2 -2K E[k2 1

    00

    = 2 k2px(k) (76)

    00

    = -- q2 I dw2 2 x~s=--00 Id w=2irs

    = Ax(0) + 4 Rx(0)

    where, from (70)

    (77)= 2 +'U2 +q 2 /12 ,

    and, for the Gaussian case

    00

    4Rx(0) 2q 2 Ij (-1)s + 2(afq)2~ cos(27rsp/q)s1 I [ 2(7ws)2 (78)

    BIVARIATE CHARACTERISTICS

    By analogy with (64), the joint probability of occurrence of a pair of multiplierinputs (k1 ,k2) is written as P,~x2 2(ki ,k2 ) which can be specified in the same way as the set

    of probabilities f Px(k)1. Let the multiplier input r.v.'s xIand x2be obtained from thejoint r.v.'s 3,and 312 by quantization with roundoff. Assume 31and 3 2 are stationary and

    20

  • have a jointp.d.f. fkl,52(3l ,2) that is continuous everywhere in the interval

    "< R I .x2 < -. Further assume xl and i2 are correlated with normalized correlation

    coefficient p = CR 1 2(")/(OLO 2 ) where IpI < I and a1 = oa1 and 02 = 05 2 . Thus

    q(k+) q(k

    Pxlx2(kl,k2f= _ -1 ) f 2('l,'2)d ld '2 (79)fq(k 1_ -2 I k

    for k Ilk2 = 0, ± 1, ±2 .....

    Associated with the p.d.f. fxl -Z2(X15'2) is the characteristic function QRi 2(w l,cw2)

    defined by the Fourier transform

    Q t 2(o1 ,w 2 ) =fl f fR1R2(Q1 ,2) exp(jwo 1 +jco 2t2 )dtldt2. (80)

    The Gaussian case will be used here also where i-1 = a2 = a and u 1 =AR2 -A. Thus

    I

    fRl2(=lZ2) times (81)

    exp -22(1 2(l-M) 2 - 2p(Z I -i)(Z 2-) + 62_-A)2]

    with characteristic function given by

    .e2~ .(2 + 2pwlw.2 + w2)+ jP(w I + w2) .(82)QglI 2(w l, w2) -- expI --- t + 2 1 2 . .The function Px xIx2(k 1 ,k2 ) is well-behaved for any pair of real values (k Ik 2 ). Its

    Fourier transform QxI x2 (w ,w 2 ) is given by

    Qxl x2 (w l'w2) = fP 2 f I 2)exp(JtI I + jw22)dtldt 2 (83)= sinc(w c w2/2ir)Q lZ2(w I /qw2/q)

    21 p

  • Note that [4)

    Ix ( w 1 W ) ~ (8 4 )and

    d1 d 2 QL 2 w 'R3 , 2 O .- ~R~(0) .(85)

    MULTIPLIER INPUT AUTOCORRELATION/CROSS-CORRELATION

    The multiplier input autocorrelation function for non-zero time lag {ijT; 77 0} isRx(17T). It can be written equivalently as Rx x (2 0 ) which is the cross-correlation function

    for two quantizers, with equal quantile step size q and zero time lag. The function presentedhere is valid only when j < 1. If p =1, the autocorrelation function is equal to the secondmoment (76). Thus,

    - 2 Ejklk,1

    k1 =--oo k2=--OO

    00 00

    -- q2 d2(86)

    si =-0 S2 =-0 27rs I

    where, from (85)

    = P0 2(87)

    22

  • and, for the Gaussian case

    00 00

    (1)12 exp {-2(a/q) ?r (s I +2pS-) +S2}Cos 2 (s 1 +s2 )A/q}

    00

    + 4pu2 I (-I )sexpl-2(,/q)2(,S)2} CosjI2rsAlq}1 (88)

    s= I00

    2 ~ ii2 e xp 2a/q)2(rs)2sin{21rsp/qJ

    23

    J__

  • V. ERROR STATISTICS

    The equations of the error statistics are derived in this section. The asymptotic anddecreasing terms are separated and identified. The asymptotic terms are found to be inde-pendent of the p.d.f. shape. The decreasing terms are left general in that they contain theforms Qx and Qxx. This section is intended to show the derivations and results with a

    minimum of explanation or discussion. Gaussian forms of the decreasing terms are presentedin the next section. Properties of the asymptotic terms are then discussed in the subsequentsection.

    If the reader is concerned primarily with using the equations for Gaussian quantizerinputs, he should use only the asymptotic terms from this section and the decreasing formsfrom the next section.

    PROBABILITIES

    Consider the case of a single multiplier with multiplication coefficient a, associatedparameter v, and word lengths K, L, M, N fixed with K+L = M+N. By Property 1, the valuesof k which map onto a particular value of the error e are given by the equation k = k'+p2vwhere the relation between k' and e is given by (58). In line with the relaxation of restric-tions in the previous section, it will be assumed k (and hence p) is not bounded. The proba-bility of occurrence of each e, PC(e) = PC(e; a, v), can be written in terms of the probability

    of occurrence of each k, Px(k) which maps onto it. The factors a, v and the word lengths

    K, L. M. and N all condition the values computed of PC(e) for any given p.d.f. f3(). This

    dependence is assumed understood in the following. Thus,

    00

    PC(e ) Px(k'+p2 v ) (89)

    p=_00

    00

    = I Qx(c"s) exp(-jk'ws)

    = P + 0,P(e)

    where

    P= 2- 90)

    00

    IPC(e) = 2-v I Qx(s) exp(-jk'w s ) , (91)s= -00

    s*O

    24

  • and

    WS= 21rs/2V". (92)

    The probabilities can also be derived for joint r.v.'s. Let a, and a2 be the constant

    coefficients of two separate multipliers with K, L, M and N the same for both. This resultsin two values of P, v, and v-), which are not necessarily equal.

    Consider the sets of integers k I I ' 1 p 1 an 2 2~+P2 22, where)Pand p-, are integers with properties as discussed above. These values of kand k2 map onto

    a pair of integers el and e-l with values determined by k, and k-) respectively by (58). The

    probability of occurrence of each pair of errors (e I,e 2 ) is written as

    Pe jl~e-) = 1 (el,e,; al,a-,vl,v2 ). This probability can be written in terms of the

    probability of occurrence P xx 2 (k I,k2 ) of each pair (k 1 ,k-)) which maps onto it. Thus

    00 00

    P E If(e I,e2) I I x2('+p1",k-+,2P I = 0 0 p )=0

    M 00

    - ~VP2 I Qxl1 (c.o51,ws,)) timesS I -00 S,)

    e xp (-j k'1 w s I-j k-,w s) (93)

    =P + ]P (e 1e-)

    where

    Pfc =2k' rv2 (94)

    00 00

    (slI s2)* (0,0)

    exp(-jk' w -k~ 2 ) (95)

    and

    W5 . 27rsi/2vi for i 1,2 (96)

    25

  • MEAN ERROR

    Through (89) the mean errory, can be expressed as

    /%=Efel

    = 2- M -V E[ej (97)

    = 2 e P(e)e

    = +

    where

    ' -M -2 v e

    e

    2- M-v-l for TCR (Two's-Conplement Roundoff) (98)

    2-M-1 (2- v- ) for TCC (Two's-Complement Chopping)

    and

    ly, 2- M - v e 4P,(e) (99)e

    The sum (98) is evaluated through use of (62) and (63). For convenience the limits on e arenot shown because they depend on whether roundoff or chopping is used.

    The mapping from k' to e is one-to-one. Hence, the error can be written as a functionof k'; namely e = e(k'). Also, a summation over all possible values of e is equivalent to a sum-mation over all possible values of k'. Thus, through (91 ) the term ,U, can be written as

    00

    =2 - M- P e(k') 2- 9 Qx(os) exp(-jk'ws)e s

    = - -00

    s*0

    00

    =2-M-2v I Qx(ws)Ds

    S-00

    s-*0 (100)(contd)

    26

  • 00

    1 -M-2v+1 l Ref Qx(ws)Ds

    where

    D e(k') exp (-jk'w) (101)

    k'0O

    The quantity DS is evaluated in the Appendix and is given as

    12 v-1 for s =- 0 mod 2 v

    DS= Is2- exp ((sgn a~wx Iohrie(102)[ 1 cos(Wsx)-l tewsfor TCR. and

    2 V 1(1-21) for s =-0 mod 21'

    DS {P- exp((sgn a)jcw5 x)-1 tews (103)cos(LO 5 )-1 otews

    for TCC. Note that DScnbe a complex quantity and that it is a function of the parameter X

    which satisfies the relation

    I ' IX mod 2P (104)

    where Q' is in turn related to the value of the coefficient a through (47)-(5 2).

    Re(O)

    The second moment of the multiplication error for one multiplier is denoted byRe(O). Thus

    Rf(O) = E[e2]

    = 2 -2M-2v Eje2I

    =2 -2M-2v I e2 p,(e) (105)e (contd)

    27

  • = Rc(0) + 4 Re(0)

    where

    RC(O) 2 -2M-3 P I e

    e

    2-2 I+ 2-2v for TCR6 12 1(106)

    6 2 3 2-v 2-i'lfor ICC

    and

    2

    IRJO)= 2-M-2pI e P'00

    es0

    00

    2v-2M-3vi1ZR{Ql 5 Fj (107)

    s=1

    The quantity FS defined by

    FS e2(k') exp (-jk'co5 ) (108)e

    is evaluated in the Appendix and is given byI2 £(23v+ 2 2p) fo r s 0 mod 2vFs ( s 2v otherwise,(19

    for TCR, and

    28

    - - - _ lit

  • L(2 .-2 3v- 3 .2 2v + 2 ) for s 0 mod 2v

    FS (110)

    22v 1 lexp((sgn a)j w?, )-i I +2" L~e w sl-cos(W5 X)otews

    for ICC. It can be complex and is dependent on the coefficient value through the parame-ter X defined by ( 104) which is the same relation as used for DS,

    Rf for p= 1.0

    The joint moment of a r.v. x and the corresponding error e for the same multiplieris denoted by Rx where

    Re Elxe]

    q2M-v Elkel (111)

    and

    00

    Elke) I k e(k) Px(k)k= -

    00 2-

    - ~~ (k'+p2v') e(k'+p2v) Px(k'+p2v)P 0 k'=0

    - e(k') I (k'+p2v) Px(kF+p2v) (112)

    k'=0 =-0

    2p-!1 00

    - e(k') 2 r"(-)d.Q(W)} exp(-jk'w5 )k'=0 s-_- 0 0 c).s

    00

    S= --Go d { W=CxJo)000

    =2"PuD0Iq -j2- IQx(W)1 Ds*0

    29

  • The third step in (112) is possible since e(k') = e(k'+p2 v ) and since the mapping from k' to e

    is one-to-one. The last step is obtained through use of (69). Thus,

    Rxe = Rxe+4 Rxe (113)

    where

    RxE = gI2 -M-2v DI2-M-v-l for TCR(114)

    A2-M-v- 1( 1-2 v ) for TCC,

    and

    00

    Rxe -Jq2M-2VI I {- Q(W~)j_ D5. (115)S= ..,=od W°=0S

    s*0

    Note that this is the same joint moment as for the r.v. x1 associated with multiplier a1 and

    the error e2 associated with multiplier a2 when p = 1.0. In this case it is necessary to make

    the assignments v = v2 and s = s2 so that o s = 2irs 2 /2v 2 .

    RxlE2 for lpl < 1.0

    Let xI be the input for the multiplier with coefficient aI and E2 be the error for the

    multiplier with coefficient a2 . Furthermore, let IJp < 1. The joint moment of the r.v.'s x land C2 is denoted by Rxle2 where

    R xl62 = E[xle 2 .

    = q2 -M-v2 E[kle 2 ] (116)

    and

    30

    .J- '

    I~I ,i

  • 00 00

    E~kle2J = k Ie 2(k2 ) Px1 x2 (k i,k 2 )

    k1=Io k2 =-o

    00 00 P-1

    = k I I I k Ie 2(k +P2 2P2) x2 (Vk+ 2v

    k1 =--40 P2=--o k'=O

    2v2-, Go 00

    = e('2) 1 klx2 kIk 2+P222 (117)k -O ki = -00 P2 = .00

    2v2-I

    = e2(k')2 p2 times

    k', =0 xx(~~2) x -k 2 5

    00 00

    ]- 0 s2 - O 2 ws

    00 00

    =~~~ ~ fj-v 1wQx x2 (Ilw2Si~00 s2 -2

    (s1,s2)~w)1 s2,O

    31

  • where

    IVI

    D e2(k2) exp(-jk~,ws,) .(118)

    k;=0

    Thus

    =XCI R xJ2+ Rxe (119)

    where

    Rflx 2 D 2--. ? 2{J s2= (120)

    iA2-M-v-,-1 for TCRp2 (1-2for TCC,

    and

    00 00

    JR *,q-M-2v 2 I~ I~ -I d I(WJW D (122)xle2 ~~sl =-.M S2 =-0 w SS

    (s I s-) (0,0)WI WS

    The joint moment of the r.v. x and the error e for the same multiplier when Ip1 < 1 .0 isobtained by simply letting a1I = a2, in the above.

    R C12forp =1.0

    Let ej, and E2 be the multiplication errors for the multipliers with coefficients a1I and

    a2 , respectively. Also, let p = 1.0 and v, > vI.. The joint moment of the r.v.* CE and E, is

    denoted by ReIe2where

    R =IE E[elei1

    - 2 -2M-vl-v 2 E~ele2] (123)

    32

  • The number of distinct states of the pair (e1 ,e2 ) is determined, in this case, by v1 , since

    v, >' v.,. The probability of occurrence of each state is thc probability of occurrence of e

    alone. Also, since p = 1.0. the error value e-I is uniquely determined by the error value e 1Thus,

    REf 2 ,-2M-V 1-V-' I e 1 e.(k' (e P el

    = Reif. + t~l 1 24)

    where, from (89)

    R -2M2pv e~ e -(k' (e I))

    (1 25)

    = -2M-2v 1,-vi I

    kI=0

    and

    IRC =2 I e I ei(k'l (e1)I times

    00

    Re I Qx(ws I)exp(-i k' ( )ss 1 1

    )-M2,--+ I~ eIW)e- l ime 2o

  • Example plots of el versus e-) appear in [61. Unfortunately, attempts at arriving at more

    suitable closed forms for and R have been unsuccessful except for the sp~ecial

    case E I = e -t already derived. (Fore 1I = el,, see the equation for Re(O).) This is the reason

    for reverting to sums which are more easily computed using the index k instead of e.

    R l2for IpI < 1.0

    Let eiand e -, be the multiplication errors for the multipliers with coefficient values

    a 1 and a i. respectively. Also, let Ip1I< I as for RX The joint moment of the r.v.'se

    and e-, is R,, where

    = -2M-v ,-v, E~ele-,1

    -, - I- 2 ele, P (ele)

    Ref)+ IREI 127

    where, from (93)

    -+ -2M-v-v,-_

    -2M-v -vi 2(12V1)(l -2 V-) for TCC

    and

    ,-2-leiv2 el e, (129)el e,

    34

    - 7,-.-

  • Through (95) the term I R,,,, can be written as

    JR =-2M-v 1 P- e' I e(k'1 ) e2(k') timeseI21 e2

    00 00

    z 22 Q-P (s 1 ,P 2 exp(-jk'1 W5 1 -jk-, )(s~ Is-) (0, 0) (130)

    The autocorrelation of the error for one multiplier can be obtained from R,(O) for Pl.0 and

    (130) forlJpl< 1.0 by lettinga a,a.

    VARIANCES AND COVARIANCES

    The variance of the multiplication error C ,(O)1is defined as

    CE(0) = R (0) -

    = CE(O) + C(O) (131)

    where, from (97) and (105),

    C E() =RC() - JU (132)

    -12{ 1

    for TCR and TCC, and

    tCemO = RJ(O) - 2yl,- 2; (133)

    35

  • [he function CX f is thte covariance o* the ml t ip lieril inpt X I (fort the mulItiplier

    with coefficient a, with the error e-, (for Jle Multiplier with coefficient a,). Similarly, the

    function Cis the covariance of a Multiplier input with the corresponding error. Both can

    be defined as the following (dropping the subscripts inl the first case for brevity):

    C CfRX +JAXC

    CXE +jXE (134)

    where. from (72 ), (9)7) 113) and (119),

    ("\t =Rx - PxU (135)

    for FCR and I (C. and

    Wc =IR . - P -P IP"U - I ,(136)

    rhle covariance C. ofthe Multiplication error E 1 (for the multiplier with coeffi-

    cient a with the multiplication error c, (for the multiplier with coefficient a-0) and

    pj < 1.0 is defined as

    Cfe1 ., Reif-) e I me, (137)

    where, from (9)7) and (127)

    =0

    for TCR and TCC, and

    tP . PCR j JAC (139)

    When p =1.0. the asymptotic portion C. is generally not equal to zero. However, the

    decreasing portion ICE 2 has the same form as 1I39).

    36

  • The following comments regard tile asymptotic correlation coefficient (ACC) pC Ic)defined for p 1 .0 as

    PC I c' (0) (140)

    (A similar form of the ACC was examined by Girard 161 and Parker and Girard 171. Theyassumed a zero-mean error which is not tile case for TCR.) First consider the case whenvi = v2 = v. For a given v the ACC values vary from a maximum of 1.0 to values which can

    become quite small but which are non-zero. fable I shows thr comnputed ACC values whichresult when v = 5 and for two coefficient values V'1. Note that the same ACC values result

    for both cases but they are permuted with respect to the values of ' . This permutation

    property appears to be a general one for all odd values of V'1. Also given inl the table are the

    computed mean and standard deviation of these ACC values which are therefore the samefor all odd values V' . Table 2 shows the computed mean and standard deviation of the ACC

    values as a function of v for v up to 10. The decrease in the standard deviation value as vincreases results from the introduction of more ACC values which are closer to zero.

    The next case to consider is when vI > v-. Table 3 shows some example ACC values

    which illustrate the similarities and differences resulting from the different coefficient values.The common behavior is for the ACC values to start at sonie initial value (for v1 = 5 in this

    case), then eventually decrease by a factor approaching 2.0 for each integer increase in vI.

    The differences in the ACC values can be sen in the four cases which are shown. The caseswere chosen according to the signs of the ACC values for vI = 5 and vI

    = 6. respectively.

    Iour possibilities t++. +- + and - - ) are represented here. For v I > 6, no polarity changes

    are indicated and the decrease in absolute value of the ACC by factors close to 2.0 takesover.

    37

    -AV /T

  • VI. GAUSSIAN FORMS

    In this Section. GauIssian forms of the decreasing terms ot the error statistics arewritten out. They were obtained by substituting the (Gaussian forms ot'Q and Qx (from

    (68) and (83)) into the decreasing termns defined in the last section. In the interest of a

    simplified presentation, a shorthand notation is employed with the following definitions:

    B = p/.

    J(-I )s for IC RI for ICCII for TCRI-2"' for TCC

    V Il/tcos(w ?0 - I)

    W =sinc(w./27r)

    Ws 2lrsRv.

    Subscripts are used with G, H., V. W and cswhenever more than one coefficient is involved

    in the formula of the statistic. These have the form

    (-I)s 'for TC R

    I for ICC

    ~I for TCR(4)

    1-v ifor TCC

    Vi=I/(eos(w5 xi) -I

    i= .nc(ws5 /2ir)

    W =27r si/2Vi

    for i 1.2. There is a special termn C used for .Rx c, for p1 < 1 .0. In this case

    W I27rs,

    38

    -IL

  • 1

    It is assumed that the coefficients have fixed values. Thus, whenever the parameter k'appears, a specific transformation from k' to e is implied according to (58). The parameter Xis the coefficient related value obtained from (104).

    PROBABILITIES

    P(e)= 2 I- Wexp(Aw )cos(w sB-k']) ( 1431

    s= 1

    I(e pe -V I -v2 W exp(Aw Is)coSLsl [B-k'I )+ W* exp(As )cos(, B-k' I1Pelf.( , e 2 ) =2 " * 1 ;I I_ , s

    Ssl~l s-=l

    + > WlW-, [xpAJ 1 + 2p"' 1 ~2 +(o)" )cos(u i~k + .,,IB-k:lls I=l s =s=l

    W,~A(.~ -2p I Ac~2 +jIL, + w2 )os(co. I Bk'1 I +o~I"2+exp A(Ljsl-2ptoSl 'WS2 )s) co-'(WslB-k'll -wosB-k',i • 144)

    MEAN ERROR

    TCR and TCC

    S= 2 -M-v G VW exp(Aws) [cosclj B + X sgnal ) -cos B]. ( 145)

    s=

    s O mod 2v

    R~ (0)

    TCR

    R(O M- 2 v l

    " )s VW exp(Aw s ) cos(osB) (146)

    s =I

    sLO mod 21

    TCC

    R(O) - 2 -2M-2v+ I VW exp(A2 ) [2v os(wsI B + X sgna )+(I 2 v-)cos(LwsB]

    s0o mod 2v (147)

    39

    'I I

  • RXE for p 1.0

    TCR and TCC

    i ,. 2-K- [I ex(A 2 o~,2)snw + 2-K- exp(Aw 2) times (148)=Ls s I s

    s-0 mod 2' s " mod 2p

    1W (Aw - -.L)+ -1 cosfw,,I!2) I VG I sin(wsIB + X sgnal I sin(w,~B)J+G(BVWICos(WSIB +X~sgna))-cos(wSBh)

    RXE forIpl < 1.0

    TCR and TCC

    IR - I-K-M-v G2 W- exp(Awc) ) [2AV-,pws,{sin(cos, [B+,\,sgnail)-silw B

    S, 10rod 2 P2

    + BV, {cos(cj,51 B + X, sgnal )cos(w,,B)]

    Iexpc(w~ o sin (w , B)

    +KMv 2 > (-I )sG, exp IA)w + pw I .s + LY times (149)1 S,

    s, Omod 2P

    +2-K--v -)sl G- x ~ W S W times

    s2PO mod 2P2

    2, [sin(BIL, -L &)~1 -w'52 sgna-,) -sin(B[w., -w,,]

    40

    ,r-.

  • R' E2for p 1.0

    TCR and TCC and v V

    IR 2-M2,-21 - e 1 (k'1 ) e2(k' ) 2)os [B - W (150)s1l =0 s I

    Re 6 for 1PI < 1.0

    TCR and TCC

    Rll= 2-Mpv1Hi' GIVIW exp(Aw2 +'~ BX, sgnal Icos(ws B)s1=I

    s

    s1 -O mod-2

    +2 -2 m 1 -2 -1 HI G-,V-,Wi exp(Aw2 ) [os(ws,,B + X, sgi,I cos(w,B)]ss2

    5, POmod2

    +-2 M-v I -v2-1 GGiV VWWexp A(w2 + pc) s()S +,,;}times

    s1=1 S-,=Is, f-Omod 2v'1

    s2 f 0mod 12 (151)

    cos(B I.w + LJ + w,, Xsgna,) +cos(B[c. 5 + w5 I

    + 22M-v1 -v2-1 G-G,VV 1 W W, exp {A(w I ,)w Li +WS2 times

    s, fO mod 2

    s, t mod 2I)2

    [cos(Bl sl -ws 2 I +w S, X, sgnal -cs-,'2 sgna2)-cos(BIj 5 j -ws, I + w ,snl

    -cos(B Iw 2I W'52 ' 12 )+ cos(BjwsI -w5

    41

    - p.

  • VII. DISCUSSION

    It is appropriate at this point to compare the results derived in this report with theerror models presently used in digital filter design. Obviously, simplification of the derivedequations are not possible when o/q is small enough so that the decreasing form of eachstatistic cannot be ignored. For the purpose of this discussion, alq will be assumed largeenough so that the decreasing forms are negligible.

    Assume a filter structure where the coefficient values are fixed. Associated with each

    coefficient value is the parameter v which is the effective word length of the associated

    multiplication error. The errors can take on each of 2" discrete values which are uniformly

    distributed. The probability of occurrence of each value is (almost) equal to 2-v. The mean

    error, 1e. is

    S2- M - v- 1for TCR/ae = /(152)

    S -Ml(2 V- 1 ) for TCC .

    The error variance, C (0), is

    1 -2MC -) -- ( 1-2- !v (0) = (153)

    An idealized continuous uniformly distributed error ec is often used in filter design. Thiserror has the following properties:

    0 for TCR

    ~C. 1(154)- 2 -M- ! for TCC,

    and

    2-(0) 12M (155)

    Note that, in comparing (154) with (152) and (155) with (153), the idealized continuousmodel can only be assumed if v is large enough. Thus, for example, if the word lengths K, L,M and N are equal, the coefficient values i 1/2 (or v = I) would yield the most differencebetween the discrete and continuous model. The difference becomes less for coefficientvalues of ± 1/4 and ±3/4 (or v = 2), and so on for larger values of v.

    Other results are the following. The covariance CxC is zero in agreement with the

    continuous model. This means the multiplier input is uncorrelated with the error. The errorcovariances are:

    Qo0) for r77 0

    Qi(T) = (156)

    0 for 17*0

    42

    -- " I

  • and

    C6 l, forp 1.0

    - 0 for jpl< 1.0 .

    These latter results imply that these contributions to the FIR filter variance and outputspectrum are white noise in nature.

    The consequences of small values of u/q on the statistics will be explored in a sub-sequent report. Particular attention will be paid to those ranges of values of o/q over whichthe asymptotic model can be assumed.

    I

    43

    I ._ ,., -*

  • 1.0000 13

    0.3548 3, 11 1,9

    0.2023 5, 13 7, 15

    0,1202 7,23 5,21

    0.0616 9,25 11,27

    -0.0674 15 13

    0.2493 17 19

    -0.0205 19,27 17,25

    -0.1730 21,29 23,31

    -0.8182 31 29

    Values of Q which resultin the indicated values

    of PI'

    NOTES:

    2. Mean ot7, e 0O.0909

    Standard deviation = 0.3 566

    -4.

    Table 1. Examples of p,, ~2 for TCR.

    44

  • V Mean of l e2 Standard Deviation

    1 1.0 0.0

    2 0.6 0.4

    3 0.33333 0.4762

    4 0.17647 0.4216

    5 0.09091 0.3566

    6 0.04615 0.2737

    7 0.02326 0.2052

    8 0.01167 0.1497

    9 0.00585 0.1086

    10 0.00293 0.0777

    Table 2. Computed mean and standard deviation of pe2 forTCR as a function of PI r = v.

    R1=17 R1=3 Q1=19 Q 15Pl (++) (+-) (- +) (-

    5 0.249267 0.354839 -0.020528 -0.067449

    6 0.124588 -0.104067 0.083547 -0.127519

    7 0.062288 -0.052029 0.041770 -0.063754

    8 0.031143 -0.026014 0.020884 -0.031876

    9 0.015572 -0.013007 0.010442 -0.015938

    10 0.007786 -0.006503 0.005221 -0.007969

    NOTE: = 5 , Q' = 1

    Table 3. Examples of the behavior of Pele 2 for TCR when v I v 2 .

    45

  • ++

    _.j -.J

    'U Cu

    .u 0

    C y)

    ClCu

    y0

    FCC 73LU 2

    0~

    CLU

    Zx

    46

  • MULTI PLI ER I NPUT COEFFICIENTx a

    K BIT on LBITS 0-

    I 1 00 0. oBIT POSITIONS BTSIGN

    BIT E U L T E OS BT

    V BITS

    ____ ____ ____ ____ ____ ____ _00 00.0.0 ]

    M BITS NodINBT

    A N BITSy

    PORTION LEFT OVER AFTER PORTION DISCARDED IN THEROUNDOFF OR CHOPPING ROUNDOFF OR CHOPPINGOPERATION

    Figure 2. Relationships among the various word lengths.

    47

  • I 0)

    ca co CD m m In

    .. . . .. . . ... ...

    0 * 19ty *Yc rc T rI 7c

    Z, S S*

    * 00

    m * 0

    A 00

    0 0 =

    -k 0

    0~L 0 _ j

    Z

    < CCV, *u 0~ 5 SS

    00

    0 ~ ~ 0 ~ 0 48

  • ViII. REFERENCES

    [1l L. R. Rabiner and B. Gold, Theory and application of digital signal processing,Englewood Cliffs, NJ: Prentice-Hall, Inc., 1975, ch. 5, pp. 295-355.

    [21 A. V. Oppenheim and R. W. Schafer, Digital signal processing, Englewood Cliffs, NJ:Prentice-Hall, Inc., 1975, ch. 9, pp. 4 0 4 - 4 7 9 .

    [31 L. P. Mulcahy, "Digital fixed-point multiplication error structure and some conse-quences," in Conf Rec., 19 76 IEEE Conf A coust., Speech. Signal Processing, 1976,pp. 529-532.

    [41 S. Bochner, Harmonic analysis and the theory of probabilit', Berkeley and LosAngeles, CA: University of California Press, 1960, p. 32.

    [51 A. Papoulis, Probability, random variables, and stochastic processes, New York, NY:McGraw-Hill, Inc., 1965, pp. 157 and 209.

    [61 P. E. Girard, Correlated noise effects of structure in digital filters, Ph.D. dissertation,Naval Postgraduate School, Monterey, CA, June 1974.

    [71 S. R. Parker and P. E. Girard, "Correlated noise due to roundoff in fixed pointdigital filters," IEEE Trans. Circuit Syst., vol. CAS-23, pp. 204-211, April 1976.

    49

    i~~~~~~~~~~~ ... ......., :I"ll S. ... Ir ..

  • APPENDIX A

    Evaluation of D. proceeds as follows. Consider tile TCC case first where a > 0. Then

    by using (58) in (101).

    Ds = I e(k') exp-jk'w s )

    k'=0

    2 k'- I

    [k'']vexp (-jk's) (A-I)

    k' =0

    for cOs=21rs/2 v where s = 0,± 1,±1 ... and Q' is the coefficient dependent value which is

    derived from the value of a through (47), (51 ). Let the variable 0 be defined by relation

    0 k'_ ' mod 2v (A-2)

    which, since Q' is an odd positive integer, implies the relation

    k' n mod 2v (A-3)

    where the odd constant integer AC[0,2v) is the unique inverse of ' which satisfies the relation

    I =_'X mod 2V (A-4)

    Since the relationship between k' and is one-to-one,

    2p~- 1; Ds=-l Oexp(-Jws[Ox1'1 V)

    0=0

    1> 0 e xp (-jJ'Os) •(A-5)

    !D

    Whnee s= mod 2vthis form simplifies toWhenevers 2 v).lt AS

    IS

    = 2vi(I- v ) .(A-6)

    51

  • This torin can be evaluated f'or other values of s by writing it as

    = I d {exp(-jW s3X)4(A7

    The smainis over apower series in exp('jw5 X) which is easily evaluated. By then

    taking the derivative, the result is

    D )-Iexptjcax)-l(A8- cos(W x)-l

    for s P 0 mod 2v.The computation of D. for a < 0 makes use of the relationship of e+(k) to c_(k) as

    called out in Property 2. ThuIS. tor a_ where a_ = -a+,

    D(a- I ejk'kexp (-jk'.o5 )

    k'==1

    2 k2v- I

    = e.(k') exp(jk'wS)

    k' 41

    =D5(a+) (A-9)

    where *denotes complex conjugate. As a result, the complete expression of Dsfor the TCC

    case, which takes into account the sign of the coefficient value, is written as

    2 v- 1(1 -2 p) for s =O0mod 2v

    D5= (A-I 10)I exp ((sgn a)jw5 x)-l otewsI. - c sx)-l ,otews

    52.

  • The parameter X is then determined from (A-4) whereby I 'I is used in those cases wherea < 0.

    This complex conjugate relationship holds for D s in the TCR c-::e and F, in tile TCR

    and TCC cases. Hence. although the bulk of the remaining derivation, are carried out fora > 0, the final results in each case will be stated so as to include the effect of the sie;l ofthe coefficient value.

    Evaluation of D s for the TCR case follows the same procedure as for the TCC case.

    Substitution of (58) into ( 1i ) for a > 0 results in the following form:

    Ds= IDS + Ds (A-Il)

    whe re

    I Ds- Vk'' + 2v-I I expl-jk'cws) (A-1 2)k,=0

    and

    2 )'- I

    D= 2 -I exp(-jk'cwS)

    k'=O

    {2v - 1 for s - 0 mod 2vA(A-13I

    otherwise

    For the evaluation of iDs let the variable f be defined as

    / (k'Q' + 2v- 1) mod 2" (A-14)

    which, since Q' is a constant odd integer, implies the relation

    k' - (OX+ 2v-1) mod 2" (A-15)

    where the odd constant integer X is the unique inverse of ' which satisfies the same relation(A-4) as for the TCC case. With this change in notation

    D ex jsOX+2- I A 0

    I (contd)

    II

  • 2v- I

    - > exp-Jwsl3X + 2v-I )0i=0

    =-(-)S 3exp(-Ws/3X)

    0=

    which is exactly the same form as Ds in the TCC case except for the factor (-I ). Thus, for

    TCR, and taking into account the sign of the coefficient value,

    2 - for s - 0 mod 2v

    exp((sgn a)jw 5 X)-l

    -1 )s - 1 cos(wx)- otherwise

    fors = 0,+'1,_+2 ..Evaluation of Fs proceeds as follows. Consider the TCC case first where a > 0. Then

    by using (58) in (108)

    F= e2 (k') exp(-jk'ws)

    k'=0 (A-18)

    i 2v- I

    I I -[k'Q'] Iv12 exp(-jk'wS~)k'=O

    for cos= 21rs/ 2 v where s = 0, 1 ±2. and V' is derived from the value of a through (47),

    (5 I ). Let the variables / and X be defined by the relations (A-2)-(A-4) as for Ds. Then

    I- I

    FS = 0 2 exp ( -w 5 1gx] 2V)3=o

    = 0 t32 exp(-jws3,t) . (A-19)

    4=0

    54

  • Whenever s 0 mod 2v this form simplifies to

    Fs = > 9 "

    6"

    This formi can be evaluated for other values of s by writing it as

    =~ I -N -, exp(-cosX)j (A-21IwdAk I0=0

    and evaluating the second derivative of the power series in exp(-jw 5 X). The result is writtenfor TCC as

    A 2v-1I [exp(jw SX)-l 1+2"(-2l -cos(w 5 x)

    for s jl 0 mod 2v'. The complete expression which takes into account the sign of the coeffi-cient value is written as

    -(2-1v -3 22v+ v)for s =Or0nod2I V I2 exp (sgn a)jw,5 X) - I 1+2' (A23I l-oswX) ,otherwise

    for s 0,±l.±2,Substitution of (58) into (108) for a > 0 results in the following form for TCR:

    F = IF +,F5 +F 5 (A-24)

    w h e re

    V I

    I Fs= -[ kQ+ 2v-l1IvJ2 exp (-jk'w5 ) (A-25)

    k' =0

    55

  • where

    2 k'- I

    2F=-I 2PjkfQ'±2L-JlI),exp(jk',5 ) (A-26)k1=O

    and

    3F (2v- 1 )2exp(jk'wS) (A-2 7

    f23P-2 for s =-0 mod 2v~

    0O otherwise

    Let the variable ~3be defined by the relation (A-14) as for Ds, Then

    2vi- 1

    03=0

    2v- 1= (-I )S 1~ 02 exp(-jWSO.X) (A-28)

    which is exactly the same form as FSin the TCC case except for the factor (-I )s. Thus

    I F (S3 3 2L±V) frs~o2 (A-29)(-.~ 22v-1 expoc X)- I I+2v' jotherwise

    The factor 2 FS can be written from (A-] 2) as

    =F 1V IDs

    (2 2v1(1- 2v) for s =Q mod 2V(A-30)

    (1I)s 22v-1 epcsk-IotherwiseCos (CosX) -I

    56

  • The inal result isI 2 v 2 2vfo s 0 m d 2t

    FS (A-31)

    l-~ -P otherwiseI I -cos(w 5 x X)

    for s .±I 1,±2,. Note that this form is real and, hence, there is no need to take into* accoun-t the sign of' the coefficient value.

    57

  • GLOSSARY

    ACRONYMS

    ACC Asymptotic correlation coefficient

    A/D Analog-to-digital

    FIR Finite impulse response

    PSF Poisson summation formula

    TC Two's-complement

    TCC Two's-complement chopping

    TCR Two's-complement roundoff

    ABBREVIATIONS

    l.s.b. Least significant bit

    p.d.f. Probability density function

    r.v. Random variable

    SYMBOLS

    Equation

    C Auto- and cross-covariances (8)

    E Denotes expectation or expected value (5)

    I Indicator function (27)

    K Multiplier input word length in bits

    L Multiplication coefficient word length in bits

    M Multiplier output word length in bits (result of roundoff or chopping)

    N Number of bits discarded in the roundoff or chopping operation(K,L,M do not include sign bit. See Fig. 2.)

    P Probability (64)

    Q Characteristic function or Fourier Transform of a continuous (65)function

    R Auto- and cross-correlation (9)

    S Power spectral density (12)

    T Time interval between consecutive data samples (1)

    59

    &

  • U Number ol FIR filter taps (4)

    a Multiplication coefficient value (2)

    1) Desired or ideal filter output sequence()

    e Integer f'orm of the Multiplication error E (58)

    (12)

    k Integer tbrm of* the filter input Sequence x (46)

    Integer forni of the coefficient a (47)

    q Quantization step size (64)

    r Remainder (35)

    Ui Integer form of the desired filter Output Sequence b (3 1)

    w Additive noise contribution to filter Output sequence y ( I)

    x Filter or Multiplier input data Sequence (2)

    Ni Fle01Multiplier output dat a sequence (1)

    Analog wax etorin used for q Uantizer input

    * Denotes machine representation (e.g.. b*kI also, complex conjugate (28)(A-9)

    N Change in a statistic imposed by the multiplication errors (6)

    6 Number of consecutivCe roes in the Ish. positions in the TC (51)representation of' the coefficient value

    E Multiplication error sequence (3)

    A Coefficient related odd integer (104)r p Mean value (4)V Effective word length in bits of the multiplication error (54)

    p Correlation coefficient ( lpI < 1 .0)

    or Standard deviation: with no subscript u denotes a (5)

    W Radian frequency (12)

    60