STATISTICAL APPLICATIONS OF ELEMENTARY MATHEMATICS

STATISTICAL APPLICATIONS 247

STATISTICAL APPLICATIONS OF ELEMENTARYMATHEMATICS.

By DUNHAM JACKSON,University of Minnesota, Minneapolis^ Minn.

Attempts have been made for a long time to subject the lawsgoverning vital phenomena and the phenomena of humansociety to a mathematical treatment comparable in definitenesswith that which is traditional in such "exact" sciences as physicsand astronomy. The branch of mathematics known as thetheory of probability has been conspicuously useful in givingthese attempts a measure of success. New fields have beenopened up to quantitative investigation, in biology, psychology,education, medicine, and economics, by the development inrecent decades of new statistical methods, notably by ProfessorKarl Pearson and his followers. From the very fact that theinitial steps are tentative, and that mathematical deduction is notto be carried too far in any one direction on the basis of the newmethods until the first conclusions have been checked by freshdata, it results that there is much work to be done with processesthat are quite elementary on the mathematical side. It is thepurpose of this paper to show how some of the most familiarfacts of high-school mathematics are adequate for the proof ofimportant theorems in statistics. While it contains nothingthat is not to be found in substance in the standard text-books1it is arranged with the aim of bringing out as clearly as possiblethe elementary character of the reasoning.The rules of algebra can be applied with particular effective-

ness in connection with the principle of least squares, and the der-ivation of various important formulas from it. A full apprecia-tion of the significance of this principle, as well as of the limi-tations of its applicability in particular problems, is to be gainedfrom extended study and experience rather than from any brieftheoretical argument; but its power is largely due to the funda-mental simplicity of the mathematical relations which ariseout of it.One of the simplest illustrations is the connection of the least-

square principle with the definition of an arithmetic mean. Tospecialize the problem in the direction of the utmost possible

�In particular, most of the discussion of the correlation coefficient is merely a presentationof the method of the corresponding passage in Yule’s Introduction to the Theory of Statisticspp. 170-177; but if it convinces the reader that the essential ideas can be readily understoodwithout previous technical study of theoretical statistics or advanced mathematics, it wilhave served its purpose.

248 SCHOOL SCIENCE AND MATHEMATICS

simplicity suppose a, b, c are any three given numbers. Theiraverage, or arithmetic mean, is y^ (a+b+c). On the other hand,let the following problem be proposed: To find a single number xwhich shall serve as a general representative of the three numbersa, b, c, with as close a degree of approximation as possible, thelast requirement being interpreted to mean specifically that thesum of the squares of the errors, the expression

(a-x)2+(b-x)2+(c-x)2,shall have the smallest possible value. (The choice of thesquare rather than some other power of the error is justified bythe result, and by the considerations that underlie the use of themethod of least squares generally. Minimizing the sum of thefirst powers of the errors, taken without regard to algebraicsign, would lead to another important average, the median;although the median sometimes has advantages over the arith-metic mean, its mathematical theory is essentially less simple.)Let* the arithmetic mean of a, b, c be denoted by m; in otherwords, let m be defined by the equation

w==^(a+b+c). -It follows at once from this equation that

(a�w)+(6�7^)+(c�?n)==a+&+c�3m=0.If a�a; is written in the form

a�a;==(a--w)4-(w--a;),expansion of the square of the last expression by the rule forsquaring a binomial gives

(a � x)2 == (a � m)2+2(a � m)(m� x)+ (m � x)2.Similarly,

(fc-a;)2=(6-m)2+2(6-m)(w-a;)+(m-a;)2,(c’-x)2-=(c-m)2+2(c-m)(m-x)+(m-x)2.

By addition of these equations, and suitable combination of theterms on the right,

^x)2+(b--x)2+(c’--x)2===(a-m)2+(b’--m)2+(c-m)2+2(w-a;)[(a-m)+(&-w)+(c-w)]+3(m--a;)2.

But the expression in brackets is equal to zero, as was pointed outabove, so that(a’-x)2+(b-x)2+(c-x)2

=(a-w)2+?-w)2+(c-m)2+3(m-a;)2.In the right-hand member of the last equation, the only termthat depends on x�the only term that can be varied by varyingx�is 3(w�a;)2, which reduces to zero if a;==m, and is positivefor all other values of x. So this term, and consequently thewhole expression, is reduced to its minimum value by taking


a;==w. TAe solution of the least-square problem is given by thearithmetic mean.

To deal with the average of an arbitrary number of givenquantities, instead of just three, suppose there are n such quanti-ties, and let them be denoted by Xi, X2, . . . , Xn, instead ofa, b, c. The problem then is to find a number x to makes(a;�Xk)2 as small as possible; the sign s, wherever it occurs,here and later, means that the terms indicated are to be formedfor each value of k from 1 to n, and added together. The aboveproof, repeated with the modifications that are obvious, showsthat the minimum is reached when x is the arithmetic mean ofthe Xk’s, (l/yQsXk.Another important problem of least squares is the following:

Suppose two sets of n quantities each are given, Xi, Xs, . . .

Xn and Yi, Yz, . . , Yn. It is convenient for various purposes,and important for certain definitions, to deal with the deviationsof the numbers of each set from the corresponding arithmeticmean, instead of the given numbers themselves. Let X be themean of the X^s, and Y the mean of the Y’s,

X=(l/n)sXk, _"Y=(l/n)sYk,and let a;k=Xk�X, ?/k=Yk�Y, for each value of k from 1 to n.It follows from these definitions that sa;k==2^/k==0. The prob-lem then is to find a. number b so that ^(y^�hx^)2 shall be assmall as possible, or in other words to determine the coefficientb so that bx^ shall give the best possible approximation to y^,when the sum of the squares of the individual differences y^� bxîs considered to measure the error of the approximation for theset of numbers as a whole. It will be shown that the solution isobtained by taking

_ sa^k^k^"i^k"

To bring out the significance of the question more clearly, letthe x^s and the ^s, and more specifically the association of eachparticular x^ with the corresponding y^y be represented graphic-ally by plotting the n points (a;k, y^) with respect to a pair ofrectangular coordinate axes. The requirement is to choose theline y =bx, ’among all lines through the origin, so that the sumof the squares of the distances of the n points from it, measuredparallel to the y-axis (not perpendicular to the line), shall be aminimum. While the geometric formulation is instructive,however, the actual solution will be algebraic.


For the moment, let ^x\ sa^k, ^y\ be denoted by A, B, Crespectively. Then, for any value of fc,

s(?/k � &:Kk)2 = St/^ � 2bxx^+b<i^x\==C-2Bb+A62.

The last expression is the same as (l/AÂC-ÂBb+A2^2),which can be further rearranged in the form

4-(AC-Bi^+B2~2ABb+A2b2)=-^-[(AC-B2)+(B-A&)2].A AThe a^s and the y^, and consequently A, B, C, are to be thoughtof as given quantities. Of course A and C, being sums of squares,are positive, while B may be positive, negative, or zero, accord-ing to circumstances. (The trivial case in which all the x^s arezero, or all the ^s are zero, is ruled out.) In the last form ofrepresentation of ^(yk�bx^)2, the only thing that can be variedby varying b is the term (B�Ab)2. Being a perfect square, thisis necessarily positive or zero; it is reduced to a minimum bymaking

B s^k^kB-A6=0, b=-,-=�F"1 A sa;^

The question can be modified by seeking to minimize thequantity x(x^�by^)2, instead of s(?/k�&^k)2. Algebraicallythis problem differs from the other only by the interchange ofthe letters x and y, and the value of b giving the solution is

Z^k^kbi=S^k

In distinction from this bi, the b previously obtained will bedenoted by &2. From the point of view of the geometric repre-sentation, the line represented by the equation y=biX is calledthe line of regression of y on x, and the line x ==biy is the line ofregression of x on y. The latter is characterized by the fact thatthe sum of the squares of the horizontal distances of the plottedpoints from it is a minimum. The numbers 62 and 61 are thecorresponding coefficients of regression.

A quantity associated with the regression problem is thecoefficient of correlation of the sets of numbers x^ and y^ common-ly denoted by the letter r. It is equal to V&i&a, the geometricmean of the regression coefficients, and its value is given by theequation

zrckl/k

V(^x\)^y\)


It is at the same time, by definition, the coefficient of correlationof the original sets of numbers Xk and Yk. That is to say, thecoefficient of correlation of any given sets of numbers is definedin terms of their deviations from their respective arithmeticmeans, or else by some equivalent expression. One such alter-native formula will be further discussed below.

An important property of the correlation coefficient, the factthat its numerical value can never exceed 1, comes out veryreadily from the formulas that have already been used. It wasseen that a certain expression obtained as a representation ofs(t/k�6^k)2 was reduced to a minimum by making B�A&=0.The value of the entire expression is thereby reduced to

(1/A)(AC-B2).But s(z/k�tok)2 is a sum of squares, and even at its

minimum is necessarily positive or zero. As A is positive, thismeans that AC�B2 must be positive or zero, which is the samething as saying that B2 is not greater than AC. But the equationdefining r can be written in the form r=B/-\/AC, and the factthat B2 cannot exceed AC means that r^BVAC cannot exceed1. Hence the value of r itself is between�1 and +1, or equalto one of these extreme values as a limiting case.The expression obtained for the minimum value of ’s(yk~-bx^)2

is of importance in itself. The quantity \/(sx\) /n, or \/A/n,which may be denoted by o-i, is the standard deviation of thenumbers x^ or of the numbers Xk, and similarly 0-2 = \/(xy\) /n ==VC/n is the standard deviation of the y^s or of the Yk’s. Thequantity _________

ff2.i= \/(lAOs(yic�&20;k)2,in which b has been given the value of the regression coefficient&2, is the standard error of estimate of y in terms of x. From thedefinition of ^2.1 it follows that

nA.l=s(yk-^k)2=-^(AC-B2)==C-^2-=G(l-B^)A A AC

== C(l~r2) =91^2(1 -r2),so that finally

^.iâMl-r2).Somewhat informally, this equation may be said to measure theextent to which the dispersion of the ^s can be reduced by sub-traction of a suitable multiple of the a^s; if r==0, 0-2.1 is the sameas 0-2, while otherwise <r2.i is less than 02 in the ratio of Vl �r2 to 1.


It has been mentioned that the formula given for r above is notthe only one. Another, which is sometimes preferable in use,though less simple in appearance, is

nsXkYk-(sXk)(2;Yk)r""

�v/nzX^- (sX^Vnsy^-- (sYk)2Its advantage lies in the fact that it is expressed directly in termsof the original quantities Xk, Yk, and does not require the explicitcalculation of the arithmetic means and the deviations from them,a process which may introduce inconvenient fractions. Theproof that both formulas represent the same quantity is oncemore a matter of elementary algebra.

Since �a;k=X^j-X, 2/k=Yk--Y, it follows that

sa;k?/k==^(Xk-X)(Yk-Y)=sXkYk-Y sXk-X sYk+nXY;Y and X are taken out as common factors from the secondand third summations in the right-hand member, while the lastterm results from the fact that the product XY, independent ofthe subscript fc, is repeated n times. Substitution of the valuesX == (1 /n) sXk, Y == (1 /n) sYk, transforms .the right-hand mem-ber into

sXkYk--1- (sXk) (sYk) -^(sXk) (sYk)+ -^-(sXk) (sYk),n n n

which reduces to

sXkYk-^sX^CsYk).Similarly,

s^k^sX^-^-CsXk)2, z^k^sY^-^-CsYk)2,n n

the distinction between sX^ and (sXk)2 being of course thatone is the sum of the squares of the X^s and the other is thesquare of their sum. By insertion of these values in the earlierexpression for r, and multiplication of numerator and denomina-tor by 7i(==-\/n-�-\/n) to clear of fractions,

so^k __nsa^k__ nsXkYk� (sXk) (sYk)V(^x\)(s^)

~

Vns^kVns^k"V^sX^-(sXk)2^/y^sY2k~(sYk)&Other modifications of the formula, sometimes preferred to eitherof those given, are similarly obtained.

These are only the beginnings of a treatment which could beworked out at much greater length if space permitted. For thedefinition of a coefficient of correlation by rank,


GsD^p=l�

n(n2-!)and a proof that it is merely the result of applying the formula forr to the numbers designating the relative positions of the X’swhen they are arranged in order of magnitude and the corre-sponding rank numbers for the Y’s, for a discussion of partialcorrelation coefficients, for the more extensive use of geometricrepresentation, with applications of plane and spherical trigonom-etry, and for further developments of the theory, the reader isreferred to papers by the present writer and by Professor Hunt-ington in the American Mathematical Monthly and elsewhere,and to the text-books2. It is not asserted that all will be easyreading, or that any algebraic devices can wholly remove thedifficulties that are inevitably encountered in dealing with acomplex array of quantitative relations. There is no royal roadto the remoter parts of the subject, for the mathematician or foranybody else. But the student who has mastered the algebraiccontent of the fundamental formulas will have taken an impor-tant step toward acquiring what Lord Bryce calls "the specialskill and knowledge needed to distil from rows of figures therefined spirit of instruction."

êe, for example, D. Jaclcson, The Algebra of Correlation, American Mathematical Monthly,vol. 31 (1924), pp. 110-121; The Trigonometry of Correlation, American Mathematical Month-ly, vol. 31 (1924), pp. 275-280; The Relation of Statistics to Modern Mathematical Research,Science, vol. 69 (1929), pp. 49-54; E. V. Huntington, Mathematics and Statistics, with anelementary account of the correlation coefficient and the correlation ratio, American Mathe-matical Monthly, vol. 26 (1919), pp. 421-435; G. U. Yule, An Introduction to the Theoryof Statistics, already cited.

"INTELLECTUAL IMMORALITIES."Twenty-five kinds of "intellectual immoralities" have been enumerated

by Milton Fairchild, director of the Character Education Institution ofWashington, D. C., in an effort to constitute a verification of plans forhuman welfare. A method to determine those in the individual has beenworked out on scientific lines by the institution. Among these’ ’intelligentimmoralities" are the following:

Carelessness in observations, "sloppy work."Slovenliness in logic, fantastic explanations.Confusing opinions with knowledge.Contentment with "discussion."Wavering interest, flitting attention, attracted by peculiar superficiali-

ties.Opposition to proof of another^s theories because of jealousy.Impatience, unwillingness to proceed step by step through a research.Indulgence in dense verbiage for the sake of appearing superlearned.Popularizing tentative generalizations for the sake of personal publicity.Resort to the authorities, or to sarcasm and ridicule, against data,

arguments and verifications.

Documents

STATISTICAL APPLICATIONS OF ELEMENTARY MATHEMATICS