14
History and Pedagogy of Mathematics 2012 16 July – 20 July, 2012, DCC, Daejeon, Korea. AN EXPERIMENT ON TEACHING THE NORMAL APPROXIMATION TO THE SYMMETRIC BINOMIAL USING DE MOIVRE & NICHOLAS BERNOULLI’S APPROACHES Michael KOURKOULOS, * Constantinos TZANAKIS ** * Department of Education, University of Crete, Greece [email protected], [email protected] ABSTRACT De Moivre (1730, 1733, 1738, 1756) and N. Bernoulli’s (1713) approaches on the approximation of the binomial distribution allows us to identify simple but fundamental conceptual elements that capture well the essential characteristics of the binomial distribution for large number of trials. Using these conceptual elements substantially facilitates the understanding of the binomial distri- bution and its normal approximation. An experimental teaching work that we have designed and implemented based on De Moivre and Bernoulli’s approaches made accessible to our (department of education) students, these nontrivial and complex issues. 1 Introduction The normal distribution (ND) and the Central Limit Theorem (CLT) are key concepts of Statistics, which present important difficulties for the students’ learning. Didactical research points out that usual introductory statistics courses addressed to students considered as potential users of statistics (e.g. students of social sciences, medicine, biology, ...) have very poor learning outcomes concern- ing these subjects (Batanero et al. 2004, Chance et al. 2004, Mathews & Clark 1997, Clark et al. 2003, Garfield & Ben-Zvi 2007). For example, Mathews, Clark and their colleagues (Mathews & Clark 1997, Clark et al. 2003, Garfield & Ben-Zvi 2007, p.377) examined students in four tertiary USA institutions, shortly after they had completed their introductory statistics course with grade A; the large majority of these students could not understand and compose the basic constitutive elements of the CLT, thus most of the examined students have only fragmentary recall of the CLT and very few have a viable understanding of the theorem. Lack of understanding and misunderstandings concerning the CLT and the ND are frequent also among students and scientists that have received statistics education beyond an introductory one (e.g. see Cummins 1991, Wilensky 1997, Crack & Ledoit 2010). This often has important negative consequences even in their professional practice (Barbieri et al. 2009, Brockett 1983, Cummins 1991, Crack & Ledoit 2010). Concerning these concepts, two important defects of the usual statistics courses * First Author

AN EXPERIMENT ON TEACHING THE NORMAL APPROXIMATION …

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: AN EXPERIMENT ON TEACHING THE NORMAL APPROXIMATION …

History and Pedagogy of Mathematics 201216 July – 20 July, 2012, DCC, Daejeon, Korea.

AN EXPERIMENT ON TEACHING THE NORMALAPPROXIMATION TO THE SYMMETRIC BINOMIAL

USING DE MOIVRE & NICHOLAS BERNOULLI’SAPPROACHES

Michael KOURKOULOS, * Constantinos TZANAKIS∗∗

∗ Department of Education, University of Crete, [email protected], [email protected]

ABSTRACT

De Moivre (1730, 1733, 1738, 1756) and N. Bernoulli’s (1713) approaches on the approximationof the binomial distribution allows us to identify simple but fundamental conceptual elements thatcapture well the essential characteristics of the binomial distribution for large number of trials.Using these conceptual elements substantially facilitates the understanding of the binomial distri-bution and its normal approximation. An experimental teaching work that we have designed andimplemented based on De Moivre and Bernoulli’s approaches made accessible to our (departmentof education) students, these nontrivial and complex issues.

1 Introduction

The normal distribution (ND) and the Central Limit Theorem (CLT) are key concepts of Statistics,which present important difficulties for the students’ learning. Didactical research points out thatusual introductory statistics courses addressed to students considered as potential users of statistics(e.g. students of social sciences, medicine, biology, . . .) have very poor learning outcomes concern-ing these subjects (Batanero et al. 2004, Chance et al. 2004, Mathews & Clark 1997, Clark et al. 2003,Garfield & Ben-Zvi 2007). For example, Mathews, Clark and their colleagues (Mathews & Clark 1997,Clark et al. 2003, Garfield & Ben-Zvi 2007, p.377) examined students in four tertiary USA institutions,shortly after they had completed their introductory statistics course with grade A; the large majorityof these students could not understand and compose the basic constitutive elements of the CLT, thusmost of the examined students have only fragmentary recall of the CLT and very few have a viableunderstanding of the theorem.

Lack of understanding and misunderstandings concerning the CLT and the ND are frequent alsoamong students and scientists that have received statistics education beyond an introductory one(e.g. see Cummins 1991, Wilensky 1997, Crack & Ledoit 2010). This often has important negativeconsequences even in their professional practice (Barbieri et al. 2009, Brockett 1983, Cummins 1991,Crack & Ledoit 2010). Concerning these concepts, two important defects of the usual statistics courses

*First Author

Page 2: AN EXPERIMENT ON TEACHING THE NORMAL APPROXIMATION …

762 An Experiment on Teaching the Normal Approximation

addressed to statistics’ users are: (a) The empirical work to be done by the students is insufficientand often inadequate; (b) few (if any) elements of explanations and proofs are given on why the CLTholds and why the ND is an adequate model of the relevant real phenomena mentioned in the course,thus seriously limiting students’ understanding and giving rise to misunderstandings and misuses(Brockett 1983, Cummins 1991, Crack & Ledoit 2010)1.

Didactical research has addressed (a) and underlined the importance of overcoming this defect(e.g. see Batanero et al 2004, Chance et al. 2004, Blanco & Ginovart 2010, Lunsford et al. 2006), but haspaid little attention concerning (b) (but see Wilensky 1997, 2003 and Crack & Ledoit 2010). This paperaims to contribute to this.

2 Τhe normal approximation to the binomial distribution

Τhe normal approximation to the binomial is the simplest case related to the CLT and thus moreadequate to be discussed in an introductory statistics course. Searching in the rich reservoir of thehistory of statistics, we identified an approach on the subject based on De Moivre’s (1730, 1733, 1738,1756) and Nicholas Bernoulli’s (1713) works, which may be used in an introductory statistics course2.Our a priori analysis pointed out that this approach offers an important possibility for better un-derstanding the binomial and the ND, and allows to explain why the normal approximation to thebinomial holds and thus to understand better the fundamental link between these distributions. Fol-lowing this analysis we designed and implemented a relevant experimental teaching in an introduc-tory statistics and probability course to 30 students of the Department of Education of the Universityof Crete (prospective primary school teachers)3. Students worked in pairs and performed guided re-search work (cf. Freudenthal 1991, Legrand 1993, Goos 2004, Stonewater 2005) in which emphasis isgiven in students investigation work. Moreover, students had not only to work out problems given bythe teacher, but also to get involved in forming the research questions, and gradually pose their ownresearch questions and problems (closed and open questions, conjectures etc).

Below we present the approach used concerning the normal approximation to the symmetric bi-nomial, as well as, some main results of this teaching.4

Prior to the teaching of the normal approximation to the symmetric binomial the teacher haddiscussed with students the formula of the binomial distribution5

1Moreover, Wilensky examined social and other scientists and found that the lack in their education of explanationsand legitimization concerning the use of ND in the modeling of real phenomena created important feelings of confusion,discomfort and insecurity to them concerning the ND and its use in the related modeling (Wilensky 1977 and referencestherein).

2For De Moivre and Bernoulli’s works see Montmort 1713 pp. 388–394, De Moivre , 1730, 1738, 1756, Hald 2003, ch16,17.3, ch24, 2007, ch3, Stigler 1986, ch2.

323 had followed a “science” or “technology” orientation in high school and 7 had followed the “human sciences”orientation.

4The presentation is restricted here to the work done on the approximation of the symmetric binomial, on the one hand,because of space limitations and, on the other hand, because almost all main ideas and methods necessary for the normalapproximation to the binomial distribution are already introduced and treated in the case of the symmetric binomial (seealso section 6). Concerning this, it is interesting to note that most of De Moivre’s work on the normal approximation to thebinomial distribution concerned the case of the symmetric binomial with even number of trials. In The Doctrine of Chances(1756) he devoted only a little more than one page (pp. 249–250) in which he stated the extension of his results for thegeneral case of the binomial, considering that they are an easy extension of the results obtained for the symmetric case (DeMoivre, 1756 pp.242-254, Stigler 1986, ch2).

5P (N, k, p) is the probability of having exactly k successes in N trials, p been the probability of success in a single trialand q = 1− p. Eq(1) was first derived for the symmetric binomial (p = q = 1/2) and then it was generalized.

Page 3: AN EXPERIMENT ON TEACHING THE NORMAL APPROXIMATION …

Michael KOURKOULOS & Constantinos TZANAKIS 763

P (N, k, p) =N !

k! · (N − k)!· pk · qN−k (1)

Additionally, the mode, the expected value, the variance and the standard deviation of the binomialdistribution were examined.6

Furthermore, examples of applications involving the binomial distribution were discussed (e.g.chance games, newborns sex, success in examinations, simple insurance models).

The interest of studying the binomial distribution was discussed with the students not only inconnection with practical applications, but also with reference to the relevant argumentation devel-oped by J. Bernoulli (1713) and De Moivre (1730, 1738, 1756) (Hald 2003 ch15, ch16, ch24, Hald 2007,ch2, ch3, Sheynin 1968, Sheynin 2005, Stigler 1986, ch2). In particular, the importance and the his-torical difficulty of calculating probabilities of binomial distributions for large number of trials werediscussed; in particular, De Moivre’s comment (corollary) on pp.234-235 of The Doctrine of Chances,1738, (1756 edition, p.242) where he clearly states that this problem is connected with the evaluationof conclusions that may be drawn from empirical evidences (see also Sheynin 1968).

Then, students with teacher’s guidance searched empirically how characteristic values of the sym-metric binomial, with large even number of trials (N ), vary following the change of the number oftrials; students used a spreadsheet with incorporated binomial function (Excel) to obtain more easilythe needed numerical examples.7

- At first, they found that for large, even numbers of trials, the probability of the modal value ofthe symmetric binomial is approximately proportional to the inverse of the square root of the numberof trials8

P (mN,mN/2,1

2) ≈ 1√

mP (N,N/2,

1

2) (2)

- They observed that for large even N , the same inverse proportionality relation holds approx-imately for the probability of values, deviating the same number of standard deviations from themiddle value. Given this and the previous relation, it was derived and then confirmed experimen-tally that the ratio

R(N, a) =P (N,N/2 + a · 1

2 ·√N, 12)

P (N,N/2, 12)(with a · 1

2·√N integer) (3)

remains approximately constant when a remains constant; i.e. that it is almost independent of N , forlarge N . - Moreover, they found that for large even N , the sum of probability of values between themiddle value and a · 1

2 ·√N is approximately independent of N . Then, upon teacher’s suggestion,

students examined the symmetric binomial with large odd N and found that the three propertiesabove, with adequate adaptation, hold also in this case.

6Measures of central tendency (mode, mean and median) and variation (range, interquartile range, mean absolutedeviation, variance and standard deviation) were discussed in the part of the course on descriptive statistics, which precededthe discussion on the binomial.

7The binomial with p = q = 1/2 and N even, besides of been symmetrical, has one modal value which is equal to itsexpected value and its median. That the values of its centers are equal, facilitates the investigation of its properties; hence,the teacher suggested to the students that it was a good starting point for their investigations.

8In the empirical investigations that led students to this and the two following properties, they used examples wherethe N varies from 50 to 160000; though interesting, this is not presented here, for brevity.

Page 4: AN EXPERIMENT ON TEACHING THE NORMAL APPROXIMATION …

764 An Experiment on Teaching the Normal Approximation

3 The Ratio

3.1 An important element in De Moivre’s work on the binomial distribution is that he considered theratio of the probability at distance d from the center of the distribution to the probability at the center(R1(N, d, p) = P (N,Np+d,p)

P (N,Np,p) ).9 Using the ratio, he reconsidered the probability at distance d from thecenter as P (N,Np+ d, p) = R1(N, d, p) ·P (N,Np, p)10 and this analysis was of key importance for hefinally achieved the normal approximation to the binomial (Hald 2003, ch24).

De Moivre was not the only one who worked on this ratio; Nicholas Bernoulli (1713), DanielBernoulli (1770) and later, in different circumstances, Karl Pearson (1895) are among those who didimportant work on this ratio and the approximation of the binomial distribution (Hald 1984, 2003ch16, 17.3, ch24, Sheynin 1970, Pearson 1895, Stigler 1985, ch10). Moreover, Laplace followed DeMoivre and considered the probability at distance d from the center of the distribution, in a simi-lar way, as the product of the probability at the center and the ratio of these two probabilities, butfor a class of distribution much larger than the binomial, and he used this analysis in the work (1810)where he derived his Central Limit Theorem, a theorem that Stigler (1986 pp136,137) qualifies as amajor generalization of De Moivre’s limit theorem.

The ratio R1(N, d, p) is a conceptual object of key importance for understanding the binomial dis-tribution and its normal approximation; however it is also a complex object, whose characteristicsand behavior is difficult to be understood by the students, especially for large value of N . In this casethe study of the history of probability, in particular of relevant works of De Moivre and of NicholasBernoulli, was of great help since it permitted us to identify conceptual elements that are simple,or at least accessible to the students, and capture the essential characteristics and the structure ofR1(N, d, p); thus, their didactical use may substantially facilitate its understanding. We did a quitedetailed teaching work on R1(N, d, p) using these simple conceptual elements to achieve students’better understanding of the subject and to obtain elements of answer to relevant questions of our di-dactical research. We present, in some length, this teaching work in subsections 3.2-3.4. However, it isuseful to keep in mind that the aforementioned conceptual elements can be presented to the studentsin more or less details than ours, according to the teaching approach and depth of examination ofthe subject sought, which depend on the level of the course, the time available and the mathematicalbackground of the students.

3.2 After the aforementioned empirical investigation, students had found some basic characteris-tics and properties that the symmetric binomial acquires when N is large. However, students foundthese properties only through empirical investigation; hence many of them asked for explanations onwhy these properties hold. The vivid interest of students on this issue created an adequate teachingenvironment to discuss explanations and proofs of these properties.

The teacher proposed them to work first on explaining and proving the second of the three afore-mentioned properties. He remarked, that in order to do so, they had first to carefully examine theratio R1(2m, d) = P (2m,m+d,1/2)

P (2m,m,1/2)11. During the discussion on this subject, the ratio was presented in

9In his work, De Moivre assumed explicitly or implicitly, that in the cases of the binomial distribution that he examined,Np is integer; thus the center of the distribution is both the mode and the expected value.

10This consideration of P (N,Np+d, p) leads to consider the whole distribution as organized around its center, the ratioR1(N, d, p) determines the structure of the distribution, while P (N,Np, p) plays the role of a scale parameter.

112m being the number of trials, m a positive integer and d a positive integer not larger than m; for simplicity

Page 5: AN EXPERIMENT ON TEACHING THE NORMAL APPROXIMATION …

Michael KOURKOULOS & Constantinos TZANAKIS 765

three forms:

(a) R1(2m, d) = m!·m!(m+d)!·(m−d)!

(b) R1(2m, d) = m·(m−1)···(m−(d−1))(m+1)·(m+2)···(m+d)

(c) R1(2m, d) = 1·(1−1/m)···(1−(d−m))/m(1+1/m)·(1+2/m)···(1+d/m)

12

Then the teacher said that De Moivre found that, when 2m is large and d is small compared to 2m,the ratio R1(2m, d) can be approximated by a much simpler fraction in which the sequences of factorsinvolved in R1(2m, d) are substituted by a few factors raised to adequate powers, and asked studentsif they could guess such fractions.

Students discussed on this issue and a common idea that emerged from their discussion was thatthe factors to be used could be middle values of the involved sequences of factors in R1(2m, d), raisedto powers equal to the number of factors in the sequences.

Based on this idea, students worked in pairs to elaborate approximate ratios. They proposed dif-ferent such ratios and after empirical investigation they found several of them that approximated wellR1(2m, d) for d small compares to 2m.13

Among them are:

Rc1(2m, d)(1− (d/2)/m)d−1

(1 + (d/2)/m)d−1 · (1 + d/m)(proposed by one pair),

Rc2(2m, d)(1− (d− 1)/2/m)d

(1 + (d+ 1)/2/m)d(proposed by two pairs),

Rb1(2m, d)(m− (d− 1)/2)d

(m+ (d+ 1)/2)d(proposed by two pairs),

Ra1(2m, d)(m/2 + 1/2)2m

(m/2 + (d+ 1)/2)m+d · (m/2− (d− 1)/2)m−d(proposed by one pair)14.

The empirical tests also gave other interesting information. For example, students observed thatin all the examined examples Rc1(2m, d), Rc2(2m, d), Rb1(2m, d) were greater or equal to R1(2m, d),while Ra1(2m, d) was smaller than R1(2m, d).

Remark 1De Moivre used the following ratio to approximate R1(2m, d)

R1(2m, d) ≈ (m)2m

(m+ d− 1)m+d−1/2 · (m− d+ 1)m−d+1/2 · (m+ d)/m

(Stigler 1984 ch2, Hald 2003, ch24)15. However, De Moivre was not the first with the idea to approx-imate R1(2m, d) with a product of a few factors raised to adequate powers. Nicholas Bernoulli hadworked a hypothesis test on Arbuthnott’s data concerning newborns’ sex ratio and communicated

R1(2m, d, 1/2) is denoted as R1(2m, d).12For (b) and (c) obviously the initial factors m, (m − 1), (m + 1), (m + 2), (1 − 1/m), (1 + 1/m), (1 + 2/m) exist for

d ≥ 2; (b) and (c) were presented in this analytical way in order to be better understood by the students. Additionally, theuse of concrete examples clarified further the meaning of the three formulas, as well as, the cases when d equals 0, 1 and 2.

13Upon teacher’s suggestion in the empirical check, students examined values of d up to five standard deviations.15In Miscellanea Analytica(1730), De Moivre worked with the inverse ratio which approximates 1/R1(2m, d); in 1733 he

inverted the ratio and derived its approximation to e−d2/m(Hald 2003 ch24).

Page 6: AN EXPERIMENT ON TEACHING THE NORMAL APPROXIMATION …

766 An Experiment on Teaching the Normal Approximation

his results to Montmort in a letter that he published in his own book with Bernoulli’s permission(Montmort 1713, pp388-394, Hald 1984, 2003 ch16, 17.3). In the context of this work Bernoulli wantedto find a convenient approximation to the ratio

P (N,Np, p)

P (N,Np− d, p)=

(Nq + d)(Nq + d− 1) · · · (Nq + 1)

(Np− d+ 1)(Np− d+ 2) · · ·Np· p

d

qd

.16 He remarked that when d is small compared to Np and Nq, the quantities fi = Nq+d+1−iNp−d+i (1 ≤

i ≤ d, i ∈ N ) involved in the ratio closely approximate the terms of a geometrical progression andtheir logarithms the terms of an arithmetic one. Based on this remark, for such d, Np and Nq, heapproximates the ratio with ( (Nq+d)(Nq+1)

(Np−d+1)Np · p2

q2)d/2.17,18.

It is worth noting that De Moivre and Bernoulli’s approximating ratios constitute, in exchange withsome loss of accuracy, an important simplification of the complex ratio R1(2m, d) both conceptuallyand computationally.

That De Moivre and Bernoulli conceived and treated the same basic idea (though using differentmethods), namely to approximate the ratio R1(2m, d)(or its inverse) by the product of a few factorsraised to adequate powers, is already a remarkable fact.

After informing students on this idea and despite their limited mathematical background, theysucceeded to find simple such ratios that approximate efficiently R1(2m, d)19.

These results constitute a strong indication that the exploration of this idea of De Moivre andBernoulli in the classroom, allows for a conceptually natural and didactically efficient approach tothe subject.

3.2.1 Approximately equal factors

Two questions that students posed after their findings in 3.2 were how it can be explained that theseratios approximate R1(2m, d) and why the observed inequalities hold. The teacher proposed to ex-amine these questions first for Rc2(2m, d).

To this end, he proposed to reconsider R1(2m, d) in form (c) above, by rearranging the factorsof the numerator in increasing order, and get R1(2m, d) = (1−(d−1)/m)···(1(d−i)/m)···1

(1+1/m)···(1+i/m)···(1+d/m) . The ratios Fi =1−(d−i)/m

1+i/m (i = 1, 2, . . . , d), whose product is R1(2m, d), also equals Fi = m−d+im+i = 1 − d

m+i . Theteacher remarked that the last form makes obvious that if d, and thus i, is small compared to mFi

increases as i increases, but the change of Fi is small. To better appreciate the variation of Fi, heasked students to calculate Fi+1 −Fi and Fd −F1. Students found that Fi+1 −Fi =

d(m+i)·(m+i+1) and

Fd − F1 = d·(d−1)(m+1)·(m+d) . These results, combined with some adequate examples, made more obvious

16In this ratio he assumed that Np, Nq are integers.17Hald (2003, pp266,267) remarks that Bernoulli having this approximation, he could very easily find that it converges to

e−d2/(2Npq) if only d is equal O(√N) and N → ∞. However, since this was not necessary to his work, he did not investigate

the approximation further.18De Moivre’s ratio approximates R1(2m, d) better than Bernoulli’s approximating ratio. Nevertheless Bernoulli’s ap-

proximation is an efficient one that allows for the normal approximation to the binomial (Hald 2003, ch16, 24) Moreover, toobtain his ratio, De Moivre used the polynomial expansion of the logarithm of R1(2m, d) and (James) Bernoulli’s formulafor the sum of powers of integers, hence, a more advanced and complex conceptual apparatus than that needed to deriveBernoulli’s approximation.

19Students approximating ratios are conceptually close to Bernoulli’s; both Bernoulli and the students derived theirapproximating ratios based on compensation reasoning. Of course Bernoulli’s rationale is more elaborated than students’simple reasoning. Still, their simple reasoning allowed them to find approximating ratios of about the same accuracy asBernoulli’s (see also §3.2.2).

Page 7: AN EXPERIMENT ON TEACHING THE NORMAL APPROXIMATION …

Michael KOURKOULOS & Constantinos TZANAKIS 767

to the students that the change of Fi as i increases, is small compared to the magnitude of Fi, when d

is small compared to m.20

Next, he explained to the students, using also examples, that if d is of the order of some standarddeviations (d = a ·

√m/2) and m is large enough, then the Fi become approximately equal; thus in

this case, R1(2m, d) is the product of d approximately equal fractions. Therefore, it is reasonable thatR1(2m, d) could be approximately equal to one of the middle value fractions among the Fi -or someclose value- raised to a power equal to the number of factors that it substitutes. But this was preciselythe case of their Rc2(2m, d), Rc3(2m, d) and Rb1(2m, d).

A student remarked that if the Fi are approximately equal, not only the aforementioned ratios, butalso powers of the other fractions Fi close to the middle could approximate R1(2m, d), an idea that theteacher confirmed. After this remark, another student asked if even the extremes F d

1 , F dd approximate

R1(2m, d). Other students responded that it was not reasonable to use the extreme fractions Fi toapproximate R1(2m, d). However, the teacher remarked that when 2m is very large and d is of theorder of a few standard deviations, the Fi are so close that even the extremes F d

1 , F dd approximate

R1(2m, d), but F di with i having a middle value do this much better. Then he urges students, using

spreadsheets, to do some relevant numerical examples in order to acquire some experience on theaccuracy of these approximations21.

3.2.2Factors approximately of geometric progression

Then the teacher remarked that the variation of the fractions Fi although small, has practical im-portance since, because of it, F d

i with i having a middle value permits a decent approximation ofR1(2m, d) even for 2m not very large, while, if we use the extremes, F d

1 , F dd , the approximation is

poorer and to achieve equally accurate approximations, much larger value of 2m are needed. Thusit is worthy to examine closer the Fi and their variation. After that, he told students that in 1713,Nicholas Bernoulli remarked that the factors involved in R1(2m, d) were approximately terms of ageometric progression and that this is an important property to understand, because it permits abetter understanding of the sequence Fi and of R1(2m, d).

Then, considering R1(2m, d) = (1−(d−1)/m)···(1−(d−i)/m)·1(1+1/m)···(1+i/m)···(1+d/m) he proposed to the students to examine

initially if the factors of the denominator are approximately terms of geometrical progression, byconsidering the ratio of two successive such factors: lden,i = 1+(i+1)/m

1+i/m = m+(i+1=)m+i , which also equals

lden,i1 +1

m+i .The teacher remarked that the last form makes obvious that if d, and thus i, is small compared to m:(a) lden,i is close to 1 and (b) lden,i changes a little; more precisely it decreases a little, as i increases.

However for better appreciating the variation of lden,i he asked students to calculate lden,i+1− lden,i

and lden,d−1 − lden,1. Students found that lden,i+1 − lden,i = − 1(m+i+1)·(m+i) and lden,d−1 − lden,1 =

20The teacher also remarked that when d is small compared to m, Fi+1 − Fi ≈ d/m2 and Fd − F1 ≈ d · (d− 1)/m2.21For example, for 2m = 10000 and d = 100 they found R1(2m, d) = 0, 135344 . . ., F d

1 = 0, 132673 . . ., F dd =

0, 138032 . . ., Rc2(2m, d) = 0, 13535332 . . ., Rc1 = 0, 13535306 . . . Thus, students saw that in such cases F d1 and F d

d in-deed approximate R1(2m, d), but their approximating ratios do this much better. The reason for this is that for 2m largeenough and d = a ·

√m/2, considering a constant, the ratio of the approximation error to the exact value (Rer =

(apr. − exact)/exact) is approximately proportional to√

1/(2m) for F d1 and F d

d , while it is approximately proportional1/(2m) for Rc2(2m, d) and Rc3(2m, d)). However, they also saw that for 2m = 100 and d = 10, R1(2m, d) = 0, 136247 . . .,F d1 = 0, 112755 . . ., F d

d = 0, 161505 . . ., Rc2(2m, d) = 0, 137146 . . ., Rc1(2m, d) = 0, 136920 . . ., so they remarked that insuch cases the approximation of F d

1 and F dd to R1(2m, d) becomes very poor, while their approximating ratios still provide

acceptable approximations, if the accuracy asked is not too high.

Page 8: AN EXPERIMENT ON TEACHING THE NORMAL APPROXIMATION …

768 An Experiment on Teaching the Normal Approximation

− d−2(m+d−1)·(m+1) . These results and some adequate examples made even more obvious to the stu-

dents that the change of lden,i as i increases is very small compared to the magnitude of lden,i, whend is small compared to m.22 Students did a similar examination for the sequence of factors in thenumerator, and then with teacher’s help they examined in the same way the ratio lF,i = Fi+1

Fi=

1−(d−i−1)/m1+(i+1)/m · 1+i/m

1−(d−i)/m = 1 + d(m+i−d)(m+i+1) . This examination work pointed out to the students

that also the sequence of factors in the numerator and the sequence of fractions Fi are approximatelyterms of geometrical progressions. In fact they found that the Fi approximate even better the terms ofa geometrical progression than the factors of the denominator or of the numerator, since lF,i changeless than lden,i and lnum,i

23 as i increases24.Additionally the teacher remarked that Fi are approximately terms of a geometrical progression

with ratio very close to 1 and because of this closeness the Fi are also approximately equal as dis-cussed previously. However, considering that the Fi are approximately terms of such a geometricprogression, takes better in to account their small differences than simply considering that they areapproximately equal.

Then the teacher discussed with the students that for d terms of a geometrical progression, a1, · · · , ad,it holds that a1 · ad = a2 · ad−1 = ai · ad−i+1, with i ∈ N , 1 ≤ i ≤ [d/2]. He also remarked that for dodd, ai · ad−i+1 = a2(d+1)/2.

Then he told students that, sinceFi, whose product constitutesR1(2m, d), are approximately termsof a geometric progression for 2m large and d small compared to 2m, Fi · Fd−i+1 should be approx-imately stable for d fixed. Thus it was interesting to consider R1(2m, d) as constituted by such cou-ples of factors by rearranging Fi. Students did this and found that for d even, R1(2m, d) = (F1 ·Fd) · · · (Fi · Fd−i+1) · · · (Fd/2 · Fd/2+1), whereas for d odd (and d > 1) R1(2m, d) = (F1 · Fd) · · · (Fi ·Fd−i+1) · · · (F(d−1)/2 · F(d+3)/2) · F(d+1)/2.

The teacher remarked that, following the aforementioned property, for d odd Fi ·Fd−i+1 should bealso approximately equal to F 2

(d+1)/2 and thus F d(d+1)/2 should be approximately equal to R1(2m, d).

But F d(d+1)/2 equals their approximating ratios Rc2(2m, d) and Rb1(2m, d), and this was an additional

explanatory element on why these ratios approximate wellR1(2m, d). Moreover, the teacher explainedthat when d is even, although F(d+1)/2 is not among the factors of R1(2m, d), it is a value between Fd/2

and Fd/2+1. Additionally, Fd/2, Fd/2+1 are very close when d is small compared to 2m and thereforeFd/2 · Fd/2+1 ≈ F 2

(d+1)/2. Thus Rc2(2m, d) and Rb1(2m, d) can be used to approximate R1(2m, d) alsoin case d is even.

However the discussion on the products of pairs Fi · Fd−i+1 had an informal character and thecloseness of the values of Fi · Fd−i+1 was not evaluated quantitatively. Thus the teacher proposed tothe students to examine more closely Fi · Fd−i+1 = 1−(d−i)/m

1+i/m · 1−(i−1)/m1+(d−i+1)/m . Students with teacher’s

help transformed this into Fi · Fd−i+1 = 1 − d·(2m+1)(m+(d+1)/2)2−((d+1)/2−i)2

In this form it was easy for thestudents to see that, for the values of i which are no greater than (d + 1)/2 (see above), Fi · Fd−i+1

increases as i increases. This also implies that Fi · Fd−i+1 < F 2(d+1)/2, for i < (d+ 1)/2. Therefore, for

d ≥ 2, R1(2m, d) < F d(d+1)/2 = Rc2(2m, d) = Rb1(2m, d); this answered the relevant students’ question

22The teacher also remarked that when d is small compared to m, lden,i+1 − lden,i ≈ −1/m2 and lden,d−1 − lden,1 ≈−(d− 2)/m2

23lnum,i been the ratio of two successive factors of the numerator of R1(2m, d)24Indeed, students found that when m is large and d is small compared to m, lF,i+1− lF,i ≈ −2d/m3 and lF,d−1− lF,1 ≈

−2d · (d− 2)/m3, which are substantially smaller than the corresponding differences of the lden,i or the lnum,i.

Page 9: AN EXPERIMENT ON TEACHING THE NORMAL APPROXIMATION …

Michael KOURKOULOS & Constantinos TZANAKIS 769

mentioned earlier.Moreover having found the monotonicity of Fi · Fd−i+1, students answered easily teacher’s ques-

tion, which is the smallest Fi · Fd−i+1? From this answer it was easily derived that for d ≥ 2, (F1 ·Fd)

d/2 ≤ R1(2m, d)25.To appreciate the change of Fi · Fd−i+1 as i increases, students worked in the same way as previ-

ously and calculatedDi+1,i = Fi+1·Fd−i−Fi·Fd−i+1 andD(d+1)/2,1 = F 2(d+1)/2−F1·Fd. They found that

for 2m large and d small compared to 2m,Di+1,i approximately equals 4d(d/2−i)m3 andD(d+1)/2,1 approx-

imately equals d·(d−1)2

2·m3 . These results, combined with some adequate numerical examples, pointed outto the students that when d is small compared to 2m, both Di+1,i and D(d+1)/2,1 are very small com-pared to Fi ·Fd−i+1, and in this sense it is reasonable to characterize Fi ·Fd−i+1 involved in R1(2m, d)

as approximately stable.Then he discussed with students that when d = a ·

√m/2 and m is large enough, the distance be-

tween (F1 ·Fd)d/2 andRc2(2m, d) is very small, and, both these (lower and upper) bounds of R1(2m, d)

can be used as its approximations26.

3.3 The following step was to look for the limit of the upper bound, F d(d+1)/2 = Rc2(2m, d), and the

lower bound (F1 ·Fd)d/2 (that we call now on Rc5(2m, d)) when m tends to infinity and d = a·

√m/2.27

Initially the teacher reminded students that when n → +∞, lim(1 + 1n)

n = e, a property thatstudents had been taught in high school. Then he explained that more generally for any sequencexn of real numbers such that xn → +∞, lim(1 + 1

xn

xn) = e. He also explained that, if xn → +∞,lim(1 + c

xn)xn = ec and lim(1 + c+b/xn

xn)xn = ec.

Since d = a·√

m/2, substituting to the numerator ofRc2(2m, d) this one equals (1−a√

m/2−1

2m )a√

m/2 =

((1− a/4−(1/4)/√

m/2√m/2

)√

m/2)a, so its limit for m → +∞ equals (lim(1− a/4−(1/4)/√

m/2√m/2

)√

m/2)a. Putting

xm =√

m/2 and applying the last of the aforementioned properties, it was obtained that this limit ise−a2/4. Working similarly with the denominator of Rc2(2m, d) it was obtained that its limit is ea2/4, sothe limit of Rc2(2m, d) is e−a2/2.

Working similarly with Rc5(2m, d)28 it was obtained that its limit is also e−a2/2.Then, the teacher explained that since Rc2(2m, d) and Rc5(2m, d) have this same limit, for an ε,

even if it is very small, there is N0, depending on ε, such that for each N > N0 the distances of both,Rc2(2m, d) and Rc5(2m, d) from e−a2/2 are smaller than ε. Therefore, for each value of N greater thanN0 and so that d = a ·

√m/2 is an integer, R1(2m, d), which is between Rc2(2m, d) and Rc5(2m, d), has

a distance less than ε from e−a2/2. So, for all such values of N , R1(2m, d) closely approximates e−a2/2.The teacher also remarked that since d = a ·

√m/2 we can also say that R1(2m, d) approximates

e−d2/m.

25So, the lower bound of R1(2m, d) thus derived, is equal to the inverse of the approximating ratio of Nicholas Bernoullifor the case p = q = 1/2 (recall that he approximated 1/R1(N, d, p), see “Remark 1” above).

26During this discussion Di+1,i and D(d+1)/2,1 were compared with Fi+1−Fi and Fd−F1, and the comparison pointedout to the students that although the Fi are approximately equal for d small compared to m, Fi ·Fd−i+1 are even less variant.Thus using (Fi · Fd−i+1)

d/2 and F d(d+1)/2 = Rc2(2m, d) permits to bound R1(2m, d) in a much smaller interval than that

defined by F d1 , F d

d .27The teacher reminded students that d = a ·

√m/2 means that d is a fixed multiple of the standard deviation.He also

explained that in order to look for the limits of Rc3(2m, d) and Rc5(2m, d) it was not necessary to be restricted in the casesthat d is integer.

28Recall that Rc5(2m, d) = ( 1−(d−1)/m1+1/m

· 11+d/m

)d/2.

Page 10: AN EXPERIMENT ON TEACHING THE NORMAL APPROXIMATION …

770 An Experiment on Teaching the Normal Approximation

3.4 Students posed some interesting questions whose treatment permitted to elaborate further onthe subject:

– Can it be proved that the other approximating ratios they have found (see §3.2) have the samelimit when m → +∞?The proofs that worked out in the treatment of this question were similar, or close, to the pre-vious proof and offered students the occasion to better understand the involved proof process.

– In cases d = a ·√

m/2 is not an integer, can we replace it with an integer close to it (e.g. d =

[a ·√m/2])? And if we do so, does the limit of R1(2m, d) remain the same? Working on this

question with the students it was found that when m → +∞, the limit of R1(2m, [a ·√

m/2]) ise−a2/2 as well.

– If we consider d to be constant (d = c) instead of being d = a ·√

m/2, what is the limit ofR1(2m, d)? If we consider d to be a constant fraction of the number of trials (d = a · 2m), what isthe limit of R1(2m, d)?

Working on the first question, students found that whenm → +∞ and d = c, the limit ofR1(2m, d)

is 1. The teacher explained that this also means that if m is large enough and d is constant, then thedifference of P (2m,m, 1/2) and P (2m,m+ d, 1/2) becomes very small compared to their magnitude.

Working on the second question, students found that when m → +∞ and d = a · 2m, the limitof R1(2m, d) is 0. They also found that in this case the limit of e−d2/m = 0. However, upon teacher’ssuggestion, the limit of the ratio R1(2m, d)/e−d2/m when m → +∞ was examined with the students,and it was found that in this case(d = a ·2m) this limit is 0, while in all the previously examined casesthis limit is 1. This result and the associated discussion helped students significantly to understandan aspect of the normal approximation to the binomial which is subject to frequent and importantmisunderstandings29.

Remark 2As already mentioned, the ratio R1(N, d) is an essential conceptual object for understanding the

binomial distribution and the normal approximation to the binomial; however, for the students it isalso complex and difficult to understand. In this case the study of the relevant history of probabilityand statistics is of great help since it permits to identify simple conceptual elements that capturethe essential characteristics and the structure of R1(N, d) and, thus, their didactical use substantiallyfacilitates its understanding. The first such conceptual element is that, for d small compared to N ,R1(N, d) is the product of d approximately equal fractions (Fi). Since the small differences betweenthese fractions still have practical importance (see 3.2.1, 3.2.2), a second conceptual element is neededfor a deeper understanding of these differences and of the structure of the ratio, namely that the Fi

are approximately terms of a geometric progression, where the ratio of this progression is very closeto 1.

Because these two conceptual elements capture the essential characteristics of R1(N, d), even ifthey are presented to the students in a less detailed way than the one presented here, still, importantresults can be easily derived. For example, powers of simple ratios can be found that approximate wellR1(N, d), some of which are lower or upper bounds of R1(N, d). Moreover, it can be proved that they

29On these misunderstandings see e.g. Brockett 1983, Cummins 1991.

Page 11: AN EXPERIMENT ON TEACHING THE NORMAL APPROXIMATION …

Michael KOURKOULOS & Constantinos TZANAKIS 771

are such bounds, provided that the monotonicity of the factors of R1(N, d) is also considered. Then,to obtain the limit values of R1(N, d), for N → +∞ and d equal O(

√N), is matter of a few simple

exercises on limit of sequences.A more detailed examination of these two conceptual elements, as the one presented here, offers

further quantitative information on the smallness of the differences of the factors of R1(N, d) com-pared to their size and thus leads to a better understanding of them.

Further examination involving these two conceptual elements naturally leads to the quantificationof the approximation errors of R1(N, d) and to the determination of its rate of convergence. Theseimportant issues were only touched upon, at the empirical level, in the teaching work examined here,since it was an introductory teaching on the subject. However, the treatment of these issues can bebased on, and is a natural prolongation of, the teaching work presented here. In fact, analysis of theratioR1(N, d) based on the two aforementioned conceptual elements can be used in different teachingapproaches whose depth of examination of the subject depends on the level of the course, on the timeavailable and on the mathematical background of the students.

4 Approximating the middle term of the Symmetric Binomial

De Moivre, in his Miscellanea analytica (1730 pp. 173–174) gives a short and elegant proof on the ap-proximation of the middle term of the symmetric binomial, based on Wallis theorem on the approxi-mation of π30(Hald 2003, ch24).

Of course, the full understanding of this approach requires the understanding of the proof of Wal-lis theorem, which we considered to be too hard for our students. Nevertheless, using Wallis product,it is simple to prove an essential element of this approximation, namely that for large numbers oftrials (2m), P (2m,m, 1/2) is approximately inversely proportional to

√2m (which is precisely one of

the properties that students have found earlier empirically -see section 2).This proof was discussed with the students in the following way:

Initially the teacher explained that, after adequate simplifications,P (2m,m,1

2) =

1 · 3 · 5 · · · (2m− 1)

2 · 4 · 6 · · · 2m,

so ,P 2(2m,m,1

2) =

1 · 32 · 52 · · · (2m− 1)2

22 · 42 · · · (2m)2, which can be also written as 1

2· 32

2 · 4· 52

4 · 6· 62

6 · 8· · · (2m− 1)2

(2m− 2) · (2m)·

1

2m.

Thus Bm = (2m) · P 2(2m,m,1

2) =

1

2· 32

2 · 4· 52

4 · 6· 62

6 · 8· · · (2m− 1)2

(2m− 2) · (2m).

Since Bm+1

Bm= (2m+1)2

(2m)·(2m+2) > 1, Bm increases when m increase.We can also write that Bm = 1·3

22· 3·542

· 5·762

· · · (2m−3)·(2m−1)(2m−2)2

· (2m−1)(2m) , in this form all the involved fractions

are smaller than 1, so it is obvious that Bm < 1. Since Bm is an increasing and bounded sequence, itslimit for m → +∞ is a constant C. So, when m → +∞, lim

√Bm = lim

√2m · P (2m,m, 1/2) =

√C.

This means that when m is large√2m ·P (2m,m, 1/2) is very close to

√C and thus it remains approx-

imately constant. This justifies that for large value of m, P (2m,m, 1/2) is approximately inverselyproportional to

√2m.

30Earlier, in 1721, De Moivre had realized a different, more involved approach on the subject and succeeded to find theapproximation sought. However, he determined the constant involved in the approximation only approximately. Around1725 he informed Stirling on this problem. Stirling’s answer resolved this issue and, later, De Moivre (1738, p. 236) remarkedthat his answer “has spread a singular Elegancy in the Solution” (of the approximation) (Hald 2003, ch24, Stigler 1986 ch2).

Page 12: AN EXPERIMENT ON TEACHING THE NORMAL APPROXIMATION …

772 An Experiment on Teaching the Normal Approximation

At this stage, the demanding problem of determining the value of√C remained; in the context

of the aforementioned simple proof, a few further steps on this were discussed with the students:Em = 2m

2m−1Bm is a decreasing sequence and its limit, when m → +∞, is equal to the limit of Bm,

thus Em > C > Bm. So√

(2m)2

2m−1 · P (2m,m, 1/2) >√C >

√2m · P (2m,m, 1/2) . Students based on

this result and using the function of the binomial distribution of Excel achieved to find the first fourdecimal digits of

√C (0, 7978).

Then the teacher informed students that Wallis, in the mid-seventeenth century, had found thatthe limit of 1/Em equals π/2, so

√C =

√2/π. This surprised students, who associated π with the

cycle and could not see how it was involved in the examined probability approximation. Howeverthis question was not answered in the context of this course31.

5 Sum of probabilities

Later, the approximation of∑

P (2m,m+d, 1/2), with d ∈ N and 0 ≤ d ≤ b·√

m/2 = Dwas discussed.This sum was transformed in the form

∑P (2m,m, 1/2) · R1(2m, d) = P (2m,m, 1/2) ·

∑R1(2m, d).

Then, using the result of section 3.3 and the graphical representations of the involved sums, theteacher explained that the ratio

∑R1(2m,d)∑e−d2/m

approximates 1 as m → +∞. Then, using graphical repre-

sentations, he explained that∑

e−d2/m∫D0 e−x2/mdx

approximates 1 as m → +∞. Thus∑

R1(2m,d)∫D0 e−x2/mdx

approximatesalso 1 as m → +∞. Although simple and understood by the students, this explanation was not rig-orous, since for the proof, properties of uniform convergence are needed, otherwise it becomes quitecomplicated. However this concept was unknown to our students.Then using the result of section 4 it was derived that P (2m,m,1/2)

∑R1(2m,d)

P (2m,m,1/2)∫D0 e−x2/mdx

has the same limit withP (2m,m,1/2)

∑R1(2m,d)

2√2π·2m

∫D0 e−x2/mdx

, which is 1. This means that for large enough m,∑

P (2m,m+d, 1/2) is approx-

imately equal to 2√2π·2m

∫ D0 e−x2/mdx.

6 Subsequent work

Subsequently, the teacher discussed with the students the normal approximation to the binomial dis-tribution for the case of (i) the symmetric binomial with an odd number of trials (ii) the non-symmetricbinomial (probability of success p ̸= 1/2) with Np integer and (iii) the non symmetric binomial withNp non-integer.

Case (i) was a simple extension of what was discussed previously.For (ii), the approximation of the ratio R1(N, d, p) was based on simple variations of the ideas and

methods already discussed for the case of R1(2m, d, 1/2). So, the subject was treated as a sequenceof exercises, which students formulated, discussing with the teacher and then worked out, with hishelp whenever needed. This approach permitted them to better assimilate the conceptual elementsintroduced for the case of R1(2m, d, 1/2) and extended the relevant methods. For the approximationof P (N,Np, p) Stirling’s formula was needed and a relevant proof was discussed with the students.

31Some students expressed a vivid interest on this question, and looked for the proof in mathematics textbooks and theinternet. The teacher helped them by explaining the proofs they found, however even with this help most of them foundthese proofs hard to understand. The issue of a more accessible approach to this subject remains; an interesting questionfor further didactical investigation.

Page 13: AN EXPERIMENT ON TEACHING THE NORMAL APPROXIMATION …

Michael KOURKOULOS & Constantinos TZANAKIS 773

This proof is significantly simpler than those usually presented in mathematics textbooks, since inthis proof it is used as a lemma the approximation of P (2m,m, 1/2) discussed in section 4.32

Case (iii) was discussed, in a non-rigorous way for reasons of saving teaching time, and it wasderived that the normal approximation holds also in this case for N large enough.

Then applications using the normal approximation to the binomial were worked out with the stu-dents. Among these applications were problems derived from the problem discussed by NicholasBernoulli in his letter to Montmort (see Remark 1 and references there in). However, for these prob-lems actual data, and not Arbuthnott’s data, were used.

7 Final Remarks

The normal approximation to the binomial distribution is the easiest case related to the Central LimitTheorem; still it is a complex subject posing important difficulties for the students. In this case thedidactically oriented study of the relevant history of probabilities and statistics was of great help sinceit permitted us to identify simple conceptual elements that capture well essential characteristics of thebinomial distribution for large number of trials. The didactical use of these conceptual elements notonly facilitates students’ understanding of final results of the normal approximation to the binomial,but also makes accessible to them explanations, justifications and even proofs of properties and resultsconcerning this approximation, otherwise difficult within the usual approaches.

Moreover, history offers vivid material on the questions and problems that leaded to the emer-gence and posing of the problem of the approximation of the binomial distribution, as well as, inter-esting application problems. The discussion of such material stimulates students’ interest and permitsthe understanding of elements concerning the importance of the subject for scientific and practicallife.

REFERENCES

– Barbieri,A., Dubikovsky,V., Gladkevich,A., Goldberg,L.R. et Hayes,M.Y., (2009) “Central Limits and Finan-cial Risk” (available at http://ssrn.com/ ) SSRN-id1404089.pdf

– Batanero,C., Tauber,L.M., Sanchez,V. (2004) “Students’ Reasoning about the Normal Distribution”, in D.Ben-Zvi & G.Garfield (eds), The Challenge of Developing Statistical Literacy, Reasoning and Thinking (pp.257–276),Dordrecht: Kluwer.

– Blanco,M. & Ginovart,M. (2010) “How to introduce historically the normal distribution in engineering edu-cation: a classroom experiment”, IJMEST 41(1), 19-–30.

– Brockett,P. (1983) “On the Misuse of the Central Limit Theorem in Some Risk Calculations” The Journal ofRisk and Insurance, 50(4), 727–731

– Chance,B., delMas,R., Garfield,J. (2004) “Reasoning about Sampling Distributions ”, in D.Ben-Zvi & G.Garfield(eds), The Challenge of Developing Statistical Literacy, Reasoning and Thinking (pp. 295–318), Dordrecht: Kluwer

– Clark,J., Karuat,G., Mathews,D. & Wimbish,J. (2003) The Fundamental Theorem of Statistics: Classifying Studentunderstanding of basic statistical concepts, unpublished paper, http://www1.hollins.edu/faculty/clarkjm/stat2c.pdf

32This proof and the relevant discussion with the students are not presented here for the sake of brevity.

Page 14: AN EXPERIMENT ON TEACHING THE NORMAL APPROXIMATION …

774 An Experiment on Teaching the Normal Approximation

– Cummins,J.D. (1991) “Statistical and Financial Models of Insurance Pricing and the Insurance Firm”, TheJournal of Risk and Insurance 58(2), 261–302

– Crack,F. T. & Ledoit,O. (2010) “Central Limit Theorems when data are dependent: addressing the pedagog-ical gaps”, Journal of Financial Education, 36(spring/summer), 38–60

– De Moivre,A., (1730) Miscellanea analytica de seriebus et quadraturis London : J. Tonson & J. Watts. (Reprinted:Gale Ecco, Print Editions, 2010)

– De Moivre,A., 1738, 1756, The Doctrine of Chances, 2nd ed London: Woodfall, 3nd ed London: Millar, Reprinted1967, NY Chelsea

– Garfield,J. & Ben-Zvi,D. (2007) “How Students Learn Statistics Revisited”, International Statistical Review75(3), 372–-396

– Goos,Μ. (2004) “Learning Mathematics in a Classroom Community of Inquiry”, JRME 35(4), 258-–291

– Freudenthal,H., 1991, Revisiting mathematics education – China lectures, Dordrecht: Kluwer

– Hald,A. (1984), “Nicholas Bernoulli’s theorem” International Statistical Revue, 52, 93–99

– Hald,A. (2003) ”A History of Probability and Statistics and their applications before 1750” NJ: Wiley

– Hald,A. (2007) A History of parametric statistical inference from Bernoulli to Fisher 1713–1935, Springer

– Legrand,M (1993) “Débat scientifique en cours de mathématiques et spécificité de l’analyse”, Repères, 10,123–158

– Lunsford,M., Rowell,G., Goodson-Espy,T. (2006) “Classroom Research: Assessment of Student Understand-ing of Sampling Distributions of Means and the Central Limit Theorem in Post-Calculus Probability andStatistics Classes”, Journal of Statistics Education 14(3) www.amstat.org/publications/jse/v14n3/lunsford.html

– Mathews,D. & Clark,J. (1997) “Successful Students’ Conceptions of Mean, Standard Deviation and the Cen-tral Limit Theorem”, Paper presented at the Midwest conference on teaching statistics, WI Oshkosh, (http://www1.hollins.edu/faculty/clarkjm/stats1.pdf)

– Montmort,P.R. de (1713) Essay d’Analyse sur les Jeux de Hazard, 2nd ed, Paris Quillau,. Reprinted 1980, NYChelsea

– Pearson,K. (1895) “Contributions to the Mathematical Theory of Evolution. II. Skew Variation in Homoge-neous Material”, Philosophical Transactions of the Royal Society of London (A), Vol. 186, pp. 343–414

– Sheynin,O.B. (1968) “On the early history of the Law of Large Numbers”, Biometrika, 55(3), 459–467

– Sheynin,O.B. (1970) “Daniel Bernoulli on the normal law”, Biometrika, 57(1), 199–202

– Stigler,S.M (1986) The history of statistics: the measurement of uncertainty before 1900, Harvard University Press

– Stonewater,J. (2005) “Inquiry Teaching and Learning: The Best Math Class Study”, School Science and Mathe-matics, 105(1) 36–47

– Wilensky,U., (1997) “What is normal anyway? Therapy for epistemological anxiety” Educational Studies inMathematics, 33, 171–202

– Wilensky,U. (2003) “Statistical mechanics for secondary school: The GasLab multi-agent modeling toolkit”IJCML 8, 1-41