Replace this box with a picture? Just click : Insert Picture – from file Locate your image Click...

Preview:

Citation preview

Replace this box with a picture?

Just click :

Insert

Picture – from file

Locate your image

Click – insert

Position picture over box

Crop/scale etc.

Select picture, hold down shift key and click on white background then

Click Draw –rder – send to back

The top of your picture should be hidden by the top shape.

Teaching Statistics using the Computer

Emlyn WilliamsCSIRO

Australia

Statistics students in Australia

• Decline in statistics student numbers in Australia• During 2003, 3000 PhDs in Australia• Only 186 PhDs in Mathematical and Statistical

Sciences• Why?

Possible reasons for the decline

• Popularity of Computing Science• Reduced capacity of our Universities to train Maths /

Stats professionals• Type of Statistics being taught in Secondary Schools

– Distribution theory– Probability– Equations / formulas

Some possible directions

• Data mining– Microarrays– Normalization– Multivariate

• Significance testing– Model selection– Resampling– Permutation tests

• Computer-based techniques

Understanding variation

• “..the central problem in management and leadership… is failure to understand the information in variation” Dr W. Edwards Deming

• Concept can be grasped without emphasizing mathematics or formulas

• Hands-on experiments• Book “Statistical Thinking for Managers” by J.A.

John, D. Whitaker and D.G. Johnson

Classroom experiments

• Beads experiment– Many white and red beads – majority white– Samples of 50 taken– Plot the number of red beads over time

• Quincunx experiment– Simulates a process to produce tubing with 50mm diameter– The process involves several steps– An operator is employed to monitor the process

Quincunx board

One sequence of 25 balls

mean=49.6

sd=1.5

Tampering

• Method 2 – Process adjustment. The operator tries to compensate for the results of the previous sample

• Method 3 – Variability reduction. The operator adjusts to try and achieve the same result as the previous sample

Means of 50 balls for 30 sequences:Methods 1 and 3 (Method 2=50.0)

40

42

44

46

48

50

52

54

56

58

1 4 7 10 13 16 19 22 25 28

sequence number

mm

mean1

mean3

Standard deviations of 50 balls for 30 sequences: Methods 1,2 and 3

1

2

3

4

5

6

7

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29

sequence number

mm

sd1

sd2

sd3

Analysis of Designed Experiments

 

Replicate Seedlot Tree  4 X X X X X X X X 3 X X X X X X X X 1 5 X X X X X X X X 2 X X X X X X X X 1 X X X X X X X X  4 X X X X X X X X 1 X X X X X X X X 2 3 X X X X X X X X 5 X X X X X X X X 2 X X X X X X X X  Seedlots 1 Acacia 2 Angophora 3 Casuarina 4 Melaleuca 5 Petalostigma

Analysis of Variance Table

Source of variation d.f. s.s. m.s. v.r. F pr. repl 1 20.301 20.301 7.54

seedlot 4 505.868 126.467 49.94 <.001 Residual 74 199.350 2.694 Total 79 725.529 ***** Tables of means ***** Grand mean 6.12 seedlot Acacia Angophora Casuarina Melaleuca Petalostigma 10.29 7.10 5.51 4.94 2.73

Correct Analysis of Variance Table

Source of variation d.f. s.s. m.s. v.r. F pr. repl stratum 1 20.301 20.301 3.42 repl.plot stratumseedlot 4 505.868 126.467 21.30 0.006Residual 4 23.746 5.936 2.37 repl.plot.tree stratum 70 175.614 2.509 Total 79 725.529 ***** Tables of means ***** Grand mean 6.12 seedlot Acacia Angophora Casuarina Melaleuca Petalostigma 10.29 7.10 5.51 4.94 2.73

A

B

•Treatment

•Technical Replicate

•Dye

•Array

• Treatment• Biological Replicate

• Technical Replicate• Dye

• Array

A BA B

Opening screen of DataPlus

Disk drive Working directorydisplayed in the status bar

Top bar menu

Experimental Title which must be filled

Path of working directory.

Directory structure

File display area

File selection type

Button to go to the next screen

Status bar

To create new sub-directory

Step-by-step instruction

• Choose you experiment design from the list

• Click the Next button

Step-by-step instruction

• Type in the numbers of replicates, plots and trees

• Click the Next button

Treatment screen

Note: plots stratum

Treatment Levels: toInput treatment names

Treatment Layout: toInput the treatment layout

Measurement screen

New Spreadsheet: for entering your measurement using Microsoft Excel

Open Data File: for opening existing data file

Derived variate: for declaring new variates (not measured in the field) e.g volume, basal area

Note: trees stratum

Output Summary screen

Output: for generating summary file

View: to view summary data file using Notepad

Select Tree: for selecting trees in inner plot

Modify: to modify data file

GenStat or SAS: to go to GenStat or SAS screen

Note: plots stratum

Design of Experiments

• Designs mainly used to be constructed using combinatorics or group theory

• The class of Partially Balanced Incomplete Block designs was defined and developed

• These designs did not always focus on quantities of importance to practitioners

• We need to maximize the amount of treatment information in the lowest stratum (where we have most precision)

• The average efficiency factor does this and can be used as an objective function in a computer search algorithm

Two possible arrangements for an incomplete block design with 9 treatments

  Replicate 1 Replicate 2  Block 1 2 3 1 2 3 ____________ ___________  1 4 7 1 2 3 2 5 8 4 5 6 3 6 9 7 8 9    Replicate 1 Replicate 2  Block 1 2 3 1 2 3 ___________ ___________  1 4 7 1 5 4 2 5 8 2 8 6 3 6 9 3 9 7

Software - CycDesigN

• Windows 95 to XP• Visual C++• Resolvable / non-resolvable• Block / row-column• One / two stage• Cyclic / alpha / other• Factorial / nested treatments• t-Latinized / partially-latinized• Unequal block sizes

Latinized row-column design for 20 treatments

Column 1 2 3 4 5

16 17 19 18 13 6 2 4 8 15 Rep 1 12 5 7 10 11 20 9 14 1 3

5 15 1 19 20 4 12 17 3 8 Rep 2 13 18 6 9 7 10 14 11 16 2

15 7 3 17 10 19 1 2 13 12 Rep 3 8 16 5 14 6 11 4 18 20 9

Summary

• In Australia (and probably elsewhere) we need to change some of the teaching practice and content in secondary schools and universities in order to prevent a continuing decline in the number of statistics students

• Development of and education using computer-based statistical techniques may provide an attractive addition to existing curricula

Recommended