38

There are three types of liesswood34/inaug.pdfThere are three types of lies — lies, damned lies and statistics Benjamin Disraeli I British prime minister (Tory). William Gladstone

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: There are three types of liesswood34/inaug.pdfThere are three types of lies — lies, damned lies and statistics Benjamin Disraeli I British prime minister (Tory). William Gladstone
Page 2: There are three types of liesswood34/inaug.pdfThere are three types of lies — lies, damned lies and statistics Benjamin Disraeli I British prime minister (Tory). William Gladstone

There are three types of lies— lies, damned lies and statistics

Page 3: There are three types of liesswood34/inaug.pdfThere are three types of lies — lies, damned lies and statistics Benjamin Disraeli I British prime minister (Tory). William Gladstone

Benjamin Disraeli

I British prime minister (Tory).

Page 4: There are three types of liesswood34/inaug.pdfThere are three types of lies — lies, damned lies and statistics Benjamin Disraeli I British prime minister (Tory). William Gladstone

William Gladstone

I Defeated Disraeli in the general election of 1868.I President of the Royal Statistical Society 1867-1869.

Page 5: There are three types of liesswood34/inaug.pdfThere are three types of lies — lies, damned lies and statistics Benjamin Disraeli I British prime minister (Tory). William Gladstone

Another Disraeli quote

. . . That question is this: Is man an ape or an angel? I,my lord, I am on the side of the angels. I repudiatewith indignation and abhorrence those new fangledtheories.

(Oxford Diocesan Conference 25/11/1864)

Page 6: There are three types of liesswood34/inaug.pdfThere are three types of lies — lies, damned lies and statistics Benjamin Disraeli I British prime minister (Tory). William Gladstone
Page 7: There are three types of liesswood34/inaug.pdfThere are three types of lies — lies, damned lies and statistics Benjamin Disraeli I British prime minister (Tory). William Gladstone
Page 8: There are three types of liesswood34/inaug.pdfThere are three types of lies — lies, damned lies and statistics Benjamin Disraeli I British prime minister (Tory). William Gladstone
Page 9: There are three types of liesswood34/inaug.pdfThere are three types of lies — lies, damned lies and statistics Benjamin Disraeli I British prime minister (Tory). William Gladstone

A rational approach to uncertainty?

1850 1900 1950 2000

−0.

6−

0.2

0.2

Global temperature

year

Tem

pera

ture

ano

mal

y (C

)

1850 1900 1950 2000

250

300

350

400

Atmospheric C02

year

CO

2 (P

PM

)

Page 10: There are three types of liesswood34/inaug.pdfThere are three types of lies — lies, damned lies and statistics Benjamin Disraeli I British prime minister (Tory). William Gladstone

Absorption spectra

Page 11: There are three types of liesswood34/inaug.pdfThere are three types of lies — lies, damned lies and statistics Benjamin Disraeli I British prime minister (Tory). William Gladstone

Is abstraction the problem?

Page 12: There are three types of liesswood34/inaug.pdfThere are three types of lies — lies, damned lies and statistics Benjamin Disraeli I British prime minister (Tory). William Gladstone
Page 13: There are three types of liesswood34/inaug.pdfThere are three types of lies — lies, damned lies and statistics Benjamin Disraeli I British prime minister (Tory). William Gladstone

Baker & Bellis, 1993, Animal Behaviour

count

0.0 0.2 0.4 0.6 0.8 1.0

100

300

500

0.0

0.2

0.4

0.6

0.8

1.0

prop.partner

100 200 300 400 500 40 60 80 100 120 140 160

4080

120

160

time.ipc

Page 14: There are three types of liesswood34/inaug.pdfThere are three types of lies — lies, damned lies and statistics Benjamin Disraeli I British prime minister (Tory). William Gladstone

The Baker and Bellis Analysis

0.0 0.2 0.4 0.6 0.8 1.0

100

300

500

prop.partner

coun

t

0.0 0.2 0.4 0.6 0.8 1.0

−20

00

200

prop.partner

coun

t

0.0 0.2 0.4 0.6 0.8 1.0

−20

00

200

prop.partner

coun

t

0.0 0.2 0.4 0.6 0.8 1.0

−20

00

200

prop.partner

coun

t

40 80 120 160−30

0−

100

100

time.ipc

rsd

40 80 120 160

−20

00

200

time.ipc

coun

t

0.0 0.2 0.4 0.6 0.8 1.0

−20

00

200

prop.partner

rsd

0.0 0.2 0.4 0.6 0.8 1.0

−20

00

200

prop.partner

coun

t

Page 15: There are three types of liesswood34/inaug.pdfThere are three types of lies — lies, damned lies and statistics Benjamin Disraeli I British prime minister (Tory). William Gladstone

Baker and Bellis Conclusions

I At the end of the process they asked whether the apparentstraight line relationships were stronger than couldplausibly have arisen by chance.

I On this basis they concluded that there is evidence forcount declining with proportion of time spent together.

I Time since last copulation seemed not to play a detectablerole.

I But they also collected another dataset . . .

Page 16: There are three types of liesswood34/inaug.pdfThere are three types of lies — lies, damned lies and statistics Benjamin Disraeli I British prime minister (Tory). William Gladstone

count

20 24 28 52 56 60 64 165 175 185 10 15 20 25 30

100

400

2024

28

f.age

f.height

155

170

5258

64

f.weight

m.age

2030

40

165

180

m.height

m.weight

6080

100 300 500

1020

30

155 165 175 20 30 40 60 70 80 90

m.vol

Page 17: There are three types of liesswood34/inaug.pdfThere are three types of lies — lies, damned lies and statistics Benjamin Disraeli I British prime minister (Tory). William Gladstone

More conclusions. . .

I Going through the same process as with the first data set,leads to the conclusion that only female weight is linearlyrelated to count.

I But a careful look at the residuals shows that thisconclusion is completely dependent on a single data pointwith very low sperm count.

I Re-do the analysis without this datum, and only volumematters.

I Actually it’s the same subjects in both datasets, and wecan match up the volumes with the first dataset.

I Repeating the first analysis with volume added, leads tothe dull conclusion that there is only any evidence for alinear relationship between count and volume.

I This result has limited marketing potential.

Page 18: There are three types of liesswood34/inaug.pdfThere are three types of lies — lies, damned lies and statistics Benjamin Disraeli I British prime minister (Tory). William Gladstone

But why straight lines anyway?

count

0.0 0.2 0.4 0.6 0.8 1.0

100

300

500

0.0

0.2

0.4

0.6

0.8

1.0

prop.partner

100 200 300 400 500 40 60 80 100 120 140 160

4080

120

160

time.ipc

Page 19: There are three types of liesswood34/inaug.pdfThere are three types of lies — lies, damned lies and statistics Benjamin Disraeli I British prime minister (Tory). William Gladstone

Smoothing

1. What if the relationship between the residuals and avariable does not look like a straight line?

2. Why not let it be a smooth curve, instead?

0.0 0.2 0.4 0.6 0.8 1.0

−30

0−

100

100

300

prop.partner

s(pr

op.p

artn

er,1

.07)

40 60 80 100 140

−30

0−

100

100

300

time.ipc

s(tim

e.ip

c,1.

77)

Page 20: There are three types of liesswood34/inaug.pdfThere are three types of lies — lies, damned lies and statistics Benjamin Disraeli I British prime minister (Tory). William Gladstone

How to choose the best fit curve?

I Take a bendy strip of wood.I Hook it up to the data points with springs.I The result is a spline

1.5 2.0 2.5 3.0

2.0

2.5

3.0

3.5

4.0

4.5

size

wea

r

Page 21: There are three types of liesswood34/inaug.pdfThere are three types of lies — lies, damned lies and statistics Benjamin Disraeli I British prime minister (Tory). William Gladstone

Splines are controllable

I Changing the flexibility of the spline changes the curve.

1.5 2.0 2.5 3.0

2.0

3.0

4.0

size

wea

r

1.5 2.0 2.5 3.0

2.0

3.0

4.0

size

wea

r

1.5 2.0 2.5 3.0

2.0

3.0

4.0

size

wea

r

1.5 2.0 2.5 3.0

2.0

3.0

4.0

size

wea

r

I Splines can be described mathematically, in a way that iseasy to work with.

Page 22: There are three types of liesswood34/inaug.pdfThere are three types of lies — lies, damned lies and statistics Benjamin Disraeli I British prime minister (Tory). William Gladstone

Smooth surfaces: thin plate splines

I For smooth surfaces there are several optionsI We can replace the bendy strip, with a bendy sheet. . .

x

0.20.4

0.6

0.8z

0.2

0.4

0.6

0.8

linear predictor

0.0

0.2

0.4

0.6

0.8

x

0.20.4

0.6

0.8

z

0.2

0.4

0.6

0.8

linear predictor

0.0

0.2

0.4

0.6

0.8

x

0.20.4

0.6

0.8

z

0.2

0.4

0.6

0.8

linear predictor

0.0

0.2

0.4

0.6

0.8

x

0.20.4

0.6

0.8

z

0.2

0.4

0.6

0.8

linear predictor

0.0

0.2

0.4

0.6

0.8

Page 23: There are three types of liesswood34/inaug.pdfThere are three types of lies — lies, damned lies and statistics Benjamin Disraeli I British prime minister (Tory). William Gladstone

More smooth surfaces: tensor product splinesI Or we can make a surface from a lattice of bendy strips.I The strips should usually have different degrees of

flexibility in the two directions.

xz

f(x,z)

Page 24: There are three types of liesswood34/inaug.pdfThere are three types of lies — lies, damned lies and statistics Benjamin Disraeli I British prime minister (Tory). William Gladstone

Yet more smooth surfaces: soap filmsI For smoothing within oddly shaped areas, it can help to

replace bendy sheets/strips, with a soap film.I This avoids smoothing across the area boundary.

58.0 58.5 59.0 59.5 60.0 60.5

44.0

44.5

45.0

45.5

46.0

46.5

longitude

latit

ude

58.0 58.5 59.0 59.5 60.0 60.544

.044

.545

.045

.546

.046

.5

longitude

latit

ude

58.0 58.5 59.0 59.5 60.0 60.5

44.0

44.5

45.0

45.5

46.0

46.5

longitude

latit

ude

58.0 58.5 59.0 59.5 60.0 60.5

44.0

44.5

45.0

45.5

46.0

46.5

longitude

latit

ude

58.0 58.5 59.0 59.5 60.0 60.5

44.0

44.5

45.0

45.5

46.0

46.5

longitude

latit

ude

58.0 58.5 59.0 59.5 60.0 60.5

44.0

44.5

45.0

45.5

46.0

46.5

longitude

latit

ude

Page 25: There are three types of liesswood34/inaug.pdfThere are three types of lies — lies, damned lies and statistics Benjamin Disraeli I British prime minister (Tory). William Gladstone

How flexible should the spline be?

I Mathematically, all these ways of describing a surface,have the degree of smoothness controlled by just one ortwo numbers . . .

I . . . which must be chosen. How?

0.2 0.4 0.6 0.8 1.0

−2

02

46

8

λ too high

x

y

0.2 0.4 0.6 0.8 1.0

−2

02

46

8

λ about right

x

y

0.2 0.4 0.6 0.8 1.0

−2

02

46

8

λ too low

x

y

Page 26: There are three types of liesswood34/inaug.pdfThere are three types of lies — lies, damned lies and statistics Benjamin Disraeli I British prime minister (Tory). William Gladstone

Cleaning up a brain scan

10 20 30 40 50

5060

7080

medFPQ brain image

Y

X

I Model log FPQ as a smooth surface, represented using athin plate spline.

I Springs attaching the plate to the data have strengthdependent on the height of the plate.

Page 27: There are three types of liesswood34/inaug.pdfThere are three types of lies — lies, damned lies and statistics Benjamin Disraeli I British prime minister (Tory). William Gladstone

Smoothed version

10 20 30 40 50

5060

7080

linear predictor

Y

X

Page 28: There are three types of liesswood34/inaug.pdfThere are three types of lies — lies, damned lies and statistics Benjamin Disraeli I British prime minister (Tory). William Gladstone

Is Cairo getting hotter?

0 1000 2000 3000

5060

7080

90

time (days)

tem

pera

ture

(F

)

I A model . . .I The temperature varies smoothly with day of year.I There might be an additional smooth long term trend in

temperature.I The small scale day to day fluctuations are probably

correlated between one day and the next.

Page 29: There are three types of liesswood34/inaug.pdfThere are three types of lies — lies, damned lies and statistics Benjamin Disraeli I British prime minister (Tory). William Gladstone

Yes it is.

0 100 200 300

−15

−10

−5

05

10

day.of.year

s(da

y.of

.yea

r,8.

52)

0 1000 2000 3000−

1.5

−1.

0−

0.5

0.0

0.5

1.0

1.5

time

s(tim

e,1.

35)

Page 30: There are three types of liesswood34/inaug.pdfThere are three types of lies — lies, damned lies and statistics Benjamin Disraeli I British prime minister (Tory). William Gladstone

Predicting octane rating

1000 1200 1400 1600

0.0

0.2

0.4

0.6

0.8

1.0

1.2

octane = 85.3

wavelength (nm)

log(

1/R

)

I How can we predict the octane rating from the spectrum?

Page 31: There are three types of liesswood34/inaug.pdfThere are three types of lies — lies, damned lies and statistics Benjamin Disraeli I British prime minister (Tory). William Gladstone

Octane prediction model

1000 1200 1400 1600

0.0

0.2

0.4

0.6

0.8

1.0

1.2

octane = 85.3

wavelength (nm)

log(

1/R

)

I Model: octane rating is a constant plus the average valueof the red curve multiplied by the spectrum (blue).

I Need to estimate the red curve.

Page 32: There are three types of liesswood34/inaug.pdfThere are three types of lies — lies, damned lies and statistics Benjamin Disraeli I British prime minister (Tory). William Gladstone

Octane prediction fit

1000 1200 1400 1600

−8

−4

02

46

Estimated function

nm

s(nm

,7.9

):N

IR

84 85 86 87 88 8984

8688

octane

fitted

mea

sure

d

Page 33: There are three types of liesswood34/inaug.pdfThere are three types of lies — lies, damned lies and statistics Benjamin Disraeli I British prime minister (Tory). William Gladstone

Diabetic Retinopathy Study0 10 20 30 40 50

0.0

0.4

0.8

10 15 20 20 30 40 50 0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.4

0.8

ret20

3040

50

bmi

1015

20

gly

0 10 20 30 40 50

020

40

dur

I Model is that probability of retinopathy is related to a sumof smooth curves depending on bmi, gly and dur plussmooth surfaces depending on bmi & gly, gly & dur . . .

Page 34: There are three types of liesswood34/inaug.pdfThere are three types of lies — lies, damned lies and statistics Benjamin Disraeli I British prime minister (Tory). William Gladstone

Diabetic Retinopathy Results

0 10 20 30 40 50

−4

−2

02

46

dur

s(du

r,3.

26)

10 15 20

−4

−2

02

46

glys(

gly,

1)

20 30 40 50

−4

−2

02

46

bmi

s(bm

i,2.6

7)

dur

gly

te(dur,gly,0)

durbm

i

te(dur,bmi,0)

gly

bmi

te(gly,bmi,2.5)

Page 35: There are three types of liesswood34/inaug.pdfThere are three types of lies — lies, damned lies and statistics Benjamin Disraeli I British prime minister (Tory). William Gladstone

Diabetic Retinopathy Results II

bmi

gly

linear predictor

15 20 25 30 35 40 45 50

1015

20

linear predictor

bmi

gly

bmi

gly

linear predictor

red/green are +/− TRUE s.e.

bmi

gly

linear predictor

red/green are +/− TRUE s.e.

bmi

gly

linear predictor

red/green are +/− TRUE s.e.

Page 36: There are three types of liesswood34/inaug.pdfThere are three types of lies — lies, damned lies and statistics Benjamin Disraeli I British prime minister (Tory). William Gladstone

cran.r-project.org

Page 37: There are three types of liesswood34/inaug.pdfThere are three types of lies — lies, damned lies and statistics Benjamin Disraeli I British prime minister (Tory). William Gladstone

Picture Credits

I Gladstone and Disraeli are from the House of Commons web site.I The 1921 Eugenics conference logo is from

en.wikipedia.org/wiki/File:Eugenics congress logo.pngI The Gates of Auschwitz are from oncampus.richmond.edu/academics/education/

projects/webquests/holocaust/images/arbeit macht frei.jpgI Hogarth’s South Sea Bubble can be found at

www.library.hbs.edu/hc/ssb/images/using-top.jpg, but I’ve lost where I found theone shown.

I The absorption spectrum figure is fromwww.te-software.co.nz/blog/augie auer.htm

I Reproductions of Picasso’s Les Demoiselles d’Avignon are available from manysites. The one shown is possibly fromwww.enjoyart.com/library/featured artists/pablopicasso/large/Bmcgaw-P591.jpg

I The cover of Sperm Wars was taken from www.amazon.co.uk.

Page 38: There are three types of liesswood34/inaug.pdfThere are three types of lies — lies, damned lies and statistics Benjamin Disraeli I British prime minister (Tory). William Gladstone

Data Credits

I The Global CO2 and temperature data are fromwww.cru.uea.ac.uk/cru/data/temperature/ and the ScrippsInstitute CO2 research group.

I The Aral Sea CO2 data are from the SeaWifs satellite.I For full credits for the Cairo and Brain Scan data, see R

package gamair.I The octane data are available in R package pls.I The Retinopathy data are available in R package gss.