62
Basic Statistics

Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who

Embed Size (px)

Citation preview

Page 1: Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who

Basic Statistics

Page 2: Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who

“I always find that statistics are hard to swallow and impossible to digest.

The only one I can remember is that if all the people who go to sleep in church were laid end to end they would be a lot more comfortable.”

[Mrs Robert A Taft]

Page 3: Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who

“Data! Data! Data!”he cried impatiently.

“I can’t make bricks without clay”

[Sherlock Holmes]

Page 4: Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who

Qualitative

a) Nominal data

(dead/alive, blood group O,A,B,AB)

b) Ordered categorical/ranked data

(mild/moderate/severe)

Page 5: Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who

Quantitative

a) Numerical discrete

(no. of deaths in a hospital per year)

b) Numerical continuous

(age, weight, blood pressure)

Page 6: Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who

Presenting data

• Graphs

• Summary statistics

• Tables

Page 7: Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who

Graphical methods

•Piechart

•Barchart

•Histogram

•Scattergram

Page 8: Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who

Pie chart

Self-reported pain

extreme pain

moderate pain

no pain

Page 9: Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who

Bar chart

self-reported pain

extreme painmoderate painno pain

No

. o

f su

bje

cts

5000

4000

3000

2000

1000

0

Page 10: Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who

Age

65.060.055.050.045.040.035.030.025.020.0

50

40

30

20

10

0

Histogram

Page 11: Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who

Boxplot

36004465N =

Gender

malefemale

Ag

e (

yea

rs)

100

80

60

40

20

0

Page 12: Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who

Error bar plot

2818714785N =

mobility

severe problemsome problemsno problem

95

% C

I h

ea

lth s

tatu

s sc

ore

400

300

200

100

0

Page 13: Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who

Scattergram of creatinine vs. digoxin

Digoxin

120100806040200

Cre

atin

ine

140

120

100

80

60

40

20

0

Scattergram

Page 14: Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who

Graph Example

Page 15: Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who

SF36 sub-scale

General

Pain

Vitality

Mental

Role emo

Role phys

Soc fun

Phys fun

SF

36

sco

re

100

80

60

40

20

0

Not ill

Long term ill

Graph

Page 16: Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who

SF36 sub-scale

General

Pain

Vitality

Mental

Role emo

Role phys

Soc fun

Phys fun

SF

36

sco

re

100

80

60

40

20

0

Not ill

Long term ill

Solution

Page 17: Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who

Summary statistics

Qualitative data

• Percentages

• Numbers

Page 18: Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who

Secondary prevention of coronary heart disease

Respondents(n=1343)

Non-respondents(n=578)

Male 58% (782) 54% (314)

Urban Practice 54% (720) 57% (331)

Practice size:

< 5,000 14% (190) 18% (105)

5,000 – 10,000 39% (523) 41% (238)

> 10,000 47% (630) 41% (235)

Page 19: Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who

Summarizing data example I

Page 20: Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who

Summary StatisticsQuantitative data

• Non-normal

median

range

inter-quartile range• Normal

mean

standard deviation

variance

Page 21: Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who

Boxplot

36004465N =

Gender

malefemale

Ag

e (

yea

rs)

100

80

60

40

20

0

Page 22: Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who

Summary StatisticsNormal data

Approximately 95% of observations lie between the mean plus or minus

2 standard deviations

Page 23: Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who

Age

65.060.055.050.045.040.035.030.025.020.0

50

40

30

20

10

0

Histogram

Page 24: Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who

IgM

3.002.752.502.252.001.751.501.251.00.75.50.250.00

140

120

100

80

60

40

20

0

Histogram of IgM values

Page 25: Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who

How to test for Normality

• Mean = Median

• (mean-2sd, mean+2sd) reasonable range

• -1 < skewness < 1

• -1 < kurtosis < 1

• Histogram shows symmetric bell shape

Page 26: Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who

Checking for Normality

Age Length of stay

Satisfaction score

Mean 66.2 12.1 5.2

Median 67 8 9

SD 8.2 9.0 4.3

Minimum 49 4 1

Maximum 80 36 10

Skewness -0.2 1.8 -2.5

Kurtosis 0.5 1.3 4.6

Page 27: Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who

Secondary prevention of coronary heart disease

Mean (sd)

Respondents

(n=1343)

Non-respondents

(n=578)

Age (years) 66.2 (8.2) 66.6 (8.7)

Time since MI (mths) * 10 (6, 35) 15 (8, 47)

Cholesterol (mmol/l) 6.5 (1.2) 6.6 (1.2)

[* Median (range)]

Page 28: Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who

Summary statistics example II

Page 29: Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who

Natural log transformation

• Can transform +vely skewed data to ‘Normal’ data

• Use transformed data in analysis

• Resulting mean value transformed back (using ex) to give geometric mean

• Present geometric mean and range

Page 30: Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who

Effect of loge transformation

Length of stay

Loge length of stay

Mean 12.1 2.2

Median 8 2.1

SD 9.0 0.5

Minimum 4 1.4

Maximum 36 3.6

Skewness 1.8 0.4

Kurtosis 1.3 0.7

[Geometric mean = e 2.2 = 9.0]

Page 31: Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who

Secondary prevention of coronary heart disease

Mean (sd)

Respondents

(n=1343)

Non-respondents

(n=578)

Age (years) 66.2 (8.2) 66.6 (8.7)

Time since MI (mths) * 10 (6, 35) 15 (8, 47)

Cholesterol (mmol/l)

Length of stay #

6.5 (1.2)

9.0 (4, 36)

6.6 (1.2)

11.2 (6, 83)

[* Median (range), # Geometric mean (range)]

Page 32: Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who

Confidence Interval

“ The estimated mean difference in systolic blood pressure between 100 diabetic and 100 non-diabetic men was 6.0 mmHg

with 95% confidence interval

(1.1mmHg, 10.9mmHg)”

Page 33: Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who

Confidence Interval

• Contains information about the (im)precision of the estimated effect size

• Presents a range of values, on the basis of the sample data, in which the population value for such an effect size may lie

Page 34: Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who

Confidence Interval

95% CI for mean = mean +/- 1.96 SEM90% CI for mean = mean +/- 1.64 SEM

SEM = sd / sqrt(n)

Page 35: Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who

Confidence Interval

• The 95% CI is a range of values which we are 95% confident covers the true population mean

• There is a 5% chance that the ‘true’ mean lies outside the 95% CI

Page 36: Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who

Error bar plot

2818714785N =

mobility

severe problemsome problemsno problem

95

% C

I h

ea

lth s

tatu

s sc

ore

400

300

200

100

0

Page 37: Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who

Confidence Interval Example

Page 38: Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who

Significance/hypothesis tests

Measure strength of evidence provided by the data for or against some proposition of interest

Eg. Is the survival rate after X better than after Y?

Page 39: Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who

Significance/hypothesis tests

Null hypothesis:

“Effects of X and Y are the same”

Alternative hypothesis:

“Effects of X and Y are different”

Page 40: Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who

Significance/hypothesis tests

One-sided :

“X is better than Y”

Two-sided:

“ X and Y have different effects”

Page 41: Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who

P-value

P is the probability of how true is the null hypothesis

Page 42: Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who

P-value

P <= 0.05

• null hypothesis is not true

• there is a difference between X and Y

• result is statistically significant

Page 43: Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who

P-value

P > 0.05

• null hypothesis may be true

• there is probably no difference between X and Y

• result is not statistically significant

Page 44: Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who

P-value

Power of study

• probability of rejecting null hypothesis when false

• increased by increasing sample size

• increased if true difference between treatments is large

Page 45: Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who

P-value

Statistical significance does not imply clinical significance

Page 46: Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who

A statistician is a person whose lifetime ambition is to be wrong

5% of the time

Page 47: Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who

Types of significance tests

Chi-square test:

“28 out of 70 smokers have a cough compared with 5 out of 50 non-smokers

- is there a significant difference?”

[28/70 = 40% compared with 5/50=10%]

Page 48: Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who

Chi-square test result

“P=0.001”

There is a significant relationship between smoking and cough

Page 49: Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who

Types of significance tests

Two-sample t-test:

“Is there a difference in the 24 hour energy expenditure between groups of lean and

obese women?”

Page 50: Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who

Types of significance tests

Mann-Whitney U-test:

“Is there a difference in the nausea score between chemo patients receiving an active anti-emetic treatment and those receiving

placebo?”

Page 51: Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who

Types of significance tests

Paired t-test:

“Is there a difference in the dietary intake of a group of students in the week before

and after Finals?”

Page 52: Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who

Types of significance tests

Wilcoxon matched pairs signed rank test or the Sign test:

“Is there a difference in the units of alcohol consumed by students in the week before

and after finals?”

Page 53: Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who

Significance test example

Page 54: Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who

Correlation

Measures the strength of the relationship between two variables

Page 55: Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who

Scattergram of creatinine vs. digoxin

Digoxin

120100806040200

Cre

atin

ine

140

120

100

80

60

40

20

0

Scattergram

Page 56: Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who

Correlation

Pearson correlation:

• Used for Normally distributed data

• Measures linear relation between variables

Page 57: Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who

Correlation

• r = 0 no relationship

• r = 1 perfect +ve relationship

• r = -1 perfect –ve relationship

Page 58: Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who

Scattergram of creatinine vs. digoxin

Digoxin

120100806040200

Cre

atin

ine

140

120

100

80

60

40

20

0

Scattergram

Page 59: Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who

Correlation

Spearman correlation:

• Used for non-Normally distributed data

• Measures monotonic relationship between variables

Page 60: Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who

Correlation Example

Page 61: Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who

Correlation

Change in IGF-1 (ng/ml)

2001000-100-200

Ch

an

ge

in le

ft-v

en

tric

ula

r m

ass

(g

)

120

100

80

60

40

20

0

-20

-40

placebo

rhGH

Page 62: Basic Statistics. “I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who

“The government are very keen on amassing statistics.They collect them, add them, raise them to the n’th power, take the cube root and prepare wonderful diagrams.But you must never forget that every one of these figures comes in the first instance from the village watchman, who just puts down what he damn pleases”

[Comment of a judge on the subject of government statistics, 1920]