44
Topic 1 An Introduction to Statistics Dr Luke Kane April 2014 Topic 1: An Introduction to Statistics 1

Statistics for the Health Scientist: Basic Statistics I

Embed Size (px)

DESCRIPTION

An introduction to medical statistics - Part 1. An introduction to statistics, data and variables

Citation preview

Page 1: Statistics for the Health Scientist: Basic Statistics I

1

Topic 1An Introduction to Statistics

Dr Luke KaneApril 2014

Topic 1: An Introduction to Statistics

Page 2: Statistics for the Health Scientist: Basic Statistics I

Topic 1: An Introduction to Statistics 2

OK… Rules – Very serious!

• YOU MUST ASK QUESTIONS • If you dont understand - let's work it out!• Otherwise – no rules

Page 3: Statistics for the Health Scientist: Basic Statistics I

Topic 1: An Introduction to Statistics 3

What is Statistics?

• Statistics is the science of learning from data, and of measuring, controlling, and communicating uncertainty;

• it thereby provides the navigation essential for controlling the course of scientific and societal advances

Page 4: Statistics for the Health Scientist: Basic Statistics I

Topic 1: An Introduction to Statistics 4

Outline

• Describing variables and data• Descriptive statistics

– Tables– Charts– Shapes

Page 5: Statistics for the Health Scientist: Basic Statistics I

Topic 1: An Introduction to Statistics 5

Objectives

• Define variable• Define data• Classify variables in quantitative or categorical• Sub-classify quantitative variables into discrete or

continuous• Sub-classify categorical variables into nominal or

ordinal• Use the type of variable to determine which table and

chart to display it• Understand the normal distribution and other shapes

Page 6: Statistics for the Health Scientist: Basic Statistics I

Topic 1: An Introduction to Statistics 6

What is a Variable?

• A variable is something whose value can vary• Examples (many!):

Page 7: Statistics for the Health Scientist: Basic Statistics I

Topic 1: An Introduction to Statistics 7

What is Data?

• Data are the values you get when you measure variables

• Example:

Page 8: Statistics for the Health Scientist: Basic Statistics I

Topic 1: An Introduction to Statistics 8

Types of Variable

• Lots of different ways of thinking about variables:– Categorical vs. Metric– Continuous vs. Categorical

I like this one:

Quantitative Vs Categorical

Page 9: Statistics for the Health Scientist: Basic Statistics I

Topic 1: An Introduction to Statistics 9

Categorical Variables - "What type?”

• "Categories"• Nominal:

– Unordered, order not important – Male or female, dead/alive, Blood group A B AB O

• Ordinal:– Ordered, order is important– type of breast cancer, agree neither agree nor

disagree disagree

Page 10: Statistics for the Health Scientist: Basic Statistics I

Topic 1: An Introduction to Statistics 10

Categorical Variables

– Types of houses– Days of the week– Opinions/viewpoints– Hair colour– Malaria positive or negative

Page 11: Statistics for the Health Scientist: Basic Statistics I

Topic 1: An Introduction to Statistics 11

Quantitative Variables - " How much?"

• Also known as metric• Quantitative variables can be:• Continuous:

– The variables come from measuring– Have units of measurement– Good for analysis

• Discrete:– The variables come from counting– The values are usually integer (whole number)

Page 12: Statistics for the Health Scientist: Basic Statistics I

Topic 1: An Introduction to Statistics 12

Quantitative Variables

• Weight, Height • Number of cigarettes per day • Blood pressure• How many malaria parasites in the blood• Number of workers with malaria

Page 13: Statistics for the Health Scientist: Basic Statistics I

Topic 1: An Introduction to Statistics 13

Variables: A Summary TableQuantitative

Continuous Discrete

Blood pressure, height, weight, age Number of children Number of asthma attacks per child

Categorical Ordinal Nominal

Grade of breast cancerBetter, same, worseDisagree, neutral, agree

Sex – male or femaleAlive or deadBlood group

Page 14: Statistics for the Health Scientist: Basic Statistics I

Topic 1: An Introduction to Statistics 14

Variables – more!• It is easy to summarise categorical variables• You can convert quantitative variables into categorical variables

– For example, in diabetes it is dangerous when sugar is very low – So a blood sugar of 1.6mmol/l is the quantitative measurement– You can place this in a low, normal or high range (which makes it a categorical

variable)– 1.6 is low - patient needs treatment (sugar!)

• Continuous variables allow better analysis as they are the real numbers • Tests have more power if used on continuous variables• So it is better to use continuous variables for statistical analysis• Better to use categorical variables for summarising results and

presentation

Page 15: Statistics for the Health Scientist: Basic Statistics I

Topic 1: An Introduction to Statistics 15

Descriptive Statistics

• This is taking the raw data and consolidating it into a table or chart

Page 16: Statistics for the Health Scientist: Basic Statistics I

Topic 1: An Introduction to Statistics 16

Descriptive Statistics

• Frequency tables• Relative frequency tables• Grouping the data• Open ended groups• Cumulative frequency• Cross tabulation

Page 17: Statistics for the Health Scientist: Basic Statistics I

Topic 1: An Introduction to Statistics 17

Frequency Tables

• Nominal Categorical Variables• Start with largest• Tell reader what the total number is (n = X)

Category Hair Colour

Frequency (number of adults)n=116

Black 85

Brown 17

Blonde 8

Red / Ginger 4

Other (e.g. blue, green) 2

Page 18: Statistics for the Health Scientist: Basic Statistics I

Topic 1: An Introduction to Statistics 18

Relative FrequencyCategory Hair Colour

Frequency (number of adults)n=116

Relative Frequency (%)

Black 85 73.3

Brown 17 14.7

Blonde 8 6.8

Red / Ginger 4 3.5

Other (e.g. blue, green) 2 1.7

Page 19: Statistics for the Health Scientist: Basic Statistics I

Topic 1: An Introduction to Statistics 19

Ordinal Categorical Variables

• Hair colour is a nominal categorical variable so does not need to be ordered.

• Satisfaction is an ordinal categorical variable so you can make a frequency table but you must put the categories in order.

• For example: How would you put these in order?• Unsatisfied• Very satisfied• Satisfied• Extremely unsatisfied

Page 20: Statistics for the Health Scientist: Basic Statistics I

Topic 1: An Introduction to Statistics 20

Continuous Data

• Not practical to display all of the raw data

• The table is too big even with the small sample size.

• Easier to group the data• Then make a frequency table

Pig Number (n = 21)

Weight of pigs at market / Kg

1 120

2 210

3 110

4 209

5 205

6 164

7 145

8 177

9 185

10 184

11 180

12 183

13 182

14 190

15 198

16 134

17 140

18 156

19 154

20 201

21 200

Page 21: Statistics for the Health Scientist: Basic Statistics I

Topic 1: An Introduction to Statistics 21

Grouped Frequency Table

• So if we group the data into groups of equal width you get a grouped frequency distribution

Weight of pigs at market / kg Number of pigs (Frequency) n =21

110-130 2

131-150 3

151 - 170 3

171- 190 7

191-210 6

Page 22: Statistics for the Health Scientist: Basic Statistics I

Topic 1: An Introduction to Statistics 22

Outliers

• This is fine if all the data is close together – i.e. if all the pigs weigh about the

same – But what do you do if there are

some giant pigs and some tiny pigs? – Like if you added two extra pigs to

our data set:• a pig weighing 54kg • one big one weighing 327kg

Weight of pigs at market / kg

Number of pigs (Frequency) n =21

51-70 1

71-90 0

91-110 0

111-130 2

131-150 3

151 – 170 3

171- 190 7

191-210 0

211-230 0

231-250 0

251-270 0

271 – 290 0

291 – 310 0

311 – 330 1

Page 23: Statistics for the Health Scientist: Basic Statistics I

Topic 1: An Introduction to Statistics 23

Open Ended Groups

• The big and small pig are called Outliers. • To make things easier you can use open ended

groups at the top and bottom Weight of pigs at market / kg Number of pigs (Frequency) n =21

≤110 1

111-130 2

131-150 3

151 - 170 3

171- 190 7

191-210 6

≥211 1

Page 24: Statistics for the Health Scientist: Basic Statistics I

Topic 1: An Introduction to Statistics 24

Symbols

• > More than• < Less than• ≥ Equal to or more than• ≤ Equal to or more than

Page 25: Statistics for the Health Scientist: Basic Statistics I

Topic 1: An Introduction to Statistics 25

Cumulative Frequency

• Adding up (cumulate) the frequencies as you go along

• Enables you to make a nice chart - see later• For example, the lengths of snakes below

Length of snake / cm Frequency (number of snakes) n = 61

Cumulative frequency of snakes

<30 10 10

31-60 17 27 (=10+17)

61-90 19 46 (=10+17+19)

91-120 12 58 (=10+17+19+12)

>121 3 61 (10+17=+19+12+3)

Page 26: Statistics for the Health Scientist: Basic Statistics I

Topic 1: An Introduction to Statistics 26

Cross-Tabulation

• Everything so far has been a table of a single variable

• Sometimes you want to look at how two variables influence one sample

• Crosstab - is the combination of two variables

Page 27: Statistics for the Health Scientist: Basic Statistics I

Topic 1: An Introduction to Statistics 27

Cross Tab Example

• Does drinking alcohol affect the number of accidents people have on motorbikes?

• What are the two variables?• The two variables are accidents and drinking• If there was a big party and you breathalysed

500 people leaving, you could determine if they were above or below the drink-drive limit. You could then ask them the next day if there was an accident on their way home.

Page 28: Statistics for the Health Scientist: Basic Statistics I

Topic 1: An Introduction to Statistics 28

Cross Tab Example

  

Accident on way home?

  

 Above the alcohol limit?

  Yes No

Yes 40 2

No 116 342

Total 156 344

Page 29: Statistics for the Health Scientist: Basic Statistics I

Topic 1: An Introduction to Statistics 29

Cross Tab Example• You can then convert this into percentages by adding up the

columns and rows.

• It is very easy to see that over 99% of the sober drivers did not have accidents and more than 1 in 4 of the drunk drivers had accidents

• Dont drink and drive!

  

Accident on way home?

  

 Above the alcohol limit?

  Yes No

Yes 25.6% 0.6%

No 74.4% 99.4%

Page 30: Statistics for the Health Scientist: Basic Statistics I

Topic 1: An Introduction to Statistics 30

Charts

• Charts are a good way of describing data• Categorical data is easily plotted as:

– Pie chart– Bar chart– Clustered bar chart– Stacked bar chart

Page 31: Statistics for the Health Scientist: Basic Statistics I

Topic 1: An Introduction to Statistics 31

Pie Charts

• Good: for categorical nominal data, easy to make, easy to understand

• Bad: Can only use one variable - need separate pie chart for each variable, confusing if many categories used

Page 32: Statistics for the Health Scientist: Basic Statistics I

Topic 1: An Introduction to Statistics 32

Simple Bar Chart

• Good: for categorical nominal data, easy to make, easy to understand

• Bad: only one variable

• Note must have spaces between bars, equal bars

Page 33: Statistics for the Health Scientist: Basic Statistics I

Topic 1: An Introduction to Statistics 33

Clustered Bar Chart

• Very similar to a simple bar chart but allows you to compare sub-groups, e.g. boys and girls

• Good for comparing category sizes between groups, e.g. blonde boys and blonde girls

Page 34: Statistics for the Health Scientist: Basic Statistics I

Topic 1: An Introduction to Statistics 34

Stacked Bar Chart

• Good for comparing total number of subjects in each group, e.g. all boys and all girls

Page 35: Statistics for the Health Scientist: Basic Statistics I

Topic 1: An Introduction to Statistics 35

Quantitative Charts

• Bar Charts can also be used to graph discrete quantitative data

• But for continuous quantitative data it is better to use a histogram

• Cumulative quantitative data can be charted with a step chart or a frequency curve

Page 36: Statistics for the Health Scientist: Basic Statistics I

Topic 1: An Introduction to Statistics 36

Histograms

• Frequency Histogram• Uses data that is

grouped together to save space

• There are no gaps between the bars - it is a continuous variable

• Bad: only use one variable at a time

Page 37: Statistics for the Health Scientist: Basic Statistics I

Topic 1: An Introduction to Statistics 37

Frequency Curve

• For cumulative data you can make a frequency curve

• Continuous quantitative data is assumed to have a smooth continuum of values

• This should make a nice, smooth curve - the cumulative frequency curve

• This is also known as an ogive

Page 38: Statistics for the Health Scientist: Basic Statistics I

Topic 1: An Introduction to Statistics 38

Frequency Curve - Snakes

• So if we take the snakes: Length of snake / cm Frequency (number of

snakes) n = 61Cumulative frequency of snakes

% Cumulative frequency of snakes

<30 10 10 10/61 = 16.4%

31-60 17 27 27/61 = 44.3%

61-90 19 46 75.4%

91-120 12 58 95.1%

>121 3 61 100%

<30 31-60 61-90 91-120 >1210.00%

20.00%

40.00%

60.00%

80.00%

100.00%

120.00%

Page 39: Statistics for the Health Scientist: Basic Statistics I

Topic 1: An Introduction to Statistics 39

Shapes

• OK now we have charts – how do you describe data from the shape of the graph?

• A uniform distribution is evenly distributed– "A normal curve represents perfectly symmetrical

distribution"– Also known as a "bell shape"

• Then you have "skews" to the left or right– Left skews are negatively skewed– Right skews are positively skewed

• Bimodal distributions have two humps

Page 40: Statistics for the Health Scientist: Basic Statistics I

Topic 1: An Introduction to Statistics 40

Normal distribution

Page 41: Statistics for the Health Scientist: Basic Statistics I

Topic 1: An Introduction to Statistics 41

Skew• A measure of the asymmetry

Page 42: Statistics for the Health Scientist: Basic Statistics I

Topic 1: An Introduction to Statistics 42

Bimodal distribution

Page 43: Statistics for the Health Scientist: Basic Statistics I

Topic 1: An Introduction to Statistics 43

Summary so far…

• Types of data and variables• Ways to put this data in tables• Ways to put this data in charts• Ways to examine the shape of the data• Next: TOPIC 2

– Using numbers to summarise the data– Prevalence and Incidence

Page 44: Statistics for the Health Scientist: Basic Statistics I

Topic 1: An Introduction to Statistics 44

References

• This lecture is based on David Bowers “Medical statistics from Scratch: An introduction for health professionals”

• Bowers, D. (2008) Medical Statistics from Scratch: An Introduction for Health Professionals. USA: Wiley-Interscience.