Transcript
Page 1: Statistics for the Health Scientist: Basic Statistics II

1

Topic 2Descriptive Statistics Continued

Dr Luke KaneApril 2014

Topic 2: Descriptive Statistics

Page 2: Statistics for the Health Scientist: Basic Statistics II

Topic 2: Descriptive Statistics 2

Outline

• Descriptive Statistics – continued!• Recap of BIMODAL Distribution (as requested)– Numerical descriptions of data– Transformation– Prevalence and Incidence

Page 3: Statistics for the Health Scientist: Basic Statistics II

Topic 2: Descriptive Statistics 3

Bimodal Distribution

• One peak = UNImodal• Two peaks = Bimodal

• Usually means there is a mix of two distributions– But there are examples:– The size of certain species of ant– Hormone levels– Age of lymphoma incidence

Page 4: Statistics for the Health Scientist: Basic Statistics II

Topic 2: Descriptive Statistics 4

Objectives

• Understand numerical ways of describing data– Including:• Median, mode, mean• Range, interquartile range, standard deviation

• Have a vague understanding of transformation• Calculate prevalence and incidence

Page 5: Statistics for the Health Scientist: Basic Statistics II

Topic 2: Descriptive Statistics 5

Describing data with numbers• Two characteristics of data can be measured with a single

numeric value: – The value around which the data clusters

• Known as a summary measure of location

– The value which measures the degree of which the data has spread out • Known as a summary measure of spread

• Summary measures of location are: – the mode, the median, the mean and percentiles

• Summary measures of spread are: – the range, the standard deviation

Page 6: Statistics for the Health Scientist: Basic Statistics II

Topic 2: Descriptive Statistics 6

Summary Measures of Location

• The value around which most of the data falls• Median, mode, mean• Which one you choose depends on type of

variable

Page 7: Statistics for the Health Scientist: Basic Statistics II

Topic 2: Descriptive Statistics 7

The Mode: Common-ness

• The value which has the highest frequency– i.e. occurs the most often

• A measure of common-nessWeight of pigs at market / kg Number of pigs (Frequency) n =21

≤110 1

111-130 2

131-150 3

151 - 170 3

171- 190 7

191-210 6

≥211 1

Page 8: Statistics for the Health Scientist: Basic Statistics II

Topic 2: Descriptive Statistics 8

The Median: Central-ness

• A measure of central-ness• Arrange all values in size, median is middle• Half less than, half more than• If two median numbers, average them

Page 9: Statistics for the Health Scientist: Basic Statistics II

Topic 2: Descriptive Statistics 9

The Mean

• The average • Uses all of the data• Affected by skewness and outliers

Page 10: Statistics for the Health Scientist: Basic Statistics II

Topic 2: Descriptive Statistics 10

N-Tiles

• n-tiles are percentiles, deciles and quintiles• A way of dividing data into equal groups• Percentiles (1%) divide the data into 100• Deciles (10%) into 10• Quintiles (20%) into 5

Page 11: Statistics for the Health Scientist: Basic Statistics II

Topic 2: Descriptive Statistics 11

Choosing the Right Measure of Location

Summary measure of location

Type of Variable Mode Median Mean

Nominal Yes No No

Ordinal Yes Yes No

Quant discrete Yes Yes – if skew Yes

Quant continuous No Yes – if skew Yes

• Mode is not suited to quantitative continuous as there may only be one value

• Median not suited to categorical nominal as there is no order to the values

• You cannot average categorical data as it’s not made up of real numbers

Page 12: Statistics for the Health Scientist: Basic Statistics II

Topic 2: Descriptive Statistics 12

Summary Measures of Spread

• Range, interquartile range, standard deviation• Range– Distance from smallest value to largest

• Interquartile range– The range of the middle 50% of the data

• Standard deviation– Mean distance of all data from overall mean

Page 13: Statistics for the Health Scientist: Basic Statistics II

Topic 2: Descriptive Statistics 13

Range

Page 14: Statistics for the Health Scientist: Basic Statistics II

Topic 2: Descriptive Statistics 14

Poem – to help you remember!

Page 15: Statistics for the Health Scientist: Basic Statistics II

Topic 2: Descriptive Statistics 15

Interquartile Range

• Range is very sensitive to outliers

• Chop off top 25% and bottom 25%– This is the interquartile

range• Ignores 50% of the

data…• Can use an ogive…

Page 16: Statistics for the Health Scientist: Basic Statistics II

Topic 2: Descriptive Statistics 16

IQR and an Ogive

Page 17: Statistics for the Health Scientist: Basic Statistics II

Topic 2: Descriptive Statistics 17

An extra chart - Boxplots

• Now we know about quartiles

• Before we talk about standard deviation…

• Boxplots provide a graphical summary of quartile values, minimum and maximum values and outliers

Page 18: Statistics for the Health Scientist: Basic Statistics II

Topic 2: Descriptive Statistics 18

Boxplots

Page 19: Statistics for the Health Scientist: Basic Statistics II

Topic 2: Descriptive Statistics 19

Standard Deviation (s.d.)

• Uses all of the data• S.d. measures the spread of individual results

around a mean of all the results• 68 – 95 – 99 rule in normal distribution– 68% of data in 1 sd of mean, 95% 2 sd, 99% 3sd

Page 20: Statistics for the Health Scientist: Basic Statistics II

Topic 2: Descriptive Statistics 20

Choosing the Right Measure of Spread

Summary measure of Spread

Type of Variable Range Interquartile Range Standard Deviation

Nominal No No No

Ordinal Yes Yes No

Quantitative Yes Yes if skew Yes

• Measures of spread not helpful with nominal categorical data

• Sd not appropriate with ordinal data as it’s non-numeric• Standard deviation goes with the mean• Interquartile range goes with the median

Page 21: Statistics for the Health Scientist: Basic Statistics II

Topic 2: Descriptive Statistics 21

Transformation

• Normal distribution looks nice– BUT not all data is normally distributed– Real world is more complicated!

• You can transform data to make it more normal

• For example, take the log of the data

Page 22: Statistics for the Health Scientist: Basic Statistics II

Topic 2: Descriptive Statistics 22

Prevalence and Incidence

• Prevalence is number of cases at a certain time and place

• Incidence is the number of new cases at a certain time and place

• What do we mean by certain time and place?

Page 23: Statistics for the Health Scientist: Basic Statistics II

Topic 2: Descriptive Statistics 23

Time & Place

• You must always define the time period• You must always define the place– place = specific population– Time = specific period of time• …Cambodian population in 2014• …Plantation workers in Mondulkiri in June-August 2013• …Irish immigrants in America 1850-1950

Page 24: Statistics for the Health Scientist: Basic Statistics II

Topic 2: Descriptive Statistics 24

Prevalence

• Amount of disease in a specific population at a particular time

• Prevalence is the probability that any one individual in the population has the disease– E.g. 65 cases of a rash in a population of 598 • 65/598 = 10.9%

Page 25: Statistics for the Health Scientist: Basic Statistics II

Topic 2: Descriptive Statistics 25

Incidence

• New cases– Can think of it as the RISK of getting a disease during a

specific time= new cases/initial population of disease free– Can be risk of death, risk of disease, risk of transmitting

a disease, could even be RISK of winning a lottery• What is the incidence of malaria if there were 176

new cases in a healthy population of 9888 in 2003– 176/9888 = 1.78%, i.e. Risk of malaria is nearly 2%

Page 26: Statistics for the Health Scientist: Basic Statistics II

Topic 2: Descriptive Statistics 26

Incidence & Prevalence

• Incidence and prevalence are usually expressed as a %

• You can also express them as per 1000 population, as per 10,000 population or per 100,000 population

• Don’t get mixed up!

Page 27: Statistics for the Health Scientist: Basic Statistics II

27

Incidence – TB in SE Asia

• Here is a real example of incidence:– This is the incidence of TB per 100,000 in SE Asia

2009-2013 I.e. NEW casesCountry TB Incidence

Cambodia 411

Laos 204

Vietnam 147

Thailand 119

Country TB Incidence

South Africa

1003

Sweden 7Topic 2: Descriptive Statistics

Data from World Bank, 2014. http://data.worldbank.org/indicator/SH.TBS.INCD

Page 28: Statistics for the Health Scientist: Basic Statistics II

Topic 2: Descriptive Statistics 28

Prevalence & Incidence: Example

• Calculate the proportion of women infected with HIV at each clinic:

• Is this prevalence or incidence?

Clinic Antenatal Clinic women seen in Oct 2013 HIV infected

Phnom Penh 412 5

Battambang 179 3

Siem Reap 264 2

1.21%

1.68%

0.76%

Prevalence

Page 29: Statistics for the Health Scientist: Basic Statistics II

Topic 2: Descriptive Statistics 29

Summary

• Numerical descriptions of data– Summary measures of location:

• Median• Mode• Mean• N-tiles

– Summary measures of location• Range• Interquartile range• Standard deviation

• Prevalence and Incidence• Transformation

Page 30: Statistics for the Health Scientist: Basic Statistics II

Topic 2: Descriptive Statistics 30

Questions?

Thank You!

Next lesson:How do we get the data?

Study design, sampling etc. Probability risks odds

Page 31: Statistics for the Health Scientist: Basic Statistics II

Topic 2: Descriptive Statistics 31

References

• Bowers, D. (2008) Medical Statistics from Scratch: An Introduction for Health Professionals. USA: Wiley-Interscience.

• Grant, A. (2014) “Epidemiology for tropical doctors”. Lecture (S6) from the Diploma of Tropical Medicine & Hygiene, London School of Hygiene & Tropical Medicine.

• Greenhalgh, T. (1997) “How to read a paper” British Medical Journal. Web, accessed April-May 2014 at <http://www.bmj.com/about-bmj/resources-readers/publications/how-read-paper>