1 Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. Lecture 2 – Descriptive Statistics...

Preview:

Citation preview

1 Dr. Michael Brown©  Epidemiology Dept., Michigan State Univ.

Lecture 2 – Descriptive Lecture 2 – Descriptive StatisticsStatistics

Michael Brown MD, MScMichael Brown MD, MSc

Professor Epidemiology and Professor Epidemiology and Emergency MedicineEmergency Medicine

Credit to Michael P. Collins, MD, MSCredit to Michael P. Collins, MD, MS

EPI-546 Block I

2 Dr. Michael Brown©  Epidemiology Dept., Michigan State Univ.

Objectives - ConceptsObjectives - Concepts

Classification of dataClassification of data Distributions of variablesDistributions of variables Measures of central tendency and Measures of central tendency and

dispersiondispersion Criteria for abnormalityCriteria for abnormality SamplingSampling Regression to the meanRegression to the mean

3 Dr. Michael Brown©  Epidemiology Dept., Michigan State Univ.

Objectives - SkillsObjectives - Skills

Distinguish and apply the forms of data Distinguish and apply the forms of data types.types.

Define mean, median, and mode and Define mean, median, and mode and locate on a skewed distribution chart. locate on a skewed distribution chart.

Apply the concept of the standard Apply the concept of the standard deviation to specific circumstances.deviation to specific circumstances.

Explain why a strategy for sampling is Explain why a strategy for sampling is needed.needed.

Recognize the phenomenon of regression Recognize the phenomenon of regression to the mean when it occurs or is described.to the mean when it occurs or is described.

4 Dr. Michael Brown©  Epidemiology Dept., Michigan State Univ.

Clinical Measurement –Clinical Measurement –2 kinds of data2 kinds of data

CategoricalCategorical

IntervalInterval

5 Dr. Michael Brown©  Epidemiology Dept., Michigan State Univ.

Distinction -Distinction -

Interval = “the interval between Interval = “the interval between successive values is equal, successive values is equal,

throughout the scale”throughout the scale”

6 Dr. Michael Brown©  Epidemiology Dept., Michigan State Univ.

Clinical Measurement –Clinical Measurement –subtypes of datasubtypes of data

CategoricalCategorical NominalNominal OrdinalOrdinal

IntervalInterval DiscreteDiscrete ContinuousContinuous

7 Dr. Michael Brown©  Epidemiology Dept., Michigan State Univ.

Nominal data: no order Nominal data: no order

Alive vs. deadAlive vs. dead Male vs. femaleMale vs. female Rabies vs. no rabiesRabies vs. no rabies

Blood group O, A, B, ABBlood group O, A, B, AB Resident of Michigan, Ohio, Indiana…Resident of Michigan, Ohio, Indiana…

8 Dr. Michael Brown©  Epidemiology Dept., Michigan State Univ.

Ordinal scale: natural order, Ordinal scale: natural order, but not intervalbut not interval

11stst vs. 2 vs. 2ndnd vs. 3 vs. 3rdrd degree burns degree burns Pain scale for migraine headache:Pain scale for migraine headache:

None, mild, moderate, severeNone, mild, moderate, severe Glasgow Coma Score (3-15)Glasgow Coma Score (3-15) Stage of cancer spread – 0 through 4Stage of cancer spread – 0 through 4

9 Dr. Michael Brown©  Epidemiology Dept., Michigan State Univ.

Clinical Measurement –Clinical Measurement –2 kinds of data2 kinds of data

CategoricalCategorical NominalNominal OrdinalOrdinal

IntervalInterval DiscreteDiscrete ContinuousContinuous

10 Dr. Michael Brown©  Epidemiology Dept., Michigan State Univ.

Discrete Interval variables: Discrete Interval variables: on a “number line”on a “number line”

Number of live birthsNumber of live births Number of sexual partnersNumber of sexual partners Diarrheal stools per dayDiarrheal stools per day Vision – 20/?Vision – 20/?

1 2 3

11 Dr. Michael Brown©  Epidemiology Dept., Michigan State Univ.

Continuous variables:Continuous variables:

Blood pressureBlood pressure Weight, or Body Mass IndexWeight, or Body Mass Index Random blood sugarRandom blood sugar IQIQ

12 Dr. Michael Brown©  Epidemiology Dept., Michigan State Univ.

Interval: Continuous vs. DiscreteInterval: Continuous vs. Discrete

No variable is perfectly continuous – e.g. No variable is perfectly continuous – e.g. you never see a BP of 152.47 mmHgyou never see a BP of 152.47 mmHg

It’s a matter of degree – lots of possible It’s a matter of degree – lots of possible values within the range clinically possible values within the range clinically possible = continuous= continuous

13 Dr. Michael Brown©  Epidemiology Dept., Michigan State Univ.

Recording data -Recording data -

SometimesSometimes the variable is intrinsically the variable is intrinsically one type or another – but, frequently it is one type or another – but, frequently it is the observer who decides how a variable the observer who decides how a variable will be measured and reportedwill be measured and reported

Consider cigarette smoking:Consider cigarette smoking:

14 Dr. Michael Brown©  Epidemiology Dept., Michigan State Univ.

Continuous variableContinuous variable

Underlying (nearly) continuous variable – Underlying (nearly) continuous variable – cigarettes/daycigarettes/day 32, 63, 2,…32, 63, 2,…

However, this level of detail may not be However, this level of detail may not be necessary or desirable.necessary or desirable.

15 Dr. Michael Brown©  Epidemiology Dept., Michigan State Univ.

Discrete interval variableDiscrete interval variable

Packs per day (probably rounded off to the Packs per day (probably rounded off to the nearest whole number)nearest whole number) 2, 1, 02, 1, 0

Cruder - but maybe good enough and Cruder - but maybe good enough and more reliably reportedmore reliably reported

16 Dr. Michael Brown©  Epidemiology Dept., Michigan State Univ.

Ordinal categorical variableOrdinal categorical variable

Non-smoker vs. light smoker vs. heavy Non-smoker vs. light smoker vs. heavy smoker.smoker.

May further collapse the pack/day May further collapse the pack/day variable.variable.

17 Dr. Michael Brown©  Epidemiology Dept., Michigan State Univ.

Nominal categorical variableNominal categorical variable

Non-smoker vs. former smoker vs. current Non-smoker vs. former smoker vs. current smoker.smoker. No obvious order here, just named categoriesNo obvious order here, just named categories

Ever-smoker vs. never-smoker.Ever-smoker vs. never-smoker. Dichotomous outcomeDichotomous outcome

18 Dr. Michael Brown©  Epidemiology Dept., Michigan State Univ.

So, the form of the variable is often So, the form of the variable is often decided by thedecided by the investigator investigator, not by , not by

naturenature

In fact, the normal vs. abnormal In fact, the normal vs. abnormal distinction is generally a matter of distinction is generally a matter of taking a much richer measure and taking a much richer measure and

making it dichotomous.making it dichotomous.

19 Dr. Michael Brown©  Epidemiology Dept., Michigan State Univ.

Quick Quiz SlideQuick Quiz Slide

What kind of a variable is religion? – What kind of a variable is religion? – Protestant, Catholic, Islamic, Judaism. . .Protestant, Catholic, Islamic, Judaism. . .

What kind is Body Mass Index (weight What kind is Body Mass Index (weight divided by heightdivided by height22)?)?

What is alcohol intake if classed as none, What is alcohol intake if classed as none, << 2 drinks/day, and > 2 drinks/day? 2 drinks/day, and > 2 drinks/day?

20 Dr. Michael Brown©  Epidemiology Dept., Michigan State Univ.

First question when meeting with statistician:First question when meeting with statistician:

1.1. Define the type of data (continuous, Define the type of data (continuous, ordinal, categorical, etc.)ordinal, categorical, etc.)

21 Dr. Michael Brown©  Epidemiology Dept., Michigan State Univ.

A Few Examples of Statistical TestsA Few Examples of Statistical Tests

Test Comparison Principal Assumptions

Student'st test

Means oftwo groups

Continuous variable,normally distributed,equal variance

Wilcoxonrank sum

Medians oftwo groups

Continuous variable

Chi-square Proportions Categorical variable,more than 5 patients inany particular "cell"

Fisher'sexact

Proportions Categorical variable

22 Dr. Michael Brown©  Epidemiology Dept., Michigan State Univ.

Objectives - ConceptsObjectives - Concepts

Classification of dataClassification of data Distributions of variablesDistributions of variables Measures of central tendency and Measures of central tendency and

dispersiondispersion Criteria for abnormalityCriteria for abnormality SamplingSampling Regression to the meanRegression to the mean

23 Dr. Michael Brown©  Epidemiology Dept., Michigan State Univ.

Distributions of continuous variablesDistributions of continuous variables

A way to display the individual – to – A way to display the individual – to – individual variation in some clinical individual variation in some clinical measure.measure.

Consider the example in Fletcher using Consider the example in Fletcher using PSA levels:PSA levels:

24 Dr. Michael Brown©  Epidemiology Dept., Michigan State Univ.

Clinical Epidemiology: The Essentials, 3rd Ed, by Fletcher RH, Fletcher SW, 2005.

25 Dr. Michael Brown©  Epidemiology Dept., Michigan State Univ.

x Variable

Frequency

www.msu.edu/user/sw/statrev/images/normal01.gif

26 Dr. Michael Brown©  Epidemiology Dept., Michigan State Univ.

Clinical Epidemiology: The Essentials, 3rd Ed, by Fletcher RH, Fletcher SW, 2005.

27 Dr. Michael Brown©  Epidemiology Dept., Michigan State Univ.

The “nicest” distributionThe “nicest” distribution

Is the normal, or Gaussian, Is the normal, or Gaussian, distribution – the “bell-shaped distribution – the “bell-shaped

curve”.curve”.

28 Dr. Michael Brown©  Epidemiology Dept., Michigan State Univ.

If we want to summarize a frequency If we want to summarize a frequency distribution, there are two major distribution, there are two major

aspects to include: aspects to include:

Central tendencyCentral tendency

DispersionDispersion

29 Dr. Michael Brown©  Epidemiology Dept., Michigan State Univ.

Principles of Epidemiology, 2nd edition. CDC.

30 Dr. Michael Brown©  Epidemiology Dept., Michigan State Univ.

Principles of Epidemiology, 2nd edition. CDC.

31 Dr. Michael Brown©  Epidemiology Dept., Michigan State Univ.

Measures of Central Tendency:Measures of Central Tendency:

MeanMean MedianMedian ModeMode

32 Dr. Michael Brown©  Epidemiology Dept., Michigan State Univ.

Consider this data: Parity (how Consider this data: Parity (how many babies have you had?) many babies have you had?)

among 19 women:among 19 women:

0,2,0,0,1,3,1,4,1,8,2,2,0,1,3,5,1,7,0,2,0,0,1,3,1,4,1,8,2,2,0,1,3,5,1,7,2 2

33 Dr. Michael Brown©  Epidemiology Dept., Michigan State Univ.

Mean (Arithmetic)Mean (Arithmetic)

Add up all the values and divide by NAdd up all the values and divide by N 43 / 19 = 2.2643 / 19 = 2.26

34 Dr. Michael Brown©  Epidemiology Dept., Michigan State Univ.

MedianMedian

The The middlemiddle value value

Must first Must first sortsort the data and put in order: the data and put in order:

0,0,0,0,1,1,1,1,1,0,0,0,0,1,1,1,1,1,22,2,2,2,3,3,4,5,7,8,2,2,2,3,3,4,5,7,8

35 Dr. Michael Brown©  Epidemiology Dept., Michigan State Univ.

ModeMode

The most common valueThe most common value

0,0,0,0,0,0,0,0,1,1,1,1,1,1,11,1,1,2,2,2,2,3,3,4,5,7,8,2,2,2,2,3,3,4,5,7,8

36 Dr. Michael Brown©  Epidemiology Dept., Michigan State Univ.

In a normal distribution, all three are In a normal distribution, all three are equalequal

ParametricParametric statistical methods statistical methods assume a distribution with known assume a distribution with known

shape shape

(i.e. normal or Gaussian (i.e. normal or Gaussian distribution)distribution)

37 Dr. Michael Brown©  Epidemiology Dept., Michigan State Univ.

x Variable

Frequency

38 Dr. Michael Brown©  Epidemiology Dept., Michigan State Univ.

Quick Quiz SlideQuick Quiz Slide

If the mode is “100” and the mean is “80” If the mode is “100” and the mean is “80” – what can you tell me about the median?– what can you tell me about the median?

39 Dr. Michael Brown©  Epidemiology Dept., Michigan State Univ.

x Variable

Frequency

10080

modemean

40 Dr. Michael Brown©  Epidemiology Dept., Michigan State Univ.

41 Dr. Michael Brown©  Epidemiology Dept., Michigan State Univ.

DispersionDispersion

Standard Deviation - most common Standard Deviation - most common measure used for normal or near normal measure used for normal or near normal distributions. distributions.

Defined by a statistical formula, but Defined by a statistical formula, but remember that:remember that: The mean +/- one SD contains about 2/3 of the The mean +/- one SD contains about 2/3 of the

observations.observations. the mean +/- 2 SD’s includes about 95% of the the mean +/- 2 SD’s includes about 95% of the

observations.observations.

42 Dr. Michael Brown©  Epidemiology Dept., Michigan State Univ.

Clinical Epidemiology: The Essentials, 3rd Ed, by Fletcher RH, Fletcher SW, 2005.

43 Dr. Michael Brown©  Epidemiology Dept., Michigan State Univ.

M J Campbell, Statistics at Square One, 9th Ed, 1997.

44 Dr. Michael Brown©  Epidemiology Dept., Michigan State Univ.

So, how about this definition of “abnormal” for So, how about this definition of “abnormal” for total serum cholesterol: A value higher than the total serum cholesterol: A value higher than the

mean + 1 S.D.?mean + 1 S.D.?

How many people would fall beyond How many people would fall beyond that cut-off?that cut-off?

45 Dr. Michael Brown©  Epidemiology Dept., Michigan State Univ.

Rose, G: The Strategy of Preventive Medicine; Oxford Press, 1998. 

46 Dr. Michael Brown©  Epidemiology Dept., Michigan State Univ.

So what’s the “best” definition of So what’s the “best” definition of abnormality?abnormality?

Fletcher lists three:Fletcher lists three: Being unusualBeing unusual

Greater than 2 SD from meanGreater than 2 SD from mean SickSick

Observation regularly associated with diseaseObservation regularly associated with disease TreatableTreatable

Consider abnormal only if treatment of the Consider abnormal only if treatment of the condition represented by the measurement condition represented by the measurement leads to improved outcomeleads to improved outcome

47 Dr. Michael Brown©  Epidemiology Dept., Michigan State Univ.

Miura et al, Archives Int Med 2001; 161:1504.

48 Dr. Michael Brown©  Epidemiology Dept., Michigan State Univ.

If you were to design a study to define If you were to design a study to define an abnormal DBP for adult females in an abnormal DBP for adult females in

the US, how would you do it? the US, how would you do it?

Measure DBP in every adult female in the Measure DBP in every adult female in the US?US? Then define abnormal as above 2 SD from Then define abnormal as above 2 SD from

mean?mean?

49 Dr. Michael Brown©  Epidemiology Dept., Michigan State Univ.

SamplingSampling

Impossible to measure the BP of everyone, Impossible to measure the BP of everyone, so must take measurements of a so must take measurements of a representative sample of subjects.representative sample of subjects.

Random sampleRandom sample May miss important subgroup (ethnicity for May miss important subgroup (ethnicity for

example)example) May need to obtain a larger sample from these May need to obtain a larger sample from these

important subgroups and select subjects at important subgroups and select subjects at random within subgrouprandom within subgroup

50 Dr. Michael Brown©  Epidemiology Dept., Michigan State Univ.

Clinical Epidemiology: The Essentials, 3rd Ed, by Fletcher RH, Fletcher SW, 2005.

51 Dr. Michael Brown©  Epidemiology Dept., Michigan State Univ.

Hanna C, Greenes D. How Much Tachycardia in Infants Can Be Attributed to Fever? Ann Emerg Med June 2004

Recommended