Upload
conrad-shepherd
View
215
Download
0
Tags:
Embed Size (px)
Citation preview
1 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Lecture 2 – Descriptive Lecture 2 – Descriptive StatisticsStatistics
Michael Brown MD, MScMichael Brown MD, MSc
Professor Epidemiology and Professor Epidemiology and Emergency MedicineEmergency Medicine
Credit to Michael P. Collins, MD, MSCredit to Michael P. Collins, MD, MS
EPI-546 Block I
2 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Objectives - ConceptsObjectives - Concepts
Classification of dataClassification of data Distributions of variablesDistributions of variables Measures of central tendency and Measures of central tendency and
dispersiondispersion Criteria for abnormalityCriteria for abnormality SamplingSampling Regression to the meanRegression to the mean
3 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Objectives - SkillsObjectives - Skills
Distinguish and apply the forms of data Distinguish and apply the forms of data types.types.
Define mean, median, and mode and Define mean, median, and mode and locate on a skewed distribution chart. locate on a skewed distribution chart.
Apply the concept of the standard Apply the concept of the standard deviation to specific circumstances.deviation to specific circumstances.
Explain why a strategy for sampling is Explain why a strategy for sampling is needed.needed.
Recognize the phenomenon of regression Recognize the phenomenon of regression to the mean when it occurs or is described.to the mean when it occurs or is described.
4 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Clinical Measurement –Clinical Measurement –2 kinds of data2 kinds of data
CategoricalCategorical
IntervalInterval
5 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Distinction -Distinction -
Interval = “the interval between Interval = “the interval between successive values is equal, successive values is equal,
throughout the scale”throughout the scale”
6 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Clinical Measurement –Clinical Measurement –subtypes of datasubtypes of data
CategoricalCategorical NominalNominal OrdinalOrdinal
IntervalInterval DiscreteDiscrete ContinuousContinuous
7 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Nominal data: no order Nominal data: no order
Alive vs. deadAlive vs. dead Male vs. femaleMale vs. female Rabies vs. no rabiesRabies vs. no rabies
Blood group O, A, B, ABBlood group O, A, B, AB Resident of Michigan, Ohio, Indiana…Resident of Michigan, Ohio, Indiana…
8 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Ordinal scale: natural order, Ordinal scale: natural order, but not intervalbut not interval
11stst vs. 2 vs. 2ndnd vs. 3 vs. 3rdrd degree burns degree burns Pain scale for migraine headache:Pain scale for migraine headache:
None, mild, moderate, severeNone, mild, moderate, severe Glasgow Coma Score (3-15)Glasgow Coma Score (3-15) Stage of cancer spread – 0 through 4Stage of cancer spread – 0 through 4
9 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Clinical Measurement –Clinical Measurement –2 kinds of data2 kinds of data
CategoricalCategorical NominalNominal OrdinalOrdinal
IntervalInterval DiscreteDiscrete ContinuousContinuous
10 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Discrete Interval variables: Discrete Interval variables: on a “number line”on a “number line”
Number of live birthsNumber of live births Number of sexual partnersNumber of sexual partners Diarrheal stools per dayDiarrheal stools per day Vision – 20/?Vision – 20/?
1 2 3
11 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Continuous variables:Continuous variables:
Blood pressureBlood pressure Weight, or Body Mass IndexWeight, or Body Mass Index Random blood sugarRandom blood sugar IQIQ
12 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Interval: Continuous vs. DiscreteInterval: Continuous vs. Discrete
No variable is perfectly continuous – e.g. No variable is perfectly continuous – e.g. you never see a BP of 152.47 mmHgyou never see a BP of 152.47 mmHg
It’s a matter of degree – lots of possible It’s a matter of degree – lots of possible values within the range clinically possible values within the range clinically possible = continuous= continuous
13 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Recording data -Recording data -
SometimesSometimes the variable is intrinsically the variable is intrinsically one type or another – but, frequently it is one type or another – but, frequently it is the observer who decides how a variable the observer who decides how a variable will be measured and reportedwill be measured and reported
Consider cigarette smoking:Consider cigarette smoking:
14 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Continuous variableContinuous variable
Underlying (nearly) continuous variable – Underlying (nearly) continuous variable – cigarettes/daycigarettes/day 32, 63, 2,…32, 63, 2,…
However, this level of detail may not be However, this level of detail may not be necessary or desirable.necessary or desirable.
15 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Discrete interval variableDiscrete interval variable
Packs per day (probably rounded off to the Packs per day (probably rounded off to the nearest whole number)nearest whole number) 2, 1, 02, 1, 0
Cruder - but maybe good enough and Cruder - but maybe good enough and more reliably reportedmore reliably reported
16 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Ordinal categorical variableOrdinal categorical variable
Non-smoker vs. light smoker vs. heavy Non-smoker vs. light smoker vs. heavy smoker.smoker.
May further collapse the pack/day May further collapse the pack/day variable.variable.
17 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Nominal categorical variableNominal categorical variable
Non-smoker vs. former smoker vs. current Non-smoker vs. former smoker vs. current smoker.smoker. No obvious order here, just named categoriesNo obvious order here, just named categories
Ever-smoker vs. never-smoker.Ever-smoker vs. never-smoker. Dichotomous outcomeDichotomous outcome
18 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
So, the form of the variable is often So, the form of the variable is often decided by thedecided by the investigator investigator, not by , not by
naturenature
In fact, the normal vs. abnormal In fact, the normal vs. abnormal distinction is generally a matter of distinction is generally a matter of taking a much richer measure and taking a much richer measure and
making it dichotomous.making it dichotomous.
19 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Quick Quiz SlideQuick Quiz Slide
What kind of a variable is religion? – What kind of a variable is religion? – Protestant, Catholic, Islamic, Judaism. . .Protestant, Catholic, Islamic, Judaism. . .
What kind is Body Mass Index (weight What kind is Body Mass Index (weight divided by heightdivided by height22)?)?
What is alcohol intake if classed as none, What is alcohol intake if classed as none, << 2 drinks/day, and > 2 drinks/day? 2 drinks/day, and > 2 drinks/day?
20 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
First question when meeting with statistician:First question when meeting with statistician:
1.1. Define the type of data (continuous, Define the type of data (continuous, ordinal, categorical, etc.)ordinal, categorical, etc.)
21 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
A Few Examples of Statistical TestsA Few Examples of Statistical Tests
Test Comparison Principal Assumptions
Student'st test
Means oftwo groups
Continuous variable,normally distributed,equal variance
Wilcoxonrank sum
Medians oftwo groups
Continuous variable
Chi-square Proportions Categorical variable,more than 5 patients inany particular "cell"
Fisher'sexact
Proportions Categorical variable
22 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Objectives - ConceptsObjectives - Concepts
Classification of dataClassification of data Distributions of variablesDistributions of variables Measures of central tendency and Measures of central tendency and
dispersiondispersion Criteria for abnormalityCriteria for abnormality SamplingSampling Regression to the meanRegression to the mean
23 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Distributions of continuous variablesDistributions of continuous variables
A way to display the individual – to – A way to display the individual – to – individual variation in some clinical individual variation in some clinical measure.measure.
Consider the example in Fletcher using Consider the example in Fletcher using PSA levels:PSA levels:
24 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Clinical Epidemiology: The Essentials, 3rd Ed, by Fletcher RH, Fletcher SW, 2005.
25 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
x Variable
Frequency
www.msu.edu/user/sw/statrev/images/normal01.gif
26 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Clinical Epidemiology: The Essentials, 3rd Ed, by Fletcher RH, Fletcher SW, 2005.
27 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
The “nicest” distributionThe “nicest” distribution
Is the normal, or Gaussian, Is the normal, or Gaussian, distribution – the “bell-shaped distribution – the “bell-shaped
curve”.curve”.
28 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
If we want to summarize a frequency If we want to summarize a frequency distribution, there are two major distribution, there are two major
aspects to include: aspects to include:
Central tendencyCentral tendency
DispersionDispersion
29 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Principles of Epidemiology, 2nd edition. CDC.
30 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Principles of Epidemiology, 2nd edition. CDC.
31 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Measures of Central Tendency:Measures of Central Tendency:
MeanMean MedianMedian ModeMode
32 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Consider this data: Parity (how Consider this data: Parity (how many babies have you had?) many babies have you had?)
among 19 women:among 19 women:
0,2,0,0,1,3,1,4,1,8,2,2,0,1,3,5,1,7,0,2,0,0,1,3,1,4,1,8,2,2,0,1,3,5,1,7,2 2
33 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Mean (Arithmetic)Mean (Arithmetic)
Add up all the values and divide by NAdd up all the values and divide by N 43 / 19 = 2.2643 / 19 = 2.26
34 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
MedianMedian
The The middlemiddle value value
Must first Must first sortsort the data and put in order: the data and put in order:
0,0,0,0,1,1,1,1,1,0,0,0,0,1,1,1,1,1,22,2,2,2,3,3,4,5,7,8,2,2,2,3,3,4,5,7,8
35 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
ModeMode
The most common valueThe most common value
0,0,0,0,0,0,0,0,1,1,1,1,1,1,11,1,1,2,2,2,2,3,3,4,5,7,8,2,2,2,2,3,3,4,5,7,8
36 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
In a normal distribution, all three are In a normal distribution, all three are equalequal
ParametricParametric statistical methods statistical methods assume a distribution with known assume a distribution with known
shape shape
(i.e. normal or Gaussian (i.e. normal or Gaussian distribution)distribution)
37 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
x Variable
Frequency
38 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Quick Quiz SlideQuick Quiz Slide
If the mode is “100” and the mean is “80” If the mode is “100” and the mean is “80” – what can you tell me about the median?– what can you tell me about the median?
39 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
x Variable
Frequency
10080
modemean
40 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
41 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
DispersionDispersion
Standard Deviation - most common Standard Deviation - most common measure used for normal or near normal measure used for normal or near normal distributions. distributions.
Defined by a statistical formula, but Defined by a statistical formula, but remember that:remember that: The mean +/- one SD contains about 2/3 of the The mean +/- one SD contains about 2/3 of the
observations.observations. the mean +/- 2 SD’s includes about 95% of the the mean +/- 2 SD’s includes about 95% of the
observations.observations.
42 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Clinical Epidemiology: The Essentials, 3rd Ed, by Fletcher RH, Fletcher SW, 2005.
43 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
M J Campbell, Statistics at Square One, 9th Ed, 1997.
44 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
So, how about this definition of “abnormal” for So, how about this definition of “abnormal” for total serum cholesterol: A value higher than the total serum cholesterol: A value higher than the
mean + 1 S.D.?mean + 1 S.D.?
How many people would fall beyond How many people would fall beyond that cut-off?that cut-off?
45 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Rose, G: The Strategy of Preventive Medicine; Oxford Press, 1998.
46 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
So what’s the “best” definition of So what’s the “best” definition of abnormality?abnormality?
Fletcher lists three:Fletcher lists three: Being unusualBeing unusual
Greater than 2 SD from meanGreater than 2 SD from mean SickSick
Observation regularly associated with diseaseObservation regularly associated with disease TreatableTreatable
Consider abnormal only if treatment of the Consider abnormal only if treatment of the condition represented by the measurement condition represented by the measurement leads to improved outcomeleads to improved outcome
47 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Miura et al, Archives Int Med 2001; 161:1504.
48 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
If you were to design a study to define If you were to design a study to define an abnormal DBP for adult females in an abnormal DBP for adult females in
the US, how would you do it? the US, how would you do it?
Measure DBP in every adult female in the Measure DBP in every adult female in the US?US? Then define abnormal as above 2 SD from Then define abnormal as above 2 SD from
mean?mean?
49 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
SamplingSampling
Impossible to measure the BP of everyone, Impossible to measure the BP of everyone, so must take measurements of a so must take measurements of a representative sample of subjects.representative sample of subjects.
Random sampleRandom sample May miss important subgroup (ethnicity for May miss important subgroup (ethnicity for
example)example) May need to obtain a larger sample from these May need to obtain a larger sample from these
important subgroups and select subjects at important subgroups and select subjects at random within subgrouprandom within subgroup
50 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Clinical Epidemiology: The Essentials, 3rd Ed, by Fletcher RH, Fletcher SW, 2005.
51 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Hanna C, Greenes D. How Much Tachycardia in Infants Can Be Attributed to Fever? Ann Emerg Med June 2004