Upload
brianne-willis
View
219
Download
0
Tags:
Embed Size (px)
Citation preview
Medical Medical StatisticsStatistics
as a scienceas a science
МеМеdicaldical Statistics Statistics: :
To do this we must assume that all data is randomly sampled from an infinitely large population, then analyse this sample and use results to make inferences about the population
Extrapolate from data collected to make general conclusions about larger population from which data sample was derived
Allows general conclusions to be made from limited amounts of data
Statistical Analysis in a
Simple ExperimentHalf the subjects
receive one treatment and the other half
another treatment
(usually placebo)Define population of
interestUse statistical techniques to make inferences about the distribution of the variables in the general population and about the effect of the treatment
Measure baseline
variables in each group(e.g. age,
Apache II to ensure
randomisation successful)
Randomly select sample of subjects to study(exclusion criteria but define a precise patient population)
OutlineOutline
PowerPower Basic Sample Size InformationBasic Sample Size Information Examples (see text for more)Examples (see text for more) Changes to the basic formulaChanges to the basic formula Multiple comparisonsMultiple comparisons Poor proposal sample size Poor proposal sample size
statementsstatements Conclusion and ResourcesConclusion and Resources
Тypes of descriptive statistics: Тypes of descriptive statistics:
Measures of
central tendency
Graphs
Measures of variabi
lity
Categorical data: values belong to categories
DataNominal data: there is no natural order to the categoriese.g. blood groups
Numerical data: the value is a number(either measured or counted)
DataOrdinal data: there is natural order e.g. Adverse Events (Mild/Moderate/Severe/Life Threatening)
DataData
Categorical data:Categorical data: values belong to categories values belong to categories Nominal dataNominal data:: there is no natural order to the there is no natural order to the
categoriescategoriese.g. blood groupse.g. blood groups
Ordinal dataOrdinal data:: there is natural order e.g. Adverse there is natural order e.g. Adverse Events (Mild/Moderate/Severe/Life Threatening)Events (Mild/Moderate/Severe/Life Threatening)
Binary dataBinary data:: there are only two possible categories there are only two possible categoriese.g. alive/deade.g. alive/dead
Numerical data:Numerical data: the value is a number the value is a number(either measured or counted)(either measured or counted) Continuous dataContinuous data:: measurement is on a continuum measurement is on a continuum
e.g. height, age, haemoglobine.g. height, age, haemoglobin
Discrete dataDiscrete data:: a “count” of events e.g. number of a “count” of events e.g. number of pregnanciespregnancies
Descriptive StatisticsDescriptive Statistics::
concerned with summarising or concerned with summarising or describing a sample eg. mean, describing a sample eg. mean, medianmedian
Inferential StatisticsInferential Statistics::
concerned with generalising from a concerned with generalising from a sample, to make estimates and sample, to make estimates and inferences about a wider population inferences about a wider population eg. T-Test, Chi Square testeg. T-Test, Chi Square test
1)Basic requirement of
medical research
Why we need to study
statistics?
Why we need to study
statistics?
3)Data manage
ment and
treatment
2)Update your medical knowledge.
Statistical TermsStatistical Terms MeanMean:: the average of the data the average of the data
sensitive to outlying data sensitive to outlying data MedianMedian:: the middle of the data the middle of the data
not sensitive to outlying data not sensitive to outlying data ModeMode:: most commonly occurring value most commonly occurring value RangeRange:: the spread of the data the spread of the data IQ rangeIQ range:: the spread of the data the spread of the data
commonly used for skewed data commonly used for skewed data Standard deviationStandard deviation:: a single number which a single number which
measures how much measures how much the the observations vary around the meanobservations vary around the mean
Symmetrical dataSymmetrical data:: data that follows normal data that follows normal distribution distribution (mean=median=mode)(mean=median=mode)
report mean & standard report mean & standard deviation & deviation & nn
Skewed dataSkewed data:: not normally distributed not normally distributed (mean (meanmedian median mode) mode) report median & IQ Range report median & IQ Range
Standard Normal Standard Normal DistributionDistribution
Standard Normal Standard Normal DistributionDistribution
Mean +/- 1 SD encompasses 68% of observations
Mean +/- 2 SD encompasses 95% of observations
Mean +/- 3SD encompasses 99.7% of observations
15
1. Experimental Design1. Experimental Design Convenience SamplingConvenience Sampling
Use results that are easy to getUse results that are easy to get
16
1. Experimental Design1. Experimental Design Stratified SamplingStratified Sampling
Draw a sample from each stratumDraw a sample from each stratum
Basic conceptsBasic concepts
Homogeneity: All
individuals have
similar values or belong to
same category.
Example: all individuals are Chinese, women, middle age (30~40 years old), work in a textile mill ---- homogeneity in nationality, gender, age and occupation.
Variation:the differences in height, weight…
Steps in Statistical Steps in Statistical TestingTesting Null hypothesisNull hypothesis
Ho: there is no difference between the Ho: there is no difference between the groupsgroups
Alternative hypothesisAlternative hypothesisH1: there is a difference between the groupsH1: there is a difference between the groups
Collect dataCollect data
Perform test statistic eg T test, Chi squarePerform test statistic eg T test, Chi square
Interpret P value and confidence intervalsInterpret P value and confidence intervals
P value P value 0.05 Reject Ho 0.05 Reject Ho
P value > 0.05 Accept HoP value > 0.05 Accept Ho
Draw conclusionsDraw conclusions
Population and sample
Population: The whole collection
of individuals
that one intends to
study
Sample:A
representative part of
the populati
on.
Randomization: An
important way to make the sample
representative
ProbabilityProbability
Measure the possibility of Measure the possibility of occurrence of a random event.occurrence of a random event.
A : random eventA : random event P(A) : Probability of the random P(A) : Probability of the random
event Aevent A
P(A)=1 , if an event always occurs.P(A)=1 , if an event always occurs.
P(A)=0, if an event never occurs.P(A)=0, if an event never occurs.
21
2. 2. Descriptive Statistics & Descriptive Statistics & DistributionsDistributions
ParameterParameter: population quantity: population quantity StatisticStatistic: summary of the sample: summary of the sample Inference for parametersInference for parameters: use sample: use sample Central TendencyCentral Tendency
Mean (average)Mean (average) Median (middle value)Median (middle value)
VariabilityVariability Variance: measure of variationVariance: measure of variation Standard deviation (sd): square root of Standard deviation (sd): square root of
variancevariance Standard error (se): sd of the estimateStandard error (se): sd of the estimate Median, quartiles, min., max, range, boxplotMedian, quartiles, min., max, range, boxplot
ProportionProportion
22
2. 2. Descriptive Statistics & Descriptive Statistics & DistributionsDistributions
Normal distributionNormal distribution
23
2. 2. Descriptive Statistics & Descriptive Statistics & DistributionsDistributions
Standard normal distribution: Standard normal distribution: Mean 0, variance 1Mean 0, variance 1
24
2. 2. Descriptive Statistics & Descriptive Statistics & DistributionsDistributions
Z-test for means Z-test for means T-test for means if sd is unknownT-test for means if sd is unknown
Meaning of PMeaning of P P Value: the probability of P Value: the probability of
observing a result as extreme or observing a result as extreme or more extreme than the one more extreme than the one actually observed from chance actually observed from chance alonealone
Lets us decide whether to reject or Lets us decide whether to reject or accept the null hypothesisaccept the null hypothesis
P > 0.05P > 0.05 Not significantNot significant P = 0.01 to 0.05P = 0.01 to 0.05 SignificantSignificant P = 0.001 to 0.01P = 0.001 to 0.01 Very significantVery significant P < 0.001P < 0.001 Extremely significantExtremely significant
26
3. 3. Inference for MeansInference for Means
Click ‘Statistics’ to select the
statistical procedure.
Click ‘File’ to open the SAS data set.
Click ‘File’ to import data and create
the SAS data set.
Click ‘Solution’ to create a
project to run statistical test
27
3. 3. Inference for MeansInference for Means
Mann-Whitney U-Test (Wilcoxon Rank-Mann-Whitney U-Test (Wilcoxon Rank-Sum Test)Sum Test)
Nonparametric alternative to two-sample t-Nonparametric alternative to two-sample t-testtest
The populations don’t need to be normalThe populations don’t need to be normal HH00: The two samples come from populations : The two samples come from populations
with equal medianswith equal medians HH11: The two samples come from populations: The two samples come from populations
with different medianswith different medians
28
3. 3. Inference for MeansInference for Means
Mann-Whitney U-Test ProcedureMann-Whitney U-Test Procedure Temporarily combine the two samples Temporarily combine the two samples
into one big sample, then replace each into one big sample, then replace each sample value with its rank sample value with its rank
Find the sum of the ranks for either Find the sum of the ranks for either one of the two samplesone of the two samples
Calculate the value of the Calculate the value of the z z test test statistic statistic
T TestT Test T test checks whether T test checks whether twotwo samples are likely to have come samples are likely to have come
from the same or different populationsfrom the same or different populations Used on continuous variablesUsed on continuous variables Example: Age of patients in the APC study (APC/placebo)Example: Age of patients in the APC study (APC/placebo)
PLACEBO: PLACEBO: APC: APC: mean age 60.6 yearsmean age 60.6 years mean age 60.5 yearsmean age 60.5 years
SD+/- 16.5SD+/- 16.5 SD +/- 17.2SD +/- 17.2 n= 840n= 840 n= 850n= 850 95% CI 59.5-61.795% CI 59.5-61.7 95% CI 59.3-61.795% CI 59.3-61.7
What is the P value?What is the P value? 0.010.01 0.050.05 0.100.10 0.900.90 0.990.99
P = 0.903 P = 0.903 not significant not significant patients from the same patients from the same populationpopulation(groups designed to be matched by randomisation so no (groups designed to be matched by randomisation so no surprise!!)surprise!!)
T Test: SAFE “Serum T Test: SAFE “Serum Albumin”Albumin”
Q: Are these albumin levels different?Q: Are these albumin levels different?Ho = Levels are the same (any difference is Ho = Levels are the same (any difference is there by chance)there by chance)H1 =Levels are too different to have occurred H1 =Levels are too different to have occurred purely by chancepurely by chance
Statistical test:Statistical test: T test T test P < 0.0001 (extremely P < 0.0001 (extremely significant)significant)Reject null hypothesis (Ho) and accept alternate Reject null hypothesis (Ho) and accept alternate hypothesis (H1) hypothesis (H1) ie. 1 in 10 000 chance that these samples are ie. 1 in 10 000 chance that these samples are both from the same overall group therefore we both from the same overall group therefore we can say they are very likely to be differentcan say they are very likely to be different
PLACEBOPLACEBO ALBUMIN ALBUMIN
nn 35003500 3500 3500
meanmean 2828 30 30
SDSD 1010 10 10
95% CI95% CI 27.7-28.327.7-28.3 29.7-30.3 29.7-30.3
Effect of Sample Size Effect of Sample Size ReductionReduction
smaller sample size (one tenth smaller)smaller sample size (one tenth smaller) causes wider CI (less confident where mean causes wider CI (less confident where mean
is)is) P = 0.008 (i.e. approx 0.01 P = 0.008 (i.e. approx 0.01 P is significant P is significant
but less so)but less so) This sample size influence on ability to find This sample size influence on ability to find
any particular difference as statistically any particular difference as statistically significant is a major consideration in study significant is a major consideration in study designdesign
PLACEBOPLACEBO ALBUMIN ALBUMIN
nn 350350 350 350
meanmean 2828 30 30
SDSD 1010 10 10
95% CI95% CI 27.0-29.027.0-29.0 29.0-31.0 29.0-31.0
Reducing Sample Size Reducing Sample Size (again)(again)
using even smaller sample size (now 1/100)using even smaller sample size (now 1/100) much wider confidence intervalsmuch wider confidence intervals p=0.41 (not significant anymore)p=0.41 (not significant anymore) SMALLER STUDY has LOWER POWER to SMALLER STUDY has LOWER POWER to
find any particular difference to be statistically find any particular difference to be statistically significant (mean and SD unchanged)significant (mean and SD unchanged)
POWER: POWER: the ability of a study to detect an the ability of a study to detect an actual effect or differenceactual effect or difference
PLACEBOPLACEBO ALBUMIN ALBUMINnn 3535 35 35
meanmean 2828 30 30
SDSD 1010 10 10
95% CI95% CI 24.6-31.424.6-31.4 26.6-33.4 26.6-33.4
33
3. 3. Inference for MeansInference for Means
Mann-Whitney U-Mann-Whitney U-Test, ExampleTest, Example
Numbers in Numbers in parentheses are their parentheses are their ranks beginning with ranks beginning with a rank of 1 assigned a rank of 1 assigned to the lowest value of to the lowest value of 17.7.17.7.
RR11 and and RR22: sum of : sum of ranksranks
34
3. 3. Inference for MeansInference for Means Hypothesis: The group means are differentHypothesis: The group means are different
HHoo: Men and women have same median BMI’s: Men and women have same median BMI’s HH11: Men and women have different median : Men and women have different median
BMI’sBMI’s
p-valuep-valuethus we do not reject Hthus we do not reject H00 at at =0.05.=0.05.
There is no significant difference in BMI There is no significant difference in BMI between men and women.between men and women.
1 1 2( 1) 13(13 12 1)169
2 2R
n n n
1 2 1 2( 1) (13)(12)(13 12 1)18.385
12 12R
n n n n
187 1690.98
18.385R
R
Rz
35
3. 3. Inference for MeansInference for Means
SAS Programming for Mann-Whitney U-SAS Programming for Mann-Whitney U-Test ProcedureTest Procedure
Data steps Data steps :: The same as slide 21.The same as slide 21. Procedure steps Procedure steps :: Click ‘Click ‘SolutionsSolutions’ Click ‘’ Click ‘AnalysisAnalysis’ Click ‘’ Click ‘AnalystAnalyst’ ’
Click ‘Click ‘FileFile’ Click ‘’ Click ‘Open By SAS NameOpen By SAS Name’ ’
Select the SAS data set and Click ‘Select the SAS data set and Click ‘OKOK’ ’
Click ‘Click ‘StatisticsStatistics’ Click ‘ ’ Click ‘ ANOVAANOVA’ ’
Click ‘Click ‘Nonparametric One-Way ANOVANonparametric One-Way ANOVA’ ’
Select the ‘Select the ‘DependentDependent’ and ‘’ and ‘IndependentIndependent’ variables respectively ’ variables respectively
and choose the interested test Click ‘and choose the interested test Click ‘OKOK’’
36
3. 3. Inference for MeansInference for Means
Click ‘Statistics’ to select the
statistical procedure.
Click ‘File’ to open the SAS data
set.
Select the dependent and independent variables:
37
3. 3. Inference for MeansInference for Means
Notation for paired t-testNotation for paired t-test dd = individual difference between the two = individual difference between the two values of a single matched pair values of a single matched pair µµdd = = mean value of the differences mean value of the differences dd for the for the population of paired data population of paired data = = mean value of the differences mean value of the differences dd for the for the
paired sample data paired sample data ssdd = = standard deviation of the differences standard deviation of the differences dd
for the paired sample datafor the paired sample data nn = number of pairs = number of pairs
d
d
38
3. 3. Inference for MeansInference for Means Example: Systolic Blood PressureExample: Systolic Blood Pressure
OC:OC: Oral contraceptiveOral contraceptive
ID Without OC’s With OC’s Difference
1 115 128 13
2 112 115 3
3 107 106 -1
4 119 128 9
5 115 122 7
6 138 145 7
7 126 132 6
8 105 109 4
9 104 102 -2
10 115 117 2
39
3. 3. Inference for MeansInference for Means Hypothesis: The group means are Hypothesis: The group means are
differentdifferent HHoo: vs. H: vs. H11: : Significance level: Significance level: = 0.05 = 0.05 Degrees of freedom (df): Degrees of freedom (df): Test statisticTest statistic
P-value: 0.009, thus reject HP-value: 0.009, thus reject Hoo at at =0.05=0.05 The data support the claim that oral The data support the claim that oral
contraceptives affect the systolic bp.contraceptives affect the systolic bp.
0d 0d
32.310/57.4
8.4
/
ns
dt
d
d
91n
40
3. 3. Inference for MeansInference for Means Confidence interval for matched pairsConfidence interval for matched pairs
100(1-100(1-)% CI:)% CI:
95% CI for the mean difference of the systolic 95% CI for the mean difference of the systolic bp:bp:
(1.53, 8.07)(1.53, 8.07)
n
std
n
std d
nd
n 1,2/1,2/ ,
27.38.410
57.426.28.4
109,025.0 ds
td
41
3. 3. Inference for MeansInference for Means
Click ‘Statistics’ to select the
statistical procedure.
Click ‘File’ to open the SAS data
set.
Put the two group variables into ‘Group 1’ and ‘Group 2’
Chi Square TestChi Square Test Proportions or frequenciesProportions or frequencies Binary data e.g. alive/deadBinary data e.g. alive/dead PROWESS Study: Primary endpoint: 28 day PROWESS Study: Primary endpoint: 28 day
all cause mortalityall cause mortalityALIVEALIVE DEAD TOTAL % DEAD DEAD TOTAL % DEAD
PLACEBO 581 (69.2%) 259 (30.8%)PLACEBO 581 (69.2%) 259 (30.8%) 840 (100%) 30.8840 (100%) 30.8
DEADDEAD 640 (75.3%) 640 (75.3%) 210 (24.7%) 210 (24.7%) 850 (100%) 24.7850 (100%) 24.7
TOTALTOTAL 1221 (72.2%) 1221 (72.2%) 469 (27.8%) 469 (27.8%) 1690 (100%)1690 (100%)
Perform Chi Square test Perform Chi Square test P = 0.006 (very significant) P = 0.006 (very significant) 6 in 1000 times this result could happen by chance6 in 1000 times this result could happen by chance 994 in 1000 times this difference was not by chance 994 in 1000 times this difference was not by chance variation variation
Reduction in death rate = 30.8%-24.7%= 6.1% Reduction in death rate = 30.8%-24.7%= 6.1% ie 6.1% less likely to die in APC group ie 6.1% less likely to die in APC group
Reducing Sample SizeReducing Sample Size Same results but using much smaller sample size (one tenth)Same results but using much smaller sample size (one tenth)
ALIVEALIVE DEAD TOTAL % DEAD DEAD TOTAL % DEAD
PLACEBO 58 (69.2%) 26 (30.8%) 84 (100%)PLACEBO 58 (69.2%) 26 (30.8%) 84 (100%) 30.8 30.8
DEADDEAD 64 (75.3%) 64 (75.3%) 21 (24.7%) 85 (100%) 21 (24.7%) 85 (100%) 24.7 24.7
TOTALTOTAL 122 (72.2%) 122 (72.2%) 47 (27.8%) 169 (100%) 47 (27.8%) 169 (100%)
Reduction in death rate = 6.1% (still the same)Reduction in death rate = 6.1% (still the same) Perform Chi Square test Perform Chi Square test P = 0.39 P = 0.39 39 in 100 times this difference in mortality could have 39 in 100 times this difference in mortality could have happened by chance therefore results not significant happened by chance therefore results not significant
Again, power of a study to find a difference depends a lot Again, power of a study to find a difference depends a lot on sample size for binary data as well as continuous data on sample size for binary data as well as continuous data
SummarySummary
Size matters=BIGGER IS BETTERSize matters=BIGGER IS BETTER Spread matters=SMALLER IS Spread matters=SMALLER IS
BETTERBETTER Bigger difference=EASIER TO FINDBigger difference=EASIER TO FIND Smaller difference=MORE Smaller difference=MORE
DIFFICULT TO FINDDIFFICULT TO FIND To find a small difference you need To find a small difference you need
a big studya big study