View
229
Download
0
Category
Preview:
DESCRIPTION
Type of the study outcome: Key for selecting appropriate statistical methods Study outcome –Dependent variable or response variable –Focus on primary study outcome if there are more Type of the study outcome –Continuous –Categorical (dichotomous, polytomous, ordinal) –Numerical (Poisson) count –Even-free duration
Citation preview
Statistical inferenceStatistical inferenceIts application for health science research
Bandit Thinkhamrop, Ph.D.(Statistics)Bandit Thinkhamrop, Ph.D.(Statistics)Department of Biostatistics and DemographyDepartment of Biostatistics and Demography
Faculty of Public HealthFaculty of Public HealthKhon Kaen UniversityKhon Kaen University
Begin at the conclusionBegin at the conclusion
Type of the study outcome: Key for Type of the study outcome: Key for selecting appropriate statistical methodsselecting appropriate statistical methods
Study outcomeStudy outcome– Dependent variable or response variableDependent variable or response variable– Focus on primary study outcome if there are moreFocus on primary study outcome if there are more
Type of the study outcomeType of the study outcome– ContinuousContinuous– Categorical (dichotomous, polytomous, ordinal)Categorical (dichotomous, polytomous, ordinal)– Numerical (Poisson) countNumerical (Poisson) count– Even-free durationEven-free duration
The outcome determine statisticsThe outcome determine statistics
Continuous
MeanMedian
Categorical
Proportion(PrevalenceOrRisk)
Count
Rate per “space”
Survival
Median survivalRisk of events at T(t)
Linear Reg. Logistic Reg. Poisson Reg. Cox Reg.
Statistics quantify errors for judgmentsStatistics quantify errors for judgmentsParameter estimation
[95%CI]
Hypothesis testing[P-value]
Common types of the statistical goalsCommon types of the statistical goals
Single measurements (no comparison)Single measurements (no comparison)Difference (compared by subtraction)Difference (compared by subtraction)Ratio (compared by division)Ratio (compared by division)Prediction (diagnostic test or predictive Prediction (diagnostic test or predictive model)model)Correlation (examine a joint distribution) Correlation (examine a joint distribution) Agreement (examine concordance or Agreement (examine concordance or similarity between pairs of observations)similarity between pairs of observations)
Dependency of the study outcome required Dependency of the study outcome required special statistical methods to handle itspecial statistical methods to handle it
Continuous Categorical Count Survival
MeanMedian
Proportion(PrevalenceOrRisk)
Rate per “space”
Median survivalRisk of events at T(t)
Linear Reg. Logistic Reg. Poisson Reg. Cox Reg.
Mixed model, multilevel model, GEE
Answer the research questionbased on lower or upper limit of the CI
Back to the conclusionBack to the conclusion
Continuous Categorical Count Survival
Magnitude of effect95% CIP-value
MeanMedian
Proportion(Prevalence or Risk)
Rate per “space”
Median survivalRisk of events at T(t)
Appropriate statistical methods
Always report the magnitude of Always report the magnitude of effect and its confidence intervaleffect and its confidence interval
Absolute effects: Absolute effects: – Mean, Mean differenceMean, Mean difference– Proportion or prevalence, Rate or risk, Rate or Risk differenceProportion or prevalence, Rate or risk, Rate or Risk difference– Median survival timeMedian survival time
Relative effects:Relative effects:– Relative risk, Rate ratio, Hazard ratioRelative risk, Rate ratio, Hazard ratio– Odds ratioOdds ratio
Other magnitude of effects: Other magnitude of effects: – Correlation coefficientCorrelation coefficient (r), Intra-class correlation (ICC)(r), Intra-class correlation (ICC)– KappaKappa– Diagnostic performanceDiagnostic performance– Etc.Etc.
Touch the Touch the variabilityvariability (uncertainty) (uncertainty) to understand statistical inferenceto understand statistical inference
id A (x- ) (x- ) 2
11 22 -2-2 4422 22 -2-2 4433 00 -4-4 161644 22 -2-2 4455 1414 1010 100100
Sum (Sum ()) 2020 00 128128Mean( )Mean( ) 44 00 32.032.0
SDSD 5.665.66MedianMedian 22
X
X X2+2+0+2+14 = 20
2+2+0+2+14 = 20 = 4 5 5
0 2 2 2 14
Variance = SD2
Standard deviation = SD
Touch the Touch the variabilityvariability (uncertainty) (uncertainty) to understand statistical inferenceto understand statistical inference
id A (x- ) (x- ) 2
11 22 -2-2 4422 22 -2-2 4433 00 -4-4 161644 22 -2-2 4455 1414 1010 100100
Sum (Sum ()) 2020 00 128128Mean( )Mean( ) 44 00 32.032.0
SDSD 5.665.66MedianMedian 22
X
X X
Measure of variation
Measure of central tendency
1
2
nXXSD
Degree of freedom
Standard deviation (SD) = The average distant between each data item to their mean
Same mean BUT different variationSame mean BUT different variation
id A11 2222 2233 0044 2255 1414
Sum (Sum ()) 2020MeanMean 44
SDSD 5.665.66MedianMedian 22
id C11 4422 3333 5544 4455 44
Sum (Sum ()) 2020MeanMean 44
SDSD 0.710.71MedianMedian 44
Heterogeneous dataSkew distribution
Heterogeneous dataSymmetry distribution
id B11 0022 3333 4444 5555 88
Sum (Sum ()) 2020MeanMean 44
SDSD 2.912.91MedianMedian 44
Homogeneous dataSymmetry distribution
Facts about VariationFacts about VariationBecause of variability, repeated samples will Because of variability, repeated samples will NOT obtain the same statistic such as mean or NOT obtain the same statistic such as mean or proportion:proportion:– Statistics varies from study to study because of the Statistics varies from study to study because of the
role of chancerole of chance– Hard to believe that the statistic is the parameter Hard to believe that the statistic is the parameter – Thus we need statistical inference to estimate the Thus we need statistical inference to estimate the
parameter based on the statistics obtained from a parameter based on the statistics obtained from a studystudy
Data varied widely = heterogeneous dataData varied widely = heterogeneous dataHeterogeneous data requires large sample size Heterogeneous data requires large sample size to achieve a conclusive findingto achieve a conclusive finding
The HistogramThe Histogramid A11 22
22 22
33 00
44 22
55 1414
id B11 44
22 33
33 55
44 44
55 44
00 11 22 33 44 55 66 77 88 99 1010 1111 1212 1313 1414
00 11 22 33 44 55 66 77 88 99 1010 1111 1212 1313 1414
The Frequency CurveThe Frequency Curveid A11 22
22 22
33 00
44 22
55 1414
id B11 44
22 33
33 55
44 44
55 44
00 11 22 33 44 55 66 77 88 99 1010 1111 1212 1313 1414
00 11 22 33 44 55 66 77 88 99 1010 1111 1212 1313 1414
Area Under The Frequency CurveArea Under The Frequency Curveid A11 22
22 22
33 00
44 22
55 1414
id B11 44
22 33
33 55
44 44
55 44
00 11 22 33 44 55 66 77 88 99 1010 1111 1212 1313 1414
00 11 22 33 44 55 66 77 88 99 1010 1111 1212 1313 1414
Central Limit TheoremCentral Limit Theorem
Right SkewX1
Symmetry
X2
Left SkewX3
Normally distributedX1 XX Xn
Distribution of Distribution of thethe sampling meansampling mean
Distribution of Distribution of the raw datathe raw data
Central Limit TheoremCentral Limit Theorem
X1
X2
X3
X1 XX Xn
Central Limit TheoremCentral Limit TheoremDistribution of Distribution of
the raw datathe raw data
X1 XX Xn
Distribution of Distribution of
thethe sampling meansampling mean
(Theoretical) Normal Distribution
Large sampleLarge sample
Central Limit TheoremCentral Limit TheoremMany X, , SDMany X, , SD
Standardized for whatever n, Mean = 0, Standard deviation = 1
Large sampleLarge sample
X1 XX Xn
Many , , SEMany , , SEX XX
X
Standard deviation of the sampling mean Standard error (SE)Estimated by
SE = SD n
(Theoretical) Normal (Theoretical) Normal DistributionDistribution
(Theoretical) Normal (Theoretical) Normal DistributionDistribution
Mean ± 3SD
99.73% of AUC
Mean ± 2SD
95.45% of AUC
Mean ± 1SD
68.26% of AUC
n = 25X = 52SD = 5
Sample
PopulationParameter estimation
[95%CI]
Hypothesis testing[P-value]
nSDSE
255
SE 5 = 1 5
Z = 2.58Z = 1.96Z = 1.64
n = 25X = 52SD = 5SE = 1
Sample
PopulationParameter estimation
[95%CI] : 52-1.96(1) to 52+1.96(1) 50.04 to 53.96We are 95% confidence that the population mean would lie between 50.04 and 53.96
Z = 2.58Z = 1.96Z = 1.64
n = 25X = 52SD = 5SE = 1
Sample
Hypothesis testing
Population
Z = 55 – 52 1 3H0 : = 55
HA : 55
Hypothesis testing
H0 : = 55HA : 55If the true mean in the population is 55, chance to obtain a sample mean of 52 or more extreme is 0.0027.
Z = 55 – 52 1 3 P-value = 1-0.9973 = 0.0027
5552-3SE +3SE
P-value P-value vs.vs. 95%CI 95%CI (1)(1)
A study compared cure rate between Drug A and Drug B
Setting:Drug A = Alternative treatmentDrug B = Conventional treatment
Results:Drug A: n1 = 50, Pa = 80%Drug B: n2 = 50, Pb = 50%
Pa-Pb = 30% (95%CI: 26% to 34%; P=0.001)
An example of a study with dichotomous outcome
P-value P-value vs.vs. 95%CI 95%CI (2)(2)
Pa-Pb = 30% (95%CI: 26% to 34%; P< 0.05)
Pa > Pb
Pb > Pa
P-value P-value vs.vs. 95%CI 95%CI (3)(3)Adapted from: Armitage, P. and Berry, G. Statistical methods in medical research. 3rd edition. Blackwell Scientific Publications, Oxford. 1994. page 99
Tips #6 Tips #6 (b)(b) P-value P-value vs.vs. 95%CI 95%CI (4)(4)
Adapted from: Armitage, P. and Berry, G. Statistical methods in medical research. 3rd edition. Blackwell Scientific Publications, Oxford. 1994. page 99
There were statistically significant different between the two groups.
Tips #6 Tips #6 (b)(b) P-value P-value vs.vs. 95%CI 95%CI (5)(5)
Adapted from: Armitage, P. and Berry, G. Statistical methods in medical research. 3rd edition. Blackwell Scientific Publications, Oxford. 1994. page 99
There were no statistically significant different between the two groups.
P-value P-value vs.vs. 95%CI 95%CI (4)(4)
Save tips:Save tips:– Always report 95%CI with p-value, NOT report Always report 95%CI with p-value, NOT report
solely p-valuesolely p-value– Always interpret based on the lower or upper Always interpret based on the lower or upper
limit of the confidence interval, p-value can be limit of the confidence interval, p-value can be an optional an optional
– Never interpret p-value > 0.05 as an indication Never interpret p-value > 0.05 as an indication of no difference or no association, only the CI of no difference or no association, only the CI can provide this message.can provide this message.
Additional NotesAdditional Notes
Alpha (Alpha () and Beta () and Beta ())Alpha (Alpha () ) – Type I error Type I error – The probability that a statistical test will reject the null hypothesis when the The probability that a statistical test will reject the null hypothesis when the
null hypothesis is true null hypothesis is true – Making a false positive decisionMaking a false positive decision
Beta (Beta () ) – Type II error Type II error – The probability that a statistical test will NOT reject the null The probability that a statistical test will NOT reject the null
hypothesis when the null hypothesis is false hypothesis when the null hypothesis is false – Making a false negative decisionMaking a false negative decision
Power (1- Power (1- ))– The probability that a statistical test will reject the null hypothesis when the The probability that a statistical test will reject the null hypothesis when the
null hypothesis is false null hypothesis is false – That is, the probability of NOT committing a Type II error or a false That is, the probability of NOT committing a Type II error or a false
negative decisionnegative decision– The higher the power, the lower Type II errorThe higher the power, the lower Type II error– Also known as the specificityAlso known as the specificity
Alpha (Alpha () and Beta () and Beta () ) cont.cont.
Significance levelSignificance level– A statement of how unlikely a result must be, if the null hypothesis is true, to A statement of how unlikely a result must be, if the null hypothesis is true, to
be considered significant. be considered significant. – Need to be declare in advance, before looking at the data, preferably before Need to be declare in advance, before looking at the data, preferably before
data collectiondata collection– Three most commonly used criteria of probabilities: Three most commonly used criteria of probabilities:
0.05 (5%, 1 in 20), 0.05 (5%, 1 in 20), 0.01 (1%, 1 in 100), and 0.01 (1%, 1 in 100), and 0.001 (0.1%, 1 in 1000)0.001 (0.1%, 1 in 1000)
P-valueP-value– The probability of having a results of as extreme as being obtained, given The probability of having a results of as extreme as being obtained, given
that the null hypothesis is truethat the null hypothesis is true– Quantify it based on the data and the hypothesisQuantify it based on the data and the hypothesis– This is the evidence for making the decision whether to reject or not reject This is the evidence for making the decision whether to reject or not reject
the null hypothesisthe null hypothesis– Reject the null hypothesis if the p-value less than the predefined level of Reject the null hypothesis if the p-value less than the predefined level of
significant and not reject otherwise significant and not reject otherwise
Alpha (Alpha () and Beta () and Beta () ) cont.cont.
Q & AQ & AThank you
Recommended