Upload
mohammed-alhefzi
View
928
Download
2
Tags:
Embed Size (px)
DESCRIPTION
A presentation I have presented as a part of the Saudi Board of Community Medicine, Western Region. It simplifies the ideas behind hypothesis and hypothesis testing, also contains many different approaches of choosing the best statistical tests needed in any study.
Citation preview
1
A d v a n c e d
BiostatisticsS i m p l i fi e d
DR. M. ALHEFZI
DR. B. ALHEJAILI
SB
CM
| R1
| Taif
DR. N. ALOTAIBI
DR. M. ALGOTHAMI
PREPARED & PRESENTED BY:
DR. A. KHALAWI
DR. S. ALGHAMDI
SBCM | R1 | Taif
2
WHY BIOSTAT ?!
Collection
Summarization
Analysis – inference.
Interpretation of the results
Abhaya Indrayan (2012). Medical Biostatistics. CRC Press. ISBN 978-1-4398-8414-0. (QR-code above).
SBCM | R1 | Taif
3
Philosophy behind HypothesisWhat is a hypothesis?
CHANCE?!
Mill’s Cannons / Methods – Agreement, Difference, Concomitant, Residues
SBCM | R1 | Taif
4
Am I right or wrong ?!Is it the truth ?!
SBCM | R1 | Taif
5
SIGNIFICANCE
• BIAS?
• CONFOUNDING?
• CHANCE?
• CAUSE / EFFECT?
• GENERALIZABILITY!
SBCM | R1 | Taif
6
My HypothesisHa
TEST!
SBCM | R1 | Taif
7
SBCM | R1 | Taif
8
In other words …
HypothesisTest
Hypothesis
Measure Assoc Sig Reject
or FTR.
SBCM | R1 | Taif
9
So, what language do we speak in biostat?
MATH?
MEAN, MEDIAN, MODE, RANGE …
AREA UNDER THE CURVE, VARIANCE, SD …
MEDICINE?
EXPOSURE, DISEASE, OUTCOME, EFFECTIVITY, PREVENTION
RELATIVE RISK, ABSOLUTE RISK
SBCM | R1 | Taif
10
Biostatisticians’ language
MEAN (μ).
MEDIAN.
MODE.
AREA UNDER THE CURVE: Variance.
SD (σ).
SBCM | R1 | Taif
11
Biostatisticians’ languageStandard Deviation (SD)
SBCM | R1 | Taif
12
Photo courtesy of Judy Davidson, DNP, RN
”
“
SBCM | R1 | Taif
13
WE MAKE MISTAKES!
IN ORDER TO AVOID THEM, WE NEED TO SET RANGES FOR CHANCE, ALSO SET OUR CRITICAL LIMITS. TO END UP WITH A MASTERPIECE OF EVIDENCE!
H0
p-value vs. α level
CI *
SBCM | R1 | Taif
14
SBCM | R1 | Taif
15
Test Hypothesis
SBCM | R1 | Taif
16
Test Hypothesis
ASSUMPTIONS.
STEPS.
TESTS.
SBCM | R1 | Taif
17
Test Hypothesis
ASSUMPTIONS
– Differs for each test.
LARGE SAMPLE SIZE.
NORMAL DISTRIBUTION. Gaussian Dist.
HOMOGENEITY.
NO MULTICOLINIARITY.
KNOWN ( μ & σ ).
INDEPENDENCY.
SBCM | R1 | Taif
18
Test Hypothesis
STEPS– 7 steps of hypothesis testing.
1) RQ ?
2) H0 & H1
3) TEST & ASSUMPTIONS.
4) α LEVEL, P-VALUE.
5) TEST STATISTIC (DF).
6) DECISION.
7) CONCLUSION (YES/NO).
SBCM | R1 | Taif
19
Test Hypothesis
TEST STATISTICS
SBCM | R1 | Taif
20
InputIndep. VA.Exposure
OutputDep. VAOutcome
Disease
Dependency Concept
Each member in this group is exclusively
linked to it
Output changes
whenever input do so
SBCM | R1 | Taif
21
• Summarizing percentage, averages…Univariate
• 2 VABivariate
• Control confoundingsMultivariate
Data Analysis
• Randomization.• Restriction.• Matching.• Stratification.
SBCM | R1 | Taif
22Statistical Tests
Parametric Tests
Student’s t-test
Paired Samples
t-test
ANOVA
Correlation
Regression
SBCM | R1 | Taif
23Statistical Tests
Non-Parametric
Tests
Chi-Square
(χ2)
Wilcoxon
Mann-Whitney (U Test)
Kruskal
Wallis
Logistic Regressio
n
SBCM | R1 | Taif
24
Dependent VA (outcome, output)
2 Cat. >2 Cat. Continuous
Indep. VA
Inputexpos
ure
Cat. χ2 χ2 t-test
> 2 Cat. χ2 χ2 ANOVA
Continuous t-test ANOV
A
CorrelationLinear
Regression
Choosing a Bivariate test
SBCM | R1 | Taif
25Continuous Data
Comparing 2 Gps
t-testComparing >2 Gps
ANOVAAssoc. 2 Gps
Pearson Correlation
Prediction
Regression
SBCM | R1 | Taif
26Ordinal Data
Comparing 2 Gps
Mann-Whitney (U) test.Wilcoxon (Pre-Post).
Comparing >2 Gps
Kruskal WallisAssoc. 2 Gps
Spearman’s ρ
SBCM | R1 | Taif
27Categorical Data
Test of frequency (χ2)
How often something is observed(AKA: Goodness of Fit Test, Test of Homogeneity)
Examples:- Do negative ads change how people vote?- Is there a relationship between marital status and health insurance coverage?
28
SBCM | R1 | Taif
Comparison the difference between groups
Cat. VA (2) Cont. VA
Independent sample(t-test)
Mann-Whitney(U test)
Cont. Dep. VA same group
Paired Sample (t-test) Wilcoxon
Cat. VA (>3) Cont. VA
One Way ANOVA Kruskal Wallis
Association / Strength of Relationship
Cont. VA Cont. VA
Pearson (r) Spearman’s ρ
Prediction
Cont. VA Cont. or Cat.
SLR (Bivariate)
Cont. VA Cont. + Other VAs
MLR
Cat. VA >1 Other VAs
Logistic Regression
By @alhefzi
Choosing the Best Statistical Test
Cat. VA Cat. VA
Chi-Square(χ2 ) McNemar
PMT NPMT
SBCM | R1 | Taif
29
SBCM | R1 | Taif
30
SBCM | R1 | Taif
31Considerations
Normal Distribution & Sample Size. Large sample size ().
Shape by inspection.
Otherwise, do (Kolmogorov Smirnov) to check normality.
If NPMT with Large sample size () less powerful than a PMT.
Gaussian Distribution ().
NPMT with Gaussian distribution, “small” sample size (). (small, Non-Gaussian) ( p-value).
PMT with Non-Gaussian distribution () CLT.
PMT with Non-Gaussian distribution, “small” sample size () CLT won’t work, inaccurate p-value.
SBCM | R1 | Taif
32Considerations
1 or 2 sided p-value
H0 ().
Based on: equal population means. Otherwise, any discrepancy is due to chance!!
Question: WHICH p-value is larger and why? (1 or 2 sided)?
i.e. when formulating your Ha; consider “larger” critical p-value accordingly!
Go for 1 sided (if)
You have formulate a “directional” hypothesis.
Set it BEFORE data collection. Otherwise, you will have to attribute the difference to chance.
Go for 2 sided (if)
Unsure or in doubt of your hypothesis direction.
Set it BEFORE data collection. Otherwise, you will have to attribute the difference to chance.
SBCM | R1 | Taif
33
2-tailed testBiostatisticians’ language
The critical value is the number that separates the “blue zone” from the middle (± 1.96 this example).
In a t-test, in order to be statistically significant the t score needs to be in the “dark-blue zone”.
If α = .05, then 2.5% of the area is in each tail
SBCM | R1 | Taif
34
1-tailed testBiostatisticians’ language
The critical value is either + or -, but not both.
e.g. in a t-test In this case, you would
have statistical significance (p < .05) if t ≥ 1.645.
SBCM | R1 | Taif
35
Chi-Square (χ2) – as an exampleBiostatisticians’ language
Any number squared is a positive number.
Therefore, area under the curve starts at 0 and goes to infinity (∞).
To be statistically significant, needs to be in the upper 5% (α = .05).
Compares observed frequency to what we expected.
Published on STAT 100 - Statistical Concepts and Reasoning (QR-code above)
SBCM | R1 | Taif
36Considerations
Regression or Correlation
Correlation Regression
Cause-effect relationship X&Y are important to be
set Swapping X&Y in the curve gives different
results
In Gaussian distribution Pearson SLR, MLR
NPMT Spearman’s rho Logistic Regression
37
End of Part I
Thank you…QUESTIONS?
@alhefzi