Advanced Biostatistics - Simplified

1

A d v a n c e d

BiostatisticsS i m p l i fi e d

DR. M. ALHEFZI

DR. B. ALHEJAILI

SB

CM

| R1

| Taif

DR. N. ALOTAIBI

DR. M. ALGOTHAMI

PREPARED & PRESENTED BY:

DR. A. KHALAWI

DR. S. ALGHAMDI

SBCM | R1 | Taif

2

WHY BIOSTAT ?!

Collection

Summarization

Analysis – inference.

Interpretation of the results

Abhaya Indrayan (2012). Medical Biostatistics. CRC Press. ISBN 978-1-4398-8414-0. (QR-code above).

SBCM | R1 | Taif

3

Philosophy behind HypothesisWhat is a hypothesis?

CHANCE?!

Mill’s Cannons / Methods – Agreement, Difference, Concomitant, Residues

SBCM | R1 | Taif

4

Am I right or wrong ?!Is it the truth ?!

SBCM | R1 | Taif

5

SIGNIFICANCE

• BIAS?

• CONFOUNDING?

• CHANCE?

• CAUSE / EFFECT?

• GENERALIZABILITY!

SBCM | R1 | Taif

6

My HypothesisHa

TEST!

SBCM | R1 | Taif

7

SBCM | R1 | Taif

8

In other words …

HypothesisTest

Hypothesis

Measure Assoc Sig Reject

or FTR.

SBCM | R1 | Taif

9

So, what language do we speak in biostat?

MATH?

MEAN, MEDIAN, MODE, RANGE …

AREA UNDER THE CURVE, VARIANCE, SD …

MEDICINE?

EXPOSURE, DISEASE, OUTCOME, EFFECTIVITY, PREVENTION

RELATIVE RISK, ABSOLUTE RISK

SBCM | R1 | Taif

10

Biostatisticians’ language

MEAN (μ).

MEDIAN.

MODE.

AREA UNDER THE CURVE: Variance.

SD (σ).

SBCM | R1 | Taif

11

Biostatisticians’ languageStandard Deviation (SD)

SBCM | R1 | Taif

12

Photo courtesy of Judy Davidson, DNP, RN

”

“

SBCM | R1 | Taif

13

WE MAKE MISTAKES!

IN ORDER TO AVOID THEM, WE NEED TO SET RANGES FOR CHANCE, ALSO SET OUR CRITICAL LIMITS. TO END UP WITH A MASTERPIECE OF EVIDENCE!

H0

p-value vs. α level

CI *

SBCM | R1 | Taif

14

SBCM | R1 | Taif

15

Test Hypothesis

SBCM | R1 | Taif

16

Test Hypothesis

ASSUMPTIONS.

STEPS.

TESTS.

SBCM | R1 | Taif

17

Test Hypothesis

ASSUMPTIONS

– Differs for each test.

LARGE SAMPLE SIZE.

NORMAL DISTRIBUTION. Gaussian Dist.

HOMOGENEITY.

NO MULTICOLINIARITY.

KNOWN ( μ & σ ).

INDEPENDENCY.

SBCM | R1 | Taif

18

Test Hypothesis

STEPS– 7 steps of hypothesis testing.

1) RQ ?

2) H0 & H1

3) TEST & ASSUMPTIONS.

4) α LEVEL, P-VALUE.

5) TEST STATISTIC (DF).

6) DECISION.

7) CONCLUSION (YES/NO).

SBCM | R1 | Taif

19

Test Hypothesis

TEST STATISTICS

SBCM | R1 | Taif

20

InputIndep. VA.Exposure

OutputDep. VAOutcome

Disease

Dependency Concept

Each member in this group is exclusively

linked to it

Output changes

whenever input do so

SBCM | R1 | Taif

21

• Summarizing percentage, averages…Univariate

• 2 VABivariate

• Control confoundingsMultivariate

Data Analysis

• Randomization.• Restriction.• Matching.• Stratification.

SBCM | R1 | Taif

22Statistical Tests

Parametric Tests

Student’s t-test

Paired Samples

t-test

ANOVA

Correlation

Regression

SBCM | R1 | Taif

23Statistical Tests

Non-Parametric

Tests

Chi-Square

(χ2)

Wilcoxon

Mann-Whitney (U Test)

Kruskal

Wallis

Logistic Regressio

n

SBCM | R1 | Taif

24

Dependent VA (outcome, output)

2 Cat. >2 Cat. Continuous

Indep. VA

Inputexpos

ure

Cat. χ2 χ2 t-test

> 2 Cat. χ2 χ2 ANOVA

Continuous t-test ANOV

A

CorrelationLinear

Regression

Choosing a Bivariate test

SBCM | R1 | Taif

25Continuous Data

Comparing 2 Gps

t-testComparing >2 Gps

ANOVAAssoc. 2 Gps

Pearson Correlation

Prediction

Regression

SBCM | R1 | Taif

26Ordinal Data

Comparing 2 Gps

Mann-Whitney (U) test.Wilcoxon (Pre-Post).

Comparing >2 Gps

Kruskal WallisAssoc. 2 Gps

Spearman’s ρ

SBCM | R1 | Taif

27Categorical Data

Test of frequency (χ2)

How often something is observed(AKA: Goodness of Fit Test, Test of Homogeneity)

Examples:- Do negative ads change how people vote?- Is there a relationship between marital status and health insurance coverage?

28

SBCM | R1 | Taif

Comparison the difference between groups

Cat. VA (2) Cont. VA

Independent sample(t-test)

Mann-Whitney(U test)

Cont. Dep. VA same group

Paired Sample (t-test) Wilcoxon

Cat. VA (>3) Cont. VA

One Way ANOVA Kruskal Wallis

Association / Strength of Relationship

Cont. VA Cont. VA

Pearson (r) Spearman’s ρ

Prediction

Cont. VA Cont. or Cat.

SLR (Bivariate)

Cont. VA Cont. + Other VAs

MLR

Cat. VA >1 Other VAs

Logistic Regression

By @alhefzi

Choosing the Best Statistical Test

Cat. VA Cat. VA

Chi-Square(χ2 ) McNemar

PMT NPMT

SBCM | R1 | Taif

29

SBCM | R1 | Taif

30

SBCM | R1 | Taif

31Considerations

Normal Distribution & Sample Size. Large sample size ().

Shape by inspection.

Otherwise, do (Kolmogorov Smirnov) to check normality.

If NPMT with Large sample size () less powerful than a PMT.

Gaussian Distribution ().

NPMT with Gaussian distribution, “small” sample size (). (small, Non-Gaussian) ( p-value).

PMT with Non-Gaussian distribution () CLT.

PMT with Non-Gaussian distribution, “small” sample size () CLT won’t work, inaccurate p-value.

SBCM | R1 | Taif

32Considerations

1 or 2 sided p-value

H0 ().

Based on: equal population means. Otherwise, any discrepancy is due to chance!!

Question: WHICH p-value is larger and why? (1 or 2 sided)?

i.e. when formulating your Ha; consider “larger” critical p-value accordingly!

Go for 1 sided (if)

You have formulate a “directional” hypothesis.

Set it BEFORE data collection. Otherwise, you will have to attribute the difference to chance.

Go for 2 sided (if)

Unsure or in doubt of your hypothesis direction.

Set it BEFORE data collection. Otherwise, you will have to attribute the difference to chance.

SBCM | R1 | Taif

33

2-tailed testBiostatisticians’ language

The critical value is the number that separates the “blue zone” from the middle (± 1.96 this example).

In a t-test, in order to be statistically significant the t score needs to be in the “dark-blue zone”.

If α = .05, then 2.5% of the area is in each tail

SBCM | R1 | Taif

34

1-tailed testBiostatisticians’ language

The critical value is either + or -, but not both.

e.g. in a t-test In this case, you would

have statistical significance (p < .05) if t ≥ 1.645.

SBCM | R1 | Taif

35

Chi-Square (χ2) – as an exampleBiostatisticians’ language

Any number squared is a positive number.

Therefore, area under the curve starts at 0 and goes to infinity (∞).

To be statistically significant, needs to be in the upper 5% (α = .05).

Compares observed frequency to what we expected.

Published on STAT 100 - Statistical Concepts and Reasoning (QR-code above)

SBCM | R1 | Taif

36Considerations

Regression or Correlation

Correlation Regression

Cause-effect relationship X&Y are important to be

set Swapping X&Y in the curve gives different

results

In Gaussian distribution Pearson SLR, MLR

NPMT Spearman’s rho Logistic Regression

37

End of Part I

Thank you…QUESTIONS?

@alhefzi

Health & Medicine

Advanced Biostatistics - Simplified