Statistical Analysis for Educational Outcomes Measurement in CME

Statistical Analysis for Educational Outcomes Measurement in CME

Jason J. Olivieri, MPHDirector, Outcomes Services

Imedex, LLC(404) 319-9782

[email protected]

www.assessCME.wordpress.com

Statistical Questions in CME

1. Was there an educational effect?

2. If so, how big was the effect?

3. How does this effect compare with other CME activities?

Was there an educational effect?Statistical tests of significance• Determine whether any observed differences (e.g., pre- vs. post)

are due to random chance.

• Observed differences not attributed to random chance are considered “statistically significant” and reflect a true difference between groups that could be associated with participation in this educational activity.

• Statistical significance is reported as a “P value”. A P value ≤ .05 is considered statistically significant. Generally speaking, a P value ≤ .05 means that there is a 5% chance or less that the result of a particular comparison is due to random chance.

Statistical tests of significance: Choosing the right test

• Define variable type (typically either categorical or ordinal)

• Is the comparison group data (e.g., pre vs. post) paired or unpaired?

• What is the sample size?

Statistical tests of significance

• Categorical variables (e.g., knowledge “test” question)

– Unpaired data (comparison groups not matched)• Chi square (samples ≥ 5)• Fisher’s exact test (samples < 5)

– Paired data (matched comparison groups)• McNemar’s test

Knowledge Change

Which of the following is NOT a complication associated with bisphosphonate therapy?

Arm pain Renal dysfunction Osteonecrosis of the jaw Atypical fx of the femur Peripheral neuropathy (correct)

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

19%

10%6%

17%

48%

8%4% 2% 1%

85%

Pre-activity (n = 111) Post-activity (n = 157)

37% increase

Competency Change (case vignette)Frontline therapy for a former smoker with symptomatic advanced stage adenocarcinoma of the lung (EGFR+)

Erlotinib alone (evidence-based, pre-

ferred)

Erlotinib + bevacizumab Carboplatin-paclitaxel & bevacizumab

Carboplatin-pemetrexed & bevacizumab

Full house: erlotinib + car-boplatin-paclitaxel & be-

vacizumab

0%

10%

20%

30%

40%

50%

60%

70%

80%

51%

12%

19%

12%6%

75%

6%3%

9%6%

Control (n = 65) Participants (n = 65)

24% increase


• Ordinal variables (e.g., rating scale)

– Unpaired data (comparison groups not matched)• Mann-Whitney U

– Pair data (matched comparison groups)• Wilcoxon signed-rank test

Achieving hemostatis with Coblation for adenotonsillectomy

Using Coblation for adenoidectomy

Using Coblation for complete tonsillectomy

Using Coblation for partial tonsillectomy

Using a microdebrider for adenoidectomy

Using a microdebrider for intracapsular tonsillectomy

Selecting an appropriate surgical technique for adenotonsillectomy for specific indications in patients

0 1 2 3 4 5 6 7

5.9

6

6.5

5.8

5.6

4.8

6.3

4.4

4.3

4.7

3.2

3.8

2.9

5.5

Confidence in performing seven clinical tasks in adenotonsillectomy (1 = not at all confident, 7 = extremely confident)

Pre (n = 57) Post (n = 49)

Knowledge (self-efficacy) Change

Competency Change

Using bevacizumab-based combo therapy for non-squamous NSCLC

1 = Never Not very often Sometimes Often 5 = Always0%

5%

10%

15%

20%

25%

30%

35%

40%

45%

32%

9%

41%

14%

5%

14%

18%

23%

32%

14%

Current vs. planned use of bevacizumab-based combo therapy in nonsquamous NSCLC

Current (mean = 2.5, n = 44) Planned (mean = 3.1, n = 39)

Overall 26% increase in planned vs current use

Once you know which statistical test to use,

where do you go?

www.VassarStats.net

http://www.vassarstats.net/

Statistical test of significance: Categorical variable example

• Participants in a CME activity were administered a paper-based pre- and post-activity survey which contained five knowledge “test” questions based on the CME content

• Survey participant responses were anonymous (i.e., pre/post not matched)

• You want to determine if the proportion of respondents answering the question correctly pre- vs. post-activity is significantly different

Knowledge Change

Which of the following is NOT a complication associated with bisphosphonate therapy?


0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

19%

10%6%

17%

48%

8%4% 2% 1%

85%


37% increase


• Categorical variables (e.g., knowledge “test” question)

– Unpaired data (comparison groups not matched)• Chi square (samples ≥ 5)• Fisher’s exact test (samples < 5)

– Paired data (matched comparison groups)• McNemar’s test

Calculating significance for a categorical variable

• Determine # of correct / incorrect answers for each group (e.g., pre- and post-activity)– Pre-activity: .48*111 = 53 correct / 58 incorrect– Post-activity: .85*157 = 133 correct / 25 incorrect

• Plug these values into online calculator (www.vassarstats.net) – Click “frequency data”– Click “Chi-Square, Cramer’s V and Lambda”

Knowledge ChangeWhich of the following is NOT a complication associated with bisphosphonate therapy?


0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

19%

10%6%

17%

48%

8%4% 2% 1%

85%


37% increase

The proportion of respondents answering this question correctly pre- vs. post activity was not likely due to random chance, P < . 0001

Statistical test of significance:Ordinal variable example

• Participants in a CME activity were asked via ARS to rate their pre- and post-activity use of five clinical practice strategies tied to the CME content

• Survey participant responses were anonymous (i.e., pre/post not matched)

• You want to determine whether the difference in rating pre- vs. post-activity is significantly different

Competency ChangeUsing bevacizumab-based combo therapy for non-squamous NSCLC


5%

10%

15%

20%

25%

30%

35%

40%

45%

32%

9%

41%

14%

5%

14%

18%

23%

32%

14%





• Ordinal variables (e.g., rating scale)

– Unpaired data (comparison groups not matched)• Mann-Whitney U

– Pair data (matched comparison groups)• Wilcoxon signed-rank test

Calculating significance for an ordinal variable

• Go to www.vassarstats.net – Click “ordinal data”– Click “Mann-Whitney U test”

• Enter in the total number of pre-activity (“sample A”) and post-activity (“sample B”) respondents

• Copy and paste pre- and post-activity responses into “sample A” and “sample B” boxes

• Click “Import data to data cells”

• Click “Calculate from Raw Data”

Competency ChangeUsing bevacizumab-based combo therapy for non-squamous NSCLC


5%

10%

15%

20%

25%

30%

35%

40%

45%

32%

9%

41%

14%

5%

14%

18%

23%

32%

14%




The difference between pre- and post-activity rating was statistically significant, P = . 02

That’s it! You’re now able to answer whether there was an

educational effect.

Now onto quantifying the magnitude of such effect…

CME activity

EOM planPre- vs. post-activity assessment via ARS or paper survey

Data 8 case vignette or clinical practice strategy questions

Didactic presentation followed by case-based discussion

Example paper survey question (clinical practice strategy)Using bevacizumab-based combo therapy for non-squamous NSCLC

CME activity

EOM planPre- vs. post-activity assessment via ARS or paper survey

Data 8 case vignette questions or clinical practice strategies

Didactic presentation followed by case-based discussion

How do we summarize this data?

And how do we then compare this result to results of other activities?

What is effect size?

• Quantifies the magnitude of effect (maximum expected range: -3 to +3)

• Difference in means (e.g., pre-test and post-test) divided by the square root of the pooled-group variances (Cohen’s d)

• Enables the comparison of CME effects across activities on a common dimensionless scale

• Calculated from comparison data (e.g., pre/post, post/control) linked directly to CME content− Knowledge questions− Case vignettes− Self-reported frequency of use of key clinical practice strategies

How is it calculated?

Calculating effect size

• Can be done using only MS Excel® and free, online resources

• Approach dependent upon variable type:– ordinal (e.g., clinical practice strategy)– categorical (e.g., case vignette)

Example paper survey question (clinical practice strategy)Using bevacizumab-based combo therapy for non-squamous NSCLC

Calculating effect size for an ordinal variable (e.g., clinical practice strategy)

• Calculate average and standard deviation for each group (e.g., pre- and post-activity)

– Pre-activity: mean (SD) = 2.5 (1.3)– Post-activity: mean (SD) = 3.1 (1.2)

• Plug these values into an online calculator (http://www.uccs.edu/~lbecker/)

Clinical practice strategy (CPS) use rating (1= never, 5 = always)

Pre-test Post-test Effect size

CPS #1 Standard deviation CPS #1 Standard

deviation Cohen’s d

2.5 ±1.3 3.1 ±1.2 .48

*d = .2 (small effect), d = .5 (medium effect), d = .8 (large effect)

How is effect size interpreted? Ordinal variable example

How big was the educational effect?

Expressed in standard deviation units: The average score of a post-test respondent was .48 standard deviations above the average score of a pre-test respondent

Effect sizes are proportional (.48 is twice as much effect as .24)

How does the effect compare to other activities?

Cohen (1988): .2 = small, .5 = medium, .8 = largeWolf (1986): .25 = educationally significant, .50 = clinically significant

Example ARS question (case vignette)Frontline therapy for a former smoker with symptomatic advanced stage adenocarcinoma of the lung (EGFR+)

Calculating effect size for a categorical variable (e.g., case vignette)

• Determine # of correct / incorrect answers for each group (e.g., pre- and post-activity)– Pre-activity: .51*65 = 33 correct / 32 incorrect– Post-activity: .75*65 = 49 correct / 26 incorrect

• Plug these values into online calculator (www.vassarstats.net) – Click “frequency data”– Click “Chi-Square, Cramer’s V and Lambda”

Calculating effect size for a categorical variable, continued

• Visit www.lyonsmorris.com/ma1/index.cfm

• Select “Correlation coefficient (r) to Effect Size”

• Enter Cramer’s V (.1474)

• Enter total number of pre-activity and post-activity respondents

http://www.lyonsmorris.com/ma1/index.cfm

How effective was your live CME in 2013?

10 live CME activities

8 ARS questions per activity

80 slides 10 effect sizesOne summary effect size for

live CME

Overall EIS (by format)June 2010 – Sept 2013

n=9

1. Cohen. J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Earlbaum Associates.2. Mansouri & Lockyer. J Contin Educ Health Prof 2007;27:6-15.3. Drexel et al. Int J Chron Obstruct Pulmon Dis 2011; 6: 297–307. 4. Casebeer et al. BMC Med Educ 2010;10: 42.

4Casebeer et al 2010. Knowledge effect size (eLearning) = .82

3Drexel et al 2011. Competence effect size = .85

2Mansouri & Lockyer 2007. Knowledge effect size = .6

1Cohen J 1988. Small effect = .2, Medium effect = .5, Large effect = .8

Questions?

Resource:www.assessCME.wordpress.com

http://www.assesscme.wordpress.com/