Evaluation workshop

‘How to evaluate your own work’Dr. Catrin Eames

Centre for Mindfulness Research and [email protected]

Workshop for the ‘Mindfulness Now’ conference, CMRP, Bangor University

9th-11th April, 2011

• Rationale for conducting your own evaluations• How to manage the numbers• Suggested evaluation material• Scoring, inputting and analysing• Presentation of results

• To evaluate whether an intervention is worth doing (efficacy trial) • To evaluate whether it works in real life settings (effectiveness trial)• To establish for whom is might work and why (moderators and mediators of outcome)• To establish what service users think about the intervention (qualitative methodologies)

• To challenge beliefs

What we feel might be true anecdotally is not always supported by large scale research studies

• To explore new applications

Research plays a role in exploring whether existing interventions can be applied to new sub-groups

• NHS Trusts or organisations may require you to conduct evaluation• It is imperative to be able to demonstrate improvement associated with your groups• Helps to maintain funding and/or win new funding

5

The most important decisions to make when considering evaluation are:

1) Design2) Evaluation measures

6

• It is very important that you have baseline (before) and outcome measures (after)• It is also important that you have the same measures on everyone• Use evidence based interventions• Deliver intervention with fidelity

7

Be aware of ethical issues• Consent, information sheet, free to withdraw• Protection of participants, e.g. wellbeing• Anonymity• Data storage• Disposal of data

8

One group post-test only design• Audit of satisfaction with a service

One group pre-test -post-test design• Common design in clinical practice• Problem of attributing change to treatment (i.e., causality)

Non-equivalent groups post-test only design• No pre-test data available• Cannot assume similarity before treatment

Non-equivalent groups pre-test -post-test design• Often one group is control• Classic effectiveness study design

Comparison against norms

Published data in other studies

For….Many of our evaluations we have used:

Demographic QuestionnaireBeck Depression InventoryHospital Anxiety and Depression QuestionnaireFive Factor Mindfulness QuestionnaireWHO Well-being Index 5Warwick Edinburgh Mental Health Wellbeing Scale

12

What do you need to know?Do you want to compare outcomes of:-

• Older versus younger participants?• Males versus females?• Different areas?• Any other ideas?

13

• Working/ Unemployed?• Prior mood disorder history?• Progression/take up training/employment• Been on another course/taster? • Cultural background/family history• Teacher effect on outcomes• Gender• Level of engagement prior to course

14

• Family income• Rurality - access issues• First language in the home / how many languages?• Any current medication?

15

• Mean & SD• Change scores • Effect sizes• Excel

• Inputting data• Analysing data• Graphs/chart production

• Writing up results

16

• For evaluation purposes you are most interested in change from start to end.

• Easiest way is to look at MEAN difference • Add up all baseline scores and divide by number of

participants, do same for follow-up.

Standard Deviation: the standard deviation is the most commonly used measure of statistical dispersion. Simply put, it measures how spread out the values in a data set are.

17

Mean Plus 1, 2, 3… SD Minus 1, 2, 3… SD

Even simple spreadsheet programmes like Excel will allow you to conduct simple statistics

20

Intervention N= 19

Control N = 11

Gender 16 female, 3 male 9 female, 2 male

Age M = 41.89 (SD = 13.05) Range 24-64

M = 44.54 (SD = 11.60) Range 24-58

• FREE! • 2-5 minutes to complete• 14 positively phrased items• Total score (min 14 max 70)

22

Cohen’s 1988 guidelines: difference between means divided by pooled SD. 0.3 = clinically useful change, 0.5 medium effect, 0.8 = large effect

Mean before

SD before

Mean after

SD after

Mean change

Pooled SD

Effect size

WEMWBS- I

14.16 3.66 17.84 3.30 3.68 3.48 1.06

WEMWBS-C

15.18 3.49 14.45 3.50 -0.72 3.50 -.21

• Change scores are useful• Easy and simple way of evaluating change• Change scores should demonstrate improvements in behaviour outcome

23

24

Cohen’s D - difference between mean of two groups divided by pooled S.D. of both groups

Glass Delta - difference between mean of two groups divided by mean SD of control group

Note: both of these can be used to look at post treatment group differences or treatment group pre and post differences

Cohen’s D

Mean of intervention - Mean of control/ (SD of intervention + SD of control)/2

Glass’s delta

Mean of intervention - Mean of control/ (SD of control)

There were XX participants in total from two group conditions (Intervention N = XX, Control N = XX). The mean age was XX (range xx-xx, SD = XX ).

At baseline the two groups DID/DID NOT differ significantly on XX/YY. The mean at baseline was ??(SD=XX) and at follow-up was ?? (SD = XX), respectively. The mean change score was therefore?? with an effect size of ?? This study suggests the intervention has impacted on participants’ self reported well-being. Furthermore this change is statistically significant as demonstrated by t-test analyses, t(20), =2.61, p<.05

27

28

• Title• Abstract (summary)• Introduction• Method

ParticipantsInterventionMeasuresDesign

• Results• Discussion• References

29

• 21-item self-report inventory measuring the severity of characteristic attitudes & symptoms associated with depression• Each item contains four possible responses which range in severity from 0 ( I do not feel sad) to 3 ( I am so sad or unhappy that I can’t stand it)• Score of 10-18 = mild to moderate depression• Score of 19-29 = moderate to severe depression30-63 = severe depression

Purchase from: http://www.pearson-uk.com

30

• 39-item self-report questionnaire used to assess five different facets of mindful awareness.

• non-reactivity to inner experience, • observing, • acting-with-awareness, • describing and • non-judging of experience.

• 5-point Likert scale (1= never o very rarely true; 5 = very often or always true).

• Rationale for conducting your own evaluations• How to manage the numbers• Evaluation measures• Scoring, inputting and analysing• Presentation of results

32

• Mean score• =AVERAGE(data)

• Standard deviation• =STDEV(data)

• T-test• Data Analysis -> t-test

• Independent = Different groups• Paired = Matched groups

Documents

Evaluation workshop