148
Advanced Analytı F cs, LLC AgreeStat 2015.1 for Excel Windows/ Mac User’s Guide April 6, 2015 Advanced Analytics, LLC Maryland, USA

F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

  • Upload
    trandan

  • View
    223

  • Download
    0

Embed Size (px)

Citation preview

Page 1: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

Advanced AnalytıF

cs, LLC

AgreeStat 2015.1 for Excel

Windows/ Mac

User’s Guide

April 6, 2015

Advanced Analytics, LLC

Maryland, USA

Page 2: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

ProprietaryNotice

Advanced Analytics, LLC owns both this software pro-gram and its documentation. Both the program anddocumentation are copyrighted with all rights reservedby Advanced Analytics, LLC

The correct bibliographical reference for this documentis as follows:

AgreeStat 2015.1 for Excel Windows/Mac User’sGuide, Advanced Analytics, Maryland, USA

CopyrightNotice

Copyright c© 2010-2014, Advanced Analytics, LLC. Allrights reserved.

Advanced Analytics, LLC

PO Box 2696

Gaithersburg, MD 20886-2696

USA

Published by Advanced Analytics, LLC .

No part of this document or the related files may be reproduced or transmitted inany form, by any means (electronic, photocopying, recording, or otherwise) withoutthe prior written permission of the publisher.

Advanced Analytics, LLCPO BOX 2696,Gaithersburg, MD 20886-2696e-mail: [email protected]

This publication is designed to provide accurate and authoritative information in

regard of the subject matter covered. However, it is sold with the understanding

that the publisher assumes no responsibility for errors, inaccuracies or omissions.

The publisher is not engaged in rendering any professional services. A competent

professional person should be sought for expert assistance.

Page 3: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1
Page 4: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

CONTENTS

Chapter 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

Welcome to AgreeStat 2015.1 . . . . . . . . . . . . . . . . . . . . . . . . 2

Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

What’s New in AgreeStat 2015.1 . . . . . . . . . . . . . . . . . . . . 5

Help, and Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Chapter 2. Using AgreeStat 2015.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Starting AgreeStat 2015.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

Organizing your Data in a Two-Rater Study . . . . . . . . . . 11

Your Data in a 2-Rater Study . . . . . . . . . . . . . . . . . . . . . . . . . 15

Your Data in a Multiple-Rater Study . . . . . . . . . . . . . . . . . .19

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

PART I: CHANCE-CORRECTED AGREEMENTCOEFFICIENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

Chapter 3. Chance-Corrected Measures of Agreement for TwoRaters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .29

Contingency Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

Two Columns of Raw Scores . . . . . . . . . . . . . . . . . . . . . . . . . . 42

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

Chapter 4. Chance-Corrected Agreement Measures for Three Ratersor More . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

Analysis of Raw Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

Analysis of the Distribution of Raters . . . . . . . . . . . . . . . . . 59

Statistical Calculations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

- i -

Page 5: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

Contents - ii -

Chapter 5. Using Weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

Pre-Defined Weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

Custom Weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

PART II: INTRACLASS CORRELATION COEFFICIENTS . . . 82

Chapter 6. INTRACLASS CORRELATION COEFFICIENTS WITH83

Organizing Your Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

Describing the Experimental Design. . . . . . . . . . . . . . . . . . . .88

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

Chapter 7. INTRACLASS CORRELATION COEFFICIENTS: THESTATISTICAL CALCULATIONS . . . . . . . . . . . . . . . . . . . . . 94

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

Each Target Rated by a Different Set of Raters . . . . . . . .96

Each Rater Rates a Different Group of Subjects . . . . . . 100

Each Rater Rates all Subjects with Rater-Subject

Interaction (Subject & Rater Effects are Random) . 104

Each Rater Rates all Subjects with Rater-Subject

Interaction (Random Subject Factor & Fixed

Rater Factor). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .109

Each Rater Rates all Subjects without Rater-Subject

Interaction (Subject and Rater as Random Effects) . 113

Each Rater Rates all Subjects without Rater-Subject

Interaction (Random Subject Effect & Fixed

Rater Effect) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78Subject Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .79

Page 6: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

CHAPTER��

� 1

INTRODUCTION

1.1. Welcome to AgreeStat 2015.1 . . . . . . . . . . . . . . . . . . . . . . . 2

1.2. Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3. What’s New in AgreeStat 2015.1 . . . . . . . . . . . . . . . . . . . . 4

Missing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

Custom Weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4

Standard Errors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4

1.4. Help, and Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

Online Help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

HTML Help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

- 1 -

Page 7: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

1.1. Welcome to AgreeStat 2015.1 - 2 -

1.1. WELCOME TO AgreeStat 2015.1

AgreeStat 2015.1 is a self-automated workbook that contains a VBA1

program, and therefore requires no installation. It is a point-and-clickprogram that is easy to use, and that computes many well-known inter-rater reliability coefficients along with their standard errors. AgreeStat2015.1 is designed to be compatible with Excel 2010, 2007, and 2003 underthe Windows operating system. It has not been tested on Excel versionsearlier than Excel 2003.

AgreeStat 2015.1 is the ideal solution for your analysis of inter-raterreliability data. Its familiar Excel user interface gives you point-and-clickaccess to data manipulation, graphing, and statistics. With AgreeStat2015.1 you can compute chance-corrected agreement coefficients as well asIntraclass Correlation Coefficients (ICC) . Chance-corrected coefficientsare those proposed in the literature such as, Cohen’s Kappa, Krippen-dorff’s Alpha, Gwet’s AC1, and more. The intraclass correlation co-efficients are more generalized versions of the coefficients described byShrout and Fleiss (1979) or McGraw and Wong (1996). Unlike the ICCsdiscussed in these papers, AgreeStat 2015.1 can handle replicates andmissing values.

In a typical AgreeStat 2015.1 session, you can accomplish the following tasks:

I Select the workbook and worksheet containing your data, or enter yourdata in a contingency table displayed on the screen.

I Choose to compute intraclass correlation coefficients or chance-correctedagreement coefficients.

I Specify the ANOVA2 model needed to compute the intraclass correlationcoefficients.

I Specify the specific agreement coefficients that you want to compute froma list containing many coefficients proposed in the literature.

1VBA=Visual Basic for Applications2ANOVA = Analysis of Variance

Page 8: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

1.2. Installation - 3 -

I Choose to run a weighted or an unweighted analysis for chance-correctedagreement coefficients.

I Select a set of weights from a list of pre-defined weights proposed in theliterature for chance-corrected agreement coefficients.

I Supply your own custom weights tailored to your interpretation of thenature of disagreements.

I Select the sub-group analysis, and choose specific domains or sub-groupsto analyze. You may even select a few raters that will be included in theanalysis.

I Retrieve your results from an ”Output” worksheet.

1.2. INSTALLATION

AgreeStat 2015.1 requires no installation. However, it is generally de-livered in the form of a zip file AgreeStat.zip, which contains the followingtwo files:

• AgreeStat2013.1.xlsm, the Excel self-automated workbook contain-ing the VBA program.

• AgreeStatGuide.pdf, the user’s guide.

Page 9: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

1.3. What’s New in AgreeStat 2015.1 ? - 4 -

1.3. WHAT’S NEW IN AgreeStat 2015.1 ?

Sub-groupAnalysis

AgreeStat 2015.1 now allows you to submit separate analyzesfor multiple domains in batch mode. This feature can be partic-ularly useful in situations where ratings are collected for distinctgroups of subjects that must be analyzed separately.

MultivariateAnalysis

AgreeStat 2015.1 now allows you to analyze several character-istics simultaneously. This feature may prove useful if subjectsare rated with respect to several characteristics, and agreementamong raters must be evaluated separately for each of the char-acteristics.

MissingData

The treatment of missing data has been improved in many ways.For example one may now submit a dataset of ratings withoutensuring that each subject was rated or that a rating is asso-ciated with each rater. AgreeStat 2015.1 reviews the inputdata and clean it before it processes it.

Selectionof Raters

For studies involving multiple raters, AgreeStat 2015.1 nowallows you to select all of them, or deselect those raters youwant to exclude from the analysis. Therefore, you can create anextract of your dataset from within AgreeStat 2015.1 withouthaving to do lots of cutting and pasting within Excel.

SavingResults

AgreeStat 2015.1 now gives you an easy way to back up yourresults after each analysis prior to running subsequent analyzes.This can be done by clicking the “Back up this Output” com-mand button located on the top row of AgreeStat 2015.1 mainOutput sheet.

Page 10: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

1.4. What’s New in AGREESTAT 2011.1 ? - 5 -

WindowsOffice2013

Those using the Windows versiion of AgreeStat 2015.1 withOffice 2013 or a more recent edition of Office must have theirrating data in separate worksheets of the same workbook asAgreeStat 2015.1 . They will not be able to select otherworkbooks from within AgreeStat 2015.1 . This is due to thefact that beginning with Office 2013, Microsoft decided to in-troduce the Single Document Interface (SDI) to its Office suitesas opposed to the old Multiple Document Interface (MDI). TheSDI feature of Office 2013 makes each Excel workbook a singleinstance, and other workbooks cannot be part of that same in-stance. Consequently an AgreeStat 2015.1 form can no longermanipulate other workbooks with ease.

1.4. WHAT’S NEW IN AGREESTAT 2011.1 ?

CustomWeights

When the ratings assigned to subjects are ordinal, the standardrecommendation is to use a weighted analysis with kappa-likeagreement coefficients. Although many pre-defined weights havebeen proposed in the literature, some researchers have strongopinions about the amount of credit some disagreements shouldreceived towards agreement. AgreeStat 2015.1 allows theseresearchers to ignore the existing standard weights and to specifytheir own custom weights.

StandardErrors

AgreeStat 2015.1 computes standard errors and confidenceintervals for all unweighted and weighted coefficients. Whenthe number of raters exceeds 2, then AgreeStat 2015.1 canproduce standard errors based on the sampling of subjects, aswell as standard errors based on the sampling of subjects andraters.

IntraclassCorrela-tion

AgreeStat 2015.1 now implements the many Intraclass Corre-lation Coefficients, including those discussed by Shrout & Fleiss(1979). Note that AgreeStat 2015.1 can handle replicates aswell as missing values, in addition to computing computing con-fidence intervals and p-values.

Page 11: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

1.5. HELP, AND SUPPORT - 6 -

1.5. HELP, AND SUPPORT

There are a variety of ways to master the AgreeStat 2015.1 program. Thissection describes the support resources available to AgreeStat 2015.1 users.

User’sGuide

This User’s Guide is certainly the most detailed resource thatAgreeStat 2015.1 users can have. It contains several examplesof datasets and step-by-step procedures for analyzing them withAgreeStat 2015.1 .

TipsButtons

AgreeStat 2015.1 also offers a context-sensitive help systemthat can be accessed by clicking the Tips? button on the differ-ent forms as shown in Figure 1.1.

Figure 1.1. AgreeStat 2015.1 help form

Page 12: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

1.5. HELP, AND SUPPORT - 7 -

Bibliography

[1] McGraw, K. O., and Wong, S. P. (1996), “Forming Inferences About

Some Intraclass Correlation Coefficients,” Psychological Methods, 1, 30-46.[

2] Shrout, P. E., and Fleiss, J. L. (1979), “Intraclass Correlations: Uses inAssessing Rater Reliability,” Psychological Bulletin, 86, 420-428.

Page 13: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

CHAPTER��

� 2

USING AGREESTAT 2015.1

2.1. Starting AgreeStat 2015.1 . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2. Organizing your Data in a Two-Rater Study . . . . . . . . .11

Your Contingency Table for a 2-Rater Study . . . . . . . . . . 12

Creating the Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Your Raw Data for a 2-Rater Study . . . . . . . . . . . . . . . . . . . 15

Reading the Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.3. Your Data in a Multiple-Rater Study . . . . . . . . . . . . . . . . 19

Your Raw Data for 3 Raters or More . . . . . . . . . . . . . . . . . . 20

Reading the Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

Your Distribution of Raters for 3 Raters or More . . . . . . 23

Reading the Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

- 8 -

Page 14: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

2.1. STARTING AgreeStat 2015.1 - 9 -

2.1. STARTING AgreeStat 2015.1

AgreeStat 2015.1 is a self-automated workbook that contains 2worksheets named START and Output as shown in Figure 2.1. Be-cause this workbook is locked, you cannot add another worksheet toit. Moreover, AgreeStat 2015.1 will always open on the STARTworksheet

To launch AgreeStat 2015.1 , Click on the “Start AgreeStat” but-ton on the START worksheet (see Figure 2.2). This action will openup the main AgreeStat 2015.1 form shown in Figure 2.3.

Figure 2.1: The 2 Worksheets of AgreeStat 2015.1

Figure 2.2: The START worksheet of AgreeStat 2015.1

Page 15: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

2.2. Organizing your Data in a 2-Rater Study - 10 -

2.2. ORGANIZING YOUR DATA IN A TWO-RATER STUDY

After launching AgreeStat 2015.1 you will be presented with theform shown in Figure 2.3. You must decide whether you want toconduct a two-rater or a three-rater analysis by selecting the appro-priate tab.

The “2-Rater” form offers two possible formats in which you mayorganize your data: the contingency table format, and the raw scoreformat. The format of your choice is specified by selecting the corre-sponding radio button on the blue strip. An example of dataset foreach of the 2 formats is shown beneath its associated radio button.

Figure 2.3: The Main Form of AgreeStat 2015.1

The contingency table option only allows you to compute chance-corrected agreement coefficients such as Cohen’s Kappa, Scott’s Pi,Gwet AC1, krippendorff’s alpha, and others. The “2 Columns ofScores” option on the other hand, allows you to compute both the

Page 16: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

2.2. Organizing your Data in a 2-Rater Study - 11 -

chance-corrected coefficients as well as the intraclass correlation co-efficients. Intraclass correlation coefficients are discussed in detailsin chapters 6 and 7. This latter format can also analyze data con-taining missing ratings (i.e. a dataset with subjects that are ratedby a single rater), and to perform sub-group analysis by specifyingthe group membership of each subject as shown in the example rawscores (see figure 2.6). The sub-group analysis performs separateanalyzes for each individual groups and for all groups combined

YOUR CONTINGENCY TABLEFOR A 2-RATER STUDY

Consider a reliability study that takes place in the emergency roomof a hospital, where 2 abstractors must rate 100 pregnant womenwho are admitted with either an abdominal pain or a vaginal bleed-ing. The exercise consists of reviewing their medical records andclassifying their pregnancy into one of the following 3 categories:

•Ectopic •Abnormal Intrauterine (AIU) •Normal Intrauterine (NIU).

If both abstractors rate each of the 100 women, then there is nomissing rating in your data, and you may organize the ratings ina contingency table as shown in Table 2.1. This format is by nomeans a requirement, since an alternative format using raw scoresand discussed later is available, and often recommended.

Table 2.1: Distribution of 100 pregnant womenby pregnancy type and by abstractor

Abstractor 2Abstractor 1 Ectopic AIU NIU

Total

Ectopic 13 0 0 13AIU 0 20 7 27NIU 0 4 56 60

Total 13 24 63 100

Researchers often organize their ratings in a contingency table beforereporting them in publications. Therefore, this table may well be

Page 17: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

2.2. Organizing your Data in a 2-Rater Study - 12 -

all you have for analysis. AgreeStat 2015.1 provides a templatefor capturing such a table, and an option to copy an existing tablefrom an Excel worksheet, and pasting it into the contingency tabletemplate of AgreeStat 2015.1 .

Describing a Contingency Table in AgreeStat 2015.1

To analyze the contingency table 2.1, proceed as follows:

• From the screen of Figure 2.3, Click on the “Execute” button afterselecting the “Contingency Table” radio button. Clicking on theExecute button will open the multiple-page form shown in Figure2.4.

• On the “Input Data” page, Specify the number of categories first.To manually capture Table 2.1 data, click on the combo box wherethe number 2 appears, and select number 3. This combo box givesyou the opportunity to select any integer value from 2 to 99. Thatis, you may analyze data that are classified in up to 99 categories.

• Note that any motion of the cursor over the left gray frame used forcategory definition automatically updates category labels associatedwith “Rater A” and “Rater B”. If after defining all categories thetable cells are disabled, or category labels associated with ratersA and B are not updated, simply move the cursor over the graycategory definition frame to resolve the problem.

• Figure 2.5 shows what the “Input Data” form will look like after allthe data is captured.

• When a table cell contains 0, you have the choice to key in thenumber 0, or to leave it empty.

• Each time the cursor points to a particular table cell, a tool tip isautomatically displayed showing rater A and rater B categories as-sociated with that particular cell. For example Figure 2.5 shows thetool tip “(NIU, AIU)” which indicates that the cursor was pointingto the cell containing 4.

Page 18: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

2.2. Organizing your Data in a 2-Rater Study - 13 -

Figure 2.4: The Initial Look of the “Input Data” Form

Figure 2.5: The Final Look of the “Input Data” Form

Page 19: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

2.2. Organizing your Data in a 2-Rater Study - 14 -

YOUR RAW DATA FOR A TWO-RATER STUDY

Consider a simple study where 2 observers named Ben and Gerryare classifying a set of 12 items into one of 5 categories1 labeled asa, b, c, d, and e. The outcome of this study as shown in Figure2.6. While Ben classified all 12 items into a category, Gerry onlyclassified 11 of the 12 items, omitting to classify item #10. Let usalso assume that the 12 participating subjects belong to two groupsnames G1 and G2, and that the subgroup analysis may be of interestto the researcher. The subgroup analysis will allow the researcherto conduct inter-rater reliability analysis separately for each groupto see if Ben and Gerry agree more on subjects from one group thanon subjects from the other group.

• The reliability data of Figure 2.6 is stored in the same workbookas AgreeStat, and in a worksheet named “CAC(Ben & GerryScores).” AgreeStat may allow you to store that data in anindependent workbook for some versions of Office.

Figure 2.6: Ratings from the 2 observers Ben and Gerry

• To be processed properly, the missing data must be representedin the Excel worksheet in the form of an empty cell (a cell thatcontains no character, not even the ‘space’ character). Any

1Note that only chance-corrected coefficients allow for both numeric and alphabetic scores.Intraclass correlation calculations require the scores to be numeric exclusively (see chapter 6for a more detailed discussion on intraclass correlation)

Page 20: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

2.2. Organizing your Data in a 2-Rater Study - 15 -

character (blank or not) used to code the missing data will betreated by AgreeStat 2015.1 as a legitimate category.

NOTE 2.1.

• If there are missing ratings in your data, then it is your rawdata in the format shown in Figure 2.6 that you must supplyto AgreeStat 2015.1 ; and not the contingency table.

• The “Items” column in Figure 2.6 is not needed forcomputing chance-agreement coefficients. This columnbecomes mandatory only if you want to compute theintraclass correlation coefficients.

ReadingtheDataset

To supply the raw scores of Figure 2.6 to AgreeStat 2015.1 ,follow the steps shown below (since the scores in Table 2.6 arealphabetic, you should naturally select the “Chance-correctedAgreement Coefficients (CAC)” radio button, since intraclasscorrelation coefficients require the scores to be numeric (see Fig-ure 2.7).):

• From the screen of Figure 2.3, Click on the “Execute” buttonafter selecting the “2 Columns of Raw Scores” radio button inthe right blue strip. This action displays the two-page form ofFigure 2.7.

• (Skip this step if using Windows Office 2013 or more recent edi-tion) The two-page form of Figure 2.7 contains the “Input Data”page that is discussed in this section, and the “Options (CAC)”page that will be discussed in chapter 3. The “Select Workbook”combo box of the “Input Data” page2 contains all workbooksopened within the same Excel instance as AgreeStat 2015.1. Click on this combo box, and select the workbook containingyour data.

2Those using the Windows version of AgreeStat 2015.1 under Office 2013 or a morerecent edition will not see this combo box, and need to have their data in a worksheet thatis part of the same workbook as AgreeStat 2015.1 .

Page 21: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

2.2. Organizing your Data in a 2-Rater Study - 16 -

• Following the selection of the data workbook, all of its work-sheets will be displayed in the “Worksheets” list box, and thecontent of the selected worksheet will appear in the background.To analyze the Ben-Gerry data, Select the “CAC(Ben & GerryScores)” worksheet as shown in Figure 2.8.Users of Windows Of-fice 2013 or newer edition will only see the “Worksheets” listboxwithout the combobox. Note that only the Ben & Gerry datain blue and in columns D and C are needed for the purposeof evaluating the overall extent of agreement between the tworaters.

• Using the 2 refEdit controls on the right side of the Input Datapage, select the 2 columns of data, one column at a time.

• To select Ben’s ratings, click inside the first refEdit control (la-beled as “Select Rater1’s Scores”), then move the cursor to Ben’sdata on the worksheet and highlight all the ratings including thecolumn name “Ben.” You will include the name Ben because the“Label in First Row” checkbox is selected. If you deselect thischeckbox then the name Ben should not be included in yourselection. Repeat this process with Gerry’s data.

• After specifying all input ratings, the final look of the “InputData” form is shown in Figure 2.8. If you click on the Exe-cute button, AgreeStat 2015.1 will display the results in the“Output” worksheet. Note that only the “Ben” and “Gerry”columns of data have been selected. There is no need to selectany other columns at this stage.

Page 22: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

2.2. Organizing your Data in a 2-Rater Study - 17 -

Figure 2.7: The Initial Look of the “Input Data” Form

SubgroupAnalysis

To analyze Ben-Gerry ratings separately by subject group, thefirst thing to do is to check the “Sub-Group Analysis” checkboxas shown in Figure 2.9. As a result, a new refEdit control labeledas “Select the Column of Group Names” is displayed. Here iswhere you must specify the group membership of each subjectby selecting the Group column of data as shown in Figure 2.10.

• Intraclass CorrelationTo calculate intraclass correlation coefficients, you will specifyyour data to AgreeStat 2015.1 as shown in Figure 2.11. Thedata set used in this figure is contained in the worksheet named“ICC(2 Raters)” which is part of the AgreeStat workbook. Thisdata set is made up of the “Target” column, which contains sub-ject names, the “Group” column showing the group membershipof the targets, and 2 columns of numeric scores. Assuming thatthe sub-group analysis is desired, (1) the “Group” column willbe selected first using the “Yellow” combo box, then (2) the“Target” column will be selected, and finally (3) the 2 columnsof numeric scores will be selected simultaneously in the “green”combo box. The ”Options (ICC)” page is discussed in chapter6.

Page 23: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

2.2. Organizing your Data in a 2-Rater Study - 18 -

Figure 2.8: The Final Look of the “Input Data” Form After Specifying Figure2.6 Data

Figure 2.9: A Look of the “Input Data” Form for Subgroup Analysis

Page 24: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

2.2. Organizing your Data in a 2-Rater Study - 19 -

Figure 2.10: A Look at the “Input Data” Form after Specifying Figure 2.6 Datafor Group Analysis

Figure 2.11: The “Input Data” Form for Computing the Intraclass Correlation

Page 25: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

2.3. Your Data in a Multiple-Rater Study - 20 -

2.3. YOUR DATA IN A MULTIPLE-RATER STUDY

After launching AgreeStat 2015.1 you will be presented with themain form shown in Figure 2.3. If your study involves 3 raters ormore, then you must select the “3 Raters or More” tab to obtainthe form shown in Figure 2.12.

Figure 2.12: The Main Form of AgreeStat 2015.1

The “3 Raters or More” form gives you two possible formats fordescribing your input data:

• The first format (the “Columns of Raw Scores” format) con-sists of a series of adjacent columns of raw scores, each col-umn representing the ratings that one observer assigned to thesubjects. Table 2.2 shows an example of such a format. The“Group” column is NOT mandatory UNLESS you are goingto do subgroup analysis. In this case, you must use it so thatAgreeStat 2015.1 can identify the group membership of each

Page 26: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

2.3. Your Data in a Multiple-Rater Study - 21 -

subject. The “Units” column is NOT mandatory UNLESS youwant to compute intraclass correlation coefficients (ICC), inwhich case it becomes mandatory. For chance-agreement coef-ficients, the “Units” column identifying subjects is not needed,although you may still want to keep it for personal reference.

• The second format (the “Distribution of Raters by Category”format) consists in a series of adjacent columns, each represent-ing one category, and containing the distribution of raters acrosssubjects. This format can be used only to compute chance-corrected agreement coefficients.

• The scores shown in Table 2.2 are alphabetic. This shouldlead to the calculation of chance-corrected agreement coeffi-cients only. These scores must be numeric in order to requestthe intraclass correlation coefficients.

Table 2.2: Ratings assigned to 12 units by 4 raters

Group Units Rater1 Rater2 Rater3 Rater4G2 1 a a aG1 2 b b c bG2 3 c c c cG1 4 c c c cG1 5 b b b bG1 6 a b c dG2 7 d d d dG1 8 a a b aG2 9 b b b bG2 10 e e eG1 11 a aG1 12 cG2 13 b b c bG2 14 b a c b

YOUR RAW DATAFOR THREE RATERS OR MORE

Page 27: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

2.3. Your Data in a Multiple-Rater Study - 22 -

Consider Table 2.2 that summarizes the outcome of an inter-raterreliability experiment where 4 observers must rate 14 units belongingto two groups G1 and G2, by classifying them into one of 5 possiblecategories labeled as a, b, c, d, and e.

NOTE 2.2.

Table 2.2 contains several missing values. These missing valuesmust be represented with empty cells only, for them to be treatedproperly

Since these scores are alphabetic, they must be analyzed with chance-corrected agreement coefficients, which only require you to sup-ply the columns labeled as Rater1, Rater2, Rater3, and Rater4 toAgreeStat 2015.1 (The “Units” column should not be supplied).

Reading the Dataset

• The raw data of Table 2.2 is described in a worksheet named CAC(GroupAnalysis) (see Example 9) in the same workbook as AgreeStat (c.f.Figure 2.13).

Figure 2.13: The Raw Data for 3 Raters or More

Page 28: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

2.3. Your Data in a Multiple-Rater Study - 23 -

• From the form of Figure 2.12, Select the “Column of Raw Data”radio button, and click on the Execute button. This action will openup a dialog form that you should complete as shown in Figure 2.14for the global analysis (i.e. all groups combined) of Figure 2.13data. For the subgroup analysis, the dialog form should be com-pleted as shown in Figure 2.15. The selected “Method of Analysis”is “Chance-corrected Agreement Coefficients.” This is recommendeddue to the alphabetic nature of the scores.

For the calculation of intraclass correlation coefficients, refer to chap-ter 6 further details. However, the general approach for specifyingthe ratings is the same for chance-agreement and intraclass corre-lation coefficients. The main difference as mentioned earlier is theuse of subject labels (i.e “Units” column for the data used in thischapter) that is mandatory when calculating intraclass correlationcoefficients (it is how you identify different trials or replicates).

Figure 2.14: Final Look of “Input Data” Form for Global Analysis

Page 29: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

2.3. Your Data in a Multiple-Rater Study - 24 -

Figure 2.15: Final Look of “Input Data” Form for Subgroup Analysis

YOUR DISTRIBUTION OF RATERSFOR THREE RATERS OR MORE

Table 2.2 reliability data could be re-organized in the form of adistribution of raters by subject and category as shown in Table 2.3(c.f. worksheet “CAC(Group Analysis)” in the AgreeStat workbook- Example #10). It follows from Table 2.3 that 3 raters classified unit#1 into category a, while unit #2 was classified into category b by3 raters, and into category c by 1 rater. Organizing your ratings inthe form of a distribution of raters by subject and category presentsone advantage and 2 disadvantages:

• The advantage is obtained when the number of categories issmall, and the number of raters large. The dataset takes lessspace on the paper and in the computer memory, since it willhave fewer columns in the latter format than in the former.Both gains in space are minor, unless you want to present yourdataset in a journal article, or in a book where space is lim-ited. Nevertheless, this format has been used in the literaturenumerous times (see Fleiss ,1971 for example).

• One disadvantage is the loss of information. Table 2.3 indicatesthat only 1 rater classified unit #12 into category c, but does

Page 30: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

2.3. Your Data in a Multiple-Rater Study - 25 -

not tell what rater. That makes it impossible to compute someagreement coefficients such as the generalized Kappa coefficientof Conger (1980). Moreover, some researchers may be interestedin the agreement coefficient’s variance due to the sampling ofraters. This can be computed only if rater-level information isavailable.

• As a second disadvantage, the distribution of raters by subjectand by category makes it impossible to compute intraclass cor-relation coefficients even with numeric scores. It only allows forthe calculation of chance-corrected agreement coefficients.

Once again the “Group” column is not mandatory, unless you wantto do subgroup analysis. The subgroup analysis requires each unitsto be assigned to a unit group, and the different groups could beanalyzed separately. The “Unit” column on the other hand is usedhere for reference only, and is never part of any analysis.

Table 2.3: Distribution of Raters by Subject and Category

CategoriesGroup Unit

a b c d e

G2 1 3 0 0 0 0G1 2 0 3 1 0 0G2 3 0 0 4 0 0G1 4 0 0 4 0 0G1 5 0 4 0 0 0G1 6 1 1 1 1 0G2 7 0 0 0 4 0G1 8 3 1 0 0 0G2 9 0 4 0 0 0G2 10 0 0 0 0 3G1 11 2 0 0 0 0G1 12 0 0 1 0 0G2 13 0 3 1 0 0G2 14 1 2 1 0 0

Page 31: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

2.3. Your Data in a Multiple-Rater Study - 26 -

ReadingtheDataset

To supply the data of Table 2.3 to AgreeStat 2015.1 you wouldproceed as follows:

• Create a data table similar to that of Example #10 in the Agree-Stat’s worksheet named CAC(Group Analysis) and shown in fig-ure 2.16.

• From the form of Figure 2.12, Select the “Distribution of Raters”radio button, and click on the Execute button. This action dis-plays a dialog form to be completed as in Figure 2.17.

In Figure 2.16, note that the “Units” column of data is notincluded in the selection of data.

• If you want to do subgroup analysis, then your data will need tobe specified as shown in Figure 2.18. The “Sub-Group Analysis”checkbox on the top right side of this figure must be checkedfirst.

Figure 2.16: The Distribution of Raters by Unit and Category

Page 32: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

2.3. Your Data in a Multiple-Rater Study - 27 -

Figure 2.17: Final Look of “Input Data” Form for the Global Analysis of Figure2.16 Data

Figure 2.18: Final Look of “Input Data” Form for the Subgroup Analysis ofFigure 2.16 Data

Page 33: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

Bibliography

[1] Conger, A. J. (1980), “Integration and Generalization of Kappas for Mul-

tiple Raters,” Psychological Bulletin, 88, 322-328.[2] Fleiss, J. L. (1971). “Measuring nominal scale agreement among many

raters”, Psychological Bulletin, 76, 378-382.

Page 34: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

PART I

CHANCE-CORRECTED AGREEMENTCOEFFICIENTS

Page 35: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

CHAPTER��

� 3

CHANCE-CORRECTED MEASURES OFAGREEMENT FOR 2 RATERS

3.1. Contingency Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

Nominal Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

Ordinal Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .32

Miscellaneous Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .36

Statistical Calculations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.2. Two Columns of Raw Scores . . . . . . . . . . . . . . . . . . . . . . . . .42

Nominal Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

Ordinal Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .44

Statistical Calculations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

- 30 -

Page 36: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

3.1. Contingency Table - 31 -

3.1. CONTINGENCY TABLE

Suppose that you want to analyze the ratings shown in Table 2.1of chapter 2 (see Example #2 in the “CAC(3x3 Table)” worksheetof AgreeStat workbook). This data represents the distribution of100 human subjects by abstractor and by pregnancy type. Theproblem consists of calculating the extent of agreement between the2 abstractors. Intuitively, this agreement is high if the 2 abstractorsconsistently categorize the pregnancies into the same type, whichwill result in large diagonal counts.

There are 2 ways in which you may look at the pregnancy type:

• Nominal Data. The 3 pregnancy types could be seen as nom-inal data. That is, these are 3 labels associated to 3 unrelatedpregnancy conditions, where agreement occurs only when the 2abstractors categorize a pregnancy into the “Exact”same grouptype.

• Ordinal Data. The 3 pregnancy types are seen as ordinal data.That is, some disagreements are seen as being more serious thanothers. The less serious disagreements are then perceived asbeing partial agreements as opposed to full agreement in thecase of a perfect match.

NOMINAL DATA

Analysis If you want to analyze your ratings as nominal data, then af-ter capturing Table 2.1 data as shown in Figure 2.5, click onthe “Execute” command button, and look at the results on the“ Output” sheet.

The output of this analysis is shown in Figure 3.1. The firstoutput table contains the input data augmented with marginalcounts and proportions. The second output table contains the6 agreement coefficients implemented in AgreeStat 2015.1 ,along with their standard errors, and 95% confidence intervals.

Page 37: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

3.1. Contingency Table - 32 -

All these agreement coefficients are corrected for chance agree-ment except the percent agreement, which represents essentiallythe relative number of subjects that both raters classified intothe exact same category. The percent agreement is displayed tohelp researchers evaluate the overall impact of chance agreementcorrection.

Figure 3.1: AgreeStat 2015.1 Output for the analysis of Table2.1 Data

Results & InterpretationThe chance-corrected agreement coefficients vary from 0.79624for Scott’s Pi to 0.8493 for Gwet’s AC1. Note that these num-bers are based on a single sample of 100 subjects. That is,another sample of 100 subjects will surely produce a differentset of agreement coefficients. If we decide to use many inde-pendent such samples of 100 subjects, then we should be ableto produce a series of values for each coefficient. The StandardError labeled as StdErr in the table, is a statistical measure that

Page 38: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

3.1. Contingency Table - 33 -

indicates how far we would expect any given coefficient value tostray from the overall average.

Ideally, you will want the standard error to represent 15% ofthe value of the agreement coefficient or less. Otherwise, theagreement coefficient is considered to be imprecise.

Confidence IntervalThe 95% confidence interval is intimately related to the notionof standard error. It represents a range of values (from the lowerconfidence bound to the upper confidence bound) supposed tocontain the “true” agreement coefficient with 95% certainty. Forexample, it follows from Figure 3.1 that our estimation of theextent of agreement between raters A and B based on Cohen’sKappa is 0.796409. However, due to the variation induced bythe sampling of subjects, the true value of kappa may be as lowas 0.68 and as high as 0.913. You may change the confidencelevel from 95% to something else. A more detailed discussion ofthe notion of confidence interval may be found in Gwet (2011a)or any other good introductory statistics book.

ORDINAL DATA

Analysis You may feel that not all disagreements should be consideredequal. Perhaps 2 raters classifying the same pregnancy as Ec-topic and IUP (A) will produce a more serious disagreementthan if they had classified that pregnancy as IUP(A) and IUP(N). In such a case, you will want to treat the pregnancy type asordinal data, and conduct a weighted analysis. You may wantto see Gwet (2010) for a more detailed discussion on the notionof weighting.

Page 39: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

3.1. Contingency Table - 34 -

After capturing your Table 2.1 data as shown in Figure 2.5, Clickon the “Options” tab to display the “Options” form of Figure3.2. You may choose one of the predefined weights, or definedyour own custom weights. Note that the “Predefined Weights”and “Custom Weights” radio buttons will remain disabled untilyou select the “Weighted Analysis” checkbox.

QuadraticWeights

For illustration purposes, Select the “Quadratic Weights” asshown in Figure 3.2, then Click the Execute button. In addi-tion to the unweighted analysis results of Figure 3.1, you willalso get the weighted analysis of Figure 3.3.

Figure 3.2: Analysis Options for the Contingency Table

The first table in Figure 3.3 contains the quadratic weights usedin analysis. To calculate these weights, AgreeStat 2015.1numbers the three categories sequentially with the integer val-ues 1, 2, 3 following the order in which these categories werelisted in Figure 2.5 of chapter 2. The quadratic weight associ-ated with the pair of categories (1,2) for example (i.e. “Ectopic”

Page 40: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

3.1. Contingency Table - 35 -

and “IUP(A)”) is calculated as follows:

WEIGHT(1, 2) = 1− (1− 2)2

(1− 3)2= 1− 1/4 = 0.75,

where the numbers 1 and 3 in the denominator (1−3)2 representthe smallest and the largest category values. The value of 4associated with the denominator remains the same for the othercategories.

According to this weighting scheme, a disagreement Ectopic-IUP(A) receives a 0.75 agreement credit, indicating that it is lessserious than an Ectopic-IUP(N) disagreement, which receives a0 agreement credit, and considered a total disagreement. Thismay or may not be how you want to analyze your data. If theEctopic-IUP(A) disagreement is instead more serious than Ec-topic-IUP(N), then you could resolve the problem by orderingthe categories in Figure 2.5 as IUP(A), IUP(N), Ectopic.

You could explore the other predefined weights in chapter 5 tosee if they may better fit your application. But what if you wantto assign a 0.5 agreement credit to the IUP(A)-IUP(N) disagree-ment, and 0 credit to any other form of disagreement? Thereis no predefined set of weights that will do precisely that. Insuch a case, AgreeStat 2015.1 allows you to supply your owncustom weights.

Figure 3.3: Weighted Analysis of Table 2.1 Data

Page 41: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

3.1. Contingency Table - 36 -

CustomWeights

Suppose you want to analyze Table 2.1 data with your own setof weights, where the IUP(A)-IUP(N) (and a fortiori IUP(N)-IUP(A)) disagreement receives a weight of 0.5, and any otherdisagreement a weight of 0. Since no predefined set of weightsallows that, you can define your own weights with AgreeStat2015.1 .

From the “Options” form of Figure 3.2, Select the “Specify Cus-tom Weights” radio button, after selecting the “Weighted Anal-ysis Requested?” checkbox. This action will add a new “EnterCustom Weights” page to the form as shown in Figure 3.4.

You will notice on the custom weight table that the diagonalcells are frozen. It is because these diagonal cells represent fullagreements and must receive a weight of 1 not to be changed.You can now assign a weight of 0.5 or 0 to the appropriate cellsas seen in Figure 3.4. The resulting weighted analysis in shownin Figure 3.5.

It appears that this new set of custom weights has increased themagnitude of the weighted agreement coefficients dramatically.It is essential for each set of custom weights to be justified.Otherwise, the agreement coefficients may be artificially inflated.

Page 42: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

3.1. Contingency Table - 37 -

Figure 3.4: Defining Custom Weights for Table 2.1 Data

Figure 3.5: Weighted Analysis of Table 2.1 Data Based on Custom Weights

MISCELLANEOUS OPTIONS

ConfidenceLevel

As you may see in Figure 3.2, the “Options” page allows youto create confidence intervals with a confidence level other than95% using the “Confidence Level” text box. To specify a 90%confidence level, you should enter 90, and not 0.90 nor 90%. Ifyou supply a number that is smaller than 50 or that equals orexceeds 100, then AgreeStat 2015.1 will automatically replaceyour input by 95. Typical values for the confidence level are 85%,90%, 95% or 99%.

Page 43: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

3.1. Contingency Table - 38 -

SamplingFraction

The sampling fraction represents the proportion of the wholepopulation of subjects that is represented in the subject sample.When specified, it will reduce the magnitude of the standarderror. In many reliability studies the subject population sizemay be unknown, in which case the sampling fraction shouldbe set to 0. Setting the sampling fraction to 0 means you areassuming the subject sample size to be negligible compared tothe population size.

AgreementCoeffi-cients

The “Agreement Coefficients” frame gives you the opportunityto select which agreement coefficients you like to produce. Elim-inating the coefficients you are not interested in may speed upthe processing time slightly.

STATISTICAL CALCULATIONS

Weights Before calculating the weights, AgreeStat 2015.1 always de-termines first whether the category name is of alphabetic ornumeric type1.Suppose that a reliability study uses 10 numeric categoriesx1, x2, x3, · · · , x10, and that x1 is the smallest number, and x10the largest. The quadratic weights are calculated for any 2 num-bers xk and xl as follows:

wkl =

1− (xk − xl)2

(x10 − x1)2, if k 6= l,

1, if k = l.(3.1.1)

This use of numeric categories allows you to analyze interval aswell as ratio data with the agreement coefficients implementedin AgreeStat 2015.1 .

Page 44: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

3.1. Contingency Table - 39 -

If on the other hand, the reliability study uses 10 alphabeticcategories, then these categories will be numbered using the first10 integer values 1, 2, 3, · · · , 10 in the order they are supplied toAgreeStat 2015.1 (see figure 2.5 of chapter 2). The quadraticweights are calculated for 2 categories k and l as follows:

wkl =

1− (k − l)2

(10− 1)2, if k 6= l,

1, if k = l.(3.1.2)

Agreement Coefficients

The procedures for calculating the different agreement coefficientsand their variances based on a contingency table are described inthe next few pages. Only the weighted coefficients and their vari-ances will be described. The unweighted coefficient is a special caseof the weighted one, and can be obtained by using the identity setof weights (For identity weights, diagonal elements are 1 and off-diagonal elements are 0). The symbol wkl will denote the weightassociated with categories k and l, pk+ is the kth row marginal pro-portion, p+l the lth column marginal proportion (see Table 3.1) .Moreover, p+k and pl+ are two weighted proportions calculated asfollows:

p+k =

q∑l=1

wklp+l, and pl+ =

q∑k=1

wklpk+.

Table 3.1: Distribution of Subjects by Rater and by Category

Rater BRater A

1 2 · · · k · · · qTotal

1 p11 p12 · · · p1l · · · p1q p1+... · · · · · · ...k pk1 pk2 · · · pkl · · · pkq pk+... · · · · · · ...q pq1 pq2 · · · pql · · · pqq pq+

Total p+1 p+2 · · · p+l · · · p+q 1

Page 45: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

3.1. Contingency Table - 40 -

Cohen’s Kappa

Cohen’s Kappa coefficient (Cohen, 1960, 1968) is defined as follows:

κ =pa − pe1− pe

,

where, pa =∑k,l

wklpkl, and pe =∑k,l

wklpk+p+l. The variance of the

weighted kappa is given by,

v(κ)

=1− f

n(1− pe)2

{q∑k,l

pkl

[wkl − (1− κ)(p+k + pl+)

]2−

[pa − 2(1− κ)pe

]2}.

This variance is equivalent to that published by Fleiss, Cohen, andEveritt (1969).

Gwet’s AC1/AC2

Gwet’s AC2 coefficient (which is the weighted version of AC1) (Gwet,K.L., 2010) is defined as follows:

AC2 =pa − pe1− pe

,

where, pa =∑k,l

wklpkl, and pe =Tw

q(q − 1)

q∑k=1

πk(1 − πk), πk =

(pk+ + p+k)/2, and Tw is the sum of all weights wkl associated withall categories. The variance of AC2 is given by,

v(AC2

)=

1− fn(1− pe)2

{q∑k,l

pkl

[wkl − 2

(1− AC2)Twq(q − 1)

×

(1− πk + πl

2

)]2−[pa − 2(1− AC2)pe

]2},

where f is the subject sampling fraction.

Page 46: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

3.1. Contingency Table - 41 -

Scott’s π Scott’s π coefficient (Scott, 1955) is defined as follows:

Pi =pa − pe1− pe

,

where, pa =∑k,l

wklpkl, and pe =∑k,l

wklπkπl. The variance

of Pi is given by,

v(Pi)

=1− f

n(1− peπ)2

{q∑k,l

pkl

[wkl − (1− Pi)(pk + pl)

]2−

[p∗a − 2(1− Pi)p∗eπ

]2}.

Page 47: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

3.1. Contingency Table - 42 -

Krippendorff’sAlpha

Krippendorff’s alpha coefficienta is defined as follows:

α =p′a − pe1− pe

, where p′a = (1− εn)pa + εn,

εn = 1/2n, pa =∑k,l

wklpkl, and pe =

q∑k=1

q∑l=1

wklπkπl, and n

the subject sample size. The variance of α is given by,

v(α)

=1− f

n(1− pe)2

{q∑k,l

pkl

[(1− εn)wkl − (1− α)(pk + pl)

]2−

[(1− εn)pa − 2(1− α)pe

]2},

aThe coefficient was proposed by Krippendorff (1980), and the current formula-tion proposed by Gwet (2011b)

Brennan-Prediger

The Brennan-Prediger coefficienta is defined as follows:

BP =pa − pe1− pe

,

where, pa =∑k,l

wklpkl, and pe =1

q2

∑k,l

wkl. The variance of

BP is given by,

v(BP)

=1− f

n(1− pe)2

(q∑k,l

w2klpkl − p2a

).

aThe unweighted version was proposed by Brennan & Prediger (1981), and Gwet(2010) proposed the weighted version.

Page 48: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

3.1. Contingency Table - 43 -

PercentAgreement

The percent agreement is defined as follows:

pa =∑k,l

wklpkl.

Its variance is given by,

v(pa)

=1− fn

(q∑k,l

w2klpkl − p2a

).

ConfidenceIntervals

The lower bound (LB) and upper bound (UB) of the con-fidence interval associated with an agreement coefficientθ are calculated as LB = max

(0, θ − zα/2StdErr(θ)

), and

UB = min(1, θ + zα/2StdErr(θ)

)where,

1− α is the confidence level (e.g. 1− α = 0.95)

zα/2 is the 100(1 − α/2)th percentile of the StandardNormal distribution.

Page 49: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

3.2. Two Columns of Raw Scores - 44 -

3.2. TWO COLUMNS OF RAW SCORES

Suppose that you want to analyze the ratings shown in Figure 2.6 ofchapter 2 (This is example 3 data found in the worksheet “CAC(Ben& Gerry Scores)” which part of the AgreeStat workbook). The prob-lem is to calculate the extent of agreement between Ben and Gerrywith respect to the classification of 12 items into 5 categories a, b,c, d, and e. Since this data contains a missing rating (Gerry did notrate item #10), it must be supplied to AgreeStat 2015.1 in theform of 2 columns of raw ratings as shown in Figure 2.6, and not asa contingency table. A contingency table is appropriate when thereis no missing data.

The raw scores to be supplied to AgreeStat 2015.1 must stored ina worksheet. Users of AgreeStat/Windows are advised to create thatworksheet in the same workbook as AgreeStat if using Office 2013or a newer version. Users of Windows/Office prior to version 2013may create a separate workbook if needed. It would be helpful foryou to know whether you want to analyze your ratings as nominaldata, or whether you want to analyze them as ordinal, interval orratio data.

• Nominal Data. The 5 categories are assumed to have no orderstructure in them, and are mainly used as labels. All disagree-ments are considered to be total disagreements, and the conceptof partial agreement does not apply. Only the unweighted anal-ysis is performed on such data.

• Ordinal. You may also want to treat Figure 2.6 ratings as or-dinal data, which will require a weighted analysis.

• The categories a, b, c, d, and e can be replaced with intervalor rational data (i.e. numbers such as 23.78, 21.33, 11.26, ....),and be analyzed using appropriate weights.

NOMINAL DATA

Page 50: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

3.2. Two Columns of Raw Scores - 45 -

Analysis If you want to analyze your ratings as nominal data, then aftercapturing your Figure 2.6 data as shown in Figure 2.8, click onthe “Execute” command button, and look at the results in the“ Output” sheet.

The output of this analysis is shown in Figure 3.6. The firstoutput table is a contingency table based on the input data,and showing the distribution of the 12 items by rater (Ben etGerry) and by category. You would notice the “‘Missing” rowand “Missing” column, as well as the “Total” row and “Total”column. The “Missing” row is filled out with zeroes. This isan indication that observer Ben did not generate any missingvalue. The “Missing” column on the other hand shows a “1”associated with row D. This indicates that Gerry did not rateone item that was classified in category D by Ben. The numbersin brackets represent marginal proportions. [25%] for examplemeans that Ben classified 25% of all items into category A.

AgreeStatPro 2013.3MODULE: Two-Rater Chance-Corrected Agreement Coefficients (Time: 12:14:38 PM. Date: Thursday, March 26, 2015)

DISTRIBUTION OF SUBJECTS BY RATER AND CATEGORYGerry

Ben a b c d e Missing Totala 1 1 0 1 0 0 3 [25%]b 0 2 0 0 0 0 2 [16.7%]c 0 0 3 0 0 0 3 [25%]d 0 1 0 1 0 1 3 [25%]e 0 0 0 0 1 0 1 [8.3%]

Missing 0 0 0 0 0 0 0 [0%]Total 1 4 3 2 1 1 12 [100%]

[8.3%] [33.3%] [25%] [16.7%] [8.3%] [8.3%] [100%]

INTER‐RATER RELIABILITY COEFFICIENTS AND ASSOCIATED PRECISION MEASURESUnweighted Agreement CoefficientsMETHOD Coeff. StdErr p‐ValueCohen's Kappa 0.65714 0.162291 1.918E‐03Gwet's AC1 0.66141 0.166876 2.221E‐03Scott's Pi 0.64951 0.172542 3.131E‐03Krippendorff's Alpha 0.66489 0.163887 2.298E‐03Brenann‐Prediger 0.65909 0.167852 2.366E‐03Percent Agreement 0.72727 0.134282 2.114E‐040.432 to 1

95% C.I.0.3 to 1

0.294 to 10.27 to 10.3 to 10.29 to 1

Back up this Output!

Figure 3.6: AgreeStat 2015.1 Output for the analysis of Figure 2.6 Data

The second output table contains the 6 agreement coefficientsimplemented in AgreeStat 2015.1 along with their standard

Page 51: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

3.2. Two Columns of Raw Scores - 46 -

errors, and 95% confidence intervals. Although these coefficientsare moderately high, their standard errors are unduly high rep-resenting more than 25% of the coefficient values. This is mainlydue to the number of subjects (which is 12) included in the sub-ject sample being too small. The large standard errors led towide 95% confidence intervals such as that of Scott’s Pi, whichranges from 0.235 to 1. Such a result is not useful, becauseit provides very little information regarding the location of the“true” extent of agreement between Ben and Gerry.

SubgroupAnalysis

Let us assume that you want to perform a subgroup analysis ofthe Ben-Gerry data. Each of the 12 subject in that dataset wasassigned to one of the two groups G1, and G2. This data mustbe described as shown in Figure 3.7. You will notice that the“Units” column was not needed to fully describe the Ben-Gerryreliability data to AgreeStat. click on the “Execute” commandbutton, and look at the results in the “ Output” sheet.

Figure 3.7: AgreeStat 2015.1 Output for the analysis of Figure 2.6 Data

Figures 3.8, 3.9, and 3.10 describe the subgroup analysis outputthat AgreeStat produces. Figures 3.8, and 3.9 show separate

Page 52: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

3.2. Two Columns of Raw Scores - 47 -

analyses of groups G1 and G2 subjects, while figure 3.10 pro-duces the analysis of all subjects (both groups combined).

AgreeStatPro 2013.3MODULE: Two-Rater Chance-Corrected Agreement Coefficients (Time: 4:29:50 PM. Date: Thursday, March 26, 2015)

Group: G1

DISTRIBUTION OF SUBJECTS BY RATER AND CATEGORY

Gerry

Ben a b c d e Missing Total

a 0 1 0 0 0 0 1 [14.3%]

b 0 2 0 0 0 0 2 [28.6%]

c 0 0 1 0 0 0 1 [14.3%]

d 0 1 0 1 0 1 3 [42.9%]

e 0 0 0 0 0 0 0 [0%]

Missing 0 0 0 0 0 0 0 [0%]

Total 0 4 1 1 0 1 7 [100%]

[0%] [57.1%] [14.3%] [14.3%] [0%] [14.3%] [100%]

INTER‐RATER RELIABILITY COEFFICIENTS AND ASSOCIATED PRECISION MEASURES

Unweighted Agreement Coefficients

METHOD Coeff. StdErr p‐Value

Cohen's Kappa 0.53333 0.239691 6.772E‐02

Gwet's AC1 0.60132 0.230817 4.038E‐02

Scott's Pi 0.49157 0.298518 1.507E‐01

Krippendorff's Alpha 0.53191 0.274257 1.101E‐01

Brenann‐Prediger 0.58333 0.240563 5.152E‐02

Percent Agreement 0.66667 0.19245 1.340E‐02

Group: G2

DISTRIBUTION OF SUBJECTS BY RATER AND CATEGORY

Gerry

Ben a b c d e Missing Total

a 1 0 0 1 0 0 2 [40%]

b 0 0 0 0 0 0 0 [0%]

c 0 0 2 0 0 0 2 [40%]

d 0 0 0 0 0 0 0 [0%]

e 0 0 0 0 1 0 1 [20%]

Missing 0 0 0 0 0 0 0 [0%]

Total 1 0 2 1 1 0 5 [100%]

[20%] [0%] [40%] [20%] [20%] [0%] [100%]

INTER‐RATER RELIABILITY COEFFICIENTS AND ASSOCIATED PRECISION MEASURES

Unweighted Agreement Coefficients

METHOD Coeff. StdErr p‐Value

Cohen's Kappa 0.72222 0.218897 2.995E‐02

Gwet's AC1 0.75758 0.22017 2.627E‐02

Scott's Pi 0.71429 0.238837 4.031E‐02

Krippendorff's Alpha 0.74286 0.214953 2.591E‐02

Brenann‐Prediger 0.75000 0.223607 2.846E‐02

Percent Agreement 0.80000 0.178885 1.106E‐02

Group: Overall

DISTRIBUTION OF SUBJECTS BY RATER AND CATEGORY

Gerry

Ben a b c d e Missing Total

a 1 1 0 1 0 0 3 [25%]

b 0 2 0 0 0 0 2 [16.7%]

c 0 0 3 0 0 0 3 [25%]

d 0 1 0 1 0 1 3 [25%]

e 0 0 0 0 1 0 1 [8.3%]

Missing 0 0 0 0 0 0 0 [0%]

Total 1 4 3 2 1 1 12 [100%]

[8.3%] [33.3%] [25%] [16.7%] [8.3%] [8.3%] [100%]

INTER‐RATER RELIABILITY COEFFICIENTS AND ASSOCIATED PRECISION MEASURES

Unweighted Agreement Coefficients

METHOD Coeff. StdErr p‐Value

Cohen's Kappa 0.65714 0.162291 1.918E‐03

Gwet's AC1 0.66141 0.166876 2.221E‐03

Scott's Pi 0.64951 0.172542 3.131E‐03

Krippendorff's Alpha 0.66489 0.163887 2.298E‐03

Brenann‐Prediger 0.65909 0.167852 2.366E‐03

Percent Agreement 0.72727 0.134282 2.114E‐04

0.294 to 1

0.27 to 1

0.3 to 1

0.29 to 1

0.432 to 1

0.051 to 1

0.146 to 1

0.129 to 1

0.303 to 1

95% C.I.

0.3 to 1

‐0.173 to 1

‐0.005 to 1

0.196 to 1

95% C.I.

0.114 to 1

0.146 to 1

95% C.I.

‐0.053 to 1

0.037 to 1

‐0.239 to 1

Back up this Output!

Figure 3.8: AgreeStat 2015.1 Output for the analysis of Figure 2.6 Data

AgreeStatPro 2013.3MODULE: Two-Rater Chance-Corrected Agreement Coefficients (Time: 4:29:50 PM. Date: Thursday, March 26, 2015)

Group: G1

DISTRIBUTION OF SUBJECTS BY RATER AND CATEGORY

Gerry

Ben a b c d e Missing Total

a 0 1 0 0 0 0 1 [14.3%]

b 0 2 0 0 0 0 2 [28.6%]

c 0 0 1 0 0 0 1 [14.3%]

d 0 1 0 1 0 1 3 [42.9%]

e 0 0 0 0 0 0 0 [0%]

Missing 0 0 0 0 0 0 0 [0%]

Total 0 4 1 1 0 1 7 [100%]

[0%] [57.1%] [14.3%] [14.3%] [0%] [14.3%] [100%]

INTER‐RATER RELIABILITY COEFFICIENTS AND ASSOCIATED PRECISION MEASURES

Unweighted Agreement Coefficients

METHOD Coeff. StdErr p‐Value

Cohen's Kappa 0.53333 0.239691 6.772E‐02

Gwet's AC1 0.60132 0.230817 4.038E‐02

Scott's Pi 0.49157 0.298518 1.507E‐01

Krippendorff's Alpha 0.53191 0.274257 1.101E‐01

Brenann‐Prediger 0.58333 0.240563 5.152E‐02

Percent Agreement 0.66667 0.19245 1.340E‐02

Group: G2

DISTRIBUTION OF SUBJECTS BY RATER AND CATEGORY

Gerry

Ben a b c d e Missing Total

a 1 0 0 1 0 0 2 [40%]

b 0 0 0 0 0 0 0 [0%]

c 0 0 2 0 0 0 2 [40%]

d 0 0 0 0 0 0 0 [0%]

e 0 0 0 0 1 0 1 [20%]

Missing 0 0 0 0 0 0 0 [0%]

Total 1 0 2 1 1 0 5 [100%]

[20%] [0%] [40%] [20%] [20%] [0%] [100%]

INTER‐RATER RELIABILITY COEFFICIENTS AND ASSOCIATED PRECISION MEASURES

Unweighted Agreement Coefficients

METHOD Coeff. StdErr p‐Value

Cohen's Kappa 0.72222 0.218897 2.995E‐02

Gwet's AC1 0.75758 0.22017 2.627E‐02

Scott's Pi 0.71429 0.238837 4.031E‐02

Krippendorff's Alpha 0.74286 0.214953 2.591E‐02

Brenann‐Prediger 0.75000 0.223607 2.846E‐02

Percent Agreement 0.80000 0.178885 1.106E‐02

Group: Overall

DISTRIBUTION OF SUBJECTS BY RATER AND CATEGORY

Gerry

Ben a b c d e Missing Total

a 1 1 0 1 0 0 3 [25%]

b 0 2 0 0 0 0 2 [16.7%]

c 0 0 3 0 0 0 3 [25%]

d 0 1 0 1 0 1 3 [25%]

e 0 0 0 0 1 0 1 [8.3%]

Missing 0 0 0 0 0 0 0 [0%]

Total 1 4 3 2 1 1 12 [100%]

[8.3%] [33.3%] [25%] [16.7%] [8.3%] [8.3%] [100%]

INTER‐RATER RELIABILITY COEFFICIENTS AND ASSOCIATED PRECISION MEASURES

Unweighted Agreement Coefficients

METHOD Coeff. StdErr p‐Value

Cohen's Kappa 0.65714 0.162291 1.918E‐03

Gwet's AC1 0.66141 0.166876 2.221E‐03

Scott's Pi 0.64951 0.172542 3.131E‐03

Krippendorff's Alpha 0.66489 0.163887 2.298E‐03

Brenann‐Prediger 0.65909 0.167852 2.366E‐03

Percent Agreement 0.72727 0.134282 2.114E‐04

0.294 to 1

0.27 to 1

0.3 to 1

0.29 to 1

0.432 to 1

0.051 to 1

0.146 to 1

0.129 to 1

0.303 to 1

95% C.I.

0.3 to 1

‐0.173 to 1

‐0.005 to 1

0.196 to 1

95% C.I.

0.114 to 1

0.146 to 1

95% C.I.

‐0.053 to 1

0.037 to 1

‐0.239 to 1

Back up this Output!

Page 53: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

3.2. Two Columns of Raw Scores - 48 -

Figure 3.9: AgreeStat 2015.1 Output for the analysis of Figure 2.6 Data

AgreeStatPro 2013.3MODULE: Two-Rater Chance-Corrected Agreement Coefficients (Time: 4:29:50 PM. Date: Thursday, March 26, 2015)

Group: G1

DISTRIBUTION OF SUBJECTS BY RATER AND CATEGORY

Gerry

Ben a b c d e Missing Total

a 0 1 0 0 0 0 1 [14.3%]

b 0 2 0 0 0 0 2 [28.6%]

c 0 0 1 0 0 0 1 [14.3%]

d 0 1 0 1 0 1 3 [42.9%]

e 0 0 0 0 0 0 0 [0%]

Missing 0 0 0 0 0 0 0 [0%]

Total 0 4 1 1 0 1 7 [100%]

[0%] [57.1%] [14.3%] [14.3%] [0%] [14.3%] [100%]

INTER‐RATER RELIABILITY COEFFICIENTS AND ASSOCIATED PRECISION MEASURES

Unweighted Agreement Coefficients

METHOD Coeff. StdErr p‐Value

Cohen's Kappa 0.53333 0.239691 6.772E‐02

Gwet's AC1 0.60132 0.230817 4.038E‐02

Scott's Pi 0.49157 0.298518 1.507E‐01

Krippendorff's Alpha 0.53191 0.274257 1.101E‐01

Brenann‐Prediger 0.58333 0.240563 5.152E‐02

Percent Agreement 0.66667 0.19245 1.340E‐02

Group: G2

DISTRIBUTION OF SUBJECTS BY RATER AND CATEGORY

Gerry

Ben a b c d e Missing Total

a 1 0 0 1 0 0 2 [40%]

b 0 0 0 0 0 0 0 [0%]

c 0 0 2 0 0 0 2 [40%]

d 0 0 0 0 0 0 0 [0%]

e 0 0 0 0 1 0 1 [20%]

Missing 0 0 0 0 0 0 0 [0%]

Total 1 0 2 1 1 0 5 [100%]

[20%] [0%] [40%] [20%] [20%] [0%] [100%]

INTER‐RATER RELIABILITY COEFFICIENTS AND ASSOCIATED PRECISION MEASURES

Unweighted Agreement Coefficients

METHOD Coeff. StdErr p‐Value

Cohen's Kappa 0.72222 0.218897 2.995E‐02

Gwet's AC1 0.75758 0.22017 2.627E‐02

Scott's Pi 0.71429 0.238837 4.031E‐02

Krippendorff's Alpha 0.74286 0.214953 2.591E‐02

Brenann‐Prediger 0.75000 0.223607 2.846E‐02

Percent Agreement 0.80000 0.178885 1.106E‐02

Group: Overall

DISTRIBUTION OF SUBJECTS BY RATER AND CATEGORY

Gerry

Ben a b c d e Missing Total

a 1 1 0 1 0 0 3 [25%]

b 0 2 0 0 0 0 2 [16.7%]

c 0 0 3 0 0 0 3 [25%]

d 0 1 0 1 0 1 3 [25%]

e 0 0 0 0 1 0 1 [8.3%]

Missing 0 0 0 0 0 0 0 [0%]

Total 1 4 3 2 1 1 12 [100%]

[8.3%] [33.3%] [25%] [16.7%] [8.3%] [8.3%] [100%]

INTER‐RATER RELIABILITY COEFFICIENTS AND ASSOCIATED PRECISION MEASURES

Unweighted Agreement Coefficients

METHOD Coeff. StdErr p‐Value

Cohen's Kappa 0.65714 0.162291 1.918E‐03

Gwet's AC1 0.66141 0.166876 2.221E‐03

Scott's Pi 0.64951 0.172542 3.131E‐03

Krippendorff's Alpha 0.66489 0.163887 2.298E‐03

Brenann‐Prediger 0.65909 0.167852 2.366E‐03

Percent Agreement 0.72727 0.134282 2.114E‐04

0.294 to 1

0.27 to 1

0.3 to 1

0.29 to 1

0.432 to 1

0.051 to 1

0.146 to 1

0.129 to 1

0.303 to 1

95% C.I.

0.3 to 1

‐0.173 to 1

‐0.005 to 1

0.196 to 1

95% C.I.

0.114 to 1

0.146 to 1

95% C.I.

‐0.053 to 1

0.037 to 1

‐0.239 to 1

Back up this Output!

Figure 3.10: AgreeStat 2015.1 Output for the analysis of Figure 2.6 Data

ORDINAL DATA

Analysis You may consider that some disagreements are more severethan others, with less severe disagreements representing partialagreements. This issue is resolved by conducting a weighedanalysis where partial agreements receive larger weights thanthe more severe ones.

You may conduct the weighted analysis of raw data withAgreeStat 2015.1 with the same approach used for theweighted analysis of contingency tables as explained in section3.1 under the title “ORDINAL DATA.”

Page 54: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

3.2. Two Columns of Raw Scores - 49 -

STATISTICAL CALCULATIONS

Weights The weights in the case of raw data, are still calculated as dis-cussed in section 3.1. More details about the used of weights inAgreeStat 2015.1 can be found in chapter 5.

Cohen’sKappa

AgreeStat 2015.1 computes the weighted Cohen’s kappa whenthe data contains missing values as, κ = (pa − pe)/(1− pe), wherepa and pe are given by,

pa =

q∑k,l

wklp′kl, and pe =

q∑k,l

wklp′k+p

′+l. (3.2.3)

Let px be the relative number of missing data generated by bothraters combined. Therefore, p′kl = pkl/(1−px), where pkl = nkl/n isthe relative number of subjects that raters A and B classified intocategories k and l respectively. Moreover, p′k+ = pk+/(1 − px+),and p′+l = p+l/(1− p+x), where,

• pk+ = proportion of subjects that rater A classified into cat-egory k, and p+l = proportion of subjects that rater B clas-sified into category l.

• px+ and p+x represent the proportions of missing values gen-erated by raters A and B respectively.

The standard error of the weighted Cohen’s Kappa is obtained asthe square root of its variance, which is defined as follows:

v(κ) =1− fn

1

n− 1

n∑i=1

(ui − u)2, (3.2.4)

where n is the number of subjects rated by at least one rater, andf the sampling fraction (i.e. the fraction of the subject populationrepresented in the sample). If unknown then AgreeStat 2015.1assigns a value of 0 to it. Moreover, ui = u1i + u2i, where u1i andu2i are defined as follows:

Page 55: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

3.2. Two Columns of Raw Scores - 50 -

u1i =1

(1− px)(1− pe)

q∑k,l

wkla(i)kl ,

and,

u2i = −(1− κ)

1− pe

(q∑

k=1

p+kb(i)k+ +

q∑k=1

pk+b(i)+k

),

where a(i)kl = δ

(i)kl −(1−δ(i)x )p′kl, δ

(i)kl = 1 if raters A and B classify

subject i into categories k and l respectively. Furthermore,δ(i)x = 1 if subject i was not rated by either rater, and δ

(i)x = 0

otherwise.

• b(i)k+ =1

1− px+

(δ(i)k+ − (1− δ(i)x+)p′k+

)• b(i)+l =

1

1− p+x

(δ(i)+l − (1− δ(i)+x)p′+l

),

• Note that δ(i)x+ = 1 (resp. δ

(i)+x = 1) if rater A (resp. rater

B) did not score subject i,and will be 0 otherwise.

• p+k and pl+ are defined as follows:

p+k =

q∑l=1

wklp′+l, and pl+ =

q∑k=1

wklp′k+.

Scott’sPi

AgreeStat 2015.1 computes the weighted Scott’s Pi when thedata contains missing values as, PI = (pa − pe)/(1 − pe), where paand pe are given by,

pa =

q∑k,l

wklp′kl, and pe =

q∑k,l

wklπ′kπ′l. (3.2.5)

Let px be the relative number of subjects that have a missing rating.Therefore, p′kl = pkl/(1 − px), where pkl = nkl/n is the relativenumber of subjects that raters A and B classified into categories kand l respectively. Moreover, π′k = (p′k+ + p′+k)/2, p′k+ = pk+/(1−px+), and p′+l = p+l/(1− p+x), where,

Page 56: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

3.2. Two Columns of Raw Scores - 51 -

• pk+ = proportion of subjects that rater A classified into cat-egory k, and p+l = proportion of subjects that rater B clas-sified into category l.

• px+ and p+x represent the proportions of subjects that aremissing a rating from raters A and B respectively.

The standard error of the weighted Scott’s Pi is obtained as thesquare root of its variance, which is defined as follows:

v(PI) =1− fn

1

n− 1

n∑i=1

(ui − u)2, (3.2.6)

where n is the number of subjects rated by at least one rater, andf the sampling fraction (i.e. the fraction of the subject populationrepresented in the sample). If unknown then AgreeStat 2015.1assigns a value of 0 to it. Moreover, ui = u1i + u2i, where u1i andu2i are defined as follows:

u1i =1

(1− px)(1− pe)

q∑k,l

wkla(i)kl , and u2i = −1− PI

1− pe

q∑k=1

πkb(i)k ,

where a(i)kl = δ

(i)kl − (1 − δ(i)x )p′kl, and δ

(i)x = 1 if subject i was not

rated by either rater, and δ(i)x = 0 otherwise.

• πk = (πk+ + π+k)/2, and b(i)k = b

(i)+k + b

(i)k+,

• b(i)k+ =δ(i)k+ − (1− δ(i)x+)p′k+

(1− px+)and b

(i)+k =

δ(i)+k − (1− δ(i)+x)p′+k

(1− p+x)

• Note that δ(i)x+ = 1 (resp. δ

(i)+x = 1) if rater A (resp. rater B)

did not score subject i, and will be 0 otherwise.

Moreover, πk+ and π+l are defined as follows:

πk+ =

q∑l=1

wklπ′l, and π+l =

q∑k=1

wklπ′k. (3.2.7)

Page 57: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

3.2. Two Columns of Raw Scores - 52 -

Gwet’sAC1

AgreeStat 2015.1 computes the weighted Gwet’s AC1 when thedata contains missing values as, AC1 = (pa − pe)/(1 − pe), wherepa and pe are given by,

pa =

q∑k,l

wklp′kl, and pe =

Twq(q − 1)

q∑k=1

π′k(1− π′k). (3.2.8)

Let px be the relative number of subjects that have a missing rating.Therefore, p′kl = pkl/(1 − px), where pkl = nkl/n is the relativenumber of subjects that raters A and B classified into categories kand l respectively. Moreover, π′k = (p′k+ + p′+k)/2, p′k+ = pk+/(1−px+), and p′+l = p+l/(1− p+x), where,

• pk+ = proportion of subjects that rater A classified into cat-egory k, and p+l = proportion of subjects that rater B clas-sified into category l.

• px+ and p+x represent the proportions of subjects that aremissing a rating from raters A and B respectively.

The standard error of the weighted Gwet’s AC1 is obtained as thesquare root of its variance, which is defined as follows:

v(AC1) =1− fn

1

n− 1

n∑i=1

(ui − u)2, (3.2.9)

where n is the number of subjects rated by at least one rater, andf the sampling fraction (i.e. the fraction of the subject populationrepresented in the sample). If unknown then AgreeStat 2015.1assigns a value of 0 to it. Moreover, ui = ai + 2(1 − AC1)ei, withai and ei being defined as,

ai =1

(1− px)(1− pe)

q∑k=1

q∑l=1

wkl

[δ(i)kl − (1− δ(i)x )p′kl

],

ei =Tw

q(q − 1)(1− pe)

q∑k=1

πkb(i)k ,

Page 58: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

3.2. Two Columns of Raw Scores - 53 -

where δ(i)x = 1 if subject i was not rated by either rater, and

δ(i)x = 0 otherwise. b

(i)k =

(b(i)+k + b

(i)k+

)/2 where,

• b(i)k+ =δ(i)k+ − (1− δ(i)x+)p′k+

1− px+,

• b(i)+k =δ(i)+k − (1− δ(i)+x)p′+k

1− p+x

.

• Note that δ(i)x+ = 1 (resp. δ

(i)+x = 1) if rater A (resp. rater B)

did not score subject i and will be 0 otherwise.

Page 59: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

3.2. Two Columns of Raw Scores - 54 -

Brennan-Prediger

AgreeStat 2015.1 computes the weighted Brennan-Prediger (BP)coefficient when the data contains missing values as, BP = (pa −pe)/(1− pe), where pa and pe are given by,

pa =

q∑k,l

wklp′kl, and pe =

1

q2

∑k,l

wkl. (3.2.10)

Let px be the relative number of subjects that have a missing rating.Therefore, p′kl = pkl/(1 − px), where pkl = nkl/n is the relativenumber of subjects that raters A and B classified into categories kand l respectively.

The standard error of the weighted BP coefficient is obtained asthe square root of its variance, which is defined as follows:

v(BP) =v(pa)

(1− pe)2,

where v(pa) is the variance of the percent agreement pa. Thisvariance is defined as follows:

v(pa) =1− fn

1

n− 1

n∑i=1

a2i ,

ai =1

1− px

∑k,l

wkl

(δ(i)kl −

(1− δ(i)x

)p′kl

).

Page 60: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

3.2. Two Columns of Raw Scores - 55 -

Krippendorff’sAlpha

AgreeStat 2015.1 computes the weighted Krippendorff’sAlpha by first excluding from analysis all subjects with amissing rating. The resulting number of subjects in thecomplete dataset is n. The coefficient is then calculated asα = (p′a − pe)/(1 − pe), where p′a = (1 − εn)pa + εn, withεn = 1/(2n), and pa and pe given by,

pa =

q∑k,l

wklpkl, and pe =

q∑k,l

wklπkπl. (3.2.11)

pkl = nkl/n is the relative number of subjects that raters Aand B classified into categories k and l respectively. More-over, πk = (pk+ + p+k)/2, where,

• pk+ = proportion of subjects that rater A classifiedinto category k, and p+l = proportion of subjects thatrater B classified into category l.

The standard error of the weighted Krippendorff’s alphacoefficient is obtained as the square root of its variance,which is defined as follows:

v(α) =1− fn

1

n− 1

n∑i=1

(ui − u)2, (3.2.12)

where n is the number of subjects rated by at least one rater,and f the sampling fraction (i.e. the fraction of the subjectpopulation represented in the sample). If unknown thenAgreeStat 2015.1 assigns a value of 0 to it. Moreover,ui = u1i + u2i is defined as follows:

u1i =1

1− pe

q∑k,l

wkla(i)kl ,

u2i = − 1− α1− pe

q∑k=1

πkb(i)k ,

Page 61: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

3.2. Two Columns of Raw Scores - 56 -

where a(i)kl = (1− εn)

(δ(i)kl − pkl

).

• πk = (πk+ + π+k)/2, and b(i)k = b

(i)+k + b

(i)k+,

• b(i)k+ = δ(i)k+ − pk+ and

b(i)+k = δ

(i)+k − p+k

Moreover, πk+ and π+l are defined as follows:

πk+ =

q∑l=1

wklπl, and π+l =

q∑k=1

wklπk. (3.2.13)

Page 62: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

3.2. Two Columns of Raw Scores - 57 -

Page 63: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

3.2. Two Columns of Raw Scores - 58 -

Bibliography

[1] Brennan, R. L., and Prediger, D. J. (1981). “Coefficient Kappa: some

uses, misuses, and alternatives.” Educational and Psychological Measure-ment, 41, 687-699.[

2] Cohen, J. (1960). “A coefficient of agreement for nominal scales.” Edu-cational and Psychological Measurement, 20, 37-46.[

3] Cohen, J. (1968). “Weighted kappa: Nominal scale agreement with pro-vision for scaled disagreement or partial credit.” Psychological Bulletin,70, 213-220.[

4] Fleiss, Cohen, and Everitt (1969). “Large Sample Standard Errors ofKappa, and weighted kappa,” Psychological Bulletin, 5, 323–327.[

5] Gwet, K. L. (2010). Handbook of Inter-Rater Reliability (2nd Edition),Advanced Analytics, LLC. Maryland, USA.[

6] Gwet, K. L. (2011a). The Practical Guide to Statistics, Advanced Ana-lytics, LLC. Maryland, USA.[

7] Gwet, K. L. (2011b). “On the Krippendorff’s Alpha Coefficient.”, Sub-mitted for publication in Communication Methods and Measures[

8] Krippendorff, K. (1980). Content Analysis: An Introduction to its Method-ology, Chapter 12. Sage, Beverly Hills, CA.[

9] Scott, W. A. (1955). “Reliability of content analysis: the case of nominalscale coding.” Public Opinion Quarterly, XIX, 321-325.

Page 64: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

3.2. Two Columns of Raw Scores - 59 -

Page 65: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

CHAPTER��

� 4

CHANCE-CORRECTED AGREEMENTMEASURES FOR 3 RATERS OR MORE

4.1. Analysis of Raw Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

Nominal Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

Ordinal Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .56

4.2. Analysis of the Distribution of Raters . . . . . . . . . . . . . . . .59

Nominal Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

Ordinal Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .60

4.3. Statistical Calculations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

Agreement Coefficients & Subject Variance . . . . . . . . . . . . 63

Fleiss’ Kappa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

Gwet’s AC1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .65

Krippendorff’s α . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

Conger’s Kappa. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .67

Brennan-Prediger . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

Total Variance due to the Sampling of Subjects

& Raters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

Confidence Interval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

- 60 -

Page 66: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

4.1. Analysis of Raw Data - 61 -

To analyze the extent of agreement among 3 raters or more, you havetwo options for describing your input data in AgreeStat 2015.1 .

• The first option consists of organizing your data in the form ofadjacent columns of raw ratings. Table 2.2 of chapter 2 showsan example of such a format where 4 adjacent columns namedRater1, Rater2, Rater3, and Rater4 contain the raw ratings that4 raters assigned to the 12 units they have rated.

• The second option consists of organizing your data in the formof the distribution of raters by subject and category as shownin Table 2.3 of chapter 2. The typical cell representing subjecti and category k contains the number of raters that classifiedsubject i into category k.

4.1. ANALYSIS OF RAW DATA

Suppose you want to analyze the data of Table 2.2, which can also befound in the AgreeStat’s worksheet named “CAC(Group Analysis),”and Example #9. This table contains ratings from 4 ratings, withmany missing values. Section 2.3 describes how that data should bedescribed in AgreeStat 2015.1 .

Once the ratings are specified in AgreeStat 2015.1 you need todecide whether you want to analyze them as nominal data or asordinal, interval, ration data. Should you consider a weighted oran unweighted analysis? The answer depends on whether some dis-agreements are more serious than others or not.

• Nominal Data. Consider the 5 categories a, b, c, d, and e asnominal data, and conduct the unweighted analysis only, if youbelieve that all disagreements are equally severe, and none canbe considered as a partial agreement.

• Ordinal Data. If a disagreement involving categories a, and bis seen as less serious than the one involving categories a and dfor example, then you should consider a weighted analysis usingone of the weights discussed in chapter 5.

Page 67: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

4.1. Analysis of Raw Data - 62 -

NOMINAL DATA

Analysis If you decide to analyze your ratings as nominal data, then aftercapturing your Table 2.2 data as shown in Figure 2.11, click onthe “Execute” command button, and look at the results in the“ Output” sheet.

The output of this analysis is shown in Figure 4.1. The first out-put table shows the distribution of units by rater and category.For example it follows from this table that rater 1 classified 3units into category A (note that categories are automaticallysorted in ascending order). When the “Total” column of thistable contains different values, it is an indication of the pres-ence of missing data. The second output table contains the6 agreement coefficients implemented in AgreeStat 2015.1 ,along with their standard errors, 95% confidence intervals, andp-values.

AgreeStatPro 2013.3MODULE: Chance-Corrected Agreement Coefficients (Time: 6:47:05 AM. Date: Friday, March 27, 2015)

DISTRIBUTION OF SUBJECTS BY RATER AND SCORE/CATEGORYCategory

Raters A B C D E TotalRater1 3 5 2 1 0 11Rater2 3 5 2 1 1 12Rater3 1 3 7 1 1 13Rater4 3 5 2 2 1 13Average 2.5 4.5 3.3 1.3 0.8 12.20

UNWEIGHTED ANALYSIS

StdErr 95% C.I. p‐Value StdErr 95% C.I. p‐ValueConger's Kappa 0.65859 0.14507 0.345 to 0.972 5.553E‐04 0.21968 0.184 to 1 1.028E‐02Gwet's AC1 0.68460 0.13353 0.396 to 0.973 1.944E‐04 0.20896 0.233 to 1 6.017E‐03Fleiss' Kappa 0.65725 0.14992 0.333 to 0.981 7.391E‐04 0.22893 0.163 to 1 1.312E‐02

Krippendorff's Alpha 0.63181 0.13833 0.333 to 0.931 5.280E‐04 0.23059 0.134 to 1 1.686E‐02Brenann‐Prediger 0.67949 0.13624 0.385 to 0.974 2.484E‐04 0.21248 0.22 to 1 6.995E‐03Percent Agreement 0.74359 0.11577 0.493 to 0.994 2.260E‐05 0.17440 0.367 to 1 9.235E‐04

Inference/Subjects & RatersMETHOD Coefficient

Inference/Subjects

Back up this Output!

Figure 4.1: AgreeStat 2015.1 Output for the unweighted analysis of Table 2.2Data

You will notice the two series of standard errors and confidence in-tervals output by AgreeStat 2015.1 . The first set of standard

Page 68: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

4.1. Analysis of Raw Data - 63 -

errors are calculated with respect to the sampling of subjects only(the raters are considered fixed). They measure precision with re-spect to the universe of subjects only. The second series of standarderrors and confidence intervals are calculated with respect to boththe sampling of subjects and that of raters. They measure precisionwith respect to both the rater and subject universes.

• Standard errors and confidence intervals with respect to thesampling of subjects will be used only when evaluating the ex-tent of agreement is limited to the specific raters that partici-pated in the reliability experiment.

• Standard errors and confidence intervals with respect to thesampling of subjects and raters will be used, if the conclusionsof the study will apply to the entire universes of subjects andraters from which the samples were selected.

ORDINAL DATA

Analysis If you want to analyze your ratings as ordinal data, then aftercapturing your Table 2.2 data as shown in Figure 2.11, do thefollowing (This is an example of the use of linear weights):

• Click on the “Options” tab to display the form shown inFigure 4.2.

• Select the “Weighted Analysis Requested?” checkbox.

• Select the “Predefined Weights” radio button.

• Select the “Linear Weights” from the list box.

After all these steps, the “Options” form should appear as shownin Figure 4.2.

The partial output of this analysis is shown in Figure 4.3. Thefirst table contains the “Linear Weights”. The second output ta-ble contains the 6 weighted agreement coefficients implementedin AgreeStat 2015.1 along with their standard errors, and 95%confidence intervals.

Page 69: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

4.1. Analysis of Raw Data - 64 -

Figure 4.2: Selection of linear weights for the weighted analysis of Table 2.2

AgreeStat 2013.3MODULE: Chance-Corrected Agreement Coefficients (Time: 5:00:13 PM. Date: Friday, March 27, 2015)

DISTRIBUTION OF SUBJECTS BY RATER AND SCORE/CATEGORYCategory

Raters A B C D E TotalRater1 3 5 2 1 0 11Rater2 3 5 2 1 1 12Rater3 1 3 7 1 1 13Rater4 3 5 2 2 1 13Average 2.5 4.5 3.3 1.3 0.8 12.20

UNWEIGHTED ANALYSIS

StdErr 95% C.I. p‐Value StdErr 95% C.I. p‐ValueConger's Kappa 0.65859 0.14507 0.345 to 0.972 5.553E‐04 0.21968 0.184 to 1 1.028E‐02Gwet's AC1 0.68460 0.13353 0.396 to 0.973 1.944E‐04 0.20896 0.233 to 1 6.017E‐03Fleiss' Kappa 0.65725 0.14992 0.333 to 0.981 7.391E‐04 0.22893 0.163 to 1 1.312E‐02

Krippendorff's Alpha 0.63181 0.13833 0.333 to 0.931 5.280E‐04 0.23059 0.134 to 1 1.686E‐02Brenann‐Prediger 0.67949 0.13624 0.385 to 0.974 2.484E‐04 0.21248 0.22 to 1 6.995E‐03Percent Agreement 0.74359 0.11577 0.493 to 0.994 2.260E‐05 0.17440 0.367 to 1 9.235E‐04

WEIGHTED ANALYSISLinear Weights

A B C D EA 1 0.75 0.5 0.25 0B 0.75 1 0.75 0.5 0.25C 0.5 0.75 1 0.75 0.5D 0.25 0.5 0.75 1 0.75E 0 0.25 0.5 0.75 1

WEIGHTED COEFFICIENTS

StdErr 95% C.I. p‐Value StdErr 95% C.I. p‐ValueConger's Kappa 0.73758 0.15035 0.413 to 1 2.872E‐04 0.17580 0.358 to 1 1.048E‐03Gwet's AC2 0.81745 0.10098 0.599 to 1 1.964E‐06 0.12030 0.558 to 1 1.272E‐05Fleiss' Kappa 0.74424 0.14648 0.428 to 1 2.107E‐04 0.17458 0.367 to 1 9.246E‐04

Krippendorff's Alpha 0.71924 0.12389 0.452 to 0.987 6.122E‐05 0.16322 0.367 to 1 7.089E‐04Brenann‐Prediger 0.79968 0.10831 0.566 to 1 5.319E‐06 0.12898 0.521 to 1 3.218E‐05Percent Agreement 0.91987 0.07924 0.749 to 1 3.105E‐08 0.08404 0.738 to 1 6.248E‐08

Inference/Subjects & Raters

METHOD CoefficientInference/Subjects Inference/Subjects & Raters

METHOD CoefficientInference/Subjects

Back up this Output!

Figure 4.3: Output for the weighted analysis with Linear Weights

Page 70: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

4.1. Analysis of Raw Data - 65 -

CustomWeights

Suppose that you now want to conduct a weighted using cus-tom weights that assign a weight of 1 to all full agreements inthe diagonal, a weight of 0.5 to all disagreements that involveconsecutive categories (e.g. A-B, or C-D, ...), and a weight of 0to all the other types of disagreements.

Selecting the “custom weights” option, and describing theseweights in AgreeStat is accomplished as shown in Figures 4.4,and 4.5. The output based on these custom weights is shownin Figure 4.6. This output shows the custom weights used inthe analysis, the agreement coefficients selected, their standarderrors, their confidence intervals, and associated p-values.

You will notice that two types of statistical inferences are per-formed: (1) the inference with respect to the selection of subjectsonly, and (2) the inference with respect to the random selectionof subjects and raters as well. When done with respect to theselection of subjects only, the inference generalizes to the uni-verse of subjects while its validity is limited to the specific groupof raters that participated in the reliability experiment.

Figure 4.4: Selection of custom weights for the weighted analysis

Page 71: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

4.1. Analysis of Raw Data - 66 -

For the inference to generalize to both universes of subjects andraters, it must be done with respect to both factors.

Figure 4.5: Input of custom weights for the weighted analysis ofTable 2.2 Data

AgreeStat 2013.3MODULE: Chance-Corrected Agreement Coefficients (Time: 5:48:21 PM. Date: Friday, March 27, 2015)

DISTRIBUTION OF SUBJECTS BY RATER AND SCORE/CATEGORYCategory

Raters A B C D E TotalRater1 3 5 2 1 0 11Rater2 3 5 2 1 1 12Rater3 1 3 7 1 1 13Rater4 3 5 2 2 1 13Average 2.5 4.5 3.3 1.3 0.8 12.20

UNWEIGHTED ANALYSIS

StdErr 95% C.I. p‐Value StdErr 95% C.I. p‐ValueConger's Kappa 0.65859 0.14507 0.345 to 0.972 5.553E‐04 0.21968 0.184 to 1 1.028E‐02Gwet's AC1 0.68460 0.13353 0.396 to 0.973 1.944E‐04 0.20896 0.233 to 1 6.017E‐03Fleiss' Kappa 0.65725 0.14992 0.333 to 0.981 7.391E‐04 0.22893 0.163 to 1 1.312E‐02

Krippendorff's Alpha 0.63181 0.13833 0.333 to 0.931 5.280E‐04 0.23059 0.134 to 1 1.686E‐02Brenann‐Prediger 0.67949 0.13624 0.385 to 0.974 2.484E‐04 0.21248 0.22 to 1 6.995E‐03Percent Agreement 0.74359 0.11577 0.493 to 0.994 2.260E‐05 0.17440 0.367 to 1 9.235E‐04

WEIGHTED ANALYSISCustom weights

A B C D EA 1 0.5 0 0 0B 0.5 1 0.5 0 0C 0 0.5 1 0.5 0D 0 0 0.5 1 0.5E 0 0 0 0.5 1

WEIGHTED COEFFICIENTS

StdErr 95% C.I. p‐Value StdErr 95% C.I. p‐ValueConger's Kappa 0.71613 0.14041 0.413 to 1 2.037E‐04 0.18020 0.327 to 1 1.588E‐03Gwet's AC2 0.76808 0.11505 0.52 to 1 1.526E‐05 0.15105 0.442 to 1 2.091E‐04Fleiss' Kappa 0.72015 0.14459 0.408 to 1 2.514E‐04 0.18576 0.319 to 1 1.909E‐03

Krippendorff's Alpha 0.69456 0.12884 0.416 to 0.973 1.231E‐04 0.18293 0.299 to 1 2.221E‐03Brenann‐Prediger 0.75962 0.11910 0.502 to 1 2.426E‐05 0.15604 0.423 to 1 3.071E‐04Percent Agreement 0.84615 0.09300 0.645 to 1 5.320E‐07 0.11319 0.602 to 1 4.658E‐06

Inference/Subjects & Raters

METHOD CoefficientInference/Subjects Inference/Subjects & Raters

METHOD CoefficientInference/Subjects

Back up this Output!

Figure 4.6: AgreeStat 2015.1 Output for the weighted analysis of Table 2.2data with custom weights

Page 72: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

4.2. Benchmarking Agreement Coefficients - 67 -

4.2. BENCHMARKING AGREEMENT COEFFICIENTS

AgreeStat 2015.1 allows you to benchmark the calculated agree-ment coefficients so that their magnitude can be interpreted. Youhave the option to choose one of the benchmarking methods pro-posed by Landis & Koch (1977), Altma (1991), and Fleiss (1981) asshown in Figures 4.7.

After clicking on the “Execute” command button, AgreeStat 2015.1will output among other statistics the table shown in Table 4.8. Thecells highlighted in yellow determine how the associated coefficientshould be interpreted. For example the third cell associated withConger’s Kappa and containing the number 0.95337 is also associ-ated with the benchmark interval (0.4; 06) indicating a “Moderate”extent of agreement.

Figure 4.7: Specifying the benchmarking model in AgreeStat2015.1

What does the number 0.95337 stand for? And how is it calculated?The number 0.95337 represents the likelihood that Conger’s kappabelongs to one of the top 3 intervals on the Landis-Koch benchmark

Page 73: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

4.3. The Distribution of Raters - 68 -

scale. That is we are 95.33% certain that the extent of agreementas measured by Conger’s kappa exceeds 0.6. For each benchmarkinterval (c.f. first column in Figure 4.8), AgreeStat 2015.1 firstcalculates the probability that Conger’s kappa belongs to it. Theseprobabilities are then cumulated from the top interval (0.8 to 1). Thenumber 0.95337 is Conger’s kappa cumulative probability associatedwith interval (0.4,0.6). The third cell is highlighted because it isthe first cell whose cumulative probability exceeds the cumulativeprobability threshold specified on form shown in Figure 4.7. Thisthreshold, which is in spirit like the confidence level, can be modifiedby the user.

Figure 4.8: Benchmarking Results

4.3. ANALYSIS OF THE DISTRIBUTION OF RATERS

This section shows you how to analyze the extent of agreementamong 3 raters or more when your data is organized in the formof the distribution of raters by subject and category. We will useTable 2.3 data (this data can be found in Example 10 of the Agree-Stat’s worksheet to illustrate the analysis of ratings presented in thisformat.

NOMINAL DATA

Page 74: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

4.3. The Distribution of Raters - 69 -

Analysis If you want to analyze your ratings as nominal data, then aftercapturing your Table 2.3 data as shown in Figures 2.17 and 2.18,click on the “Execute” command button, and look at the resultsin the “ Output” sheet.

The output of this analysis is shown in Figure 4.7. The firstoutput table shows a number general statistics by category, in-cluding the number and percent of raters classified in a categoryby at least one rater. Categories b and c are most popular amongsubjects, while category b was the most popular among raters.

AgreeStat 2013.3MODULE: Chance-Corrected Agreement Coefficients (Time: 2:11:24 PM. Date: Tuesday, March 31, 2015)

SUBJECT AND RATER STATISTICS BY CATEGORYCategory

Statistic a b c d eNumber of Subjects in Category 5 7 7 2 1Percent Subjects in Category 35.7% 50.0% 50.0% 14.3% 7.1%Average Number of Raters 2 2.57 1.86 2.5 3Minimum Number of Raters 0 0 0 0 0Maximum Number of Raters 3 4 4 4 3

UNWEIGHTED ANALYSIS

StdErr 95% C.I. p‐ValueGwet's AC1 0.68460 0.13353 0.396 to 0.973 1.944E‐04Fleiss' Kappa 0.65725 0.14992 0.333 to 0.981 7.391E‐04

Krippendorff's Alpha 0.63181 0.13833 0.333 to 0.931 5.280E‐04Brenann‐Prediger 0.67949 0.13624 0.385 to 0.974 2.484E‐04Percent Agreement 0.74359 0.11577 0.493 to 0.994 2.260E‐05

METHOD CoefficientInference/Subjects

Back up this Output!

Figure 4.9: Output for the unweighted analysis of Table 2.3 data

The second output table shows 5 unweighted agreement coefficientsalong with their standard errors, confidence intervals, and p-valuescalculated with respect to the sampling of subjects only. This datadoes not allow for the calculation of standard errors with respect tothe sampling of subjects and raters simultaneously

.

ORDINAL DATA

Page 75: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

4.4. Statistical Calculations - 70 -

WeightedAnalysis

Even if the ratings are in the form of a distribution of ratersby subject and category, they may still be analyzed as ordinal,interval or even ratio data using one of the predefined sets ofweights or custom weights. If you want to analyze Table 2.3data with “Quadratic” weights for example, then after describ-ing your data as shown in Figure 2.17 and 2.18, you should Select“Quadratic Weights” in the same way “Linear Weights” was se-lected in Figure 4.2. Then “Execute” and look at the results inthe Output sheet (see Figure 4.10)

The output of this weighted analysis includes the unweightedanalysis of figure 4.9, as well as the two tables of figure 4.10.The first output table of figure 4.10 contains the “Quadratic”weights associated with the 5 categories a, b, c, d, and e. Thesecond table contains the weighted agreement coefficients basedon the quadratic weights.

AgreeStat 2013.3MODULE: Chance-Corrected Agreement Coefficients (Time: 4:04:31 PM. Date: Tuesday, March 31, 2015)

SUBJECT AND RATER STATISTICS BY CATEGORYCategory

Statistic a b c d eNumber of Subjects in Category 5 7 7 2 1Percent Subjects in Category 35.7% 50.0% 50.0% 14.3% 7.1%Average Number of Raters 2 2.57 1.86 2.5 3Minimum Number of Raters 0 0 0 0 0Maximum Number of Raters 3 4 4 4 3

UNWEIGHTED ANALYSIS

StdErr 95% C.I. p‐ValueGwet's AC1 0.68460 0.13353 .396 to 0.97 1.944E‐04Fleiss' Kappa 0.65725 0.14992 .333 to 0.98 7.391E‐04

Krippendorff's Alpha 0.63181 0.13833 .333 to 0.93 5.280E‐04Brenann‐Prediger 0.67949 0.13624 .385 to 0.97 2.484E‐04Percent Agreement 0.74359 0.11577 .493 to 0.99 2.260E‐05

WEIGHTED ANALYSISQuadratic Weights

a b c d ea 1 0.9375 0.75 0.4375 0b 0.9375 1 0.9375 0.75 0.4375c 0.75 0.9375 1 0.9375 0.75d 0.4375 0.75 0.9375 1 0.9375e 0 0.4375 0.75 0.9375 1

WEIGHTED COEFFICIENTS

StdErr 95% C.I. p‐ValueGwet's AC2 0.90073 0.08743 0.712 to 1 1.272E‐07Fleiss' Kappa 0.82012 0.14326 0.511 to 1 6.998E‐05

Krippendorff's Alpha 0.79810 0.11384 0.552 to 1 9.191E‐06Brenann‐Prediger 0.88141 0.09415 0.678 to 1 3.845E‐07Percent Agreement 0.97035 0.07641 0.805 to 1 1.056E‐08

METHOD CoefficientInference/Subjects

METHOD CoefficientInference/Subjects

Back up this Output!

Figure 4.10: AgreeStat 2015.1 Output for the weighted analysis of Table 2.3data with Quadratic weights

4.4. STATISTICAL CALCULATIONS

Page 76: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

4.4. Statistical Calculations - 71 -

This section describes the procedures for calculating the differentagreement coefficients and their variances when the number of ratersis 3 or more. Only the weighted coefficients and their variancesare described. The unweighted coefficient is a special case of theweighted one, and can be obtained by using the identity set ofweights (for identity weights, diagonal elements are 1 and off-diagonalelements are 0). The symbol wkl represents the weight associatedwith categories k and l respectively.

• Table 4.1 is an abstract representation of a table of raw ratingsfrom r raters who have rated n subjects. For example cig rep-resents the particular category (or score) that rater g assignedto subject i.

• Table 4.2 is an abstract representation of the distribution of rraters by subject and by category (or score). It is assumed thatn subjects are being classified by r raters into one of q possiblecategories. The symbol rik represents the number of raters whoclassified subject i into category k, and ri is the number of raterswho assigned a category to subject i.

Table 4.1: Ratings of n subjects by r raters

RatersSubject

1 · · · g · · · r1 c11 · · · c1g · · · c1r...

... · · · · · · ...i ci1 · · · cig · · · cir...

... · · · · · · ...n cn1 · · · cng · · · cnr

Table 4.2: Distribution of r raters by subject and by category

Page 77: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

4.4. Statistical Calculations - 72 -

CategorySubject

1 · · · k · · · qTotal

1 r11 · · · r1k · · · r1q r1... · · · · · · ...i ri1 · · · rik · · · riq ri... · · · · · · ...n rn1 · · · rnk · · · rnq rn

Average r+1 · · · r+k · · · r+q r

• Table 4.3 is an abstract representation of the distribution of nsubject by rater and by category. The symbol ngk represents thenumber of subjects that rater g has classified into category k,and ng is the number of subjects that rater g has scored. Thistable will be particularly useful for calculating Conger’s kappacoefficient (Conger, 1980).

Table 4.3: Distribution of n subjects by rater and category

CategoryRater

1 · · · k · · · qTotal

1 n11 · · · n1k · · · n1q n1... · · · · · · ...g ng1 · · · ngk · · · ngq ng... · · · · · · ...r nr1 · · · nrk · · · nrq nr

Average n+1 · · · n+k · · · n+q n

AGREEMENT COEFFICIENTS &VARIANCE DUE TO THE SAMPLING OF SUBJECTS

This section describes various coefficients implemented in AgreeStat2015.1 for calculating agreement among 3 raters or more, and theirassociated variances with respect to the sampling of subjects only.

Page 78: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

4.4. Statistical Calculations - 73 -

Fleiss’Kappa

Fleiss’ Kappa (Fleiss, 1971) (Fleiss (1971) proposed the un-weighted kappa, and Gwet (2014) introduced the weighted ver-sion) is defined as follows:

κ̂f = (pa − pe)/(1− pe),

where, pa is defined as follows:

pa =1

n′

n′∑i=1

pa|i, pa|i =

q∑k=1

rik(r?ik − 1

)ri(ri − 1)

, r?ik =

q∑l=1

wklril,

where n′ is the number of subjects scored by 2 raters or more.The chance-agreement probability pe is given by,

p′e =∑k,l

wklπkπl, where πk =1

n

n∑i=1

rikri·

VARIANCE

The variance estimator published by Fleiss (1971) is incorrect,since it is based on the assumption that no agreement amongraters. The variance estimator implemented in AgreeStat isbased on equation (33) of Gwet (2008a). The standard error ofFleiss’ Kappa is the square root of its variance, which is definedas follows:

v(κ̂f) =1− fn

1

n− 1

n∑i=1

(κ?f|i − κf

)2,

where,

κ?f|i = κf|i − 2(1− γπ)pe|i − pe1− pe

,

•• κf|i =

{(n/n′)(pa|i − pe)/(1− pe), if ri ≥ 2,

0, otherwise,

• pe|i =

q∑k=1

π?krik/ri, with π?k ==∑q

l=1wklπl.

Page 79: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

4.4. Statistical Calculations - 74 -

Gwet’sAC1

Gwet’s AC1 (Gwet, 2008a) is defined as follows:

AC1 = (pa − pe)/(1− pe),

where, pa is defined as follows:

pa =1

n′

n′∑i=1

pa|i, pa|i =

q∑k=1

rik(rik+ − 1

)ri(ri − 1)

, rik+ =

q∑l=1

wklril,

where n′ is the number of subjects scored by 2 raters or more.The chance-agreement probability pe is given by,

p′e =Tw

q(q − 1)

q∑k=1

πk(1− πk), where Tw =∑k,l

wkl.

VARIANCE

The standard error of Gwet’s AC1 is the square root of its vari-ance, which is defined as follows:

v(AC1) =1− fn

1

n− 1

n∑i=1

(AC∗1|i − AC1

)2,

where,

AC∗1|i = AC1|i − 2(1− AC1)pe|i − pe1− pe

,

•• AC1|i =

{(n/n′)(pa|i − pe)/(1− pe), if ri ≥ 2,

0, otherwise,

• pe|i =Tw

q(q − 1)

q∑k=1

rikri

(1− πk).

Page 80: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

4.4. Statistical Calculations - 75 -

Krippendorff’sAlpha

n′ represents the number of subjects rated by 2 observersor more (all subjects rated by 1 observer only or less areexcluded). The Alpha coefficient (Krippendorff, K., 1980,and Gwet, K.L., 2011) is defined as follows:

α = (pa − pe)/(1− pe),

where pa = (1 − εn)p′a + εn (with εn = 1/(n′r)), and p′a isdefined as follows:

p′a =1

n′

n′∑i=1

pa|i, pa|i =

q∑k=1

rik(rik+ − 1

)r(ri − 1)

, rik+ =

q∑l=1

wklril,

Chance-agreement probability pe is given by,

pe =∑k,l

wklπkπl, where πk =1

n′

n′∑i=1

rikr.

Note that r is the average of the ri’s only for ri ≥ 2, and the πkassociated with α is different from that of Fleiss’s Kappa andGwet’s AC1. The outcome is nevertheless the same when thereis no missing rating.

VARIANCE

The variance of Krippendorff’s alpha is defined as follows:

v(α) =1− fn

1

n− 1

n∑i=1

(α∗i − α

)2,

where, α∗i = αi − (1− α)(pe|i − pe)/(1− pe) and,αi = (paεn|i − pe)/(1− pe),

•• paεn|i = (1 − εn)[pa|i − pa(ri − r)/r

]+ εn, where pa|i =

q∑k=1

rik(rik+ − 1

)r(ri − 1)

, pa =1

n

n∑i=1

pa|i, pe|i =

q∑k=1

πkrikr− πk(ri −

r)/r, πk = (πk+ + π+k)/2, and

Page 81: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

4.4. Statistical Calculations - 76 -

πk+ =

q∑l=1

wklπl, π+l =

q∑k=1

wklπk.

Conger’sKappa

Conger’s version of Kappa (Conger, A.J. 1980, and Gwet, K.L.,2010) is defined as follows:

κc = (pa − pe)/(1− pe),

where pa is defined as follows:

pa =1

n′

n′∑i=1

pa|i, pa|i =

q∑k=1

rik(rik+ − 1

)ri(ri − 1)

, rik+ =

q∑l=1

wklril,

where n′ is the number of subjects that were rated by 2 ratersor more. The chance-agreement probability pe is given by,

pe =∑k,l

wkl(p+kp+l − skl/r

), where p+k =

1

r

r∑g=1

pgk,

pgk = ngk/ng, and skl =1

r − 1

(r∑

g=1

pgkpgl − rp+kp+l

).

VARIANCE

The standard error of Conger’s kappa is the square root of itsvariance, which is calculated as follows:

v(κc) =(1− f)

n

1

n− 1

n∑i=1

(κ∗(i)c − κc

)2,

where,κ∗(i)c = κ

(i)c − 2(1− κc)(pe|i − pe)/(1− pe),

•• κ(i)c =

{(n/n′)(pa|i − pe)/(1− pe), if ri ≥ 2,

0, otherwise,

• pe|i =1

r(r − 1)

r∑g=1

q∑k=1

w(i)gk

(rp+k − pgk

).

Page 82: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

4.4. Statistical Calculations - 77 -

• w(i)gk =

q∑l=1

(wkl+wlk)δ(i)gl , with δ

(i)gl being 1 if rater g classifies

subject i into category l, and 0 otherwise.

Brennan-Prediger

The Brenan-Prediger agreement coefficient (Bennan & Prediger,1981) is defined as follows:

BP = (pa − pe)/(1− pe),

where, pa is defined as follows:

pa =1

n′

n′∑i=1

pa|i, pa|i =

q∑k=1

rik(rik+ − 1

)ri(ri − 1)

, rik+ =

q∑l=1

wklril,

where n′ is the number of subjects scored by 2 raters or more.The chance-agreement probability pe is given by,

pe =1

q2

∑k,l

wkl.

VARIANCE

The standard error of BP coefficient is the square root of itsvariance, which is defined as follows:

v(BP) =1− fn

1

n− 1

n∑i=1

(BPi − BP

)2,

where,

BPi =

{(n/n′)(pa|i − pe)/(1− pe), if ri ≥ 2,

0, otherwise,

TOTAL VARIANCE DUE TO THE SAMPLINGOF SUBJECTS & RATERS

Page 83: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

4.4. Statistical Calculations - 78 -

TotalVariance

The total variance that accounts for the sampling of subjectsand raters is obtained by summing the variance due to the sam-pling of subjects and the variance due to the sampling of raters.The variance due to the sampling of subjects is calculated asdiscussed earlier in this section, and that due to the samplingraters is obtained using the jackknife method.

Let θ be an arbitrary agreement coefficient, vs(θ) and vr(θ) itsrespective variances due to the subject and rater sampling. Thetotal variance vt(θ) (sometimes called unconditional variance)is given by:

vt(θ) = vs(θ) + vr(θ),

where vs(θ) is calculated as discussed earlier in this section, andvr(θ) is calculated as follows:

vr(θ) = (1− f)n− 1

n

n∑i=1

(θ(i) − θ

)2,

where θ(i) represents the agreement coefficient calculated usingall subjects in the sample except subject i. That is, the calcu-lation of κ

(i)c is based on a sub-sample of size n − 1 extracted

from the initial sample of n subjects.

CONFIDENCE INTERVALS

AgreeStat 2015.1 forces the two confidence bounds of a confidenceintervals to lie between 0 and 1. Let 1−α be the chosen confidencelevel, and zα/2 the 100(1−α/2)th percentile of the Standard Normaldistribution. The lower confidence bound (LB) and upper confidencebound (UB) of the 100(1− α)% confidence interval associated withan agreement coefficient θ are calculated as follows:

LB = max(0, θ − zα/2StdErr(θ)

)LB = min

(1, θ + zα/2StdErr(θ)

)

Page 84: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

4.4. Statistical Calculations - 79 -

Page 85: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

4.4. Statistical Calculations - 80 -

Page 86: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

4.4. Statistical Calculations - 81 -

Bibliography

[1] Altman, D. G. (1991). Practical Statistics for Medical Research. Chap-

man and Hall.[2] Brennan, R. L., and Prediger, D. J. (1981). “Coefficient Kappa: some

uses, misuses, and alternatives.” Educational and Psychological Measure-ment, 41, 687-699.[

3] Conger, A. J. (1980), “Integration and Generalization of Kappas for Mul-tiple Raters,” Psychological Bulletin, 88, 322-328.[

4] Fleiss, J. L. (1971). “Measuring nominal scale agreement among manyraters”, Psychological Bulletin, 76, 378-382.[

5] Fleiss, J. L. (1981). Statistical Methods for Rates and Proportions. JohnWiley & Sons.[

6] Gwet, K. L. (2008a). “Computing inter-rater reliability and its variancein the presence of high agreement.” British Journal of Mathematical andStatistical Psychology, 61, 29-48.[

7] Gwet, K. L. (2014). Handbook of Inter-Rater Reliability (4th Edition),Advanced Analytics, LLC. Maryland, USA.[

8] Gwet, K. L. (2011). “On the Krippendorff’s Alpha Coefficient.”, Sub-mitted for publication in Communication Methods and Measures[

9] Krippendorff, K. (1980). Content Analysis: An Introduction to its Method-ology, Chapter 12. Sage, Beverly Hills, CA.[

10] Landis, J.R., and Koch, G. (1977). “The Measurement of observer agree-ment for categorical data,” Biometrics, 33, 159-174.

Page 87: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

4.4. Statistical Calculations - 82 -

Page 88: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

4.4. Statistical Calculations - 83 -

Page 89: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1
Page 90: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

4.4. Statistical Calculations - 85 -

Page 91: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

CHAPTER��

� 5

USING WEIGHTS

5.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .74

Identity Weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

5.2. Pre-Defined Weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

Ordinal Weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .75

Quadratic Weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

Linear Weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

Radical Weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .78

Ratio Weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .79

Circular Weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

Bipolar Weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

5.3. Custom Weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

- 86 -

Page 92: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

5.1. Introduction - 87 -

5.1. INTRODUCTION

AgreeStat 2015.1 gives you the opportunity to use 7 different pre-defined weights. However, if none of these predefined weights meetsyour needs, you can opt to supply your own custom weights. Sec-tion 5.2 provides a detailed review of the 7 predefined and providesexamples of each of them. Section 5.3 discusses the use of customweights.

All calculations in AgreeStat 2015.1 whether the analysis is weightedor unweighted, are based on the use of weights. When the un-weighted analysis is requested, then AgreeStat 2015.1 uses whatis called the identity weights, where all diagonal elements are 1, andall off-diagonal elements equal to 0.

IdentityWeights

The identity weight wkl associated with 2 categories k and l isdefined as follows:

wkl =

{1, if k = l,0, otherwise.

(5.1.1)

EXAMPLE

Suppose that your reliability study consists of classifying sub-jects into 5 nominal categories named A, B, C, D, and E.

Table 5.1: Identity Weights

A B C D EA 1 0 0 0 0B 0 1 0 0 0C 0 0 1 0 0D 0 0 0 1 0E 0 0 0 0 1

Each time you request the unweighted analysis, AgreeStat2015.1 creates these identity weights to perform a weighted anal-ysis.

Page 93: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

5.2. PRE-DEFINED WEIGHTS - 88 -

5.2. PRE-DEFINED WEIGHTS

OrdinalWeights

The ordinal weights implemented in AgreeStat 2015.1 are de-fined as follows:

wkl =

{1−Mkl/Mmax if k 6= l,1 if k = l.

(5.2.2)

Mkl = #{(i, j) : min(k, l) ≤ i < j ≤ max(k, l)} represents thenumber of pairs (i, j) (with i < j), which can be formed withnumbers between min(k, l) and max(k, l), and Mmax its maxi-mum values over all k, and l. This number can be calculated asfollows:

Mkl =

(max(k, l)−min(k, l) + 1

2

). (5.2.3)

EXAMPLE

Suppose that your reliability study consists of classifying sub-jects into 5 nominal categories named A, B, C, D, and E. Whenyou request a weighted analysis with linear weights, AgreeStat2015.1 creates the set of weights shown in the following table:

Table 5.2: Simple Ordinal Weights

A B C D EA 1.0 0.9 0.7 0.4 0.0B 0.9 1.0 0.9 0.7 0.4C 0.7 0.9 1.0 0.9 0.7D 0.4 0.7 0.9 1.0 0.9E 0.0 0.4 0.7 0.9 1.0

It appears from Table 5.2 that an A-B or B-A disagreementcarries a substantial weight of 0.9, and that weight decreases asthe 2 categories become far apart.

To compute the ordinal weights of Table 5.2, AgreeStat 2015.1first assign the values 1, 2, 3, 4, and 5 to the categories A, B, C,D, and E. For this example Mmax =

(5−1+1

2

)= 10. The weight

0.9 associated with categories A and B is labeled as w12 (1 is for

Page 94: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

5.2. PRE-DEFINED WEIGHTS - 89 -

A, and 2 is for B), and is calculated as w12 = 1 −M12/Mmax,where M12 =

(2−1+1

2

)= 1 following equation 5.3. Consequently,

w12 = 1− 1/10 = 1− 0.1 = 0.9.

• The use of ordinal weights requires the categories to be atleast of ordinal type. One should be able to rank all cate-gories from the smallest to the largest.

• If your ratings are supplied to AgreeStat 2015.1 in theform of a contingency table, or in the form of a distributionof raters by subject and category, then it is your responsi-bility to list the categories is ascending order. AgreeStat2015.1 does not re-rank the categories for those 2 formats.

• If your ratings are supplied to AgreeStat 2015.1 in theform of raw data (2 columns or more) then these ratingswill automatically be sorted in ascending order and weightedas shown in Table 5.2.

• The actual values of the ratings do not affect the magnitudeof the ordinal weights. Only their ranks do.

QuadraticWeights

The quadratic weights implemented in AgreeStat 2015.1 aredefined as follows:

wkl =

1− (k − l)2

maxi,j

(i− j)2, if k 6= l,

1, if k = l.

(5.2.4)

The values taken by quadratic weights depends on whether theratings are of alphabetic or numeric type. If the ratings arealphabetic, then they are numbered sequentially from 1 to thenumber of categories, and these numbers will be used to createthe weights. If the ratings are already numeric, then these ratingvalues are used for calculating the weights.

Again, AgreeStat 2015.1 will automatically sort raw data (i.e.when you provide the ratings assigned by raters to each subject).

Page 95: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

5.2. PRE-DEFINED WEIGHTS - 90 -

For contingency tables and distribution of raters, you will needto supply the categories is ascending order as you want them tobe weighted.

EXAMPLES

Tables 5.5.3 and 5.4 provides 2 examples that show the extent towhich quadratic weights may differ depending the type of inputdata. In Table 5.3 the categories is alphabetic and are replacedby the integer values 1, 2, 3, 4, and 5 before using equation5.4. In Table 5.4 the categories are numeric are used as such inequation 5.4.

Table 5.3: Quadratic Weights

A B C D EA 1 0.9375 0.75 0.4375 0B 0.9375 1 0.9375 0.75 0.4375C 0.75 0.9375 1 0.9375 0.75D 0.4375 0.75 0.9375 1 0.9375E 0 0.4375 0.75 0.9375 1

Table 5.4: Quadratic Weights

1.7 2.2 2.5 3.9 5.71.7 1 0.9844 0.96 0.6975 02.2 0.9844 1 0.9944 0.8194 0.23442.5 0.96 0.9944 1 0.8775 0.363.9 0.6975 0.8194 0.8775 1 0.79755.7 0 0.2344 0.36 0.7975 1

LinearWeights

The linear weights implemented in AgreeStat 2015.1 are de-fined as follows:

wkl =

1− |k − l|maxi,j|i− j|

, if k 6= l,

1, if k = l,

(5.2.5)

where | · | represents the absolute value function.

Page 96: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

5.2. PRE-DEFINED WEIGHTS - 91 -

The values taken by linear weights depend on whether the rat-ings are of alphabetic or numeric type. If the ratings are al-phabetic, then they are numbered sequentially from 1 to thenumber of categories, and these numbers will be used to cre-ate the weights. If the ratings are already numeric, then theserating values are used for calculating the weights.

Again, AgreeStat 2015.1 will automatically sort raw data (i.e.when you provide the ratings assigned by raters to each subject).

RadicalWeights

The radical weights implemented in AgreeStat 2015.1 are de-fined as follows:

wkl =

1−√|k − l|

maxi,j

√|i− j|

, if k 6= l,

1, if k = l.

(5.2.6)

The values taken by radical weights depend on whether theratings are of alphabetic or numeric type. If the ratings arealphabetic, then they are numbered sequentially from 1 to thenumber of categories, and these numbers will be used to createthe weights. If the ratings are already numeric, then these ratingvalues are used for calculating the weights.

Again, AgreeStat 2015.1 will automatically sort raw data (i.e.when you provide the ratings assigned by raters to each subject).

RatioWeights

The ratio weights implemented in AgreeStat 2015.1 are de-fined as follows:

wkl = 1−[(k − l)/(k + l)

]2, (5.2.7)

where k and l are 2 categories. Again the values taken by k andl depend on whether the categories are alphabetic or numeric.For alphabetic categories, k and l take the integer values thatare assigned to the categories sequentially from 1 to the numberof categories. For numeric categories, k and l take the samevalues as the categories.

Page 97: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

5.3. CUSTOM WEIGHTS - 92 -

CircularWeights

The circular weights implemented in AgreeStat 2015.1 are de-fined as follows:

• If the sine function’s argument is in degrees,

wkl = 1−(

sin[180(k − l)/U

])2, (5.2.8)

• If the sine function’s argument is in radians,

wkl = 1−(

sin[π(k − l)/U

])2, (5.2.9)

where U = qmax − qmin + 1. Note that qmax and qmin arethe largest and the smallest values on the scoring scalerespectively.

k and l are 2 categories. Again the values taken by k and ldepend on whether the categories are alphabetic or numeric.For alphabetic categories, k and l take the integer values thatare assigned to the categories sequentially from 1 to the numberof categories. For numeric categories, k and l take the samevalues as the categories.

BipolarWeights

The bipolar weights implemented in AgreeStat 2015.1 are de-fined as follows:

wkl = 1− (k − l)2

(k + l − 2qmin)(2qmax − k − l), (5.2.10)

where k and l are 2 categories. Again the values taken by k andl depend on whether the categories are alphabetic or numeric.For alphabetic categories, k and l take the integer values thatare assigned to the categories sequentially from 1 to the numberof categories. For numeric categories, k and l take the samevalues as the categories.

5.3. CUSTOM WEIGHTS

Page 98: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

5.3. CUSTOM WEIGHTS - 93 -

The use of custom weights in AgreeStat 2015.1 is simple. Onceyou open the custom weight dialog form shown in Figure 5.1, youonly need to specify the weights in the cells provided to this effect.

Figure 5.1: AgreeStat 2015.1 ’s Custom Weight Form

• All diagonal cells are disabled and each of them contains a valueof 1. Therefore, you cannot modify these values. There is nofundamental justification for assigning a value other than 1 toa diagonal cell.

• Placing the cursor over a cell will display a tool tip that describesthe 2 associated category labels (see Figure 5.1). This featurecould be very convenient for entering weights in a large tablewith many categories.

• The off-diagonal cells can a priori take any value. However, werecommend assigning values between 0 and 1 to these cells. Thiscan be done without loss of generality.

Page 99: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

5.3. CUSTOM WEIGHTS - 94 -

Page 100: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

5.3. CUSTOM WEIGHTS - 95 -

Page 101: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

PART II

INTRACLASS CORRELATION COEFFICIENT

Page 102: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

5.3. CUSTOM WEIGHTS - 97 -

Page 103: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

CHAPTER��

� 6

INTRACLASS CORRELATIONCOEFFICIENTS WITH AGREESTAT

2015.1

6.1. ORGANIZING YOUR DATA . . . . . . . . . . . . . . . . . . . . . . . . .84

TWO-RATER ANALYSIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

ANALYSIS OF THREE RATERS OR MORE . . . . . . . . 86

6.3. DESCRIBING THE EXPERIMENTAL DESIGN . . . . . 88

ANALYSIS OF MEAN RATINGS . . . . . . . . . . . . . . . . . . . . . 91

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

- 98 -

Page 104: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

6.1. Organizing your data - 99 -

Most researchers using the intraclass correlation coefficient to quantifythe extent of agreement among observers, are typically interested in oneof the coefficients discussed by Shrout and Fleiss (1979) or McGraw andWong (1996) . These coefficients are essentially what is implemented inAgreeStat 2015.1 although the setting is more general in the followingsense:

• Shrout & Fleiss (1979), as wells as McGraw & Wong (1996)have not considered the use of replicatesa, which represent thenumber of measurements that each rater takes on each subject.AgreeStat 2015.1 can handle an arbitrary number of replicates.The use of replicates is very common in practice.

• These authors have not considered the important issue of missingvalues. The handling of missing values in the context of intraclasscorrelation coefficients requires graduate-level mathematics as de-scribed in Searle (1997). AgreeStat 2015.1 does not removemissing values. Instead, it uses the rigorous statistical approachrecommended by Searle (1997) and others.

aNot considering replicates means that they assume each rater to produce asingle score per subject

6.1. ORGANIZING YOUR DATA

Let us consider the data of Table 6.1a, which represent the ratingsthat 4 judges assigned to 6 targets. Although this particular datasetdoes not contain any missing values, AgreeStat 2015.1 can handlemissing data properly (the mathematical details for doing this areprovided in chapter 7).

AgreeStat 2015.1 allows you to compute various intraclass cor-relation coefficients to evaluate the extent of agreement among 2judges, among all raters or any number of raters of your choice.The first thing to do is to create your own Excel workbook con-taining all 5 columns of data of Table 6.1, and open AgreeStat2015.1 in the same Excel instance.

aThis data is from Shrout & Fleiss (1979)

Page 105: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

6.1. Organizing your data - 100 -

Table 6.1: Four Ratings on 6 Targets

JudgesTarget

J1 J2 J3 J41 9 2 5 82 6 1 3 23 8 4 6 84 7 1 2 65 10 5 6 96 6 2 4 7

TWO-RATER ANALYSIS

“RawScores”Option

If your goal is to measure the extent of agreement of 2 particu-lar judges, then after launching AgreeStat 2015.1 you mustchoose the “2 Columns of Raw Scores” radio button as shownin Figure 6.1.

Figure 6.1: Selection of the “raw scores” option for 2-rater analysis

Page 106: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

6.1. Organizing your data - 101 -

DescribeData

After clicking the “Execute” command button of Figure 6.1,AgreeStat 2015.1 will display the form shown in Figure 6.2.The first thing to do is to choose the “Method of Analysis.” Inthis case, you will select the Intraclass Correlation Coefficients(ICC)” radio button. Then proceed as follows:

• Select the workbook containing your data, then the work-sheet.

• In the 2 combo boxes on the right, you will select sepa-rately the first column of subject (or target) names, andthen both columns of ratings as shown in Figure 6.2.

Figure 6.2: Describing the ratings of Judges 1 and 2 in AgreeStat 2015.1

ANALYSIS OF THREE RATERS OR MORE

Page 107: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

6.1. Organizing your data - 102 -

“RawScores”Option

If your goal is to measure the extent of agreement among all 4judges reported in Table 6.1, then after launching AgreeStat2015.1 you must choose the “3 Raters or More” tab, then selectthe “Columns of Raw Scores” radio button as shown in Figure6.3. Only this selection will give you the option to computethe intraclass correlation coefficient.

Figure 6.3: Selecting the “raw scores” option for the 3-rater analysis

Page 108: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

6.1. Organizing your data - 103 -

DescribeData

After clicking the “Execute” command button of Figure 6.3,AgreeStat 2015.1 will display the form shown in Figure 6.4.The first thing to do is choose the “Method of Analysis.” In thiscase, you will select the “Intraclass Correlation Coefficients (ICC)”radio button. Then you should proceed as follows:

• Select the workbook containing your data from the list ofworkbooks in the “Select Workbook” combo box (Note thatyour workbook will appear there only if it is opened in thesame Excel instance as AgreeStat 2015.1 ). Then selectthe correct worksheet from the list box of Worksheets.

• In the 2 combo boxes on the right side of the form, youwill select separately the first column of subject (or target)names (these are the names in red), and then all 4 columnsof ratings as shown in Figure 6.4. Note that AgreeStat2015.1 will only keep the first 7 characters of the subjectname.

Figure 6.4: Describing the ratings of all 4 judges of Table 6.1 in AgreeStat2015.1

Page 109: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

6.2. Describing the experiment design - 104 -

6.2. DESCRIBING THE EXPERIMENTAL DESIGN

From Figure 6.2 (for a 2-judge analysis) or Figure 6.4 (for the 4-judge analysis), you must select the “Options (ICC)” tab (see Fig-ure 6.5) in order to provide further information regarding the waythe data should be analyzed. What you describe here is not theoutcome of your experiment (i.e. your collected data). Instead, itis the experimental design that you would describe. How the ratersare expected to score the subjects, is what you must specify.

The first thing to do at this stage is select one of the 3 radio buttonswithin the “Inter/Intra-rater reliability model” frame on the rightside of the form.

I The first radio button labeled as “Each target rated by a dif-ferent set of raters” should be selected if 2 targets ore morein Table 6.1 were rated by a different set of 4 judges. In thiscase, only inter-rater reliability (and not intra-rater reliabil-ity) can be calculated. Intra-rater reliability represents thedegree of self-consistency and can be calculated only if onejudge scores many targets.

Figure 6.5: Experimental Design underlying Table 6.1 DataRepresenting Individual Ratings

Page 110: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

6.2. Describing the experiment design - 105 -

I The second radio button labeled as “Each rater rates a differentgroup of targets” should be selected if some of the 4 judges ofTable 6.1 have rated different sets of 6 targets. In this case,only intra-rater reliability can be calculated (and not the inter-rater reliability). Inter-rater reliability represents the degree ofconsistency among raters, which requires the judges to rate thesame targets; a condition not guaranteed by this design.

I The third radio button labeled as “Each rater rates all targets”should be selected if each of the 4 judges of Table 6.1 has scoredall 6 targets. This is known in the jargon of experimental designas a factorial design.

The selection of this radio button will automatically activate thelist box below containing 2 items, which allows you to furtherspecify the nature of the rater effect. If the group of judges whoproduced the ratings was selected from a larger group of judgesto which the results must be projected, then you must selectedthe option “Raters selected from a larger population.” Otherwise,select the second option entitled “Raters are the only ones ofinterest.” This latter option is selected when you only care aboutthe 4 judges who provided the ratings.

Page 111: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

6.2. Describing the experiment design - 106 -

RESULTSFigure 6.6 shows the results of the analysis done on Table 6.1 datawith the specifications of Figure 6.5. The user may experiment withalternative model specifications to discover what the impact would be.

I The descriptive statistics allows you to verify that AgreeStat2015.1 read the input data correctly. The maximum and mini-mum number of replicates are calculated based on the numberof times a target name appears in the target column. Hence theneed to label target names in this first column properly. Themaximum number of characters of a target name cannot exceed7.

I The “Total number of measurements” excludes missing values.This provides another opportunity to verify the integrity of thedataset that was actually analyzed.

I The variance components are used in the calculation of the inter-rater and intra-rater reliability coefficients (see chapter 7 for amathematical description of these coefficients). The intraclasscoefficients coefficients representing inter-rater and intra-rater re-liability coefficients are highlighted in yellow.

Figure 6.6: Results of ICC Analysis of Table 6.1 Data as IndividualRatings, with AgreeStat 2015.1

Page 112: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

6.2. Describing the experiment design - 107 -

ANALYSIS OF MEAN RATINGS

If the data being analyzed represent averages of a fixed number of mea-surementsa instead of representing individual ratings, then the calcula-tion of inter-rater and intra-rater reliability coefficients will be affected.To account for this, you should select the checkbox labeled as “MeanRating is the Unit of Analysis” as shown in Figure 6.7. This selec-tion will enable the text box labeled as “Number of obs the mean isbased on,” which allows you specify the number of observations usedto compute the average.

Figure 6.7: Experimental Design underlying Table 6.1 Data that, whichRepresent Mean Ratings

RESULTS

Figure 6.8 shows the results of the analysis done on Table 6.1 datawith the specifications of Figure 6.7. It is assumed here that eachrating represents the average of 4 measurements. This has the effectof reducing the variation in measurement due to the error, increasingthereby the intraclass correlation coefficient.

aFor example, in Table 6.1, J1, J2, J3, and J4 may each represent a group of 3raters reporting their average rating for each subject.

Page 113: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

6.2. Describing the experiment design - 108 -

Figure 6.8: Results of ICC Analysis of Table 6.1 Data as AverageRatings, with AgreeStat 2015.1

Page 114: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

6.2. Describing the experiment design - 109 -

Page 115: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

6.2. Describing the experiment design - 110 -

Page 116: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

6.2. Describing the experiment design - 111 -

Bibliography

[1] McGraw, K. O., and Wong, S. P. (1996), “Forming Inferences About

Some Intraclass Correlation Coefficients,” Psychological Methods, 1, 30-46.[

2] Searle, S.R. (1997). Linear Models (Wiley Classics Library), Wiley-Interscience: John Wiley & Sons, Inc.[

3] Shrout, P. E., and Fleiss, J. L. (1979). “Intraclass Correlations: Uses inAssessing Rater Reliability,” Psychological Bulletin, 86, 420-428.

Page 117: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

6.2. Describing the experiment design - 112 -

Page 118: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

CHAPTER��

� 7

INTRACLASS CORRELATIONCOEFFICIENTS: THE STATISTICAL

CALCULATIONS

7.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

7.2. Each Target Rated by a Different Set of Raters . . . . . . . . . . . . . . . . . . 96

The Individual Rating Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

The Mean Rating Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

7.3. Each Rater Rates a Different Group of Subjects . . . . . . . . . . . . . . . . .100

The Individual Rating Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

The Mean Rating Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

7.4. Each Rater Rates all Subjects with Rater-Subject Interaction . . 104

The Individual Rating Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

The Mean Rating Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

7.5. Each Rater Rates All Subjects with Rater-Subject Interaction andFixed Rater Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .109

The Individual Rating Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

The Mean Rating Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

7.6. Each Rater Rates all Subjects without Rater-Subject Interaction 113

The Individual Rating Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

The Mean Rating Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

7.7. Each Rater Rates all Subjects without Rater-Subject Interaction andFixed Rater Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .118

The Individual Rating Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

The Mean Rating Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .121

- 113 -

Page 119: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

7.1. INTRODUCTION - 114 -

7.1. INTRODUCTION

This section describes the procedures implemented inAgreeStat 2015.1 for calculating the Intraclass correlationcoefficient for various reliability models. The reliability modelsconsidered here are those discussed by Shrout & Fleiss (1979) ,and by McGraw & Wong (1996). However, these authors con-fined themselves to the simple situation where there is only onemeasurement taken per subject and per rater, and where thereis no missing value. AgreeStat 2015.1 however, considers themore general and more practical framework where some valuesare missing, and where mij measurements are associated withrater j and subject i.

AgreeStat 2015.1 also treats the case where each measure-ment represents the average of a fixed number of m measure-ments (see Shrout & Fleiss, 1979, or McGraw & Wond, 1996).

• Table 7.1 is an abstract representation of a table of ratingsfrom r raters who have rated n subjects. For example yijk rep-resents the kth measurement that rater j assigned to subjecti.

• mij is the numbera of measurements taken by rater j on sub-ject i.

aShrout & Fleiss (1979) as well as McGraw & Wong (1996) have consideredthe special situation where mij = 1. AgreeStat 2015.1 however, can handlethe more practical and realistic situation where mij could take any integervalue from 0 on. The 0 value being an indication that rater j did not ratesubject i

Page 120: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

7.2. Different Sets of Raters for each Target - 115 -

Table 7.1: Ratings of n subjects by r raters with mmeasurements taken per subject and per rater

RaterSubject

1 · · · j · · · r

1 y111 · · · y1j1 · · · y1r1...

... · · · · · · ...1 y11m11 · · · y1jm1j

· · · y1rm1r

...... · · · · · · ...

i yi11 · · · yij1 · · · yir1...

... · · · · · · ...i yi1mi1

· · · yijmij· · · yirmir

...... · · · · · · ...

n yn11 · · · ynj1 · · · ynr1...

... · · · · · · ...n yn1mn1 · · · ynjmnj

· · · ynrmnr

7.2. EACH TARGET RATED BY A DIFFERENT SET OF RATERS

7.2.1 THE INDIVIDUAL RATING ANALYSIS

The Model The rating of each subject by a different group of raters isusually described by the following One-way ANOVA model:

yijk = µ+ ti + eijk, (7.2.1)

where yijk is the kth measurement associated with subject iand rater j, µ is the overall population mean of the ratings, tiis the target effect, and eijk is the residual term (i = 1, · · · , n,j = 1, · · · , r, k = 1, · · · ,mij). This random error term com-bines the joint and inseparable effects of rater, rater-targetinteraction, and random error.

Page 121: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

7.2. Different Sets of Raters for each Target - 116 -

• ti is a random variable assumed to follow the Normaldistribution N (0, σ2

t ).

• eijk is a random variable assumed to follow the Normaldistribution N (0, σ2

e).

• eijk and ti are 2 independent random variables

Inter-RaterReliability

The inter-rater reliability based on equation 7.2.1 is definedas γr = Corr(yijk, yij′k) = σ2

t /(σ2t + σ2

e), which represents thecorrelation coefficient between 2 different raters j and j′ forthe same subject i and the same measurement k.

Balanced Data.When there is no missing data, AgreeStat 2015.1 calculatesthe inter-rater reliability from data as suggested by Shrout &Fleiss (1979). That is,

ICC(1,1) =MST−MSE

MST + (rm− 1)MSE, (7.2.2)

where, MST, and MSE represent the mean of squares due tothe target and the error respectively, and defined as follows:

MST =rm

n− 1

n∑i=1

(yi·· − y

)2, (7.2.3)

MSE =1

n(rm− 1)

r∑j=1

n∑i=1

m∑k=1

(yijk − yi··

)2, (7.2.4)

where yi·· and y are given by:

yi·· =1

rm

r∑j=1

m∑k=1

yijk, and y =1

nrm

r∑j=1

n∑i=1

m∑k=1

yijk.

Page 122: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

7.2. Different Sets of Raters for each Target - 117 -

Unbalanced Dataa

Inter-rater reliability in this case is calculated as follows:

ICC(1, 1) =σ̂2t

σ̂2t + σ̂2

e

, (7.2.5)

where σ̂2e and σ̂2

t are given by,

σ̂2e = (T2y − T2t)/(M − n), & σ̂2

t =

[T2t − T 2

y /M − (n− 1)σ̂2e

]M − k4

where M , k4, T2y, Ty, and T2t are defined as,

M =n∑i=1

r∑j=1

mij, k4 =r∑j=1

n∑i=1

m2ij

m.j

, T2t =n∑i=1

y2i··/mi·,

T2y =n∑i=1

r∑j=1

mij∑k=1

y2ijk, Ty =n∑i=1

r∑j=1

mij∑k=1

yijk

aThese expressions are from Searle (1997), page 474.

7.2.2 THE MEAN RATING ANALYSIS

The Model If input ratings represent averages of k individual ratings (e.g.j represents a group of k raters) then,

yij = µ+ ti + eij, where eij =1

k

m∑l=1

eijl, (7.2.6)

i = 1, · · · , n, j = 1, · · · , r, ti follows the Normal distributionN (0, σ2

t ), eij follows the Normal distributionN (0, σ2e/k), ti and

eij are independent.

Page 123: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

7.2. Different Sets of Raters for each Target - 118 -

Inter-RaterReliability

The inter-rater reliability based on equation 7.2.6 is definedas γr = Corr(yij, yij′) = σ2

t /(σ2t + σ2

e/k), which represents thecorrelation coefficient between 2 different raters j and j′ forthe same subject i.

Balanced DataWhen there is no missing data, AgreeStat 2015.1 calculatesthe inter-rater reliability from data as suggested by Shrout &Fleiss (1979). That is,

ICC(1,k) =MST−MSE

MST + (r/k − 1)MSE, (7.2.7)

where, MST, and MSE are defined as in equations 7.2.3 and7.2.4 with the exception of m = 1, and subscript k is omitted.

Unbalanced DataWhen there are missing data, inter-rater reliability is given by,

ICC(1,k) =σ̂2t

σ̂2t + σ̂2

e/k, (7.2.8)

where σ̂2e and σ̂2

t are defined as follows:

σ̂2e = (T2y − T2t)/(M − n), & σ̂2

t =

[T2t − T 2

y /M − (n− 1)σ̂2e

]M − k4

where M , k4, T2y, Ty, and T2t are defined as,

M =n∑i=1

mi, k4 =1

M

n∑i=1

m2i , T2t =

n∑i=1

y2i·/mi,

T2y =n∑i=1

mi∑j=1

y2ij, Ty =n∑i=1

mi∑j=1

yij.

mi represents the number of rating groups that scored subjecti.

Page 124: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

7.3. Raters Rated Different Groups of Subjects - 119 -

Intra-RaterReliability

If your reliability is designed in such a way that the rater effectis not measurable because each rater does not necessarily scoreall n targets in Table 7.1, then the intra-rater reliability will beγa = Corr

(yijk, yi′jk

)= 0. It means for all practical purposes

that it is impossible to measure it. A study with a differentwill be necessary to accomplish this task.

7.3. EACH RATER RATES A DIFFERENT GROUP OF SUBJECTS

7.3.1 THE INDIVIDUAL RATING ANALYSIS

The Model The rating by each rater of a different group of subjects isdescribed by the following One-way ANOVA model:

yijk = µ+ rj + eijk, (7.3.9)

where rj is the rater effect, and eijk is the residual term (i =1, · · · , n, j = 1, · · · , r, k = 1, · · · ,mij).

• rj is a random variable assumed to follow the Normaldistribution N (0, σ2

r).

• eijk is a random variable assumed to follow the Normaldistribution N (0, σ2

e).

• eijk and rj are 2 independent random variables

Inter-RaterReliability

If your reliability is designed in such a way that the subjecteffect is not measurable because each rater rates a differentgroup of subjects, then the inter-rater reliability will be γr =Corr

(yijk, yij′k

)= 0. It means for all practical purposes that it

is impossible to measure it.

Page 125: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

7.3. Raters Rated Different Groups of Subjects - 120 -

Intra-RaterReliability

The intra-rater reliability based on equation 7.3.9 is definedas γa = Corr(yijk, yi′jk) = σ2

r/(σ2r + σ2

e), which represents thecorrelation coefficient between 2 different subjects i and i′ forthe same rater j and the same measurement k.

Balanced Data.When there is no missing data, AgreeStat 2015.1 calculatesthe intra-rater reliability from data as follows:

ICCa(1, 1) =MSR−MSE

MSR + (nm− 1)MSE, (7.3.10)

where, MSR, and MSE represent the mean of squares due tothe rater and the error respectively, and defined as follows:

MSR =nm

r − 1

r∑j=1

(y·j· − y

)2, (7.3.11)

MSE =1

r(nm− 1)

r∑j=1

n∑i=1

m∑k=1

(yijk − y·j·

)2, (7.3.12)

where y·j· and y are given by:

y·j· =1

nm

n∑i=1

m∑k=1

yijk, and y =1

rnm

r∑j=1

n∑i=1

m∑k=1

yijk.

Unbalanced Data.Intra-rater reliability in this case is calculated as follows:

ICCa(1, 1) =σ̂2r

σ̂2r + σ̂2

e

, (7.3.13)

where σ̂2e and σ̂2

r are given by,

Page 126: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

7.3. Raters Rated Different Groups of Subjects - 121 -

σ̂2e = (T2y − T2r)/(M − r), & σ̂2

r =

[T2r − T 2

y /M − (r − 1)σ̂2e

]M − k4

where M , k4, T2y, Ty, and T2r are defined as,

M =n∑i=1

r∑j=1

mij, k4 =n∑i=1

1

mi·

r∑j=1

m2ij, T2r =

r∑j=1

y2·j·/m·j,

T2y =n∑i=1

r∑j=1

mij∑k=1

y2ijk, Ty =n∑i=1

r∑j=1

mij∑k=1

yijk

7.3.2 MEAN RATING AS UNIT OF ANALYSIS

The Model If the input ratings to be analyzed represent averages of kindividual ratings from each raters, then the following ANOVAmodel is used:

yij = µ+ rj + eij, where eij =1

k

m∑k=1

eijl, (7.3.14)

where, i = 1, · · · , n, j = 1, · · · , n, rj follows the Nor-mal distribution N (0, σ2

r), eij follows the Normal distributionN (0, σ2

e/k), and rj and eij are independent.

The inter-rater reliability under model 7.3.14 is 0 for the samereasons evoked in section 7.3.3.

Intra-RaterReliability

The intra-rater reliability based on equation 7.3.14 is definedas γa = Corr(yij, yi′j) = σ2

r/(σ2r + σ2

e/k), which represents thecorrelation coefficient between 2 different targets i and i′ forthe same rater j.

Page 127: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

7.3. Raters Rated Different Groups of Subjects - 122 -

Balanced DataWhen there is no missing data, AgreeStat 2015.1 calculatesthe inter-rater reliability from data as suggested by Shrout &Fleiss (1979). That is,

ICCa(1, k) =MSR−MSE

MSR + (r/k − 1)MSE, (7.3.15)

where, MSR, and MSE are defined as in equations 7.3.11 and7.3.12 with the exception that m = 1, and subscript k is omit-ted.

Unbalanced DataWhen there are missing data, inter-rater reliability is given by,

ICC(1,k) =σ̂2r

σ̂2r + σ̂2

e/k, (7.3.16)

where σ̂2e and σ̂2

r are defined as follows:

σ̂2e = (T2y − T2r)/(M − r), & σ̂2

r =

[T2r − T 2

y /M − (r − 1)σ̂2e

]M − k4

where M , k4, T2y, Ty, and T2r are defined as,

M =r∑j=1

mj, k4 =1

M

r∑j=1

m2j , T2r =

r∑j=1

y2·j/mj,

T2y =n∑i=1

mi∑j=1

y2ij, Ty =r∑j=1

mj∑i=1

yij.

Note that mj represents the number of subjects that rater jhas rated.

Page 128: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

7.4. Random Effect Model with Rater-Subject Interaction - 123 -

7.4. EACH RATER RATES ALL SUBJECTS WITHRATER-SUBJECT INTERACTION(Subject & Rater Effects are Random)

7.4.1 THE INDIVIDUAL RATING ANALYSIS

The Model This situation is usually described by the following Two-wayANOVA model:

yijk = µ+ ti + rj + (rt)ij + eijk, (7.4.17)

where ti is subject i’s effect, rj rater j’s effect, (rt)ij the in-teraction effect between subject i and rater j (i = 1, · · · , n,j = 1, · · · , r, k = 1, · · · ,mij).

• ti is random and follows N (0, σ2t ).

• rj is random and follows N (0, σ2r).

• (rt)ij is random and follows N (0, σ2rt).

• eijk is random and assumed to follow N (0, σ2e).

• eijk, ti, and rj are mutually independent.

Inter-RaterReliability

The inter-rater reliability based on equation 7.4.17 is defined asγr = Corr(yijk, yij′k) = σ2

t /(σ2t + σ2

r + σ2rt + σ2

e).

Balanced Data.When there is no missing data, AgreeStat 2015.1 calculatesthe inter-rater reliability from data as suggested by Shrout &Fleiss (1979), and McGraw & Wong (1996) :

ICC(2,1) =MST−MSI

MST + r(MSR−MSI)/n+ (r − 1)MSI + r(m− 1)MSE, (7.4.18)

where, MSR, MST, MSI, and MSE are defined as follows:

Page 129: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

7.4. Random Effect Model with Rater-Subject Interaction - 124 -

MSE =1

rn(m− 1)

r∑j=1

n∑i=1

m∑k=1

(yijk − yij·

)2, (7.4.19)

MSR =nm

r − 1

r∑j=1

(y·j· − y

)2, MST =

rm

n− 1

n∑i=1

(yi·· − y

)2, (7.4.20)

MSI =m

(r − 1)(n− 1)

r∑j=1

n∑i=1

(yij· − yi·· − y·j· + y

)2, (7.4.21)

If m = 1 then we should set MSE = 0.

Unbalanced Data.Inter-rater reliability in this case is calculated as follows:

ICC(2, 1) =σ̂2t

σ̂2t + σ̂2

r + σ̂2rt + σ̂2

e

, (7.4.22)

where σ̂2e , σ̂

2t , σ̂

2r , σ̂

2rt are defined below,

σ̂2e = (T2y − T2rt)/(M − λ0), (7.4.23)

σ̂2rt =

{(M − k′1)δr + (k3 − k′2)δt

−[T2t − T ′2y − (n− 1)σ̂2

e

]}/(M − k′1 − k′2 + k′23) (7.4.24)

σ̂2r = δt − σ̂2

rt, (7.4.25)

σ̂2t = δr − σ̂2

rt. (7.4.26)

where T ′2y = T 2y /M , k′1 = k1/M , k′2 = k2/M , k′23 = k23/M ,

I λ0 = number of non-empty subject-rater cells (i, j). For all λ0cells, mij > 0.

I δt =[T2rt − T2t − (λ0 − n)σ̂2

e

]/(M − k3)

I δr =[T2rt − T2r − (λ0 − r)σ̂2

e

]/(M − k4)

If m = 1 then σ̂2e should take the value of σ̂2

rt (equation 7.4.25), andσ̂2e must be set to 0.

Page 130: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

7.4. Random Effect Model with Rater-Subject Interaction - 125 -

where M , k4, T2y, Ty, T2r, and T2rt are defined as

M =n∑i=1

r∑j=1

mij, k1 =n∑i=1

m2i·, k2 =

r∑j=1

m2·j,

k3 =n∑i=1

r∑j=1

m2ij

mi·, k4 =

r∑j=1

n∑i=1

m2ij

m·j, k23 =

n∑i=1

r∑j=1

m2ij,

T2t =n∑i=1

y2i··/mi·, T2r =r∑j=1

y2·j·/m·j, T2rt =n∑i=1

r∑j=1

y2ij·/mij

T2y =n∑i=1

r∑j=1

mij∑k=1

y2ijk, Ty =n∑i=1

r∑j=1

mij∑k=1

yijk

Intra-RaterReliability

The intra-rater reliability based on equation 7.4.17 is defined asγa = Corr(yijk, yi′jk) = σ2

r/(σ2t +σ2

r +σ2rt+σ

2e), which represents

the correlation coefficient between 2 subjects i and i′ for thesame rater j and the same measurement k.

Balanced Data.When there is no missing data, AgreeStat 2015.1 calculatesthe intra-rater reliability from data as follows:

ICCa(2, 1) =MSR−MSI

MSR + n(MST−MSI)/r + (n− 1)(MSI−MSE),

(7.4.27)where, MSR, MST, MSI, and MSE are given by equations

7.4.20,7.4.21, and 7.4.22.

Unbalanced Data.When your dataset contains missing values, then the intra-rater reliability coefficient is calculated as follows:

ICCa(2, 1) = σ̂2r/(σ̂

2t + σ̂2

r + σ̂2rt + σ̂2

e), (7.4.28)

where σ̂2r , σ̂

2t , σ̂

2rt, and σ̂2

e are given by equations 7.4.23 through7.4.27.

Page 131: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

7.4. Random Effect Model with Rater-Subject Interaction - 126 -

7.4.2 THE MEAN RATING ANALYSIS

The Model If the input ratings are averages of k individual ratings (e.g. themean of k randomly chosen raters), we can use the followingANOVA model:

yij = µ+ ti + rj + (rt)ij + eij, (7.4.29)

where the rater effect rj may represent the joint effect of the kraters in the group that produced the mean rating.

• j = 1, · · · , r (assume that r groups of k raters partici-pated in the experiment),

• rj follows the Normal distribution N (0, σ2r/k),

• (rt)ij follows the Normal distribution N (0, σ2rt/k),

• eij follows the Normal distribution N (0, σ2e/k),

• ti, rj, (rt)ij, and eij, are mutually independent.

Inter-RaterReliability

The inter-rater reliability based on equation 7.4.30 is definedas γr = Corr(yij, yij′) = σ2

t /[σ2t + (σ2

r + σ2rt + σ2

e)/k].

Balanced DataWhen there is no missing data, AgreeStat 2015.1 calculatesthe inter-rater reliability from data as suggested by Shrout &Fleiss (1979). That is,

ICC(2,k) =MST−MSI

MST + r(MSR−MSI)/nk + (r − k)MSI/k,

(7.4.30)where, MST, MSR, and MSI are defined as in equations 7.4.20and 7.4.21 with the exception that m = 1.

Page 132: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

7.4. Random Effect Model with Rater-Subject Interaction - 127 -

Unbalanced DataWhen there are missing data, inter-rater reliability is given by,

ICC(2,k) =σ̂2t

σ̂2t +

(σ̂2r + σ̂2

rt + σ̂2e

)/k, (7.4.31)

where σ̂2e , σ̂

2r , and σ̂2

rt are defined by equations 7.4.23 through7.4.27.

Intra-RaterReliability

MEAN RATING ANALYSIS

γa =σ2r/k

σ2t + (σ2

r + σ2rt + σ2

e)/k. (7.4.32)

Balanced Data

ICCa(2, k) =MSR−MSI

MSR + (n− 1)MSI + nk(MST−MSI)/r(7.4.33)

Unbalanced Data

ICCa(2, k) =σ̂2r/k

σ̂2t + (σ̂2

r + σ̂2rt + σ̂2

e)/k(7.4.34)

where σ̂2e , σ̂

2r , σ̂

2t , and σ̂2

rt are defined by equations 7.4.23 through7.4.27.

Page 133: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

7.5. Mixed-Effect Models with Rater-Subject Interaction - 128 -

7.5. EACH RATER RATES ALL SUBJECTS WITH RATER-SUBJECTINTERACTION (Random Subject Factor & Fixed Rater Factor)

7.5.1 THE INDIVIDUAL RATING ANALYSIS

The Model This situation is usually described by the following mixed-effectTwo-way ANOVA model:

yijk = µ+ ti + rj + (rt)ij + eijk, (7.5.35)

ti is the random effect of subject i, rj the fixed effect of raterj, (rt)ij the random interaction effect between subject i andrater j, and eijk is the residual term (i = 1, · · · , n, j = 1, · · · , r,k = 1, · · · ,mij).

• ti and (rt)ij satisfy the following conditions:

r∑j=1

rj = 0,r∑j=1

(rt)ij = 0, and (rt)ij ∼ N (0, σ2rt).

• ti is a random variable that follows the Normal distribu-tion N (0, σ2

t ).

• eijk is a random variable assumed to follow the Normaldistribution N (0, σ2

e).

• eijk, (rt)ij, and ti are mutually independent random vari-ables.

Page 134: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

7.5. Mixed-Effect Models with Rater-Subject Interaction - 129 -

Inter-RaterReliability

The inter-rater reliability based on equation 7.5.17 is defined asγr = Corr(yijk, yij′k) = (σ2

t − σ2rt/(r − 1))/(σ2

t + σ2rt + σ2

e), whichrepresents the correlation coefficient between 2 raters j and j′

for the same subject i and the same measurement k.

Balanced Data.When there is no missing data, and m ≥ 2, AgreeStat 2015.1calculates the inter-rater reliability from data as suggested byShrout & Fleiss (1979)a, and McGraw & Wong (1996) :

ICC(3,1) =(MST−MSI)− (MSI−MSE)/(r − 1)

MST + r(MSI−MSE) + (rm− 1)MSE, (7.5.36)

where, MST, MSI, and MSE represent respectively the mean ofsquares due to the target, the rater-target interaction, and theerror, and are defined by equations 7.3.19, 7.3.20, and 7.3.21. Ifm = 1 then assume a model with no interaction (see section 7.7for further details)

Unbalanced Data.Inter-rater reliability in this case is calculated as follows:

ICC(3, 1) =σ̂2t − σ̂2

rt/(r − 1)

σ̂2t + σ̂2

rt + σ̂2e

. (7.5.37)

If m > 1 then σ̂2e , σ̂

2t , σ̂

2rt are defined as follows:

σ̂2e = (T2y − T2rt)/(M − λ0), (7.5.38)

σ̂2rt =

[T2rt −RSS − (λ0 − n− r + 1)σ̂2

e)]/h6 (7.5.39)

σ̂2t =

[T2rt − T2r − (λ0 − r)σ̂2

e

]/(M − k4)− (r − 1)σ̂2

rt/r.(7.5.40)

where k4 =r∑j=1

n∑i=1

mij

m·j. Expressions for RSS and h6 bear some

complexity and will now be defined.

aOur equation is actually more general, since Shrout & Fleiss only consideredthe case of ne replicate per rater (i.e. m = 1)

Page 135: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

7.5. Mixed-Effect Models with Rater-Subject Interaction - 130 -

I Defining the (r − 1) × (r − 1) matrices Fi = (f(i)jj′) for

i = 1, · · · , n, and j, j′ = 1, · · · , r − 1.The element f

(i)jj′ is defined as follows:

f(i)jj′ =

{(m2

ij/mi·)(λi +mi· − 2mij), if j = j′,

(mijmij′/mi·)(λi −mij −mij′), if j 6= j′,

where λi is given by:

λi =r∑j=1

m2ij/mi·

I Defining the (r − 1) × (r − 1) matrix C = (cjj′) forj, j′ = 1, · · · , r − 1.

cjj′ =

m·j −

n∑i=1

m2ij

mi·, if j = j′,

−n∑i=1

mijmij′

mi·, if j 6= j′,

I Defining the r-dimensional vector b = (bj) where bj isdefined by,

bj = y·j· −n∑i=1

mijyi··.

I RSS = T2t + b>C−1b.

I h6 = M − k∗ where k∗ is given by,

k∗ =n∑i=1

λi + tr

(C−1

n∑i=1

Fi

)

Page 136: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

7.5. Mixed-Effect Models with Rater-Subject Interaction - 131 -

Intra-RaterReliability

The intra-rater reliability based on equation 7.5.36 is givenby γa = Corr(yijk, yi′jk) = 0, which represents the correlationcoefficient between 2 subjects i and i′ for the same rater j andthe same measurement k. That is, when the rater factor isfixed, the intra-rater reliability cannot be calculated.

7.5.2 THE MEAN RATING ANALYSIS

The Model For mean ratings, we will use the following ANOVA model:

yij = µ+ ti + rj + (rt)ij + eij, (7.5.41)

where ti is the random subject effect, and rj the joint effect ofk raters in group j.

• ti follows N (0, σ2t ), (rt)ij follows N (0, σ2

rt/k),

• eij followsN (0, σ2e/k), and ti, (rt)ij, and eij, are mutually

independent,

•r∑j=1

rj = 0, andr∑j=1

(rt)ij = 0.

Inter-RaterReliability

The inter-rater reliability based on equation 7.5.42 is definedas γr = Corr(yij, yij′) = (σ2

t−σ2rt/(k(r−1)))/

[σ2t +(σ2

rt+σ2e)/k

].

It is impossible to estimate this coefficient if interaction ispresent (i.e. σ2

rt > 0). However, if the interaction is absent,then the coefficient takes the form γr = σ2

t /(σ2t + σ2

e/k).

Balanced Data (no interaction)If there is no missing data, AgreeStat 2015.1 calculates inter-rater reliability from data as suggested by Shrout & Fleiss(1979). That is,

ICC(3,k) =MST−MSE

MST + (r − 1)MSE(7.5.42)

where, MST, and MSE are defined as in equations 7.5.20 and7.5.21 with the exception that m = 1.

Page 137: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

7.6. Random Effect Models without Interaction - 132 -

Unbalanced Data (no interaction)

When there are missing data, inter-rater reliability is given by,

ICC(3,k) =σ̂2t

σ̂2t + σ̂2

e/k, (7.5.43)

where σ̂2e and σ̂2

t are defined by equations 7.5.39 and 7.5.41.Note that model 7.5.42 is treated as a model without interac-tion, since the error and interaction terms are confounded.

Intra-RaterReliability

The intra-rater reliability for this model is 0. This model isinappropriate for computing intra-rater reliability because therater is fixed.

7.6. EACH RATER RATES ALL SUBJECTS WITHOUT RATER-SUBJECT INTERACTION(Subject and Rater as Random Effects)

7.6.1 THE INDIVIDUAL RATING ANALYSIS

The Model This situation is usually described by the following Two-wayANOVA model:

yijk = µ+ ti + rj + eijk, (7.6.44)

• ti is a random variable that follows the Normal distribu-tion N (0, σ2

t ).

• rj is a random variable that follows the Normal distribu-tion N (0, σ2

r).

• eijk is a random variable assumed to follow the Normaldistribution N (0, σ2

e).

• eijk, ti, and rj are mutually independent random vari-ables.

Page 138: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

7.6. Random Effect Models without Interaction - 133 -

Inter-RaterReliability

The inter-rater reliability based on equation 7.6.45 is defined asγr = Corr(yijk, yij′k) = σ2

t /(σ2t + σ2

r + σ2e), which represents the

correlation coefficient between 2 raters j and j′ for the samesubject i and the same measurement k.

Balanced Data.When there is no missing data, AgreeStat 2015.1 calculatesthe inter-rater reliability from data as follows:

ICC(2,1) =MST−MSE

MST + r(MSR−MSE)/n+ (rm− 1)MSE,

(7.6.45)where, MSR, MST, MSI, and MSE represent respectively themean of squares due to the rater, the target, the rater-targetinteraction, and the error. These means of squares are definedas follows:

MSE =1

rnm− r − n+ 1

∑i,j,k

(yijk − yi·· − y·j· + y)2, (7.6.46)

MSR =nm

r − 1

r∑j=1

(y·j· − y)2, (7.6.47)

MST =rm

n− 1

n∑i=1

(yi·· − y)2. (7.6.48)

Unbalanced Data.Inter-rater reliability in this case is calculated as follows:

ICC(2, 1) =σ̂2t

σ̂2t + σ̂2

r + σ̂2e

, (7.6.49)

where σ̂2e , σ̂

2t , σ̂

2r are defined as follows:

σ̂2e =

λ2(T2y − T2t) + λ1(T2y − T2r)− (T2y − T 2′y )

λ2(M − n) + λ1(M − r)− (M − 1), (7.6.50)

σ̂2t =

[T2y − T2r − (M − r)σ̂2

e

]/(M − k4), (7.6.51)

σ̂2r =

[T2y − T2t − (M − n)σ̂2

e

]/(M − k3), (7.6.52)

Page 139: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

7.6. Random Effect Models without Interaction - 134 -

λ1 = (M − k′1)/(M − k4), (7.6.53)

λ2 = (M − k′2)/(M − k3), (7.6.54)

k′1 = k1/M, where k1 =n∑i=1

m2i·, (7.6.55)

k′2 = k2/M, where k2 =r∑j=1

m2·j, (7.6.56)

k3 =n∑i=1

r∑j=1

m2ij

mi·, k4 =

n∑i=1

r∑j=1

m2ij

m·j. (7.6.57)

Intra-RaterReliability

The intra-rater reliability coefficient is given by,

γa = Corr(yijk, yi′jk

)=

σ2r

σ2t + σ2

r + σ2e

. (7.6.58)

Balanced DataThe intra-rater reliability coefficient is calculated as follows:

ICCa(2, 1) =MSR−MSE

MSR + n(MST−MSE)/r + (nm− 1)MSE, (7.6.59)

where MST, MSR, and MSE are defined by equations 7.6.49,7.6.48, and 7.6.47 respectively.

Unbalanced DataThe inter-rater reliability coefficient is calculated as follows:

ICCa(2, 1) =σ̂2r

σ̂2r + σ̂2

t + σ̂2e

. (7.6.60)

Page 140: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

7.6. Random Effect Models without Interaction - 135 -

7.6.2 THE MEAN RATING ANALYSIS

The Model For mean ratings the following ANOVA model could be used:

yij = µ+ ti + rj + eij, (7.6.61)

• rj follows the Normal distribution N (0, σ2r/k),

• eij follows the Normal distribution N (0, σ2e/k),

• ti, rj, and eij, are mutually independent.

Inter-raterReliability

For mean ratings, the inter-rater reliability coefficient is givenby,

γr = Corr(yij, yij′

)=

σ2t

σ2t + (σ2

r + σ2e)/k

. (7.6.62)

Balanced DataThe inter-rater reliability coefficient is calculated as follows:

ICC(2,k) =MST−MSE

MST + r(MSR−MSE)/nk + (r/k − 1)MSE, (7.6.63)

where MST, MSR, and MSE are defined as follows:

MST =r

n− 1

n∑i=1

(yi· − y)2, (7.6.64)

MSR =n

r − 1

r∑j=1

(y·j − y)2, (7.6.65)

Page 141: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

7.6. Random Effect Models without Interaction - 136 -

MSE =1

(n− 1)(r − 1)

n∑i=1

r∑j=1

(yij − yi· − y·j + y)2. (7.6.66)

Unbalanced DataThe inter-rater reliability coefficient is calculated as follows:

ICC(2,k) =σ̂2t

σ̂2t + (σ̂2

r + σ̂2e)/k

, (7.6.67)

where σ̂2t , σ̂

2r , and σ̂2

e are defined by equations 7.6.52, 7.6.53,and 7.6.51 respectively.

Intra-RaterReliability

The intra-rater reliability coefficient for mean ratings is givenby,

γ∗a = Corr(yij, yi′j

)=

σ2r/k

σ2t + (σ2

r + σ2e)/k

. (7.6.68)

Balanced DataThe intra-rater reliability coefficient is calculated as follows:

ICCa(2, k) =MSR−MSE

MSR + nk(MST−MSE)/r + (n− 1)MSE, (7.6.69)

where MST, MSR, and MSE are defined by equations 7.6.65,7.6.66, 7.6.67 respectively.

Unbalanced DataThe inter-rater reliability coefficient is calculated as follows:

ICCa(2, k) =σ̂2r/k

σ̂2r + (σ̂2

t + σ̂2e)/k

, (7.6.70)

where σ̂2t , σ̂

2r , and σ̂2

e are defined by equations 7.6.52, 7.6.53,and 7.6.51 respectively.

Page 142: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

7.7. Mixed-Effect Models without Interaction - 137 -

7.7. EACH RATER RATES ALL SUBJECTS WITHOUT RATER-SUBJECT INTERACTION (Random Subject Effect & Fixed Rater Effect)

7.7.1 THE INDIVIDUAL RATING ANALYSIS

The Model This situation is usually described by the following Two-wayANOVA model:

yijk = µ+ ti + rj + eijk, (7.7.71)

• ti is a random variable that follows the Normal distribu-tion N (0, σ2

t ).

• rj is a fixed rater effect that satisfies r1 + · · ·+ rr = 0.

• eijk is a random variable assumed to follow the Normaldistribution N (0, σ2

e).

• eijk, and ti, are independent random variables.

Inter-RaterReliability

The inter-rater reliability based on equation 7.7.72 is definedas γr = Corr(yijk, yij′k) = σ2

t /(σ2t + σ2

e), which represents thecorrelation coefficient between 2 raters j and j′ for the samesubject i and the same measurement k.

Balanced Data.When there is no missing data, AgreeStat 2015.1 calculatesthe inter-rater reliability from data as follows:

ICC(3,1) =MST−MSE

MST + (rm− 1)MSE, (7.7.72)

where, MST, and MSE represent respectively the mean ofsquares due to the target, and the error. These means ofsquares are defined by equations 7.6.49, and 7.6.47 respectively.

Page 143: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

7.7. Mixed-Effect Models without Interaction - 138 -

Unbalanced Data.Inter-rater reliability in this case is calculated as follows:

ICC(3, 1) =σ̂2t

σ̂2t + σ̂2

e

, (7.7.73)

where σ̂2e , and σ̂2

t are defined as follows:

σ̂2e = (T2y −RSS)/(M − n− r + 1), (7.7.74)

σ̂2t =

(R− T2r − (n− 1)σ̂2

e

)/h7, (7.7.75)

where h7 = M − k4, while k4 and RSS are defined in theparagraphs following equation 7.7.42.

Intra-RaterReliability

The intra-rater reliability cannot be calculated for mixed-models where the rater effect is fixed. It is consistently equalto 0.

7.7.2 THE MEAN RATING ANALYSIS

The Model If the input ratings to be analyzed represent averages of k in-dividual ratings (e.g. the mean rating from k randomly chosenraters), then the following ANOVA model could be used:

yij = µ+ ti + rj + eij, (7.7.76)

where the rater effect rj for example, may represent the jointeffect of the jth group of k raters that produced the meanrating.

• i = 1, · · · , n, j = 1, · · · , r (it is assumed that r sepa-rate groups of k raters each, participate in the reliabilityexperiment),

Page 144: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

7.7. Mixed-Effect Models without Interaction - 139 -

• rj follows the fixed effect of group j,

• eij follows the Normal distribution N (0, σ2e/k),

• ti, and eij, are mutually independent.

Inter-RaterReliability

For mean ratings, the inter-rater reliability coefficient is givenby,

γr = Corr(yij, yij′

)=

σ2t

σ2t + σ2

e/k. (7.7.77)

Balanced DataThe inter-rater reliability coefficient is calculated as follows:

ICC(3,k) =MST−MSE

MST + (r/k − 1)MSE, (7.7.78)

where MST, and MSE are defined as follows:

MST =r

n− 1

n∑i=1

(yi· − y)2, (7.7.79)

MSE =1

(n− 1)(r − 1)

n∑i=1

r∑j=1

(yij − yi· − y·j + y)2. (7.7.80)

Unbalanced DataThe inter-rater reliability coefficient is calculated as follows:

ICC(3,k) =σ̂2t

σ̂2t + σ̂2

e/k(7.7.81)

Intra-RaterReliability

The intra-rater reliability cannot be calculated for mixed-models where the rater effect is fixed. It is consistently equalto 0.

Page 145: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

7.7. Mixed-Effect Models without Interaction - 140 -

Page 146: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

7.7. Mixed-Effect Models without Interaction - 141 -

Page 147: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

7.7. Mixed-Effect Models without Interaction - 142 -

Bibliography

[1] McGraw, K. O., and Wong, S. P. (1996), “Forming Inferences About

Some Intraclass Correlation Coefficients,” Psychological Methods, 1, 30-46.[

2] Shrout, P. E., and Fleiss, J. L. (1979). “Intraclass Correlations: Uses inAssessing Rater Reliability,” Psychological Bulletin, 86, 420-428.[

3] Searle, S.R. (1997). Linear Models (Wiley Classics Library), Wiley-Interscience: John Wiley & Sons, Inc.

Page 148: F AdvancedAnalyt cs,LLC AgreeStat 2015.1 for …agreestat.com/documents/agreestatguide.pdfHTML Help .....5 - 1 - 1.1. Welcome to AgreeStat 2015.1 - 2 - 1.1. WELCOME TO AgreeStat 2015.1

Subject Index - 143 -