A Bifactor Model of Burnout? An Item Response Theory

Wright State University Wright State University

CORE Scholar CORE Scholar

Browse all Theses and Dissertations Theses and Dissertations

2016

A Bifactor Model of Burnout? An Item Response Theory Analysis A Bifactor Model of Burnout? An Item Response Theory Analysis

of the Maslach Burnout Inventory - Human Services Survey of the Maslach Burnout Inventory - Human Services Survey

David Andrew Periard Wright State University

Follow this and additional works at: https://corescholar.libraries.wright.edu/etd_all

Part of the Industrial and Organizational Psychology Commons

Repository Citation Repository Citation Periard, David Andrew, "A Bifactor Model of Burnout? An Item Response Theory Analysis of the Maslach Burnout Inventory - Human Services Survey" (2016). Browse all Theses and Dissertations. 1534. https://corescholar.libraries.wright.edu/etd_all/1534

This Dissertation is brought to you for free and open access by the Theses and Dissertations at CORE Scholar. It has been accepted for inclusion in Browse all Theses and Dissertations by an authorized administrator of CORE Scholar. For more information, please contact [email protected].

https://corescholar.libraries.wright.edu/

https://corescholar.libraries.wright.edu/etd_all

https://corescholar.libraries.wright.edu/etd_comm

https://corescholar.libraries.wright.edu/etd_all?utm_source=corescholar.libraries.wright.edu%2Fetd_all%2F1534&utm_medium=PDF&utm_campaign=PDFCoverPages

http://network.bepress.com/hgg/discipline/412?utm_source=corescholar.libraries.wright.edu%2Fetd_all%2F1534&utm_medium=PDF&utm_campaign=PDFCoverPages

https://corescholar.libraries.wright.edu/etd_all/1534?utm_source=corescholar.libraries.wright.edu%2Fetd_all%2F1534&utm_medium=PDF&utm_campaign=PDFCoverPages

mailto:[email protected]

A BIFACTOR MODEL OF BURNOUT? AN ITEM

RESPONSE THEORY ANALYSIS OF THE MASLACH

BURNOUT INVENTORY – HUMAN SERVICES

SURVEY

A dissertation submitted in partial fulfillment of the

requirements for the degree of Doctor of Philosophy

By

DAVID PERIARD

M.S., Wright State University, 2014

B.S. Le Moyne College, 2008

2016

Wright State University

WRIGHT STATE UNIVERSITY

GRADUATE SCHOOL

JUNE 9, 2016

I HEREBY RECOMMEND THAT THE DISSERTATION PREPARED UNDER

MY SUPERVISION BY David Periard ENTITLED A Bifactor Model of

Burnout? An Item Response Theory Analysis of the Maslach Burnout Inventory – Human

Services Survey. BE ACCEPTED IN PARTIAL FULFILLMENT OF THE

REQUIREMENTS FOR THE DEGREE OF Doctor of Philosophy.

Gary Burns, Ph.D.

Dissertation Director

Scott Watamaniuk, Ph.D.

Graduate Program Director

Debra Steele-Johnson, Ph.D.

Chair, Department of Psychology

Final Examination

David LaHuis, Ph.D.

Joseph Houpt, Ph.D.

Nathan Bowling, Ph.D.

Robert E.W. Fyffe, Ph.D.

Vice President for Research and

Dean, Graduate School

iii

ABSTRACT

Periard, David Ph.D., Industrial/Organizational Psychology and Human Factors, Wright

State University, 2016. A Bifactor Model of Burnout? An Item Response Theory

Analysis of the Maslach Burnout Inventory – Human Services Survey.

Burnout is a syndrome—composed of emotional exhaustion, depersonalization, and

personal accomplishment—resulting from chronic stress. The Maslach Burnout

Inventory – Human Services Survey (MBI-HSS; Maslach, Jackson, & Leiter, 1996) is the

most popular measure of burnout. Unfortunately, the MBI-HSS has flaws including

highly correlated traits and low subscale reliabilities. I tested a bifactor model for the

MBI-HSS based on the work by Mészáros, Ádám, Svabó, Szigeti, and Urbán (2014)

using item response theory. Bifactor models specify a general factor that underlies all the

items within a scale and specific factors that underlie the subscale items; also, all factors

are orthogonal. I found that the bifactor model had superior fit to the traditional

correlated traits. A method for decomposing item and test information in

multidimensional item response theory is also introduced along with a new method of

displaying the test information. Finally, I provide the scoring recommendation that only

the general burnout dimension for the MBI-HSS should be reported as the subscales are

unreliable.

iv

TABLE OF CONTENTS

Page

I.INTRODUCTION……………………………………………………………………1

Burnout……………………………………………………………………………3

History of Burnout………………………………………………………...3

Underlying Processes of Burnout…………………………………………4

Outcomes of Burnout.……………………………………………………..6

Structure of Burnout………………………………………………………8

Bifactor Models……………………………………………………………….....10

Benefits of Bifactor Models……………………………………………...12

Item Response Theory and Standard Errors……………………………………..14

Item Parameters………………………………………………………….15

Item Discrimination……………………………………………...16

Directional Discrimination……………………………….17

Item Difficulty…………………………………………………...17

Information………………………………………………………………17

Item Information…………………………………………………17

Test Information………………………………………………….18

Present Research…………………………………………………………………20

II.METHOD……………………………………………………………………………21

v

TABLE OF CONTENTS (cont.)

Participants……………………………………………………………………….21

Measures…………………………………………………………………………21

Analyses………………………………………………………………………….22

Software………………………………………………………………………….22

III.RESULTS……………………………………………………………………………23

Descriptive Statistics and Inter-Item Correlations……………………………….23

Model Comparisons……………………………………………………………...24

Item Fit…………………………………………………………………………..26

IRT parameters…………………………………………………………………..27

Item Information…………………………………………………………………29

Test Information………………………………………………………………….31

General Burnout………………………………………………………….32

Depersonalization………………………………………………………..35

Emotional Exhaustion……………………………………………………35

Personal Accomplishment……………………………………………….36

Rodriguez, Reise, and Haviland (2015a) Analyses……………………………...36

Explained Common Variance……………………………………………36

Percent Uncontaminated Correlations…………………………………...37

Omega Coefficients……………………………………………………...38

Omega……………………………………………………………38

Omega Hierarchical……………………………………………...39

Omega Subscale…………………………………………………39

vi

TABLE OF CONTENTS (cont.)

Omega Hierarchical Subscale……………………………………40

Construct Reliability……………………………………………………..41

Supplemental Analyses…………………………………………………………..42

Supplemental Analyses Results………………………………………….44

IV.DISCUSSION………………………………………………………………………..46

Scoring Recommendations………………………………………………………46

Summary…………………………………………………………………………49

Importance of this Study…………………………………………………49

Strengths and Limitations………………………………………………………..53

Future Directions………………………………………………………………...54

Conclusion….........................................................................................................56

V.REFERENCES………………………………………………………………………57

vii

LIST OF TABLES

Table Page

1. Descriptive statistics for the MBI-HSS …………………………………………......68

2. Model fit comparisons for unidimensional, correlated traits, and bifactor models using

entire MBI-HSS. …………………………………………………………………….69

3. Mean |SRC| values for the MBI-HSS items………………………………………….70

4. Raw Bifactor Graded Response Model Parameters for the MBI-HSS items………..71

5. Converted Item Parameters for the MBI-HSS……………………………………….72

6. Directional Discriminations for the items of the MBI-HSS…………………………73

7. Standardized Factor Loadings for the items of the MBI-HSS……………………….74

8. Results from the Rodriguez, Reise, and Haviland (2015) Analyses…………………75

9. Supplemental Analyses: Descriptive Statistics for the VA 360-Degree Feedback

Instrument……………………………………………………………………………76

10. Supplemental Analyses: Sample Characteristics of the VA 360-Degree feedback

sample………………………………………………………………………………..77

11. Supplemental Analyses: Impact of Burnout on Communication Competency

Ratings……………………………………………………………………………….78

12. Supplemental Analyses: Impact of Burnout on Interpersonal Effectiveness

Competency Ratings…………………………………………………………………79

13. Supplemental Analyses: Impact of Burnout on Critical Thinking Competency

Ratings……………………………………………………………………………….80

14. Supplemental Analyses: Impact of Burnout on Organizational Stewardship

Competency Ratings...……………………………………………………………….81

viii

LIST OF TABLES (cont.)

15. Supplemental Analyses: Impact of Burnout on Veteran and Customer Focus

Competency Ratings...……………………………………………………………….82

16. Supplemental Analyses: Impact of Burnout on Personal Mastery Competency

Ratings……………………………………………………………………………….83

17. Supplemental Analyses: Impact of Burnout on Leading People Competency

Ratings……………………………………………………………………………….84

18. Supplemental Analyses: Impact of Burnout on Building Coalitions Competency

Ratings……………………………………………………………………………….85

19. Supplemental Analyses: Impact of Burnout on Leading Change Competency

Ratings……………………………………………………………………………….86

20. Supplemental Analyses: Impact of Burnout on Results Driven Competency

Ratings……………………………………………………………………………….87

21. Supplemental Analyses: Impact of Burnout on Global Perspective Competency

Ratings……………………………………………………………………………….88

22. Supplemental Analyses: Impact of Burnout on Business Acumen Competency

Ratings……………………………………………………………………………….89

ix

LIST OF FIGURES

Figure Page

1. Examples of Different Possible Structures of Burnout………………………………90

2. Corrgram of the Correlations Between the Items of the MBI-HSS………………….91

3. DP1 Item Information Clamshell Plot……………………………………………….92





8. EE1 Item Information Clamshell Plot……………………………………………….97



11. EE4 Item Information Clamshell Plot………………………………………………100






17. PA1 Item Information Clamshell Plot……………………………………………...106


x

LIST OF FIGURES (cont.)







25. Depersonalization Test Information Clamshell Plot…………..…………………...114

26. Emotional Exhaustion Test Information Clamshell Plot…………..………..……...115

27. Personal Accomplishment Test Information Clamshell Plot…………..……….......116

28. General Burnout Information Provided by the Depersonalization Subscale ……...117

29. Depersonalization Test Information………………………………………………..118

30. General Burnout Information Provided by the Emotional Exhaustion Subscale…..119

31. Emotional Exhaustion Test Information……………………………………………120

32. General Burnout Information Provided by the Personal Accomplishment

Subscale…….............................................................................................................121

33. Personal Accomplishment Test Information………….……………………………122

34. Marginal General Burnout Information Plots………………………………………123

xi

ACKNOWLEDGMENT

I would like to thank my advisor, Dr. Gary Burns, and my dissertation committee

Drs. Nathan Bowling, David LaHuis, and Joe Houpt for all their help and support during

the course of this dissertation. All of them were always available to discuss any

questions I had on the project and served as a “reality-check” on the information

decomposition. Also, Gary did an excellent job of keeping me on track and helping me

explain concepts effectively.

I would also like to thank the VHA National Center for Organization

Development. As an intern there I have learned an incredible amount about applied

topics and my colleagues have been always willing to answer questions and discuss

analyses. They also provided me with the data for this dissertation without which this

project would have been impossible.

Finally, I need to thank my friends and family. The amount of support I have

received during graduate school and while writing my dissertation has been incredible.

Without my friends and family, I would not be where I am today.

xii

DEDICATION

I dedicate this dissertation to my wife, Deanna, and my children Jack and Jade.

Without their endless love and support, this project would never have been finished.

Their understanding when I needed to work late or go in on the weekend to work made

this project possible.

Running head: BIFACTOR MODEL OF BURNOUT

1

A Bifactor Model of Burnout? An IRT analysis of the Maslach Burnout Inventory –

Human Services Survey.

Burnout has become an increasingly popular construct to study in organizational

research: researchers have linked its components –emotional exhaustion,

depersonalization, and feelings of reduced personal accomplishment—to a number of

personal and organizational outcomes such as decreased job satisfaction (e.g., Wolpin,

Burke, & Greenglass, 1991), turnover intentions (e.g., Kim & Kao, 2014), and decreased

job performance (e.g., Leiter, Harvie, & Frizzel, 1998). Given the relationships of the

components of burnout with important criteria, it is important to ensure we are measuring

burnout appropriately.

There are a number of instruments used to measure burnout including (but not

limited to) the Copenhagen Burnout Inventory (Kristensen, Borritz, Villadsen, &

Chistensen, 2005), the Oldenburg Burnout Inventory (Halbesleben & Demerouti, 2005),

and the Shirom-Melamed Burnout Questionnaire (Melamed, Kushnir, & Shirom, 1992);

however, the most popular measure of burnout is the Maslach Burnout Inventory (MBI;

Maslach, Jackson, & Leiter, 1996). According to Schaufeli and Enzmann, as of 1998, (p.

188, 1998) 90% of burnout research had been conducted using the MBI. The MBI has

multiple versions including the General Survey and Human Services Survey (MBI-HSS,

Maslach et al., 1996). This project focuses on the MBI-HSS.

BIFACTOR MODEL OF BURNOUT

2

The MBI-HSS has been subject to numerous psychometric evaluations including

studies using confirmatory factor analysis (for a summary see Worley, Vassar, Wheeler,

& Barnes, 2008) and reliability generalization (Wheeler, Vassar, Worley, & Barnes,

2011); however, the MBI-HSS has not been subject to analyses using item response

theory (IRT). Analyzing the MBI-HSS using IRT is important because it gives us more

detailed information regarding how well the items measure the components of burnout

and at what trait level each item provides the most information for establishing a person’s

trait level. In addition, Mészáros, Ádám, Svabó, Szigeti, and Urbán (2014) recently

tested a bifactor model with the Hungarian version of the MBI-HSS and found superior

fit compared to the original model.

In light of Mészáros et al’s (2014) recent findings and uses of the MBI-HSS

outside its manual’s directions, described below, this dissertation seeks to better

illuminate the structure of the MBI-HSS and evaluate the performance of the individual

items. This study will make three contributions to the literature. First, it will test

Mészáros and colleagues’ (2014) bifactor model on the English version of the MBI-HSS.

Second, it will subject the MBI-HSS to an item response theory (IRT) analysis. Finally,

it will introduce a novel method for decomposing bifactor IRT item and test information

that allows for calculating standard errors for each factor of the model separately instead

of only standard errors for the test as a whole. These contributions are important—and

necessary—in order to ensure that burnout is measured appropriately by both academics

and practitioners. Finally, the decomposition of item and test information for IRT

bifactor models is an important tool that can be used by researchers across the field of


3

psychology and provides a more accurate picture of the precision of measurement

provided by bifactor scales.

Burnout

History of Burnout. The identification and definition of burnout occurred with

two groups of researchers working independently. First, Freudenberger (1974) identified

a set of symptoms that occurred among workers at a free clinic which he termed ‘burn-

out’. The symptoms he identified included physical symptoms—such as fatigue and

susceptibility to illness—and behavioral symptoms including irritability, paranoia,

rigidity, and depressed mood (Freudenberger, 1974/1975). He noticed that the symptoms

usually appeared about a year after the worker began working at the clinic and was

especially prevalent in the most dedicated and committed workers (Freudenberger, 1975).

Maslach and Pines (1977) also developed a construct called burnout based on

their observations of workers in a child-care setting. According to Maslach and her

colleagues, burnout is a result of working in a stressful environment and is composed of

three facets: emotional exhaustion, depersonalization/cynicism, and reduced personal

accomplishment (Maslach & Jackson, 1984; Maslach & Pines, 1977). Emotional

exhaustion is the core of burnout, and is a lack of physical and emotional resources due to

extended work stress which results in a lack of positive emotions (Maslach & Pines,

1977). Depersonalization refers to the treatment of other people as objects and a failure

to see other people as having feelings (Maslach & Pines, 1977). Finally, reduced

personal accomplishment is a subjective feeling that the person is not accomplishing as

much as he or she used to (Maslach & Jackson, 1981). While Freudenberger did not


4

label the symptoms of burnout the same as Maslach, he did note similar symptoms in a

specific order:

“What happens is that the harder he works, the more frustrated he

becomes; and the more frustrated he is, the more exhausted, the more

bitchy, the more cynical in outlook and behavior—and, of course, the less

effective in the very things he so wishes to accomplish. (p. 74;

Freudenberger, 1975)”

It is important to note that, originally, Maslach and her colleagues theorized that burnout

could only occur in people who work in the human services (Maslach & Jackson, 1984),

whereas Freudenberger was open to the idea that people who did not work in the human

services could suffer from burnout (1975). Over time, research has shown that workers

in all occupations can suffer from burnout (e.g., Golembiewski, Boudreau, Sun, & Luo,

1998; Schutte, Toppinen, Kalimo, & Schaufeli, 2000). With the generalization of

burnout to occupations that do not directly serve people, depersonalization was renamed

cynicism in order to be applicable to the wider population; however, the MBI-HSS

retains the depersonalization label (Maslach et al., 1996).

Underlying Models of the Burnout Process. There are two dominant theories

on the process that underlies the development of burnout: the Job Demands-Resources

model (Demerouti, Bakker, Nachreiner, & Schaufeli, 2001) and Conservation of

Resources theory (Hobfoll, 1989). Both theories are based on the premise that burnout

occurs when the demands on a person become too great and deplete their resources.

The Job Demands-Resources model—as the model name suggests—states that each job

has demands and resources. While job demands are fairly straight forward (i.e., meeting

with clients, deadlines, etc.; Bakker & Demerouti, 2014), Demerouti and colleagues’

(2001) definition of job resources requires some explanation. Job resources are any parts


5

of the job or person that reduce job demands, stimulate personal growth, or help a person

achieve goals. Job resources can be both internal—such as cognitive ability, helpful

personality traits, and skills—and external. External job resources can be of both social

and organizational natures. Social resources include support from family and friends,

whereas organizational resources are positive aspects of the job such as job control and

participation in decision making (Demerouti et al., 2001). According to the Job

Demands-Resources Model, burnout is a self-defense mechanism that people use when

job demands overwhelm their resources. In an attempt to regain resources and prevent

the further loss of resources, the person emotionally detaches themselves from their job

and becomes more cynical about their job.

The Conservation of Resources Theory (Hobfoll, 1989) is very similar to the Job

Demands-Resources model. Just like the Job Demands-Resources model, Conservation

of Resources theory postulates that burnout occurs when a person’s resources are

depleted; however, Conservation of Resources theory defines resources differently than

the Job Demands-Resources models. According to Hobfoll (1989; Hobfoll & Lilly,

1993), resources are anything that a person values or can use to gain more resources.

There are four types of resources: objects, things we value for their physical

characteristics; conditions, states and statuses like tenure or a happy marriage; personal

characteristics such as beneficial personality traits and cognitive ability; and energies,

possessions, such as money or time, which can be used to increase other resources. In

Conservation of Resources Theory, stress occurs under three conditions: losing resources,

the threat of losing resources, or not regaining resources after investing resources

(Hobfoll, 1989; Hobfoll & Lilly, 1993).


6

One of the major differences between Conservation of Resources Theory and the

Job Demands-Resources model is the scope of the models. Demerouti et al. (2001)

specifically developed the Job Demands-Resources model as a model for burnout.

Conservation of Resources Theory, in the other hand, is a model for stress in general.

That being said, when discussing burnout, the models are largely the same: burnout

occurs when resources are depleted from demands at work.

Outcomes of Burnout. Researchers have linked the components of burnout (i.e.,

emotional exhaustion, depersonalization, and personal accomplishment) to a number of

important organizational and personal outcomes. Demerouti, Bakker, and Leiter (2014)

found a negative relationship between emotional exhaustion and task performance. Other

research has demonstrated positive relationships between emotional exhaustion and

turnover intention (Parker & Kulik, 1995) and absenteeism (Parker & Kulik, 1995).

Researchers have also linked the three facets of burnout to musculoskeletal disorders and

cardiovascular disease: emotional exhaustion and depersonalization were positively

related to the prevalence of the two medical conditions whereas personal accomplishment

was negatively related to their prevalence (Honokonen et al., 2006). Burnout—defined

as emotional exhaustion and depersonalization—also predicted the onset of depressive

symptoms in dentists (Hakanen & Schaufeli, 2012). A meta-analysis by Nahrgang,

Morgeson, and Hofmann (2011) found that burnout—defined as “worker anxiety, health,

and depression, and work- related stress.” (p. 6)—was positively related to accidents and

injuries.

It is important to note that while researchers—including three of the world’s most

prominent burnout researchers: Christina Maslach, Michael Leiter, and Susan Jackson—


7

discuss the relationship between ‘burnout’ and other constructs (e.g., Hakanen &

Schaufeli, 2012; p. 406, Maslach et al., 2001), there is no overall burnout dimension.

Instead, ‘burnout’ refers to the collection of emotional exhaustion, depersonalization, and

personal accomplishment.

The manual for the MBI-HSS states that “given our limited knowledge about the

relationships between the three aspects of burnout, the scores for each subscale are

considered separately and are not combined into a single, total score (emphasis in

original, p.5, Maslach et al., 1996).” Thus, rather than receiving a burnout score, users of

any of the Maslach Burnout Inventories only receive scores for the three subscales

(Maslach et al., 1996). Others have used the term “burnout” even more loosely, referring

to a whole host of strains from the stress literature as “burnout” (e.g., Nahrgang et al.,

2011).

This lack of an overall burnout score makes the discussion of the relationship

between burnout and other constructs problematic: are the researchers referring to all of

the subscales together or just pieces? In fact, there are a number of research studies on

burnout that fail to find relationships between all three components and outcomes (e.g.,

Demerouti et al., 2014; Parker & Kulik, 1995), so referring to the relationship between

burnout and outcomes is misleading (e.g., Hakanen & Schaufeli, 2012). In order to be

able to refer to the relationship between burnout and other constructs, an overall burnout

dimension is needed.

This has not stopped researchers from discussing the relationship between burnout

and other constructs. As mentioned above, the meta-analysis by Nahrgang et al. (2011)

found a relationship between burnout and accidents, but what does that mean? Ignoring


8

their inclusion of anxiety, depression, and other pieces not in the traditional model of

burnout, their use of a single burnout construct does not match traditional burnout theory.

Nahrgang et al. (2011) are not alone in their use of a single burnout score. Another

example is a meta-analysis by Wang, Bowling, and Eschleman (2010). In their meta-

analysis examining locus of control, they analyze the relationship between two types of

locus of control and burnout, but they never define what they considered burnout and

how it related to Maslach’s three piece model. A third meta-analysis that follows this

method of using a single burnout variable is Crawford, LePine, and Rich (2010). In their

method, Crawford et al (2010) states that the studies they found for their meta-analysis

measured burnout “nearly exclusively using some form of the Maslach Burnout

Inventory (p. 839)”, however they still treat burnout as a singular construct despite the

MBI’s scoring recommendations and previous burnout theory.

These three meta-analyses are but a few that demonstrate a gap between burnout

theory and how researchers and practitioners use burnout. As mentioned above, the

burnout construct proposed by Maslach and colleagues has no overall burnout

dimension—i.e., a correlated traits model—whereas practitioners and researchers (e.g.,

Crawford et al., 2010; Nahrgang et al., 2011; Wang et al., 2010) use burnout as if there is

an overall dimension—i.e., a second-order factor model or bifactor model. This raises

the important question about how the results by researchers looking at a global “burnout”

factor fit into the burnout literature.

Structure of Burnout. Previous research on the structure of burnout has focused

on a correlated traits model with emotional exhaustion, depersonalization, and personal

accomplishment and no overarching burnout factor (Worley et al., 2008). However, a


9

meta-analysis of the factor structure of the MBI-HSS found that the factors of burnout are

correlated: the mean correlation between emotional exhaustion and depersonalization is

.56; emotional exhaustion and personal accomplishment is .30; and depersonalization and

personal accomplishment is .35 (Worley et al., 2008). Such a pattern of correlations

between factors suggests the presence of a common factor (Thompson, 2004). Without

modeling the common factor, the relationships found between other constructs and the

facets of burnout is muddled. The correlation between the factors confounds the

relationship as the shared variance between the factors makes it impossible to say that the

relationship between, for example, emotional exhaustion and turnover intent is not

actually due, in part, to the other correlated factors (i.e., depersonalization and personal

accomplishment). However, the use of a bifactor model which specifies a common factor

that accounts for the correlation between the factors and results in uncorrelated factors

clears up the relationships between the factors of burnout and external criteria and allows

for the testing of true differential predictions between the factors.

Part of the reason for the focus on a correlated traits model could be that

researchers cannot test a second-order model for burnout. It is clear that a three-factor

model fits better than a single common factor (Worley et al., 2008), but this is not the

same as providing a test for a second-order model which would provide support for a

“burnout” factor. In order to test the fit of a second order there needs to be at least four

first order factors (Rindskopf & Rose, 1988). With three first-order factors—as is the

case with burnout—the model is just identified at the second-order level and the second-

order structure cannot be tested: the second-order model will have identical fit to the


10

correlated traits model (p. 53, Rindskopf & Rose, 1988). An alternative to the second-

order model that avoids this problem is a bifactor model.

Bifactor models

Originally developed by Holzinger and Swineford (1937), bifactor models specify

that there is a general factor that underlies all items in a test. Along with the general

factor there are orthogonal—both in respect to each other and the general factor—specific

factors on which the items can load (Holzinger & Swineford, 1937; Reise, 2012).

Holzinger and Swineford’s work was closely tied to Spearman’s (1904) work on general

intelligence and was an instrumental part of the Spearman-Holzinger Unitary Trait Study.

It was Holzinger and Swineford’s (1939) work with the bifactor model that provided

support for a general factor of intelligence and the five group factors of spatial

relationships, verbal, perceptual speed, recognition, and associative memory, with all

factors being modeled orthogonal to one another. Similar to these analyses in the

intelligence realm, in the case of the MBI-HSS all of the items will load on a general

“burnout” factor; the emotional exhaustion items will also load onto an orthogonal

emotional exhaustion specific factor, the depersonalization items will load onto an

orthogonal depersonalization specific factor, and the personal accomplishment items will

load onto an orthogonal personal accomplishment specific factor. In order to make the

differences between the bifactor, second-order, and correlated traits model more intuitive,

Figure 1 contains sample diagrams for each model type.

While bifactor models have existed since the 1930’s and have played an

instrumental role in how we view the structure of intelligence, only recently have they

become a popular modeling option for studying complex, multidimensional


11

psychological phenomena. Researchers have used bifactor models to clarify the structure

of mental ability (e.g., Gustafsson & Balke, 1993), quality of life (Chen et al., 2006),

attention deficit and hyperactivity disorder (Martel, von Eye, & Nigg, 2010), and

personality (e.g., Chen, Hayes, Carver, Laurenceau, & Zhang, 2012).

When working with bifactor models, the question becomes how to interpret the

factors of a bifactor model. Chen et al. (2012) explained their bifactor model of

extraversion as such: the general extraversion factor encapsulates all the common

variance shared between the six facets of extraversion (e.g., warmth, gregariousness,

etc.). The six specific factors are composed from the unique variance from the facet

specific items that is uncontaminated by the general extraversion factor (Chen et al.,

2012); thus the specific factor of gregariousness is the portion of the variance in the

gregariousness items that is not related to the rest of extraversion.

This separation of the unique gregariousness variance from the variance shared

with the rest of extraversion allowed Chen et al. (2012) to more accurately model the

relationship between gregariousness (and the other facets of extraversion) and outcomes.

For example, they found that when an individual score approach (i.e., modeling the

association of the individual facets with external variables) was used to model the

relationship between gregariousness and positive affect, there was a positive relationship

between the variables. In contrast, when they used a bifactor model and modeled the

same relationship, they found a negative relationship. This demonstrates that the

inclusion of the variance that is shared by other factors can drastically influence the

relationship between two constructs and confound a construct’s nomological net.


12

Bifactor models have many other benefits beyond clarifying relationships between

constructs.

Bifactor Model Benefits. Bifactor models have a number of benefits over

second-order models. First, since each of the items loads onto the general factor directly

as opposed to through primary factors, there are more observations for the general factor

which allows for identification and statistical tests of the general factor even when there

are few primary factors (Chen, West, & Sousa, 2006). So in the case of the MBI-HSS, a

second-order model only has six observations for the second-order burnout factor. In

contrast, the bifactor model has 253 observations for the general factor, which allows for

statistical testing of the second-order structure.

Bifactor models also allow for better exploration of relationships between the

specific factors and their items (Chen et al., 2006). In order to model the relationship

between the specific factor and its items in a second-order model, one needs to use the

disturbances (Chen et al., 2006; Gustafsson & Balke, 1993). With a bifactor model each

factor is independent of the others, which allows for the use of factor loadings to

determine the relationship between the items and each factor (Chen et al., 2006; Rindkopf

& Rose, 1988).

A third benefit of bifactor models is the orthogonality of the factors. Since all of

the factors are orthogonal within the bifactor model, the interpretation of the relationships

of the factors with external variables is straightforward. To use the example of burnout,

in a bifactor model emotional exhaustion, depersonalization, personal accomplishment,

and the general burnout factors are unrelated to each other. Since the factors are

unrelated the relationship between emotional exhaustion and absenteeism would be


13

uncontaminated by depersonalization, personal accomplishment, and the general burnout

factor as the general burnout factor accounts for the relationships between the specific

factors (Chen et al., 2006; Holzinger & Swineford, 1937). Such information would help

guide researchers about whether it is really appropriate to talk about “burnout” or if the

appropriate focus is on the components of burnout. In contrast, a second-order model

confounds the relationship between emotional exhaustion and absenteeism: the researcher

must use the externalized residuals of the emotional exhaustion factor as a predictor

rather than the emotional exhaustion factor itself in order to prevent linear dependencies

with a common burnout factor (p. 197, Chen et al., 2006; p. 414, Gustafsson & Balke,

1993).

A fourth benefit of bifactor models is the relative computational simplicity of

bifactor models in IRT. Since each item loads only on two factors the computation of the

unconditional probabilities for the bifactor model never exceed a two-dimensional

integral (Eq. 13; Gibbons et al., 2007). Multidimensional IRT models on the other hand

need an additional degree of integration for each dimension in the model in order to

compute the unconditional probability (Eq. 10; Gibbons et al., 2007). So whereas a

multidimensional IRT model with 6 dimensions would require 6 degrees of integration—

a herculean task even for today’s computers—a bifactor IRT model with 6 specific

factors and a general factor would still only require a two-dimensional integral.

Bifactor models also specify a different relationship between the general factor

and the items than the second-order model. In a traditional, reduced second-order model,

the second-order factor influences items indirectly through the first-order factors: a full

mediation model (Chen et al., 2006). Bifactor models, on the other hand, specify a direct


14

relationship between the general factor and the items. That is, each item contains variance

related to both the general factor and their specific factor.

Item Response Theory and Standard Errors

The psychometric approach of IRT is composed of many models that aim to

establish the relationship between latent traits and item responses (de Ayala, 2009;

Embretson & Reise, 2000). Item response theory constitutes a significant departure from

traditional psychometrics, called classical test theory or true score theory (CTT; de Ayala,

2009; Embretson & Reise, 2000). There are a large number of differences between IRT

and CTT (for accessible overviews, see Baker, 2001 and Embretson & Reise, 2000), but

for this paper I will focus mostly on the treatment of standard errors.

Classical test theory assumes that the standard error is constant across all trait

levels (e.g., Hambleton & Jones, 1993; Embretson & Reise, 2000). In other words, CTT

assumes that if two individuals with radically different trait levels take the same test, their

test scores will have the same amount of error. For example, if two individuals—one

who is very burned out and another who is not burned out at all—take the MBI-HSS,

CTT states that the scores are equally accurate for both individuals.

Item response theory does not make this assumption. In IRT, the standard error of

a test fluctuates across the trait range based on how much psychometric information (i.e.,

the reciprocal of the error variance of the estimators; Baker, 2001) is available at a

specific point or range of the latent trait. This allows for a more accurate estimation of a

person’s ability or trait level. In the case of the previous example with the two people

with different burnout levels, the accuracy of the burnout subscale scores would be


15

dependent on how much psychometric information was available at their respective trait

levels.

Most common IRT models assume that the instrument is unidimensional (de

Ayala, 2009; Embretson & Reise, 2000; Slocum-Gori & Zumbo, 2011); however there

are multidimensional IRT models (Reckase, 2009) and bifactor IRT models (Reise, 2012;

Reise, Morizot, & Hays, 2007). All of these different classes of models assume that the

number of underlying trait dimension(s) matches the number of dimension(s) for which

the model is designed.

Item Parameters

The formula for the multidimensional graded response model is (Stucky, Thissen,

& Edelen, 2013):

𝑇∗(𝑢𝑖 = 𝑙|𝜽𝒋) = 1

1 + 𝑒[𝒂𝑗′𝜽𝒋+𝑑𝑖𝑙]

Where T* is the probability that a specific response category (l) or higher will be

selected conditional on the individual trait levels (θj) based on the slope parameters of

each item (i) for each latent trait (𝒂𝑗′) and the items’ thresholds for each trait (dil). The

probability of choosing a specific response category is simply the probability of

responding in that category or higher (l) minus the probability of choosing a higher

response category (k + 1):

𝑇𝑖(𝑙|𝜽𝒋) = 𝑇𝑖∗(𝑙|𝜽𝒋) − 𝑇𝑖

∗(𝑙 + 1|𝜽𝒋)

Multidimensional IRT (and Bifactor IRT as a special case) require extra steps to

compute item parameters analogous to unidimensional IRT parameters. I will complete

the conversion of each parameter into a similar format as unidimensional IRT in turn.

Because I believe that, as an extension of Mészáros et al. (2014), my analyses will


16

indicate that the bifactor model provides the best fit for the MBI-HSS, I focus on this

bifactor model below.

Item Discrimination. As the proposed bifactor model has 4 dimensions (k)—the

general factor, emotional exhaustion, depersonalization, and personal accomplishment—

each item (i) has four discrimination parameters (𝑎𝑖𝑘). These discrimination parameters

indicate the item’s discrimination for each of the four latent traits. Since I am using a

bifactor model, each item only has two non-zero item discrimination values. I computed

the multidimensional discrimination parameter (Αi Max) with the following formula (p.

284; De Ayala, 2009):

𝐴𝑖 𝑀𝑎𝑥 = √∑ 𝑎𝑖𝑘2

𝐾

𝑘=1

𝐴𝑖 𝑀𝑎𝑥 is the steepest slope of the item response surface in the direction of the

items’ difficulty parameters (p. 118, Reckase, 2009). As a side note, this is a use of the

Pythagorean Theorem for finding the length of the hypotenuse of a right triangle (𝐴𝑖 𝑀𝑎𝑥).

Next it is necessary to compute the angle of the slope of maximum discrimination

relative to the latent traits. In the case of the bifactor model, the latent traits are all

orthogonal to each other and thus have 90° angles between them. The angle of the item

(𝜔𝑖𝑘 𝑀𝑎𝑥) relative to a latent trait is computed using the following formula (p. 284; De

Ayala, 2009):

𝜔𝑖𝑘 𝑚𝑎𝑥 = cos−1𝑎𝑖𝑘

𝐴𝑖 𝑀𝑎𝑥

Since each item only loads on the general factor and a single specific factor, each item

only has one reported 𝜔𝑖𝑘 value.


17

Directional Discrimination. In order to allow for direct comparison of the item

discrimination values for each item within each subscale, I computed directional

discriminations (𝐴𝑖𝜔; p. 285, De Ayala, 2009) for each item in 10 degree increments

relative to the general burnout latent trait. The formula for 𝐴𝑖𝜔 for a given angle (𝜔𝑖𝑘) is

as follows (p. 285, De Ayala, 2009):

𝐴𝑖𝜔 = ∑ 𝛼𝑖𝑘cos (𝜔𝑖𝑘)

𝐾

𝑘=1

Item difficulty. Each item has six step parameters (dim) where m indicates the

step. These step parameters can be converted to step difficulty parameters (𝐵𝑖𝑚) using

the following formula (p. 121; Reckase, 2009):

𝐵𝑖𝑚 = −𝑑𝑖𝑚

𝐴𝑖 𝑚𝑎𝑥

The step difficulty parameters can be interpreted in the same way as unidimensional

difficulty parameters in the direction of the item’s maximum item discrimination

(𝜔𝑖𝑘 𝑚𝑎𝑥; Reckase, 2009).

Information

Item Information. Unlike in unidimensional IRT, each item has multiple test

characteristic curves (coalesced into an item characteristic surface; Reckase, 2009). Also,

in contrast to unidimensional IRT, information provided by an item is dependent on—in

the case of a bifactor model—the person’s levels on both the general burnout dimension

(θ𝐺𝐵) and the specific dimension (θ𝑆𝐹). The amount of information provided by an item

can be computed for any angle between the latent traits using the formula (p. 122,

Reckase, 2009):


18

𝐼𝑖 𝐴𝜔(θ𝐺𝐵, θ𝑆𝐹) = 𝑃(θ𝐺𝐵, θ𝑆𝐹)𝑄(θ𝐺𝐵, θ𝑆𝐹) (∑ 𝛼𝑖𝑘 cos(𝜔𝑖𝑘)

𝐾

𝑘=1

)

2

= 𝑃(θ𝐺𝐵, θ𝑆𝐹)𝑄(θ𝐺𝐵, θ𝑆𝐹)(𝐴𝑖 𝜔)2

One method for examining the amount of information provided by an item is to

examine the maximum information provided by an item—the information provided by

the item along the direction of maximum discrimination (Reckase, 2009)—using the

formula (p. 123; Reckase, 2009):

𝐼𝑖 𝑚𝑎𝑥(θ𝐺𝐵, θ𝑆𝐹) = 𝑃(θ𝐺𝐵, θ𝑆𝐹)𝑄(θ𝐺𝐵, θ𝑆𝐹)(𝐴𝑖 𝑚𝑎𝑥)2

By taking advantage of the bifactor models orthogonal structure, we can

decompose an item’s information (I(θ𝐺𝐵, θ𝑆𝐹) )—in this example along the line of

maximum discrimination—into information for general burnout (I(θ𝐺𝐵, θ𝑆𝐹) 𝐺𝐵) and the

item’s specific factor (I(θ𝐺𝐵, θ𝑆𝐹) 𝑆𝐹). In order to do this, we treat the I(θ𝐺𝐵, θ𝑆𝐹) as

the hypotenuse of a right triangle with angle (𝜔𝑖 𝐺𝐵 𝑚𝑎𝑥) relative to the GB trait. We can

then use trigonometry to find the horizontal component (i.e., 𝐼(θ𝐺𝐵, , θ𝑆𝐹) 𝐺𝐵) of

I(θ𝐺𝐵, θ𝑆𝐹) :

𝐼(θ𝐺𝐵, θ𝑆𝐹) 𝑀𝑎𝑥 𝐺𝐵 = 𝐼(θ𝐺𝐵, θ𝑆𝐹)𝑀𝑎𝑥 𝑇𝑜𝑡𝑎𝑙 ∗ cos (𝜔𝑖 𝐺𝐵 𝑀𝑎𝑥)

Then we can find the vertical component (I(θ) 𝑆𝐹) of I(θ) 𝑇𝑜𝑡𝑎𝑙:

𝐼(θ𝑆𝐹, θ𝐺𝐵) 𝑀𝑎𝑥 𝑆𝐹 = 𝐼(θ𝑆𝐹 , θ𝐺𝐵)𝑀𝑎𝑥 𝑇𝑜𝑡𝑎𝑙 ∗ sin (𝜔𝑖 𝐺𝐵 𝑀𝑎𝑥)

This process can be completed for any angle between two orthogonal dimensions. To do

this, substitute the angle of choice and the information provided by the model along that

angle.

Test Information. Determining how much test information the MBI-HSS

provides for determining a person’s θ level on each of the components of burnout and the


19

general burnout factor required a number of steps. First, I computed test information

(TI(𝜃𝐺𝐵)ω) in 10° increments away from 𝜃𝐺𝐵 in the direction of each specific factor

separately by summing the individual item information (p. 291, De Ayala):

TI(θ𝐺𝐵, θ𝑆𝐹)𝜔 = ∑𝐼𝑖 𝐴𝜔(θ𝐺𝐵, θ𝑆𝐹)

𝑖

𝑖=1

For example, I computed the TI(𝜃𝐺𝐵)ω in the emotional exhaustion plane in the range

described above at 10 different angles (0°, 10°, 20°, …, 90°) relative to 𝜃𝐺𝐵. I then

followed the same procedure for depersonalization and personal accomplishment.

After computing the TI(𝜃𝐺𝐵 , θ𝑆𝐹)𝜔, I separated the information for 𝜃𝐺𝐵 from the

information for each specific factor. To do this, I used the same trigonometry used above

with the item information. I can decompose the TI at a given 𝜃𝐺𝐵 and 𝜃𝑆𝐹 level at a given

angle with 𝜃𝐺𝐵 (𝜔𝐺𝐵) into a horizontal component containing only the GB test

information in the plane of a specific factor (TI(θ𝐺𝐵, θ𝑆𝐹)𝐺𝐵) :

𝑇𝐼(θ𝐺𝐵, θ𝑆𝐹)𝐺𝐵 = TI(θ𝐺𝐵, θ𝑆𝐹)𝜔𝐺𝐵∗ cos(ω𝐺𝐵)

and a vertical component with only the specific factor test information (TI(𝜃𝑆𝐹 𝑂𝑁𝐿𝑌)ω):

𝑇𝐼(θ𝑆𝐹, θ𝐺𝐵)𝑆𝐹 = TI(θ𝑆𝐹, θ𝐺𝐵)𝜔𝐺𝐵∗ sin(ω𝐺𝐵)

In order to compute the total amount of information provided by the MBI-HSS for

the general burnout dimension (𝑇𝐼(𝜃𝐺𝐵 , 𝜃𝐸𝐸 , 𝜃𝐷𝑃, 𝜃𝑃𝐴)), one must take into account the

person’s levels on all four dimensions:

𝑇𝐼(𝜃𝐺𝐵 , 𝜃𝐸𝐸 , 𝜃𝐷𝑃 , 𝜃𝑃𝐴) = 𝑇𝐼(θ𝐺𝐵, θ𝐸𝐸)𝐺𝐵 + 𝑇𝐼(θ𝐺𝐵, θ𝐷𝑃)𝐺𝐵 + 𝑇𝐼(θ𝐺𝐵, θ𝑃𝐴)𝐺𝐵

This decomposition of the test information allowed us to compute standard errors

for each factor separately as opposed for having just a single standard error value at a

range of 𝜃𝐺𝐵 for the test overall. The standard error of estimation for each factor is the


20

square root of the inverse of the test information at a given set of θ levels. For example,

for the general burnout dimension, the standard error is equal to:

𝑆𝐸(𝜃𝐺𝐵 , 𝜃𝐸𝐸 , 𝜃𝐷𝑃, 𝜃𝑃𝐴) = 1

√𝑇𝐼(𝜃𝐺𝐵 , 𝜃𝐸𝐸 , 𝜃𝐷𝑃, 𝜃𝑃𝐴)

And the standard error for the emotional exhaustion dimension is:

𝑆𝐸(θ𝐸𝐸 , θ𝐺𝐵) = 1

√𝑇𝐼(θ𝐸𝐸 , θ𝐺𝐵)𝑆𝐹

Present Research

Based on the information above, this dissertation focuses on answering one

research question through four steps:

Research Question: Does a bifactor model of burnout that accounts for the

strong correlations between the burnout dimensions perform better than the traditional

correlated traits model of burnout and are there items on the MBI-HSS that are not

providing useful information?

Step 1: Test the bifactor model used by Mészáros et al. (2014) on the English

version of MBI-HSS. Based on Mészáros et al.’s (2014) work with the Hungarian version

of the MBI-HSS and the prevalence of meta-analyses examining a single “burnout”

factor, I hypothesize that the bifactor model will provide the better fit compared to the

traditional correlated traits model.

Step 2: Using the model identified in Step 1, conduct an IRT analysis of MBI-

HSS.


21

Step 3: Using the results from Step 2, calculate standard errors for each specific

factor as well as the general burnout factor by decomposing item and test information

into general and specific factor components.

Step 4: Provide recommendations on MBI-HSS score reporting using the

methods advocated by Rodriguez, Reise, and Haviland (2015a). This approach

recommends examining the coefficient omegas (ω; McDonald, 1999; Reise, 2012),

Explained Common Variance (ECV; Reise, 2012), and percent uncontaminated

correlations (Reise, 2012).

Method

Participants

The sample is an archival sample that consists of 8,007 employees at a large

Federal agency who completed the agency’s 360-degree assessment, which includes the

MBI-HSS, between the years of 2008 and 2012. Of the 8,007, I removed 526 for missing

data, leaving a final sample of 7,481. The majority of the individuals who completed the

MBI-HSS reported that they were between 40 and 49 (32.5%) or between 50 and 59

(31.0%) years of age. Women made up the majority of the sample (62.8%). Finally, the

sample was predominantly Caucasian (64.4%) and African-American (25.1%).

Measures

MBI-HSS. The MBI-HSS measures three components of burnout: emotional

exhaustion, depersonalization, and personal accomplishment. The Emotional Exhaustion

subscale is composed of 9 items (e.g., “I feel burned out from my work”).

Depersonalization is measured with 5 items (e.g., “I worry that this job is hardening me

emotionally”); finally, the Personal Accomplishment subscale contains 8 items (e.g., “I


22

have accomplished many worthwhile things in this job”). All items are rated on a 7-point

frequency scale (0 = Never; 6 = Daily). The scoring of the scales is completed such that

high depersonalization and emotional exhaustion scores are indicative of burnout while

low personal accomplishment scores are signs of burnout. In order to aid in

interpretation, for this study all personal accomplishment items were reverse-scored so

that high scores on all three scales are undesirable.

Analyses.

Determining appropriate IRT model. In order to determine the appropriate IRT

model for the MBI-HSS, I compared the fit of the different competing models (i.e., the

correlated traits model, bifactor model, and the unidimensional model) using both the

graded response model based on Samejima’s (1969) and the generalized partial credit

model based on Muraki’s model (1992). In order to determine which model fit the data

better, I compared the models’ Bayes Information Criterion (BIC; Schwarz, 1978),

Akaike Information Criterion (AIC; Akaike, 1974), RMSEA, and Standardized Root

Mean Square Residual (SRMSR; Maydeu-Olivares, 2014). After establishing the proper

model to use, I then evaluated item fit, establish model parameters, and evaluate test and

item information. From the IRT model, I extracted standardized factor loadings in order

to complete the analyses recommended by Rodriguez et al. (2015a).

Software

All analyses will be completed using the open-source statistical program R

(version 3.2.3; R Core Team, 2015). Polychoric correlations and McDonald’s ω were

computed using the psych package (Revelle, 2015). All IRT analyses were conducted


23

using the mirt package (Chalmers, 2012). Corrgram were produced using the corrplot

package (Taiyun, 2013).

Results

Descriptive statistics and inter-item correlations

Table 1 contains the descriptive statistics for the MBI-HSS. The means for all the

items were very low: a good sign for the people in the sample. The items do show a

decent amount of variability and both the highest and lowest response categories for each

item were used. As a reminder, the personal accomplishment items were reverse scored

such that a lower score is better.

Rather than create an incomprehensible 22 by 22 table of correlation coefficients,

I created a corrgram (Friendly, 2002; Murdoch & Chow, 1996) to display the polychoric

correlations between each of the MBI-HSS items. A corrgram is a graphical display for

displaying correlation matrices that uses colors and/or shapes to represent the magnitude

and direction of correlations (Friendly, 2002; Murdoch & Chow, 1996). The use of a

corrgram makes it easier to identify patterns and oddities in a correlation matrix than the

traditional table format. Figure 2 is the corrgram for the MBI-HSS. Below the corrgram

is a scale explaining of the correlation values represented by each color. Red cells

indicate a negative correlation; blue cells indicate a positive correlation. The shade of the

cell indicates the magnitude of the correlation: the darker the cell color the stronger the

correlation. The corrgram makes clear that the depersonalization and emotional

exhaustion items are highly correlated. In contrast, the majority of the personal

accomplishment items do not correlate very strongly with items from the other subscales.


24

The exception to this is PA4, which is moderately correlated with the emotional

exhaustion items.

Model Comparisons

In order to complete Step 1 and Step 2 and determine the most appropriate model

for the MBI-HSS, I compared the fit of the unidimensional, correlated traits, and bifactor

graded response models and generalized partial credit models (Table 2). First, I

compared the fit of the unidimensional, correlated traits, and bifactor graded response

models. Of the three models, the unidimensional model had the worst fit across all fit

indices. Of note, the unidimensional graded response model’s deviance information

criterion and Bayesian information criterion (DIC = 410,895.8; BIC = 411961.5), which

penalize for model complexity were higher than that of both the more complex models,

the correlated traits model (AIC = 398,234.5; BIC = 399,321.0) and the bifactor model

(AIC = 394,165.5; BIC = 395,383.4). The SRMSR, which Maydeu-Olivares (2014)

described as an effect size for the amount of misfit in a model, was also much higher for

the unidimensional graded response model (SRMSR = .110) than for the correlated traits

(SRMSR = .081) and bifactor (SRMSR = .050) graded response models.

Not all of the differences in fit indices were as clear-cut as the AIC, BIC, and

SRMSR. While the unidimensional graded response model had a much larger RMSEA

(RMSEA = .07; CI5% = .07; CI 95% = .07) than the other two models, the 95% confidence

intervals for the correlated traits (RMSEA = .04; CI5% = .04; CI 95% = .05) and bifactor

models (RMSEA = .04; CI5% = .03; CI 95% = .04) overlapped. Finally, the CFI for the

correlated traits model (CFI = .96) was worse than both the unidimensional (CFI = .98)

and bifactor (CFI = .98) models.


25

In order to determine whether the difference in model fits was statistically

significant, I conducted a likelihood ratio test comparing the unidimensional model to the

correlated traits and bifactor models. Both tests were significant indicating that both the

correlated trait model (χ2(3) = 12,667.26, p < .001) and bifactor model (χ2(22) =

16,774.33, p < .001) fit the observed data better than the unidimensional model. I then

compared the bifactor and correlated traits model via likelihood ratio test (χ2(19) =

4,107.07, p < .001); the significant result indicates that the more complicated model—the

bifactor model—is a better fit for the data than the correlated traits model.

The generalized partial credit models displayed similar results. The deviance

information criterion and Bayesian information criterion for the unidimensional model

(AIC = 413,828.5; BIC = 414,894.2) were worse than for the correlated traits (AIC =

403,149.9; BIC = 404,236.4) and the bifactor (AIC = 399,154.1; BIC = 400,372.1)

generalized partial credit models. The unidimensional model’s SRMSR (SRMSR = .113)

was also slightly worse than the SRMSR for the correlated traits model (SRMSR = .109)

and much worse than that of the bifactor model (SRMSR = .062). As with the graded

response models, I conducted likelihood ratio tests to determine whether the fit of the

more complex models was significantly better than that of the unidimensional model.

The same pattern emerged as with the graded response model: both the correlated traits

(χ2(3) = 10,684.54, p < .001) and bifactor models (χ2(22) = 14,718.32, p < .001) had

significantly better fit than the unidimensional model and the bifactor model had a better

fit than the correlated traits model (χ2(19) = 4,033.78, p < .001).

Unlike the graded response models, the differences in CFI values for the

generalized partial credit models mirrored the differences in the fit indices described


26

above: the unidimensional model had the worst fit (CFI = .87) followed by the correlated

traits model (CFI = .96) while the bifactor model displayed the best fit (CFI = .98). On

the other hand, the RMSEA values for the generalized partial credit models displayed the

same pattern as the graded response models. The unidimensional model had the highest

RMSEA (RMSEA = .08; CI5% = .08; CI 95% = .08) while the correlated traits (RMSEA =

.04; CI5% = .04; CI 95% = .05) and bifactor (RMSEA = .04; CI5% = .04; CI 95% = .04) model

RMSEA values overlapped.

Having identified the best fitting models from both the graded response models

and generalized partial credit models—the bifactor model in both cases—I compared the

fit of these two models by examining their AIC and BIC values. The bifactor graded

response model (AIC = 394,165.5, BIC = 395,383.4) fit much better than the bifactor

generalized partial credit model ( AIC = 399,154.1 BIC = 400,372.1). In addition to the

likelihood ratio test, an examination of Table 1 reveals that with the exceptions of the

RMSEA and the CFI, the graded response model had superior fit across all of the fit

indices. Also, Maydeu-Olivares (2014) recommended that a SRMSR less than or equal

to .05 is indicative of adequate model fit: the bifactor graded response model was the

only model to meet that criterion of adequate model fit.

Item Fit

To assess item fit I computed the standardized residual correlations (SRC) for all

possible pairwise combinations within each subscale (Table 3). The SRC is the sample

correlation between two items minus the expected correlation from the model between

the two items (Maydeu-Olivares, 2014). For example, five items (DP1, DP4, PA1, PA2,

and PA7; Table 2) had mean |SRC| values within their subscales greater than the .05 cut-


27

off advocated by Maydeu-Olivares (2014). This indicates that the bifactor graded

response model does not replicate the correlations between these items and other items

within the subscale accurately.

IRT parameters

Having established the most appropriate of the tested models—the bifactor graded

response model—I extracted the raw item parameters for each of the MBI-HSS items

(Table 4). There are several interesting pieces of information to note in Table 4. First,

there are differences between the subscales as to their items’ discrimination patterns for

the general burnout dimension. All of the emotional exhaustion items discriminate better

on the general burnout dimension than on the emotional exhaustion dimension. The

personal accomplishment items (with the exception of PA5, discussed below)

discriminate better on the personal accomplishment dimension. Similar to the emotional

exhaustion items, the depersonalization items discriminate better on the general burnout

dimension, however the difference between their discrimination powers on the two

dimensions is less extreme.

Another interesting observation is that two of the emotional exhaustion items

(EE4 and EE8) have negative discrimination values on the emotional exhaustion latent

trait when a bifactor model is used. Part of this may be due to different item stems for

those two items in comparison to the other items; EE4 and EE8 begin with “Working

with people” whereas the remainder of the items begin with “I feel”. The stem “Working

with people” suggests a more external focus for the items, possibly prompting

respondents to think more about their job than their internal emotional state. In contrast,

the stem “I feel” clearly indicates an internal focus. The difference in these item stems


28

could account for the negative relationships with the other emotional exhaustion items

when the model accounts for the general burnout trait. Additionally, as reported in a

review article by Worley et al. (2008), several studies have found that both EE4 and EE8

do not load on the emotional exhaustion factor as expected from the scale’s construction.

The bifactor model helps elucidate the nature of the relationship between EE4 and EE8

and the remainder of the emotional exhaustion subscale: when the relationship between

all of the MBI-HSS items is modeled via the general burnout factor, EE4 and EE8 are

negatively related to the remainder of the subscale.

It is worth noting that the MBI-HSS manual recommends that when conducting

analyses on the MBI-HSS, item EE8—as well as PA4—should be removed from the

analyses because they have strong cross-loadings (p.11, Maslach et al., 1996). According

to the MBI-HSS manual, the item EE8 cross-loads on depersonalization whereas PA4

cross-loads on emotional exhaustion (Appendix A, Maslach et al., 1996).

As noted above in Table 4 is all but one of the PA items (PA4) discriminate more

strongly on the PA dimension than on the general burnout dimension. This finding

reflects the weaker correlation between the personal accomplishment and the other two

subscales. Maslach et al. (1996) noted that PA4 traditionally cross-loads on emotional

exhaustion (p.11), which explains why it has a higher discrimination value on the general

burnout dimension than the other personal accomplishment items. As mentioned above,

the general burnout dimension encapsulates the commonality between all the MBI-HSS

items, so the cross-loading that PA4 had with emotional exhaustion is captured by the

general burnout dimension.


29

The parameters in Table 4 were used in turn to calculate 𝐴𝑖 𝑀𝑎𝑥, 𝜔𝑖𝑘 𝑀𝑎𝑥, and the

step difficulties (𝐵𝑖𝑚) for each item as described in the introduction. The results of these

calculations are in Table 5. As a reminder: 𝐴𝑖 𝑀𝑎𝑥 is the slope of the item response

surface in the direction of the item difficulty parameters; 𝜔𝑖𝑘 𝑀𝑎𝑥 is the angle of 𝐴𝑖 𝑀𝑎𝑥

relative to the general burnout dimension; and the step difficulties are the thresholds for

the response categories for the item in the direction of 𝐴𝑖 𝑀𝑎𝑥. These parameters give a

view of the best an item can discriminate between people with different levels on the

dimensions. However, the 𝐴𝑖 𝑀𝑎𝑥 and 𝜔𝑖𝑘 𝑀𝑎𝑥 do not give a complete picture of the

functioning of the MBI-HSS.

In addition, I computed the directional discriminations for each item for every 10-

degree increment between 0 and 90 degrees (Table 6). I chose 10-degree intervals as this

provides 10 views of the discriminatory power of the items--a balance between too little

detail and too much detail—and these intervals are the same as recommended by Reckase

and McKinley (1991). These directional discrimination coefficients allow for the

computation item and test information coefficients for each of the aforementioned

intervals.

Item Information

The fact that the information provided by the MBI-HSS is conditional on all four

dimensions makes it prohibitive to create a table of all the information values for the

MBI-HSS. For example, in order to make a table of 𝑇𝐼(𝜃𝐺𝐵 , 𝜃𝐸𝐸 , 𝜃𝐷𝑃 , 𝜃𝑃𝐴) values for the

MBI-HSS there are 6,561 different θ level combinations when examining between -4 and

4 standard deviations above the mean in each dimension without counting the different


30

angles. Instead, I examined the amount of information provided by the MBI-HSS using

graphical methods.

To investigate the amount of information provided by each individual item for

each of the traits, I constructed clamshell plots (e.g., Reckase & McKinley, 1991).

Clamshell plots are graphical tools for displaying the amount of information provided by

each item for each angle interval between two orthogonal dimensions (Figures 3 - 24).

Within each cell of the plot, there are 10 lines—at every 10 degrees from 0 to 90 degrees

relative to the general burnout dimension—that indicate the amount of information being

provided by that item based on the responses of the sample. The line lengths are scaled

such that the length of the line indicates the ratio of the information provided in that cell

to the maximum amount of information provided by any one of the items: item EE1

(Figure 8) has the cell with the most information (4.34), so all lines are scaled relative to

that value. The axes indicate θ trait levels for the general burnout dimension and the

item’s respective dimension (i.e., depersonalization, emotional exhaustion, or personal

accomplishment) from four standard deviations below the mean θ level to four standard

deviations above the mean in one standard deviation intervals.

The clamshell plots reveal that all of the items provide very little information

when individuals have low levels on both dimensions. Also, there are several items (DP1

[Figure 3], DP4 [Figure 6], DP5 [Figure 7], EE7 [Figure 14], PA1 [Figure 17], PA2

[Figure 18], PA4 [Figure 20], PA6 [Figure 22], and PA8 [Figure 24]) that contribute very

little information—less than 1 unit of information at their maximum to either

dimension—for determining individuals’ θ levels. Also, the clamshell plots display

graphically what I mentioned above regarding the discrimination values for the different


31

subscales. By examining the lengths of the lines in the clamshell plots as well as their

angle relative to the different dimensions, the viewer can determine—roughly—on which

dimension each item best discriminates. Namely, the emotional exhaustion items

discriminate better—and therefore contribute more information—on the general burnout

dimension, whereas the personal accomplishment items discriminate better on the

personal accomplishment dimension and the depersonalization items are more balanced.

Test Information

Next, I examined the amount of information provided by the subscales of the

MBI-HSS I created clamshell plots for each subscale (Figures 25 – 27). The lengths of

the lines are scaled to be relative to the maximum information provided by any of the

subscales: in this case, all the subscale level clamshell plots are scaled in relation to the

emotional exhaustion subscale’s maximum information. What becomes clear from

looking at the overall clamshell plots is that the emotional exhaustion subscale provides

much more information than the other two subscales. The clamshell plots, however,

make it difficult to compare the magnitude of the amount of information provided at

different angles.

In order to complete Step 3 and display how much information the subscales in

reference to the general burnout dimension and their respective secondary dimensions

provide, I decomposed the test information clamshells into bar graphs (Figures 28 – 33).

The bar graphs with horizontal bars (i.e., Figures 28, 30, & 32) display the amount of

information provided about the general burnout dimension; the bar graphs with vertical

lines (i.e., Figures 29, 31, & 33) display the amount of information provided about the

subscale’s secondary dimension. The bar graphs are arranged in the same manner as the


32

clamshell plots except that the origins of each line have shifted. The bars are arranged by

angle such that higher angles are further from the intersection of the θ values (e.g., 0

degrees is at the meeting of the θ values, 10 degrees is next to that bar, and so on). This

method of displaying the test information allows for the viewer to clearly see the amount

of information provided for each dimension without having to estimate the relative length

of lines at differing angles.

To help compare information from item response theory to classical test theory’s concept

of reliability, Thissen (2000) recommended that an item response theory estimate of

reliability comparable to reliability in the classical test theory could be calculated as 1 −

𝑆𝐸𝑒2 (p. 163, Table 7.1). Using this formula, I calculated the amount of information

necessary to achieve reliabilities of .70, .80, and .90 (I(θ) = 3.33, 5, and 10 respectively)

and added lines (red, blue, and green) to the graphs to represent where the subscales

provide enough information to meet those reliability values. The specific plots for each

subscale will be discussed in the next section.

General burnout.

Figure 28 displays the amount of information provided by the depersonalization

subscale about individuals’ general burnout θ level. As Figure 28 shows, there is a small

number of combinations of depersonalization and general burnout dimension levels—that

form a band in the graph—for which the depersonalization subscale provides enough

information to increase the reliability of the general burnout dimension estimate above

.80. Outside that band however, the scale provides essentially no information about the

general burnout dimension.


33

Figure 30 tells a different story: the emotional exhaustion scale has a much wider

band of information provided about individuals’ general burnout θ level. Comparing

Figure 28 and Figure 30, not only does the emotional exhaustion subscale have a wider

band of information than the depersonalization subscale, it also provides much more

information. In fact, where the emotional exhaustion scale provides the most information

about the general burnout factor, increases the reliability of the general burnout

dimension’s estimate well above.90.

Finally, the personal accomplishment subscale provides very little information

about the general burnout dimension (Figure 32). As mentioned above the personal

accomplishment items, except PA4, discriminate on the personal accomplishment

dimension than the general burnout dimension, so it is not surprising that the personal

accomplishment scale provides so little information. The personal accomplishment

subscale does have a wide band of information like the emotional exhaustion subscale,

but the amount of information provided is small.

To compare the amount of information provided by any of the subscales for the

general burnout dimension, it is necessary to collapse each subscale’s information into

marginal information by summing the information provided by the subscale across all

secondary trait levels. For example, I summed the amount of information provided by

the depersonalization subscale across all depersonalization trait levels at each general

burnout level and angle relative to general burnout; thus getting the marginal information

for each general burnout trait level and angle. Figure 34 displays the marginal

information for the general burnout dimension faceted by subscale and angle relative to

the general burnout dimension. It is clear that the emotional exhaustion subscale


34

provides the most information about the general burnout dimension. Part of the reason

that the emotional exhaustion scale provides more information about the general burnout

dimension than the depersonalization subscale is the relative number of items for each

subscales: the emotional exhaustion subscale has nine items whereas the

depersonalization subscale has only five. However, the emotional exhaustion subscale

provides practically no information at the lowest level of general burnout. This deficit is

remedied by the other two subscales which provide a small amount of information at

those low levels of general burnout. What also becomes more evident from the marginal

plots is that the personal accomplishment subscale provides information at an almost

uniform amount across the entire general burnout trait range. This is useful for the scale

in that it still provides information at general burnout levels not covered by the other

subscales. Depersonalization also provides information across the entire general burnout

spectrum, however the information is more concentrated at the upper levels of general

burnout.

One must use caution in interpreting the marginal information. While the

marginal information is useful for comparing subscales, the scales will never provide that

level of information about an individual’s trait level. In order to determine the amount of

information provided for the general burnout dimension one must add the information

provided by the emotional exhaustion, depersonalization, and personal accomplishment

subscales at the individual’s respective θ levels for both that dimension and the general

burnout dimension. In other words, the information provided about a person’s general

burnout dimension is conditional on the other dimensions (Brown & Croudace, 2014).


35

This conditionality makes the marginal information inappropriate for determining an

individual’s general burnout level.

Depersonalization.

Figure 29 displays the test information provided for the depersonalization

dimension. Similar to the information provided by the depersonalization subscale for the

general burnout dimension, there is a thin band of trait levels where the scale provides

enough information to raise the reliability to .80. This is disconcerting, as the other

subscales do not provide any information about depersonalization. Therefore, the only

information we have about an individual’s depersonalization level is from the

depersonalization subscale, and if that subscale is not providing much information, the

ability of the MBI-HSS to determine individuals’ depersonalization levels is mediocre at

best. In fact, outside the band of higher information, there is essentially no information,

meaning that the estimate of a person’s depersonalization level in the low information

areas is very uncertain.

Emotional exhaustion.

The test information for the emotional exhaustion dimension is displayed in

Figure 31. The band of viable information for emotional exhaustion is much wider,

indicating that the MBI-HSS does a better job of placing people at different emotional

exhaustion levels than depersonalization across the spectrum of general burnout and

emotional exhaustion levels. As can be seen in Figure 8, there are also many more points

where the reliability of the subscale is above .80 indicating that the placement of people

on the emotional exhaustion dimension is fairly reliable compared to the

depersonalization subscale. It is unsurprising that the emotional exhaustion items provide


36

more information about the general burnout dimension than the emotional exhaustion

dimension: there are several items that are more about frustration than exhaustion and

there is one item that even asks specifically about being burned out.

Personal accomplishment.

The personal accomplishment dimension is similar to the depersonalization

dimension in that there are very few points at which the reliability is above .80 (Figure

33). It is different in that the band of information is much wider than that for

depersonalization. Given the small quantities of information provided by the personal

accomplishment subscale, personal accomplishment θ level estimates are unreliable and

should be interpreted with caution.

Rodriguez, Reise, and Haviland (2015a) analyses

Rodriguez, Reise, and Haviland (2015a) recommended a series of analyses to

determine the psychometric value of the general factor and specific factors of a bifactor

model. Rodriguez et al. (2015) recommend examining the coefficient omegas (ω;

McDonald, 1999; Reise, 2012), Explained Common Variance (ECV; Reise, 2012),

construct reliability (Hancock & Mueller, 2001), and percent uncontaminated correlations

(Reise, 2012) for the model. In order to compute these values, I first extracted

standardized factor loadings from the model (Table 7). Then, using the formulas in

Rodriguez et al. (2015) I computed the above indices (Table 8).

Explained common variance.

The explained common variance (ECV; Reise, 2012) is the amount of variance

accounted for by the model that is accounted for by each of the dimensions. Reise (2012)

explains that a bifactor model with a general factor that accounts for a large majority


37

(greater than 60%; Reise, Scheines, Widaman, & Haviland, 2013) of the variance

accounted for by the entire model, may be safely treated as unidimensional. The equation

for the ECV of the general factor is (Equation 10; Rodriguez et al, 2015):

𝐸𝐶𝑉𝐺𝐵 = ∑ 𝜆𝐺𝐵

2

∑𝜆𝐺𝐵2 + ∑𝜆𝐷𝑃

2 + ∑𝜆𝐸𝐸2 + ∑𝜆𝑃𝐴

2

Where λ is the standardized factor loading. This equation can be modified to work for

any dimension by replacing the numerator with the dimension of interest. The ECV for

the general burnout dimension is .62, meaning that the general burnout dimension

accounts for 62% of the variance accounted for by the model as a whole. The

depersonalization, emotional exhaustion, and personal accomplishment dimensions

accounted for 12%, 2%, and 24% of the variance respectively. This indicates that a large

proportion of the variance is explained by the general factor. However, there are more

indices that need to be taken into account before declaring the MBI-HSS

“unidimensional”.

Percent uncontaminated correlations. As part of examining the ECV values,

Rodriguez et al (2015) recommend examining the percentage of uncontaminated

correlations (PUC; Reise, Scheines et al., 2013). The PUC is the number of correlations

between items that are uncontaminated by multidimensionality. According to Reise,

Scheines et al. (2013), both the specific dimension and the general dimension influence

correlations between two items within the same subscale. For example, two dimensions

influence the correlations between two depersonalization items: the depersonalization

dimension and the general burnout dimension. On the other hand, only the general

burnout dimension influences the correlation between a depersonalization item and a

personal accomplishment item. Reise et al. (2013) note that this multidimensional


38

contamination can bias the computation of unidimensional factor loadings when trying to

force a bifactor model into a unidimensional framework.

The PUC is computed by subtracting the number of correlations within each

subscale from the total number of correlations and then dividing by the total number of

correlations (Rodriguez et al., 2015a):

𝑃𝑈𝐶 =

(22 ∗ 21

2 ) − ((9 ∗ 8

2 ) + (8 ∗ 7

2 ) + (5 ∗ 4

2 ))

(22 ∗ 21

2 )= .68

In other words, 68% of the correlations between items in the MBI-HSS are uninfluenced

by multidimensionality. When evaluating bifactor models, the PUC is important as it

serves as an indicator of how many of the correlations inform the general factor. As

noted by Reise, Scheines et al. (2013): “as PUC increases, the general trait in the bifactor

model becomes more and more similar to the single trait estimated in a unidimensional

model, especially when the ECV is high. (p. 9)”. While Reise, Scheines, et al. (2013) do

not define a critical value for the PUC in order to at which a measure can be considered

unidimensional, they note that PUC moderates the relationship between ECV and bias in

factor loadings. Specifically, as the PUC value increases the relative bias in factor

loadings, when comparing the loadings of the bifactor general factor and unidimensional

models, stays low even if the ECV value is low.

Omega coefficients.

Omega. Coefficient omega (ω; McDonald, 1999) is a model-based reliability

estimate from the factor analysis literature. It is a measure of the common variance

divided by the total variance of a scale. For the MBI-HSS as a whole, the equation is

(Equation 11, Reise, 2012):


39

ω = (∑𝜆𝐺𝐵)2 + (∑𝜆𝐸𝐸)2+ (∑𝜆𝐷𝑃)2+ (∑𝜆𝑃𝐴)2

(∑𝜆𝐺𝐵)2 + (∑𝜆𝐸𝐸)2+ (∑𝜆𝐷𝑃)2+ (∑𝜆𝑃𝐴)2 + 𝐸𝑟𝑟𝑜𝑟

For the MBI-HSS data, the ω for the scale as a whole is .93 (Table 8). However, for the

purposes of this dissertation, ω is not very informative. In order to examine the

reliabilities of the general burnout dimension and subscales, we need to examine omega

hierarchical (𝜔𝐻), omega subscale (𝜔𝑆), and omega hierarchical subscale (𝜔𝐻𝑆; Reise,

2012).

Omega hierarchical. Omega hierarchical is similar to ω, except that the

numerator in the equation only includes the general factor’s variance. It is also similar to

ECV except for the inclusion of the error term in the denominator: ω𝐻 is a measure of the

total variance that is attributable to the general factor as opposed to just the explained

variance (Equation 10; Reise, 2012):

ω𝐻 = (∑𝜆𝐺𝐵)2

(∑𝜆𝐺𝐵)2 + (∑𝜆𝐸𝐸)2+ (∑𝜆𝐷𝑃)2+ (∑𝜆𝑃𝐴)2 + 𝐸𝑟𝑟𝑜𝑟

For the MBI-HSS, the ω𝐻 value equals .76 (Table 8). This indicates that 76% of the

variance in total scores on the MBI-HSS is attributable to individual differences in

general burnout (Rodriguez et al., 2015a). Rodriguez et al. (2015a) recommend looking

at the proportion of 𝜔𝐻

𝜔 to determine the percent of reliable variance accounted for by the

MBI-HSS as a whole is attributable to the general burnout dimension; in this case, 82%

of the reliable variance in the MBI-HSS is accounted for by the general burnout

dimension.

Omega subscale. In order to look more closely at the subscales, I computed the

𝜔𝑆 for each subscale. The 𝜔𝑆 is calculated in the same manner as ω, however it is


40

computed only the items within the subscale. For example, for the depersonalization

subscale the equation is (Equation 12; Reise, 2012):

ω𝑆 = (∑𝜆𝐺𝐵)2 + (∑𝜆𝐷𝑃)2

(∑𝜆𝐺𝐵)2 + (∑𝜆𝐷𝑃)2 + 𝐸𝑟𝑟𝑜𝑟

Table 8 contains the 𝜔𝑆 values for each scale. Emotional exhaustion had an excellent

𝜔𝑆value (𝜔𝑆 = .92), with personal accomplishment close behind (𝜔𝑆 = .85).

Depersonalization had the worst 𝜔𝑆 (𝜔𝑆 = .78) level.

Omega hierarchical subscale. The 𝜔𝐻𝑆 coefficient is similar to the 𝜔𝐻

coefficient except it examines the contribution of the subscale rather than the general

burnout dimension for the items that compose the subscale. For example, 𝜔𝐻𝑆 for the

depersonalization dimension looks at the common variance accounted for by the

depersonalization dimension without the general burnout factor (Equation 7; Rodriguez et

al., 2015a):

ω𝐻𝑆 = (∑𝜆𝐷𝑃)2

(∑𝜆𝐺𝐵)2 + (∑𝜆𝐷𝑃)2 + 𝐸𝑟𝑟𝑜𝑟

While the three subscales had acceptable 𝜔𝑆 values, the 𝜔𝐻𝑆 values reveal that the

subscales’ reliability is mostly due to the general burnout dimension (Table 8). In fact,

when the general burnout dimension is removed from the emotional exhaustion subscale,

the 𝜔𝐻𝑆 is .00. This is largely the result of the two items (EE4 and EE8) that load

negatively on the emotional exhaustion dimension and the low magnitude of the loadings

for the rest of the items. The depersonalization subscale also had a low 𝜔𝐻𝑆 value (𝜔𝐻𝑆

= .35); also a result of the relatively low loadings on the depersonalization factor.

Personal accomplishment had the highest 𝜔𝐻𝑆 value (𝜔𝐻𝑆 = .59), but it was still far

below what is normally accepted reliability levels.


41

In the same manner as I did with the ω𝐻 and ω values I can look at the proportion

of reliable variance within the subscale accounted for by the subscale dimension (𝜔𝐻𝑆

𝜔𝑆).

The emotional exhaustion dimension accounted for none of the reliable variance in

emotional exhaustion scores. The depersonalization dimension accounted for 45% of the

reliable variance in depersonalization scores, and the personal accomplishment dimension

accounted for 69% of the reliable variance in personal accomplishment scores.

What becomes clear from the above analyses is that after partialing out the general

burnout dimension, none of the subscales have adequate reliabilities for use. This

corresponds well with the results of the item response theory analyses: the

depersonalization and emotional exhaustion subscales provided more information about

the general burnout dimension than the subscale’s dimension and the general burnout

dimension was more reliable than the subscale dimensions.

Construct reliability. Construct reliability, or construct reproducibility, is a

measure of the replicability of a latent trait (Hancock & Mueller, 2001; Rodriguez et al.,

2015a). In the words of Hancock and Mueller (2001) the measure of construct validity,

H, “is the proportion of variability in the construct explainable by its own indicator

variables (pg. 202-203).” Rodriguez et al. (2015) explain construct reliability another

way: “construct reliability is a statistical method of judging how well a latent variable is

represented by a given set of items (i.e., the quality of the indicators), and, thus,

replicable across studies (pg. 7).” Mathematically, H is defined as (Equation 9;

Rodriguez et al., 2015a):


42

𝐻 =

[

1 + 1

∑𝜆𝑖

2

1 − 𝜆𝑖2

𝑘𝑖=1 ]

−1

Where k equals the total number of items within a scale or subscale. The H values for

each of the subscales can be found in Table 8. For the MBI-HSS, the general burnout has

excellent construct reliability (H = .93) indicating that it is likely to be replicated across

studies (Rodriguez et al., 2015a). Rodriguez, Reise, and Haviland (2015b) recommend

that a construct reliability coefficient H above .80 indicates a well-defined and likely to

replicate latent variable. The only other latent variable meet this mark was personal

accomplishment (H = .81). The H values for depersonalization (H = .59) and emotional

exhaustion (H = .55) indicate that these latent variables are likely unstable and unlikely to

replicate (Rodriguez et al., 2015b).

Supplemental analyses.

In order to demonstrate the practical differences between the correlated traits

model and the bifactor models for the MBI-HSS, I examined the impact of a person’s

self-ratings of burnout on peer, staff, boss, and self competency ratings from the

Department of Veterans Affairs 360-degree feedback instrument (VA-360). The VA-360

measures 12 competencies which are divided into six core competencies (i.e.,

communication, interpersonal effectiveness, critical thinking, organizational stewardship,

veteran and customer focus, and personal mastery) and six leadership competencies (i.e.,

leading people, building coalitions, leading change, results driven, global perspective,

and business acumen). In addition to the items assessing the competencies, the

individuals who request a VA-360 (i.e., the self-raters) also take the MBI-HSS. The

raters from the other groups (i.e., staff, peer, and boss) do not take the MBI-HSS.


43

The sample contained 19,652 people who had completed the VA-360 on behalf of

themselves or another person (self-raters N = 1,440, peer raters N = 9,432, staff raters N =

6,564, and boss raters N = 2,216). Table 9 contains the descriptive statistics and

Cronbach’s α for each competency by rater group. Table 10 contains the sample ages

and genders broken down by rater group.

I then used the models described in the item response theory results above to

compute θ estimates for the individuals who took the MBI-HSS in the VA-360 sample

for both the bifactor and correlated traits models using expected a posteriori scoring

(EAP; de Ayala, 2009). The EAP scoring method allows for the computation of θ values

for perfect and zero response patterns (unlike maximum likelihood estimation) and

shrinks θ estimates toward the mean of the posterior distribution (de Ayala, 2009). This

resulted in θ values for the four bifactor dimensions (i.e., general burnout, emotional

exhaustion, depersonalization, and personal accomplishment) and the three correlated

traits dimensions (i.e., emotional exhaustion, depersonalization, and personal

accomplishment).

To analyze the VA-360 ratings, I treated the data as having a two-level structure

with raters nested within individuals (e.g., boss ratings within an individual). This allows

for the use of multilevel, random coefficient modeling (e.g., Hox, 2010) to investigate the

relationship between self-ratings of burnout and the competencies. Since only the target

individual (i.e., the self-rater) had burnout ratings, it was treated as a group level variable,

with the VA-360 ratings as individual level variables for the peer, staff, and boss groups

in two-level regression analyses. All reported random coefficient models were computed

with random intercepts but fixed slopes across groups due to convergence issues. For the


44

self-ratings, I used multiple regression analyses (e.g., Cohen, Cohen, West, & Aiken,

1999) since there was no nesting and no need for the more complicated multilevel

analyses.

Supplemental analyses results. Tables 11 – 22 contain the results of the

supplemental analyses. Rather than go through the analyses for each competency, I

counted the number of times that each dimension of burnout was a significant predictor

of one of the competencies for both the correlated traits and bifactor models. When a

correlated traits model was used, emotional exhaustion was a significant predictor of the

competencies 15 out of 48 analyses (12 competencies times the four rater groups). In

contrast, emotional exhaustion was only a significant predictor for six analyses—a 60%

decrease—when the bifactor model was used. Depersonalization was only a significant

predictor of the competencies for eight out of the 48 analyses with the correlated traits

model. For the bifactor model, the number of significant relationships between

depersonalization and the competencies was only two (a 75% decrease). Personal

accomplishment was a significant predictor of the competencies in 44 analyses with the

correlated traits model and 41 analyses with the bifactor model (a 6.8% decrease). The

general burnout dimension of the bifactor model was a significant predictor of the

competencies in 41 of the 48 analyses.

Compared to the correlated traits model, two of the three traditional dimensions of

burnout—emotional exhaustion and depersonalization—had many fewer significant

relationships with the competencies when the bifactor model was used. In other words,

the source of the relationships between these dimensions and the competencies were, in


45

many cases, a result of the inter-relationships of the dimensions rather than the

dimensions themselves.

In the bifactor model, the inter-relationships between the dimensions are

accounted for by the general burnout dimension and the subscales are residualized

versions of the subscales that are free from correlations with the other dimensions. If the

relationship between the competencies and the emotional exhaustion and

depersonalization dimensions were due to the subscales themselves, the relationships

would have remained significant when the bifactor model was used and the general

burnout dimension would not have been significant.

As noted by Worley and colleagues (2008), the depersonalization and emotional

exhaustion dimensions are highly correlated, and thus the use of a bifactor model, greatly

influences their relationships with the competencies. The personal accomplishment

dimension has a much lower correlation with the other dimensions, and so the use of a

bifactor model does not impact many of its inter-relationships with the competencies. In

addition, the IRT analyses above showed that the personal accomplishment items provide

more information and discriminate better on the personal accomplishment dimension than

the general burnout dimension.

These analyses raise important questions about the current nomological net that

has been woven around burnout: are the significant relationships that have been found

using the correlated traits model actually due to the subscale (e.g., emotional exhaustion)

or is the relationship a result of the inter-relation between the subscales. The bifactor

model proposed here gives researchers a means to answer that question. In the

supplemental analyses presented here, I demonstrated that many of the significant


46

relationships between the dimensions of burnout and VA-360 competencies were a result

of the inter-relations rather than the subscales themselves.

Discussion

Scoring Recommendations

In this study the bifactor model fit the MBI-HSS better than the correlated traits

model. This new model requires a different scoring procedure from the traditional

method presented in the scoring manual (i.e., Maslach et al., 1996). As mentioned

before, the manual states that “given our limited knowledge about the relationships

between the three aspects of burnout, the scores for each subscale are considered

separately and are not [emphasis in original] combined into a single, total score (p.5,

Maslach et al., 1996).” Since the publishing of the manual there has been a large amount

of research on the relationship among the subscales (e.g., Worley et al., 2008); it is time

to reconsider a general burnout score.

The results of this study demonstrated that there was a strong general factor in the

MBI-HSS. Not only did the general factor account for the majority of the variance

accounted for by the MBI-HSS, it also is very reliable. I also demonstrated that when the

general factor is removed from the subscales, the subscale scores are unreliable and do

not account for much of the variance. The Standards of Educational and Psychological

Testing (American Educational Research Association [AERA], American Psychological

Association [APA], National Council on Measurement in Education [NCME], 2014) state

that in order to report subscale scores in an educational or psychological context,

sufficient evidence of their reliability and distinctness must be demonstrated (p. 43). In

the case of the MBI-HSS, the analyses (ωHS) demonstrated that none of the subscales met


47

the requirement of being reliable without the general factor. Only the personal

accomplishment subscale came close to being reliable enough to be reported; however,

its ωHS was still well below traditional cut-offs for acceptable reliability.

The results from the bifactor item response theory analyses provide further

information about the precision of the respective scales. Each of the subscales had only

thin bands in their trait space where they had adequate information to precisely determine

a person’s level on that subscale. In contrast, the area of acceptable precision is much

wider for the general burnout dimension.

In light of these results, I recommend just reporting the general factor score rather

than the subscale scores. First, the general burnout dimension is more reliable and is

more likely to replicate than the subscales. The analyses in this study demonstrated that

when the general burnout dimension is accounted for, none of the subscales are reliable

enough to be reported.

Second, the interpretation of the subscale scores from the bifactor model is

different from those of the correlated trait model: the subscale scores in the bifactor

model are actually residualized subscale scores after the removal of the general burnout

dimension. How to interpret such residualized subscales is not as straight forward as

traditional subscale scores. Whereas the traditional subscale scores capture the

relationship between the items of the subscale, the residualized subscale scores capture

the relationship between the subscale items after accounting for the communality with the

items on the other subscales. For example, the depersonalization subscale score under

the correlated traits model would represent the person’s level of depersonalization as

traditionally reported. The bifactor model’s depersonalization score is the unique


48

variance of the depersonalization subscale after accounting for the subscale’s relationship

with emotional exhaustion and personal accomplishment. This difference in the actual

substance of the subscale scores between the correlated traits and bifactor models is

likely to get lost for users of the MBI-HSS and lead to inappropriate usage and

interpretation of the subscale scores.

According to Reise, Scheines, and colleagues (2013), a bifactor model can be

used to diagnose whether a measure is “unidimensional enough” to use a unidimensional

structural equation model without creating excessive bias in the parameters. In their

simulation they found that if a bifactor model had a PUC under .80, an 𝜔𝐻 above .70, and

an ECV over .60—criteria which are all met by the current study’s data (PUC = .68, 𝜔𝐻

= .76, ECV = .62)—a unidimensional model could be used without introducing more

than 10% parameter bias. The research by Reise, Scheines et al. (2013) used a factor

analysis framework for determining the amount of parameter bias; it is unclear how the

results of their analyses will translate into an IRT framework. The MBI-HSS data differs

from the conditions of the simulation, so it is unclear how much these recommendations

generalize to this data. The main differences between the MBI-HSS data and that of the

simulation are the different number of items per group factor and the unequal loadings

across items. With this in mind, the recommendations of Reise, Scheines, et al. (2013)

should be interpreted with caution in the case of the MBI-HSS. However, given the

strength of the general dimension (i.e., the high ECV) and the saturation of the general

dimension (i.e., the high 𝜔𝐻) there is still strong evidence for interpreting the general

dimension rather than the subscale scores. In order to empirically evaluate Reise,

Scheines and colleagues’ (2013) recommendations with respect to the MBI-HSS, I


49

calculated the average relative bias for the MBI-HSS when using a unidimensional model

rather than the bifactor model, using both the standardized loadings and discrimination

parameters. I compared the loadings and discrimination values for the bifactor model’s

general burnout dimension to those of a unidimensional model. For the factor loadings,

the loadings for the unidimensional model were 6% larger than those of the general

burnout dimension. When the discrimination parameters were compared, the

discrimination parameters for the unidimensional model were 12% lower than those of

the bifactor general burnout dimension.

Summary

This study examined an alternative model, originally proposed by Mészáros et al.

(2014)—for the MBI-HSS: a bifactor model. The analyses revealed that the bifactor

model did fit the data better than the correlated traits model. In addition, the analyses

showed that the general burnout dimension was the only dimension reliable enough to be

reported. In addition, I demonstrated a method for decomposing item and test

information in bifactor models into information for the general dimension and the

specific/group dimensions using basic trigonometry. Using this decomposition, I showed

that there are only thin bands within the two dimensional trait spaces where the subscales

have adequate information for precisely determining a person’s level on that trait.

Importance of this study. This study made many important contributions to both

the burnout and multidimensional item response theory literature. First, the scoring

recommendations above are in stark contrast to those in the MBI-HSS manual. Contrary

to what the manual states, I demonstrated that, not only is there a general burnout


50

dimension, it is also much more reliable than any of the subscales and provides precise

estimates of trait level across a broad range of trait level combinations.

The bifactor model of burnout accounts for the strong correlations between the

burnout dimensions found in the literature (e.g., Worley et al., 2008). Previous results

from the correlated traits model fail to take into account these correlations and interpret

their results as if the subscales are orthogonal. For example, research by Leiter and

Maslach (2009) examined the relationship between turnover intention and burnout in

nurses. In the article, they find that depersonalization (referred to as cynicism by the

authors) is a significant positive predictor of turnover intention; however, they do not

address the correlation between the burnout subscales. In fact, the correlation between

depersonalization and emotional exhaustion is .60, which indicates that the subscales

share 36% of their variance (Leiter & Maslach, 2009). Given the large proportion of

variance that is shared between depersonalization and emotional exhaustion, it is

impossible to determine whether the relationships between depersonalization and

turnover intentions they found are actually due to depersonalization and not emotional

exhaustion. Using the bifactor model demonstrated here, that question could be

answered.

One of the possible practitioner critiques of the recommendation to use the

general burnout dimension score rather than the subscale scores is that there is a loss of

diagnostic richness. Instead of three separate dimensions, each with their own predictors

and outcomes, we are left with a single dimension. While I agree that the movement

from three scores to one is not ideal, the three scores traditionally reported are so

intertwined (i.e., inter-correlated) that it is inappropriate to provide the three scores and


51

treat them as independent. The bifactor model accounts for this inter-correlation and

quantifies it in the form of the general burnout dimension. The analyses in this paper

demonstrated that when the inter-correlation is removed from the subscales, the subscales

themselves are unreliable and are inappropriate to report (AERA, APA, & NCME, 2014).

In order to make the residualized subscales reliable or precise enough to be reported the

test publisher could develop items that discriminate more highly on the subscale

dimensions than on the general burnout dimension.

The use of the general burnout dimension rather than the subscale scores raises

other questions about prior research findings as well. Maslach and Leiter (2008) found

that one of the best predictors of a person developing burnout was the presence of a high

level on either emotional exhaustion or depersonalization. They argue that since the

dimensions are so highly correlated, the state in which a person is high on one subscale

and low on the other is unstable (Maslach & Leiter, 2008) and that this could serve as an

early indicator of burnout. The use of just the general burnout dimension precludes the

use of this “early warning” pattern. However, their analyses did find other predictors of

burnout that could be used to predict the onset of burnout (e.g., perceptions of fairness)

which leaves practitioners with other methods to detect the beginnings of burnout.

In the introduction, I introduced several meta-analyses (Crawford et al., 2010;

Nahrgang et al., 2011; Wang et al., 2010) that treated burnout as a unidimensional

construct. Taking into account the results of the analyses in this dissertation, the use of a

unidimensional conceptualization of burnout is defensible even though it is contrary to

the instructions in the manual. While the unidimensional model does not take into


52

account the subscales, this dissertation demonstrated that a general burnout dimension is

reliable.

On top of the new scoring recommendations, the bifactor model also revealed

more information about items that had been problematic for previous researchers (i.e.,

EE4, EE8, and PA4). Specifically, after accounting for the relationships between all the

items in the form of the general burnout dimension EE4 and EE8 were negatively related

to the rest of the emotional exhaustion items and the PA4 item had a higher

discrimination value on the general burnout dimension than on the personal

accomplishment dimension, unlike the rest of the items within that subscale.

This study also adds to the literature by being the first IRT analysis of the MBI-

HSS. The use of IRT provided much more nuanced information about the functioning of

each item as well as the scale as a whole than previous analyses based on classical test

theory. The IRT analyses revealed that many items (e.g., DP1, DP5, and PA1) provide

very little information about the traits with which they are associated. By using IRT I

was also able to identify what trait ranges that each subscale was most precise in

determining a person’s trait level on both the specific dimension (i.e., emotional

exhaustion, depersonalization, or personal accomplishment) and general burnout

dimension. Previous research has only focused on classical test theory which assumes

constant precision across the entire continuum of ability levels.

In addition to those contributions to the burnout literature, this dissertation also

adds to the MIRT literature. I demonstrated a method for decomposing the item and test

information for the test into information for the specific dimension and the general

burnout dimension. This allows for more nuanced exploration of scale functioning in


53

bifactor or two dimensional orthogonal MIRT models. For example, this decomposition

allowed me to determine that the MBI-HSS subscales—especially the depersonalization

subscale—can only precisely determine a person’s level on that scale within a small area

of the total trait space. This decomposition also revealed more details about the general

burnout dimension. Overall, the general burnout dimension had a much wider range of

trait levels at which there was acceptable information.

Strengths and Limitations

This project had several strengths. First, the sample was larger than any of the

structural studies of the MBI-HSS reported in Worley et al. (2008) and much larger than

Mészáros et al. (2014). This large sample allowed for the use of multidimensional item

response theory and made it more likely that the parameter estimates are stable. The

sample also came from across an entire large, Cabinet-level federal agency which

provided a geographically and occupationally diverse sample and allows for better

generalizability of the results.

One limitation of this study was that all of the participants were from the public

sector. It is possible that employees from the private sector have different experiences of

burnout, and different levels of burnout than public sector employees. This is an area in

need of more research, but is beyond the scope of this project. Despite this limitation,

the IRT parameters computed from this dissertation should still apply to private sector

employees due to the parameter invariance property of IRT (Embretson & Reise, 2000).

A second limitation is the use of a bipolar IRT model rather than a unipolar IRT

model (Lucke, 2013; 2014). Bipolar IRT models (e.g., traditional IRT models including

the graded response model) assume that the latent trait levels extend from -∞ to ∞ which


54

is reasonable for attitudinal and ability research (Lucke, 2013; 2014). In other words, it is

impossible with a bipolar IRT model to have the absence of the latent trait; rather a θ

value of 0 indicates that the individual scored at the mean level of the latent trait.

Unipolar IRT models assume that the latent trait continuum only extends from 0 to ∞

where 0 indicates an absence of the latent trait (Lucke, 2013; 2014). In the case of

burnout a unipolar model makes theoretical sense: it is possible to be “burnout-free”.

Also, the frequency scale—with “Never” as its lowest response category—used to

measure burnout implies a unipolar model. Unfortunately unipolar IRT models are still

in their infancy and the current models are not sophisticated enough to conduct a similar

analysis of the MBI-HSS. Current unipolar IRT models are strictly unidimensional and

would be unable to calculate either the bifactor or correlated traits models. Also, unipolar

models only exist for dichotomous items at this point in time (Lucke, 2013; 2014).

Hopefully in the near future, researchers will be able to test a unipolar IRT model of the

MBI-HSS.

Future Directions

This dissertation provides several avenues of future research. First, and most

importantly, more research needs to be conducted to confirm the bifactor structure found

here and in Mészáros et al. (2014). Over two decades of research has been conducted

using the correlated traits model of the MBI-HSS: more than two studies are needed to

overturn the correlated traits model. In addition, the previous research on the correlates

of burnout needs to be reexamined using a bifactor model to see how the relationships

differ when the general burnout factor is taken into account. In most cases, there should


55

be sufficient information within the published articles to complete these analyses for an

initial examination.

Second, the information decomposition used in this dissertation should also be

examined. Future research could generalize this decomposition to oblique latent trait

structures. Three-dimensional structures should also be explored to determine how to

decompose the information of more complex structures.

Third, the information about item parameters, item information, and item fit in

this study could be used to improve the MBI-HSS. The items identified above as

providing little information (e.g., DP1, DP5, and PA1) or being negatively related to their

subscale (i.e., EE4 and EE8) are all good candidates for removal from the MBI-HSS. It

has been 20 years since the last version of the MBI-HSS was released (Maslach et al.,

1996): it is time to take another look at the instrument and make revisions.

The results in this dissertation also show that the publisher has three directions

that they could take the MBI-HSS. The first direction they could go in is trying to

remove the general burnout dimension and moving back to the correlated traits model.

To do this, the publisher could remove items that discriminate better on the general

burnout dimension (e.g., EE2, EE4, EE8, DP3, etc.) and supplement with new items that

they expect to discriminate better on their respective dimensions (e.g., emotional

exhaustion) and not correlate across subscales.

A second direction is moving the MBI-HSS toward a truly unidimensional

measure by removing items that discriminate more on their secondary dimensions (e.g.,

PA1, PA3, PA7, etc.). These items could be replaced with items that are more strongly


56

correlated with items across all the current subscales. This would also mean getting rid

of the subscales and having only the general score.

The other alternative is to embrace the bifactor model for the MBI-HSS. Even if

the publishers decided to take this route, improvements could certainly be made. The

analyses in this paper revealed many items that did not discriminate well on any

dimension (e.g., DP5, PA1) items that had a negative relationships with their subscale’s

dimension (EE4 and EE8), and items that did not fit the model well (DP1, DP4, PA1,

PA2, and PA7). All of these items should be examined and possibly replaced.

Conclusion

This project examined the English version of the most popular measure of

burnout—the MBI-HSS—and tested an alternative structure for the instrument that was

used by Mészáros et al. (2014) in the Hungarian translation. The results of this study

demonstrated that the bifactor model has superior fit compared to the traditional

correlated traits model and the general burnout dimension is, by far, more reliable than

the subscales. The analyses in this dissertation also give detailed information on the

performance of the individual items in measuring the dimensions of burnout and serve as

a beginning for a conversation on how the publishers can revise the MBI-HSS. In

addition, this study demonstrated a method for decomposing item and test information in

bifactor structures into information for the general factor and specific factors.


57

References

Akaike, H. (1974). A New Look at the Statistical Model Identification. IEEE

Transactions on Automatic Control, 19(6), 716-723.

American Educational Research Association, American Psychological Association, &

National Council on Measurement in Education (2014). Standards for educational

and psychological testing. Washington, DC: American Educational Research

Association.

Baker, F. B. (2001). The basics of item response theory. ERIC Clearinghouse on

Assessment and Evaluation. For full text:

http://files.eric.ed.gov/fulltext/ED458219.pdf.

Bakker A. B. & Demerouti, E. (2014). Job Demands-Resources Theory. In Chen, P.Y &

Cooper, C. I. (Eds.), Work and Wellbeing: Wellbeing: A Complete Reference

Guide, Volume III. (pp. 37 – 64). John Wiley & Sons, West Sussex, UK.

Beckstead, J. W. (2002). Confirmatory factor analysis of the Maslach Burnout Inventory

among Florida nurses. International Journal of Nursing Studies, 39(8), 785-792.

Bonifay, W. E., Reise, S. P., Scheines, R., & Meijer, R. R. (2015). When Are

Multidimensional Data Unidimensional Enough for Structural Equation

Modeling? An Evaluation of the DETECT Multidimensionality Index. Structural

Equation Modeling: A Multidisciplinary Journal. Advanced online publication.

http://dx.doi.org/10.1080/10705511.2014.938596 .

http://files.eric.ed.gov/fulltext/ED458219.pdf

http://dx.doi.org/10.1080/10705511.2014.938596


58

Brown, A., & Croudace, T. J. (2014). Scoring and Estimating Score Precision Using

Multidimensional IRT Models. In S.P. Reise & D.A. Revicki (Eds.) Handbook of

Item Response Theory Modeling: Applications to Typical Performance

Assessment.(pp. 307-333). New York, NY: Routledge.

Byrne, B. M. (1993). The Maslach Burnout Inventory: Testing for factorial validity and

invariance across elementary, intermediate and secondary teachers. Journal of

Occupational and Organizational Psychology, 66(3), 197-212.

Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R

environment. Journal of Statistical Software, 48(6), 1-29.

Chen, F. F., Hayes, A., Carver, C. S., Laurenceau, J. P., & Zhang, Z. (2012). Modeling

general and specific variance in multifaceted constructs: A comparison of the

bifactor model to other approaches. Journal of Personality, 80(1), 219-251.

Chen, F. F., Sousa, K. H., & West, S. G. (2005). Teacher's corner: Testing measurement

invariance of second-order factor models. Structural equation modeling, 12(3),

471-492.

Chen, F. F., West, S. G., & Sousa, K. H. (2006). A comparison of bifactor and second-

order models of quality of life. Multivariate Behavioral Research, 41(2), 189-225.

Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). Applied multiple

regression/correlation analysis for the behavioral sciences. New York, NY:

Routledge.

Crawford, E. R., LePine, J. A., & Rich, B. L. (2010). Linking job demands and resources

to employee engagement and burnout: a theoretical extension and meta-analytic

test. Journal of Applied Psychology, 95(5), 834-848.


59

De Ayala, R. J. (2009). Theory and practice of item response theory. New York, NY:

Guilford Publications.

DeMars, C. E. (2013). A tutorial on interpreting bifactor model scores. International

Journal of Testing, 13(4), 354-378.

Demerouti, E., Bakker, A., Nachreiner, F., & Schaufeli, W. (2001). The Job Demands-

Resources model of burnout. Journal of Applied Psychology, 86(3), 499-512.

Demerouti, E., Bakker, A. B., & Leiter, M. (2014). Burnout and job performance: the

moderating role of selection, optimization, and compensation strategies. Journal

of Occupational Health Psychology, 19(1), 96-107.

Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah,

NJ: Lawrence Earlbaum Associates.

Freudenberger, H. J. (1974). Staff burn‐out. Journal of Social Issues, 30(1), 159-165.

Freudenberger, H. J. (1975). The staff burn-out syndrome in alternative

institutions. Psychotherapy: Theory, Research & Practice, 12(1), 73-82.

Friendly, M. (2002). Corrgrams: Exploratory displays for correlation matrices. The

American Statistician, 56(4), 316-324.

Gibbons, R. D., Bock, R. D., Hedeker, D., Weiss, D. J., Segawa, E., Bhaumik, D. K., ...

Stover, A. (2007). Full-information item bifactor analysis of graded response

data. Applied Psychological Measurement, 31(1), 4-19.

Golembiewski, R. T., Boudreau, R. A., Sun, B. C., & Luo, H. (1998). Estimates of

burnout in public agencies: worldwide, how many employees have which degrees

of burnout, and with what consequences? Public Administration Review, 58(1),

59-65.


60

Grice, J. W. (2001). Computing and Evaluating Factor Scores. Psychological

Methods, 6(4), 430-450.

Gustafsson, J. E., & Balke, G. (1993). General and specific abilities as predictors of

school achievement. Multivariate Behavioral Research, 28(4), 407-434.

Hakanen, J. J., & Schaufeli, W. B. (2012). Do burnout and work engagement predict

depressive symptoms and life satisfaction? A three-wave seven-year prospective

study. Journal of Affective Disorders, 141(2), 415-424.

Halbesleben, J. R., & Demerouti, E. (2005). The construct validity of an alternative

measure of burnout: Investigating the English translation of the Oldenburg

Burnout Inventory. Work & Stress, 19(3), 208-220.

Hambleton, R. K., & Jones, R. W. (1993). An NCME Instructional Module on

Comparison of Classical Test Theory and Item Response Theory and Their

Applications to Test Development. Educational Measurement: Issues and

Practice, 12(3), 38-47.

Hancock, G. R., & Mueller, R. O. (2001). Rethinking construct reliability within latent

variable systems. In R. Cudeck, S. Du Toit, & D. Sorbom (Eds.). Structural

equation modeling: Present and future—A Festschrift in honor of Karl Joreskog

(pp. 195-216). Lincolnwood, IL: Scientific Software International.

Hobfoll, S. (1989). Conservation of resources: A new attempt at conceptualizing stress.

American Psychologist, 44(3), 513-524.

Hobfoll, S., & Lilly, R. (1993). Resource conservation as a strategy for community

psychology. Journal of Community Psychology, 21(2), 128-148.


61

Holzinger, K. J., & Swineford, F. (1937). The bi-factor method. Psychometrika,2(1), 41-

54.

Holzinger, K. J., & Swineford, F. (1939). A study in factor analysis: The stability of a bi-

factor solution. Supplementary Education Monographs, No. 48. Chicago:

Department of education, University of Chicago.

Hox, J. J. (2010). Multilevel analysis: Techniques and applications. New York, NY:

Routledge.

Kim, H., & Kao, D. (2014). A meta-analysis of turnover intention predictors among US

child welfare workers. Children and Youth Services Review, 47(P3), 214-223.

Kristensen, T. S., Borritz, M., Villadsen, E., & Christensen, K. B. (2005). The

Copenhagen Burnout Inventory: A new tool for the assessment of burnout. Work

& Stress, 19(3), 192-207.

LaHuis, D. M., Hartman, M. J., Hakoyama, S., & Clark, P. C. (2014). Explained variance

measures for multilevel models. Organizational Research Methods, 17(4), 433-

451.

Leiter, M., Harvie, P., & Frizzell, C. (1998). The correspondence of patient satisfaction

and nurse burnout. Social Science Medicine, 47(10), 1611-1617.

Leiter, M. P., & Maslach, C. (2009). Nurse turnover: the mediating role of burnout.

Journal of nursing management, 17(3), 331-339.

Lucke, J. F. (2013). Positive trait item response models. In R.E. Millsap, L.A. van der

Ark, D.M. Bolt, & C.M. Woods (Eds.) New Developments in Quantitative

Psychology (pp. 199-213). New York, NY: Springer.


62

Lucke, J. F. (2014). Unipolar Item Response Models. In S.P. Reise & D.A. Revicki

(Eds.) Handbook of Item Response Theory Modeling: Applications to Typical

Performance Assessment (pp. 272-284). New York, NY: Routledge.

Martel, M. M., Von Eye, A., & Nigg, J. T. (2010). Revisiting the latent structure of

ADHD: Is there a ‘g’factor?. Journal of Child Psychology and Psychiatry,51(8),

905-914.

Maslach, C., & Jackson, S. E. (1981). The measurement of experienced burnout. Journal

of occupational behavior, 2(2), 99-113.

Maslach, C., & Jackson, S. (1984). Patterns of burnout among a national sample of public

contract workers. Journal of Health and Human Resources Administration, 7(2),

189-212.

Maslach, C., Jackson, S. E., & Leiter, M. P. (1996). Maslach burnout inventory manual.

Mountain View: CA. Consulting Psychologists Press.

Maslach, C., & Leiter, M. P. (2008). Early predictors of job burnout and engagement.

Journal of applied psychology, 93(3), 498-512.

Maslach, C., & Pines, A. (1977). The burn-out syndrome in the day care setting. Child

Care Quarterly, 6(2), 100-113.

Maslach, C., Schaufeli, W. B., & Leiter, M. P. (2001). Job burnout. Annual Review of

Psychology, 52(1), 397-422.

Maydeu-Olivares, A. (2014). Evaluating the Fit of IRT Models. In Reise, S. P. &

Revicki, D. (Eds.), Handbook of Item Response Theory Modeling: Applications to

Typical Performance Assessment, (pp. 111-127). Routledge, New York, NY.


63

McDonald, R. P. (1999). Test theory: A unified approach. Mahwah, NJ: Lawrence

Earlbaum.

Melamed, S., Kushnir, T., & Shirom, A. (1992). Burnout and risk factors for

cardiovascular diseases. Behavioral Medicine, 18(2), 53-60.

Mészáros, V., Adám, S., Szabó, M., Szigeti, R., & Urbán, R. (2014). The Bifactor Model

of the Maslach Burnout Inventory-Human Services Survey (MBI-HSS)-An

Alternative Measurement Model of Burnout. Stress and Health: Journal of the

International Society for the Investigation of Stress, 30(1), 82-88.

Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm.

ETS Research Report Series, 1992(1), i-30.

Murdoch, D. J., & Chow, E. D. (1996). A graphical display of large correlation matrices.

The American Statistician, 50(2), 178-180.

Nahrgang, J. D., Morgeson, F. P., & Hofmann, D. A. (2011). Safety at work: a meta-

analytic investigation of the link between job demands, job resources, burnout,

engagement, and safety outcomes. Journal of Applied Psychology, 96(1), 71-94.

R Core Team (2014). R: A language and environment for statistical computing [computer

software]. R Foundation for Statistical Computing, Vienna, Austria. URL

http://www.R-project.org/

Reckase, M. (2009). Multidimensional item response theory. New York: Springer.

Reckase, M. D., & McKinley, R. L. (1991). The discriminating power of items that

measure more than one dimension. Applied Psychological Measurement, 15(4),

361-373.

http://www.r-project.org/


64

Reise, S. P. (2012). The rediscovery of bifactor measurement models. Multivariate

Behavioral Research, 47(5), 667-696.

Reise, S. P., Bonifay, W. E., & Haviland, M. G. (2013). Scoring and modeling

psychological measures in the presence of multidimensionality. Journal of

Personality Assessment, 95(2), 129-140.

Reise, S. P., Cook, K. F., & Moore, T. M. (2014). Evaluating the Impact of

Multidimensionality on Unidimensional Item Response Theory Model

Parameters. In Reise, S. P. & Revicki, D. (Eds.), Handbook of Item Response

Theory Modeling: Applications to Typical Performance Assessment, (pp. 13-40).

Routledge, New York, NY.

Reise, S. P., Moore, T. M., & Haviland, M. G. (2010). Bifactor models and rotations:

Exploring the extent to which multidimensional data yield univocal scale scores.

Journal of personality assessment, 92(6), 544-559.

Reise, S. P., Morizot, J., & Hays, R. D. (2007). The role of the bifactor model in

resolving dimensionality issues in health outcomes measures. Quality of Life

Research, 16(1), 19-31.

Reise, S. P., Scheines, R., Widaman, K. F., & Haviland, M. G. (2013).

Multidimensionality and structural coefficient bias in structural equation

modeling a bifactor perspective. Educational and Psychological Measurement,

73(1), 5-26.

Reise, S. P., Ventura, J., Keefe, R. S., Baade, L. E., Gold, J. M., Green, M. F., ... Bilder,

R. (2011). Bifactor and item response theory analyses of interviewer report scales


65

of cognitive impairment in schizophrenia. Psychological Assessment, 23(1), 245-

261.

Revelle, W. (2015) psych: Procedures for Personality and Psychological Research,

Northwestern University, Evanston, Illinois, USA. http://CRAN.R-

project.org/package=psych Version = 1.5.8.

Rindskopf, D., & Rose, T. (1988). Some theory and applications of confirmatory second-

order factor analysis. Multivariate Behavioral Research, 23(1), 51-67.

Rodriguez, A., Reise, S. P., & Haviland, M. G. (2015a). Evaluating Bifactor Models:

Calculating and Interpreting Statistical Indices. Psychological Methods. Advance

online publication. http://dx.doi.org/10.1037/met0000045

Rodriguez, A., Reise, S. P., & Haviland, M. G. (2015b). Applying Bifactor Statistical

Indices in the Evaluation of Psychological Measures. Journal of Personality

Assessment, 98(3), 1-15.

Rosseel, Y. (2012). lavaan: An R package for structural equation modeling. Journal of

Statistical Software, 48(2), 1-36.

Samejima, F. (1969). Estimation of latent ability using a response pattern of graded

scores. Psychometrika monograph. Richmond, VA: Psychometric Society.

Schaufeli, W., & Enzmann, D. (1998). The burnout companion to study and practice: a

critical analysis. Philadelphia, PA: Taylor & Francis.

Schmid, J., & Leiman, J. M. (1957). The development of hierarchical factor solutions.

Psychometrika, 22(1), 53-61.

Schutte, N., Toppinen, S., Kalimo, R., & Schaufeli, W. (2000). The factorial validity of

the Maslach Burnout Inventory‐General Survey (MBI‐GS) across occupational

http://dx.doi.org/10.1037/met0000045


66

groups and nations. Journal of Occupational and Organizational

psychology, 73(1), 53-66.

Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2),

461-464.

Slocum-Gori, S. L., & Zumbo, B. D. (2011). Assessing the unidimensionality of

psychological scales: Using multiple criteria from factor analysis. Social

Indicators Research, 102(3), 443-461.

Stucky, B. D., Thissen, D., & Edelen, M. O. (2013). Using Logistic Approximations of

Marginal Trace Lines to Develop Short Assessments. Applied Psychological

Measurement, 37(1), 41-57.

Wang, Q., Bowling, N. A., & Eschleman, K. J. (2010). A meta-analytic examination of

work and general locus of control. Journal of Applied Psychology, 95(4), 761-

768.

Wheeler, D. L., Vassar, M., Worley, J. A., & Barnes, L. L. (2011). A reliability

generalization meta-analysis of coefficient alpha for the Maslach Burnout

Inventory. Educational and Psychological Measurement, 71(1), 231-244.

Wolpin, J., Burke, R., & Greenglass, E. (1991). Is job satisfaction an antecedent or a

consequence of psychological burnout? Human Relations, 44(2), 193-209.

Worley, J. A., Vassar, M., Wheeler, D. L., & Barnes, L. L. (2008). Factor structure of

scores From the Maslach Burnout Inventory: A review and meta-analysis of 45

exploratory and confirmatory factor-analytic studies. Educational and

Psychological Measurement, 68(5), 797-823.


67

Taiyun Wei (2013). corrplot: Visualization of a correlation matrix [computer software].

R package version 0.73. http://CRAN.R-project.org/package=corrplot

Thissen, D. (2000). Reliability and Measurement. In H. Wainer (Ed.), Computer

Adaptive Testing: A Primer. (pp. 159 – 184). Mahweh, NJ: Lawrence Erlbaum

Associates Publishers

Thompson, B. (2004). Exploratory and confirmatory factor analysis: Understanding

concepts and applications. Washington, D.C.: American Psychological

Association.

http://cran.r-project.org/package=corrplot


68

Table 1

Descriptive statistics for the MBI-HSS

Item M SD Minimum Maximum

DP1 0.63 1.05 0 6

DP2 0.80 1.25 0 6

DP3 0.80 1.30 0 6

DP4 0.28 0.73 0 6

DP5 1.65 1.65 0 6

EE1 2.42 1.64 0 6

EE2 2.64 1.74 0 6

EE3 1.87 1.68 0 6

EE4 1.02 1.24 0 6

EE5 1.65 1.55 0 6

EE6 2.47 1.71 0 6

EE7 2.33 1.87 0 6

EE8 0.68 1.00 0 6

EE9 0.78 1.25 0 6

PA1 1.06 1.50 0 6

PA2 0.72 1.18 0 6

PA3 1.01 1.42 0 6

PA4 1.07 1.18 0 6

PA5 0.65 1.06 0 6

PA6 1.47 1.53 0 6

PA7 1.19 1.37 0 6

PA8 1.02 1.36 0 6

Note. N = 7481.


69

Table 2

Model fit comparisons for unidimensional, correlated traits, and bifactor models using entire MBI-HSS.

Model df AIC AICc SABIC BIC LL

RMSEA

(CI5%, CI95%) CFI SRMSR

Uni-GRM 99 410,895.8 410,902.3 411,472.1 411,961.5 -205,294 .07 (.07, .07) 0.98 0.110

CT-GRM 96 398,324.5 398,241.3 398,822.1 399,321.0 -198,960 .04 (.04, .05) 0.96 0.081

Bifactor-GRM 77 394,165.5 394,174.0 394,824.1 395,383.4 -196,907 .04 (.03, .04) 0.98 0.050

Uni-GPCM 99 413,828.5 413,835.0 414,404.8 414,894.2 -206,760 .08 (.08, .08) 0.87 0.113

CT-GPCM 96 403,149.9 403,156.7 403,737.5 404,236.4 -201,418 .04 (.04, .05) 0.96 0.109

Bifactor-GPCM 77 399,154.1 399,162.7 399,812.8 400,372.1 -199,401 .04 (.04, .04) 0.98 0.062

Note. N = 7,481. 22 items. Uni = Unidimensional models. CT = Correlated Trait model.


70

Table 3

Mean |SRC| values for the MBI-HSS items.

Item Mean |SRC|

DP1 0.067

DP2 0.011

DP3 0.026

DP4 0.055

DP5 0.022

EE1 0.019

EE2 0.024

EE3 0.018

EE4 0.029

EE5 0.021

EE6 0.027

EE7 0.030

EE8 0.023

EE9 0.035

PA1 0.051

PA2 0.068

PA3 0.048

PA4 0.043

PA5 0.042

PA6 0.035

PA7 0.065

PA8 0.037

Note. Mean |SRC| values computed using only

within subscale item pairs. Bolded and Italicized

values are greater than the .05 cut-off suggested by

Maydeu-Olivares (2014).


71

Table 4

Raw Bifactor Graded Response Model Parameters for the MBI-HSS Items.

Item a (GB) a (EE) a (DP) a (PA) d1 d2 d3 d4 d5 d6

DP1 0.97 0.66 -0.67 -2.20 -3.04 -4.14 -4.88 -6.65

DP2 2.83 2.90 -0.82 -4.02 -5.69 -7.38 -8.57 -10.41

DP3 2.40 1.82 -0.74 -3.16 -4.32 -5.62 -6.39 -7.87

DP4 1.12 0.73 -1.94 -3.61 -4.50 -5.32 -5.91 -6.98

DP5 0.96 0.42 1.15 -0.53 -1.22 -2.07 -2.64 -4.07

EE1 3.39 2.14 5.57 1.44 -0.32 -2.97 -4.32 -8.52

EE2 3.05 1.95 5.10 1.78 0.19 -2.01 -3.24 -6.95

EE3 2.48 0.98 2.37 -0.18 -1.39 -2.88 -3.85 -6.39

EE4 2.35 -0.71 0.53 -2.10 -3.41 -4.85 -5.96 -8.15

EE5 3.12 1.19 2.81 -0.98 -2.41 -4.14 -5.26 -7.97

EE6 2.20 0.84 3.79 0.89 -0.33 -1.80 -2.67 -5.14

EE7 1.41 0.59 2.04 0.39 -0.39 -1.45 -2.07 -3.63

EE8 2.60 -1.17 -0.43 -3.66 -5.11 -6.66 -7.69 -10.29

EE9 2.12 0.47 -0.47 -2.63 -3.45 -4.53 -5.25 -7.20

PA1 0.08 0.88 -0.06 -1.40 -1.84 -2.66 -3.19 -4.43

PA2 0.61 1.42 -0.57 -2.48 -3.10 -4.08 -4.69 -6.18

PA3 0.98 1.78 -0.22 -1.93 -2.58 -3.90 -4.83 -6.16

PA4 1.51 1.12 0.93 -2.02 -2.82 -4.34 -5.33 -6.45

PA5 1.12 1.73 -0.71 -3.05 -3.76 -5.16 -6.12 -7.41

PA6 1.03 1.46 1.24 -1.17 -1.91 -3.18 -3.92 -5.00

PA7 0.95 1.63 0.66 -1.54 -2.22 -3.65 -4.68 -7.41

PA8 0.58 1.27 0.07 -1.68 -2.23 -3.23 -4.11 -6.25

Note. N = 7,481. Off-dimension slopes (all equal to 0) removed for clarity.


72

Table 5

Converted Item Parameters for the MBI-HSS.

Items 𝐴𝑖 𝑀𝑎𝑥 𝐵𝑖1 𝐵𝑖2 𝐵𝑖3 𝐵𝑖4 𝐵𝑖5 𝐵𝑖6 𝜔𝑖 𝐺𝐵 𝑀𝑎𝑥

DP1 1.17 0.57 1.88 2.59 3.53 4.16 5.67 34.44

DP2 4.05 0.20 0.99 1.40 1.82 2.11 2.57 45.69

DP3 3.01 0.25 1.05 1.43 1.87 2.12 2.61 37.13

DP4 1.34 1.44 2.69 3.36 3.97 4.41 5.20 33.00

DP5 1.05 -1.09 0.51 1.16 1.97 2.51 3.87 23.66

EE1 4.01 -1.39 -0.36 0.08 0.74 1.08 2.12 32.28

EE2 3.62 -1.41 -0.49 -0.05 0.55 0.90 1.92 32.65

EE3 2.66 -0.89 0.07 0.52 1.08 1.45 2.40 21.52

EE4 2.46 -0.22 0.85 1.39 1.97 2.43 3.32 -16.88

EE5 3.34 -0.84 0.29 0.72 1.24 1.58 2.39 20.92

EE6 2.36 -1.61 -0.38 0.14 0.76 1.13 2.18 20.98

EE7 1.53 -1.33 -0.26 0.25 0.94 1.35 2.37 22.48

EE8 2.85 0.15 1.29 1.80 2.34 2.70 3.61 -24.31

EE9 2.17 0.22 1.21 1.59 2.09 2.42 3.32 12.56

PA1 0.88 0.07 1.59 2.08 3.00 3.61 5.01 84.81

PA2 1.55 0.37 1.60 2.01 2.64 3.04 4.00 66.75

PA3 2.03 0.11 0.95 1.27 1.92 2.38 3.03 61.16

PA4 1.88 -0.49 1.07 1.50 2.31 2.84 3.44 36.53

PA5 2.06 0.34 1.48 1.83 2.51 2.97 3.60 57.04

PA6 1.79 -0.69 0.66 1.07 1.78 2.19 2.80 54.91

PA7 1.88 -0.35 0.82 1.18 1.94 2.48 3.93 59.58

PA8 1.40 -0.05 1.21 1.60 2.32 2.95 4.48 65.25

Note. 𝜔𝑖 𝐺𝐵 𝑀𝑎𝑥 is reported in degrees relative to the general burnout axis.


73

Table 6

Directional Discriminations for the items of the MBI-HSS

Item A0 A10 A20 A30 A40 A50 A60 A70 A80 A90

DP1 0.97 1.07 1.14 1.17 1.17 1.13 1.06 0.95 0.82 0.66

DP2 2.83 3.29 3.65 3.90 4.03 4.04 3.93 3.69 3.35 2.90

DP3 2.40 2.68 2.88 2.99 3.01 2.93 2.77 2.53 2.21 1.82

DP4 1.12 1.23 1.31 1.34 1.33 1.28 1.19 1.07 0.91 0.73

DP5 0.96 1.02 1.05 1.04 1.01 0.94 0.85 0.73 0.58 0.42

EE1 3.39 3.71 3.92 4.01 3.98 3.82 3.55 3.17 2.70 2.14

EE2 3.05 3.34 3.53 3.62 3.59 3.46 3.22 2.88 2.45 1.95

EE3 2.48 2.61 2.66 2.63 2.52 2.34 2.08 1.76 1.39 0.98

EE4 2.35 2.19 1.96 1.68 1.34 0.96 0.56 0.13 -0.29 -0.71

EE5 3.12 3.28 3.34 3.30 3.15 2.92 2.59 2.19 1.72 1.19

EE6 2.20 2.31 2.36 2.33 2.23 2.06 1.83 1.55 1.21 0.84

EE7 1.41 1.49 1.53 1.52 1.46 1.36 1.21 1.03 0.82 0.59

EE8 2.60 2.35 2.04 1.66 1.23 0.77 0.28 -0.21 -0.70 -1.17

EE9 2.12 2.17 2.15 2.07 1.93 1.72 1.47 1.17 0.83 0.47

PA1 -0.31 -0.17 -0.03 0.12 0.27 0.40 0.52 0.63 0.72 0.79

PA2 -0.03 0.19 0.41 0.61 0.80 0.96 1.09 1.18 1.25 1.27

PA3 0.18 0.46 0.72 0.95 1.16 1.34 1.47 1.56 1.60 1.59

PA4 1.01 1.17 1.29 1.37 1.41 1.41 1.37 1.28 1.16 1.00

PA5 0.35 0.61 0.85 1.07 1.26 1.41 1.51 1.57 1.58 1.54

PA6 0.37 0.59 0.80 0.98 1.13 1.24 1.32 1.36 1.35 1.31

PA7 0.23 0.47 0.71 0.92 1.11 1.26 1.37 1.44 1.47 1.45

PA8 0.02 0.21 0.40 0.58 0.74 0.88 0.99 1.07 1.12 1.13

All slope directions are in degrees relative to the general burnout dimension.


74

Table 7

Standardized factor loadings for the items of the MBI-HSS.

Item GB EE DP PA Error

DP1 0.47 0.32 0.68

DP2 0.64 0.66 0.15

DP3 0.69 0.53 0.24

DP4 0.52 0.34 0.62

DP5 0.48 0.21 0.72

EE1 0.78 0.49 0.15

EE2 0.76 0.49 0.18

EE3 0.78 0.31 0.29

EE4 0.79 -0.24 0.32

EE5 0.83 0.32 0.21

EE6 0.76 0.29 0.34

EE7 0.62 0.26 0.55

EE8 0.78 -0.35 0.26

EE9 0.77 0.17 0.38

PA1 0.04 0.46 0.79

PA2 0.27 0.62 0.55

PA3 0.37 0.67 0.41

PA4 0.60 0.44 0.45

PA5 0.42 0.65 0.41

PA6 0.42 0.59 0.48

PA7 0.38 0.64 0.45

PA8 0.27 0.58 0.60

Note. Loadings on off-dimensions removed for clarity.


75

Table 8

Results from the Rodriguez, Reise, and Haviland (2015) analyses.

Analysis GB EE DP PA

ECV 0.62 0.02 0.12 0.24

ω 0.93

ωH 0.76

ωS 0.92 0.78 0.85

ωHS 0.00 0.35 0.59

Construct Reliability (H) 0.95 0.55 0.59 0.81

Note. Percent Uncontaminated Correlations (PUC) = .68


76

Table 9

Supplemental analyses: descriptive statistics for the VA 360-degree feedback instrument

Peer Staff Boss Self

Competency # of items α M SD α M SD α M SD α M SD

Communication 3 .91 4.85 1.65 .92 4.96 1.75 .89 4.69 1.49 .84 4.73 1.24

Interpersonal Effectiveness 5 .96 5.21 1.59 .96 5.17 1.76 .95 5.11 1.46 .93 5.08 1.21

Critical Thinking 3 .94 4.95 1.73 .95 5.04 1.83 .93 4.90 1.52 .90 4.76 1.28

Org. Stewardship 5 .94 4.82 1.63 .95 4.97 1.72 .92 4.94 1.43 .91 4.85 1.21

Veteran Focus 2 .90 5.11 1.87 .91 5.18 1.92 .87 5.36 1.49 .85 5.16 1.36

Personal Mastery 4 .93 4.87 1.68 .94 4.84 1.84 .92 4.97 1.44 .89 4.89 1.22

Leading People 5 .96 3.63 2.16 .95 4.50 1.99 .94 3.75 1.96 .92 4.09 1.55

Building Coalitions 3 .93 4.56 1.87 .93 4.63 1.96 .90 4.60 1.60 .88 4.47 1.38

Leading Change 3 .92 4.33 2.04 .92 4.71 1.97 .89 4.49 1.69 .87 4.59 1.37

Results Driven 2 .89 4.20 2.17 .88 4.74 2.00 .88 4.32 1.94 .85 4.25 1.60

Global Perspective 3 .95 4.33 2.14 .95 4.65 2.04 .94 4.30 1.87 .92 4.23 1.52

Business Acumen 2 .96 3.48 2.56 .95 4.17 2.43 .94 3.55 2.35 .92 3.55 1.99

Note. Peer N = 9,432, Staff N = 6,564, Boss N = 2,216, Self N = 1,440


77

Table 10

Supplemental analyses: sample characteristics of VA 360-degree feedback sample

Rater

Group Age Frequency Percentage Gender Frequency Percentage

Boss

< 20 1 0% Male 918 41%

20 - 29 17 1% Female 1,180 53%

30 - 39 229 10% NA 118 5%

40 - 49 549 25%

50 -59 841 38%

60 + 420 19%

NA 159 7%

Peer

< 20 10 0% Male 3,072 33%

20 - 29 352 4% Female 5,751 61%

30 - 39 1,661 18% NA 609 6%

40 - 49 2,503 27%

50 -59 3,079 33%

60 + 1,069 11%

NA 758 8%

Staff

< 20 12 0% Male 2,056 31%

20 - 29 436 7% Female 4,026 61%

30 - 39 1,264 19% NA 482 7%

40 - 49 1,683 26%

50 -59 1,887 29%

60 + 684 10%

NA 598 9%

Self

< 20 0 0% Male 544 38%

20 - 29 80 6% Female 867 60%

30 - 39 360 25% NA 29 2%

40 - 49 464 32%

50 -59 408 28%

60 + 97 7%

NA 31 2%

Note. Peer N = 9,432, Staff N = 6,564, Boss N = 2,216, Self N = 1,440


78

Table 11

Supplemental analyses: impact of burnout on communication competency ratings.

Bifactor Model Correlated Traits Model

Rater Group Predictor b SE t-value 𝑅2 Predictor b SE t-value 𝑅2

Staff

GB -0.12 0.03 -3.55 0.01 0.01

EE -0.02 0.04 -0.52 EE -0.05 0.05 -1.08

DP 0.03 0.04 0.73 DP 0.03 0.06 0.54

PA -0.12 0.04 -3.21 PA -0.14 0.04 -3.41

Peer

GB -0.12 0.02 -4.64 0.01 0.01

EE -0.03 0.03 -1.09 EE -0.07 0.04 -1.98

DP 0.02 0.03 0.73 DP 0.03 0.04 0.62

PA -0.08 0.03 -3.04 PA -0.1 0.03 -3.21

Boss

GB -0.06 0.04 -1.50 0.01 0.01

EE -0.02 0.05 -0.33 EE -0.04 0.06 -0.75

DP 0.05 0.05 0.98 DP 0.08 0.07 1.13

PA -0.11 0.04 -2.63 PA -0.13 0.05 -2.76

Self

GB -0.27 0.04 -7.55 0.06 0.06

EE 0.01 0.04 0.36 EE 0.00 0.05 0.01

DP -0.09 0.05 -1.90 DP -0.16 0.06 -2.51

PA -0.16 0.04 -4.26 PA -0.19 0.04 -4.28

Note. Peer N = 9,432, Staff N = 6,564, Boss N = 2,216, Self N = 1,440. Significant b and 𝑅2 values are bolded and italicized.

𝑅2 for multilevel models (e.g., peer, staff, and boss) is 𝑅𝑀𝑉𝑃2 (LaHuis, Hartman, Hakoyama, & Clark, 2014).


79

Table 12

Supplemental analyses: impact of burnout on interpersonal effectiveness competency ratings.



Staff

GB -0.16 0.04 -4.68 0.01 0.01

EE -0.02 0.04 -0.60 EE -0.05 0.05 -1.10

DP 0.01 0.04 0.13 DP 0.00 0.06 -0.02

PA -0.14 0.04 -3.73 PA -0.17 0.04 -3.80

Peer

GB -0.15 0.03 -5.75 0.01 0.01

EE -0.04 0.03 -1.35 EE -0.09 0.04 -2.38

DP 0.02 0.03 0.52 DP 0.01 0.05 0.30

PA -0.09 0.03 -3.11 PA -0.11 0.03 -3.26

Boss

GB -0.14 0.04 -3.55 0.01 0.01

EE -0.10 0.04 -2.20 EE -0.14 0.06 -2.43

DP 0.00 0.05 -0.01 DP 0.03 0.07 0.47

PA -0.07 0.04 -1.75 PA -0.07 0.05 -1.43

Self

GB -0.32 0.03 -9.59 0.09 0.08

EE 0.08 0.04 2.06 EE 0.02 0.05 0.50

DP -0.07 0.04 -1.66 DP -0.17 0.06 -2.91

PA -0.19 0.04 -5.23 PA -0.25 0.04 -5.94




80

Table 13

Supplemental analyses: impact of burnout on critical thinking competency ratings.



Staff

GB -0.15 0.03 -4.53 0.01 0.01

EE -0.01 0.03 -0.25 EE -0.08 0.05 -1.69

DP 0.05 0.04 1.29 DP 0.04 0.06 0.74

PA -0.11 0.04 -3.11 PA -0.15 0.04 -3.53

Peer

GB -0.12 0.02 -4.90 0.01 0.01

EE -0.03 0.03 -0.90 EE -0.09 0.04 -2.49

DP 0.03 0.03 0.98 DP 0.03 0.04 0.58

PA -0.05 0.03 -1.97 PA -0.07 0.03 -2.29

Boss

GB -0.10 0.04 -2.72 0.01 0.01

EE -0.06 0.05 -1.42 EE -0.15 0.06 -2.69

DP 0.06 0.05 1.25 DP 0.12 0.07 1.80

PA -0.07 0.04 -1.75 PA -0.09 0.05 -1.99

Self

GB -0.28 0.04 -7.62 0.06 0.05

EE 0.04 0.04 1.06 EE -0.03 0.05 -0.54

DP -0.01 0.05 -0.21 DP -0.08 0.06 -1.21

PA -0.18 0.04 -4.48 PA -0.24 0.05 -5.21




81

Table 14

Supplemental analyses: impact of burnout on organizational stewardship competency ratings.



Staff

GB -0.16 0.03 -4.87 0.01 0.01

EE -0.01 0.04 -0.37 EE -0.08 0.05 -1.74

DP 0.04 0.04 0.97 DP 0.03 0.06 0.47

PA -0.10 0.04 -2.99 PA -0.14 0.04 -3.37

Peer

GB -0.12 0.02 -5.22 0.01 0.01

EE -0.04 0.03 -1.40 EE -0.07 0.03 -2.07

DP 0.01 0.03 0.37 DP 0.01 0.04 0.33

PA -0.08 0.03 -3.19 PA -0.10 0.03 -3.35

Boss

GB -0.09 0.04 -2.66 0.01 0.01

EE -0.07 0.04 -1.57 EE -0.12 0.05 -2.28

DP 0.04 0.05 0.98 DP 0.11 0.06 1.72

PA -0.10 0.04 -2.62 PA -0.12 0.04 -2.76

Self

GB -0.32 0.03 -9.22 0.08 0.08

EE 0.09 0.04 2.31 EE 0.02 0.05 0.47

DP -0.05 0.04 -1.18 DP -0.16 0.06 -2.56

PA -0.19 0.04 -5.20 PA -0.26 0.04 -6.01




82

Table 15

Supplemental analyses: impact of burnout on veteran and customer focus competency ratings.



Staff

GB -0.14 0.03 -4.71 0.01 0.01

EE 0.01 0.03 0.21 EE -0.01 0.04 -0.26

DP 0.01 0.04 0.37 DP -0.01 0.05 -0.12

PA -0.15 0.03 -4.54 PA -0.18 0.04 -4.81

Peer

GB -0.09 0.02 -3.98 0.01 0.01

EE -0.02 0.03 -0.81 EE -0.02 0.03 -0.48

DP -0.01 0.03 -0.25 DP -0.01 0.04 -0.33

PA -0.09 0.03 -3.68 PA -0.11 0.03 -3.61

Boss

GB -0.06 0.03 -1.75 0.01 0.01

EE -0.05 0.04 -1.20 EE -0.01 0.05 -0.27

DP -0.02 0.04 -0.42 DP 0.02 0.06 0.29

PA -0.12 0.04 -3.35 PA -0.13 0.04 -3.03

Self

GB -0.35 0.04 -9.42 0.09 0.09

EE 0.08 0.04 1.91 EE 0.06 0.05 1.17

DP -0.11 0.05 -2.35 DP -0.23 0.07 -3.47

PA -0.23 0.04 -5.72 PA -0.29 0.05 -6.20




83

Table 16

Supplemental analyses: impact of burnout on personal mastery competency ratings.


Rater Group Predictor b SE t-value R2 Predictor b SE t-value R2

Staff

GB -0.14 0.03 -4.27 0.01 0.01

EE -0.02 0.04 -0.62 EE -0.08 0.05 -1.71

DP 0.04 0.04 0.96 DP 0.04 0.06 0.78

PA -0.11 0.04 -3.19 PA -0.14 0.04 -3.49

Peer

GB -0.12 0.02 -4.97 0.01 0.01

EE -0.04 0.03 -1.45 EE -0.09 0.03 -2.59

DP 0.02 0.03 0.64 DP 0.02 0.04 0.43

PA -0.05 0.03 -1.97 PA -0.07 0.03 -2.14

Boss

GB -0.08 0.04 -2.30 0.01 0.01

EE -0.05 0.04 -1.23 EE -0.12 0.05 -2.19

DP 0.06 0.05 1.27 DP 0.10 0.07 1.6

PA -0.07 0.04 -1.82 PA -0.09 0.05 -2.02

Self

GB -0.31 0.03 -8.97 0.08 0.08

EE 0.05 0.04 1.23 EE 0.03 0.05 0.61

DP -0.11 0.04 -2.37 DP -0.22 0.06 -3.54

PA -0.16 0.04 -4.29 PA -0.20 0.04 -4.63




84

Table 17

Supplemental analyses: impact of burnout on leading people competency ratings.



Staff

GB -0.17 0.03 -4.87 0.01 0.01

EE -0.01 0.04 -0.28 EE -0.05 0.05 -0.96

DP -0.01 0.04 -0.11 DP -0.04 0.06 -0.61

PA -0.11 0.04 -2.76 PA -0.13 0.04 -2.92

Peer

GB -0.13 0.03 -4.87 0.01 0.01

EE -0.04 0.03 -1.33 EE -0.09 0.04 -2.34

DP 0.01 0.03 0.43 DP 0.02 0.05 0.46

PA -0.07 0.03 -2.35 PA -0.09 0.03 -2.59

Boss

GB -0.10 0.04 -2.36 0.01 0.01

EE -0.09 0.05 -1.93 EE -0.15 0.06 -2.53

DP 0.04 0.05 0.67 DP 0.11 0.07 1.53

PA -0.07 0.04 -1.63 PA -0.09 0.05 -1.67

Self

GB -0.34 0.04 -8.96 0.07 0.07

EE 0.09 0.04 2.12 EE 0.04 0.06 0.64

DP -0.08 0.05 -1.60 DP -0.22 0.07 -3.22

PA -0.17 0.04 -3.99 PA -0.23 0.05 -4.69




85

Table 18

Supplemental analyses: impact of burnout on building coalitions competency ratings.



Staff

GB -0.16 0.03 -4.76 0.01 0.01

EE -0.01 0.04 -0.21 EE -0.04 0.05 -0.95

DP 0.01 0.04 0.34 DP 0.00 0.06 -0.08

PA -0.12 0.04 -3.42 PA -0.16 0.04 -3.76

Peer

GB -0.11 0.02 -4.34 0.01 0.01

EE -0.05 0.03 -1.88 EE -0.08 0.04 -2.30

DP 0.01 0.03 0.35 DP 0.03 0.04 0.73

PA -0.08 0.03 -2.84 PA -0.09 0.03 -2.89

Boss

GB -0.06 0.04 -1.72 0.01 0.01

EE -0.08 0.04 -1.80 EE -0.12 0.06 -2.09

DP 0.04 0.05 0.73 DP 0.11 0.07 1.59

PA -0.08 0.04 -1.91 PA -0.09 0.05 -1.86

Self

GB -0.31 0.04 -8.38 0.07 0.06

EE 0.08 0.04 1.91 EE 0.00 0.05 0.04

DP -0.04 0.05 -0.76 DP -0.13 0.07 -1.93

PA -0.18 0.04 -4.59 PA -0.26 0.05 -5.50




86

Table 19

Supplemental analyses: impact of burnout on leading change competency ratings.



Staff

GB -0.09 0.04 -2.29 0.01 0.01

EE 0.02 0.04 -0.37 EE -0.01 0.05 -0.23

DP 0.04 0.05 0.79 DP 0.03 0.07 0.51

PA -0.11 0.04 -2.55 PA -0.15 0.05 -3.00

Peer

GB -0.06 0.03 -2.03 0.01 0.01

EE -0.06 0.04 -1.55 EE -0.05 0.05 -1.14

DP 0.03 0.04 0.76 DP 0.09 0.06 1.56

PA -0.14 0.03 -4.11 PA -0.17 0.04 -4.16

Boss

GB 0.00 0.04 0.01 0.01 0.01

EE -0.04 0.05 -0.71 EE -0.02 0.06 -0.29

DP 0.02 0.06 0.39 DP 0.08 0.08 1.01

PA -0.09 0.05 -2.02 PA -0.10 0.05 -1.94

Self

GB -0.29 0.04 -7.34 0.06 0.06

EE 0.09 0.05 1.95 EE 0.04 0.06 0.73

DP -0.04 0.05 -0.79 DP -0.13 0.07 -1.92

PA -0.21 0.04 -4.96 PA -0.28 0.05 -5.75




87

Table 20

Supplemental analyses: impact of burnout on results driven competency ratings.



Staff

GB -0.17 0.03 -5.20 0.01 0.01

EE 0.02 0.04 0.60 EE -0.05 0.05 -1.11

DP 0.03 0.04 0.78 DP -0.01 0.06 -0.14

PA -0.10 0.04 -2.76 PA -0.14 0.04 -3.30

Peer

GB -0.11 0.03 -4.57 0.01 0.01

EE -0.03 0.03 -0.93 EE -0.06 0.04 -1.55

DP 0.00 0.03 0.11 DP 0.01 0.04 0.18

PA -0.08 0.03 -2.93 PA -0.1 0.03 -3.14

Boss

GB -0.07 0.04 -1.62 0.00 0.00

EE -0.04 0.05 -0.77 EE -0.08 0.06 -1.42

DP 0.04 0.05 0.69 DP 0.09 0.07 1.26

PA -0.07 0.04 -1.72 PA -0.10 0.05 -1.92

Self

GB -0.36 0.04 -8.65 0.07 0.06

EE 0.13 0.05 2.68 EE 0.00 0.06 0.03

DP -0.03 0.05 -0.60 DP -0.17 0.07 -2.29

PA -0.15 0.04 -3.35 PA -0.23 0.05 -4.48




88

Table 21

Supplemental analyses: impact of burnout on global perspective competency ratings.



Staff

GB -0.11 0.04 -2.93 0.00 0.00

EE 0.01 0.05 0.20 EE -0.02 0.06 -0.37

DP 0.02 0.05 0.47 DP 0.00 0.07 0.06

PA -0.10 0.04 -2.37 PA -0.14 0.05 -2.74

Peer

GB -0.08 0.03 -2.30 0.00 0.00

EE -0.02 0.04 -0.48 EE -0.04 0.05 -0.87

DP 0.01 0.04 0.33 DP 0.04 0.06 0.65

PA -0.09 0.04 -2.33 PA -0.11 0.04 -2.58

Boss

GB 0.00 0.04 0.01 0.00 0.00

EE -0.04 0.05 -0.71 EE 0.07 0.07 0.92

DP 0.02 0.06 0.39 DP 0.00 0.09 0.05

PA -0.09 0.05 -2.02 PA -0.14 0.06 -2.23

Self

GB -0.32 0.04 -7.35 0.06 0.05

EE 0.10 0.05 2.02 EE 0.01 0.06 0.18

DP 0.00 0.06 -0.02 DP -0.10 0.08 -1.34

PA -0.22 0.05 -4.57 PA -0.31 0.06 -5.58




89

Table 22

Supplemental analyses: impact of burnout on business acumen competency ratings.



Staff

GB -0.17 0.04 -4.60 0.01 0.01

EE -0.02 0.04 -0.37 EE -0.04 0.05 -0.88

DP -0.01 0.05 -0.22 DP -0.04 0.06 -0.68

PA -0.10 0.04 -2.52 PA -0.12 0.05 -2.66

Peer

GB -0.10 0.03 -3.38 0.01 0.01

EE -0.06 0.03 -1.89 EE -0.09 0.04 -2.13

DP 0.01 0.04 0.23 DP 0.04 0.05 0.79

PA -0.07 0.03 -2.25 PA -0.08 0.04 -2.22

Boss

GB -0.09 0.05 -1.93 0.01 0.01

EE -0.07 0.06 -1.25 EE -0.14 0.07 -2.08

DP -0.09 0.06 1.48 DP 0.17 0.08 1.98

PA -0.12 0.05 -2.30 PA -0.15 0.06 -2.58

Self

GB -0.31 0.05 -6.38 0.05 0.04

EE -0.06 0.06 1.06 EE -0.03 0.07 -0.46

DP 0.00 0.06 0.04 DP -0.07 0.09 -0.83

PA -0.20 0.05 -3.76 PA -0.28 0.06 -4.55


𝑅2 for multilevel models (e.g., peer, staff, and boss) is 𝑅𝑀𝑉𝑃2 (LaHuis, Hartman, Hakoyama, & Clark, 2014.


90

Figure 1

Example of different possible structures of burnout.

Note. From left to right: correlated traits model, second-order factor model, and bifactor

model.


91

Figure 2

Corrgram of the correlations between the items of the MBI-HSS.


92

Figure 3

DP1 Item Information Clamshell Plot.


93

Figure 4



94

Figure 5



95

Figure 6



96

Figure 7



97

Figure 8

EE1 Item Information Clamshell Plot.


98

Figure 9



99

Figure 10



100

Figure 11



101

Figure 12



102

Figure 13



103

Figure 14



104

Figure 15



105

Figure 16



106

Figure 17

PA1 Item Information Clamshell Plot.


107

Figure 18



108

Figure 19



109

Figure 20



110

Figure 21



111

Figure 22



112

Figure 23



113

Figure 24



114

Figure 25

Depersonalization Test Information Clamshell Plot.


115

Figure 26

Emotional Exhaustion Test Information Clamshell Plot


116

Figure 27

Personal Accomplishment Test Information Clamshell Plot


117

Figure 28

General Burnout Information Provided by the Depersonalization Subscale.


118

Figure 29

Depersonalization Test Information.


119

Figure 30

General Burnout Information Provided by the Emotional Exhaustion Subscale.


120

Figure 31

Emotional Exhaustion Test Information.


121

Figure 32

General Burnout Information Provided by the Personal Accomplishment Subscale.


122

Figure 33

Personal Accomplishment Test Information.


123

Figure 34

Marginal General Burnout Information Plots

Note. Plot is faceted by subscale (Columns) and angle relative to the general burnout

factor (Rows)

Documents

A Bifactor Model of Burnout? An Item Response Theory