2006 Kilburg EER Long Form Article 20pgs

The EER: IS IT TRANSFORMATIONAL?

A PSYCHOLOGIST TURNED DIPLOMAT TAKES A CRITICAL LOOK AT THE CURRENT EER SYSTEM

Don Kilburg, Ph.D.

Transformational Diplomacy

& Employee Evaluation Reports In a key address at Georgetown University, Secretary of State Condoleezza Rice showcased the buzzwords “Transformational Diplomacy” (January 18, 2006). America needs a “diplomacy that not only reports about the world as it is, but seeks to change the world itself”, the Secretary advised. “We must transform old diplomatic institutions to serve new diplomatic purposes” and we must “prepare” and “challenge” our own diplomats with “new expertise” and “new expectations”, she insisted. As both a diplomat and a psychologist, it occurred to me that if we want to advance this sort of diplomacy, we might need to take a critical look at how we formally shape ourselves as a diplomatic force: the Employee Evaluation Report, or EER. Does the EER produce diplomats who are true “transformers”, as Secretary Rice has called for?

Of course, just like the answer to most questions in the Department, “it depends”. It depends who you ask. Anecdotal reports from many employees suggest “the system ain’t broke, so don’t fix it”. Other employees are openly critical of the EER system, referring to it as “a game” or a “kabuki dance”. Both tend to agree that the EER is essentially an exercise in “water-walking”, whereby everyone is made to look exceptional, regardless of his or her true performance.

Still others point out that when EERs do not paint a picture of water walking, they either “damn with faint praise” or they damn with the rare candid critique. Damning with faint praise is arguably disingenuous. Supervisors may knowingly code what they write in order to covertly damage an employee they will not overtly criticize. In contrast, damning with the rare candid critique is arguably an accident of who your supervisor happens to be. Such supervisors may pride themselves in breaking the high praise norm, which essentially results in a disproportionate amount of overt damage to their own employees vis-à-vis their water-walking peers.

In any case, it is not clear how exactly the EER could serve as a tool for transformation. A system wherein the luck of the draw on supervisors’ candor matters so much does not seem fair. A system that over-rewards might serve only to limit the advancement of promising employees by virtue of under-challenging them. Worse still, a system that reinforces bad habits through an inability to effectively categorize employees could actually result in bad employees getting promoted.

“So what?” you say. “The real transformation of the workforce occurs by way of so-called ‘corridor reputation’”, you suppose. In fact, you may be right, but then this in turn begs another question: why do we spend so much time and energy on EERs? By some estimates, progress on other work

1

at our missions comes to a grinding halt for one or two full months a year – simply to deal with the business of evaluating ourselves. Are EERs really worth it?

Context

To analyze the value of EERs, I first attempted to review the existing literature on the topic. I asked people in Human Resources if they knew of any systematic review of the EER system. They did not. I asked the American Foreign Service Association if they knew of any study or article that discussed the merits of the EER system. They did not. In fact, no one could point me in the direction of any sort of systematic appraisal of the employee evaluation process. “Could it be that no one had ever systematically evaluated a tool the Department uses so widely?” I asked myself. As a psychologist the most common question you get when presenting research findings is: does your measure have demonstrated statistical validity and reliability? I began to wonder: does the EER have demonstrated statistical validity and reliability?

It was not until I started mentioning my idea to conduct a survey of employee’s experiences with the EER that anything resembling preexisting literature on the subject surfaced. A retired Foreign Service Officer of 26 years of service, named Eli Lauderdale, told me about a book called “The Foreign Affairs Fudge Factory”, written by John Franklin Campbell in 1971. As the title indicates, the book’s take on the Department’s bureaucratic culture of the early seventies is critical. The introduction to the book starts with a

quote from Joseph Kraft: “The fact is that the Department has not been run primarily as a decision-making instrument. It has been run as a fudge factory. The aim has been to make everybody happy, to conciliate interests, to avoid giving offense and rocking the boat.” This quotation certainly resonates, yet I was inclined to dismiss a book written a full 35 years ago. It could not possibly apply to today’s employee evaluation system, I thought.

Then I came across another opinion from the period, in the Campbell book, expressed clearly by renowned diplomat George Kennan however: “Let me control personnel and I will ultimately control policy. For the part of the machine that recruits and hires and fires and promotes people can soon control the entire shape of the institution, and of our foreign policy.”

I also heard from Doug Ellice, another retired Foreign Service Officer, with 27 years of service under his belt. He sent me a copy of his letter of a few years ago, to Director General Ambassador Davis. Mr. Ellice’s commentary about EERs in the letter was, to say the least, scathing. He calls the EER “worthless”, stating that there are “no quantifiable performance measures to be weighed”. He goes on to state that the Rating Officer’s role is “totalitarian”. Ellice maintains in the letter that the Foreign Service is “poisoned by the need officers feel to please their Rating and Reviewing Officers”.

With these anecdotes, but no real research on the topic, I decided to conduct a survey of employees’ experiences with the EER. This would be a first step toward identifying how the

2

EER works or does not work as a tool for developing a workforce that could advance Transformational Diplomacy. The survey is distinct from the poll in that the researcher aims to identify not only people’s basic opinions, but also the relationships between various factors that underlie them, especially by using statistical analysis. The goal of the project was not simply to collect a smattering of complaints, but rather to systematically characterize the collective experience our people have of the EER. The idea was to hold the proverbial mirror to the EER, to evaluate it for what it is worth as a tool (special thanks to Brian Majewski and Isiah Parnell for their encouragement).

Basic Questions & Hypotheses

I wanted to know what factors go into getting a good EER, getting tenured, and getting promoted. I openly surmised that one’s experience with the EER was based on more than just his or her actual work performance in a vacuum, but also on the approaches of the Rater and Reviewer, the background of the employee, the circumstances of the EER process, etcetera. In short, I hypothesized the EER to be an imperfect tool, one that could be improved for the betterment of the Foreign Service and hence for the advancement of Transformational Diplomacy.

Research Method

Obtaining survey participants for the on-line “EER Experiences Survey” I would undertake was challenging. Denied institutional help, the survey had to be distributed almost entirely through private email and website channels, with the exception of limited email

forwarding that individual employees undertook through Department email to their own colleagues (special thanks to John Dinkelman for providing website contact names). Fortunately, when employees did come across my survey, the response rate was truly outstanding – employees clearly had something to get off their proverbial chests. For as frustrating as it was to have official channels (HR/PE, CDA, AFSA) decline to participate in a survey designed to make the Foreign Service better, I respect their traditionalist reaction and mainly the fact that they did not ask me to stop the survey as a private channel endeavor.

The survey was entirely web-based, utilizing a survey tool called “Zoomerang”, found at www.zoomerang.com (please contact me through email if you would like a copy). It consisted of 80 questions, most of which are multiple-choice. It asked about basic demographic information, such as age, sex, and education. It also asked about what sort of EER the respondent got last, for example whether he or she was satisfied with the evaluation and if not, why not. Indeed by the time the respondent was finished with all 80 questions, he or she had been asked about a full range of items hypothetically related to the EER process and its outcomes. The amount of data generated by the survey was remarkable and cannot easily be summarized here. Key findings will be discussed and additional ones will be reserved for future papers or for responses to specific inquiries. The survey was admittedly designed primarily for Entry Level Officers (ELOs). It was developed using

3

http://www.zoomerang.com/

feedback from a focus groups consisting of several ELOs from the Consular Section at US Embassy Mexico City. The survey was later expanded so that Specialists and Generalists of all ranks could answer it if they came across it. Though it clearly does not apply as well to these other employees, the goal was to make the survey as inclusive as possible. Some questions are more useful than others in comparing experiences across employment type and pay grade. The survey asked respondents about their most recent EER exclusively. The point of limiting the survey to the most recent EER was to hold constant as best as possible the extraneous effects of memory revision and disproportionate complaining. If each respondent answered the questions with his/her most recent EER in mind, we would have a clearer snap-shot of the overall experiences of the respondents as a group, rather than a biased snap-shot of simply the bad or good experiences over the years the respondents chose to voice.

Participants

Participation in the survey was remarkably high and remarkably comprised of hires from the Diplomatic Readiness Initiative (DRI). In a month and a half, 644 Foreign Service employees had completed the survey. We cannot know how many employees had access to the survey, since it was passed around electronically. However, the response rate can be considered exceptionally high, given both the relatively small size of the Foreign Service and especially the number of respondents by orientation class (A-100), as a percentage, which was roughly 25%. That is, at least among

ELOs, roughly 1 in 4 answered the survey, per incoming class. After cleaning the dataset of incomplete cases, there were 446 Generalists and 189 Specialists whose data could be used in analyses. Because of the relatively low response rate among the Specialists and their inherent variability in job type, I chose to focus on the Generalists for now.

Among the 446 Generalists who answered the survey completely, roughly 90% (389) were entry-level or lower echelon Officers with less than nine years in the Foreign Service. The remaining 10% were too varied in length of service, so I separated them out and focused on the remaining 389 Generalists. The vast majority (71%) of this group was hired during the three-year (2002-2004) DRI, i.e. 278 out of 389. One can appreciate the successful response rate of the survey when considering that the Diplomatic Readiness Initiative was to hire 623 Generalists across 2002-2004 (according to GAO reports) and the survey had 278 DRI hires. That is a sizeable 45% response rate within the DRI cohort.

Most of the Generalists in the subset of 389 came from the 99th A-100 through the 120th. Most were not tenured at the time of the survey, but 150 out of 384 of them (39%) were tenured (5 people did not disclose). The Generalists ranged in age from 24 to 63, with an average of 34 and a mode (most common value) of 31. There were 188 female respondents and 199 male, a roughly even split. The 5 job “cones” were relatively evenly represented in the dataset, in terms of the Generalists’ chosen career tracks. However, most of the Generalists (48%) in the dataset

4

reported their most recent EER was received while doing a Consular job, 14% a Management job, 14% a Political job, 10% a Public Diplomacy job, and only 8% an Economic job. All the following results refer to the subset of the 389 newer Generalists unless otherwise specified.

Results

Most Satisfied with Own EER and Much is Good about the Current System.

In many ways, much is good about the current EER system. Most Generalists surveyed were very satisfied with their own EERs. A full 86% of the Generalists were either somewhat satisfied (29%) or very satisfied (57%) with the final outcomes of their own EER Ratings. A full 82% of these Generalists were either somewhat satisfied (25%) or very satisfied (57%) with the final outcomes of their own EER Reviews. You might say these satisfaction levels are relatively high, given the fact that most of the Generalists received the given EERs for doing jobs they by and large did not join the Foreign Service to do. Indeed, according to this group’s data, it is not until one’s fifth year in the Foreign Service that the odds of getting a non-Consular job become greater than “50-50”. Among those in the FS up to two years, 60% had their last EER in a Consular job; up to three years, 54%; up to five years, 52%. At the fifth year mark, 34% of the Generalists in the dataset had a Consular job, the largest of any cone group. Satisfaction with Raters ran relatively high among the 389 Generalists. Roughly 79% of the

Generalists liked their Raters as bosses at least “somewhat” (28%) and most (51%) liked them “very much”. Only 21% reported liking their Raters as bosses “very little”. As much as 92% liked their Raters as people at least “somewhat” (26%) and most (66%) liked them “very much”. Only 8% reported liking their bosses as people “very little”. The figures were very similar for Reviewers. We can also say that the current EER system is a decent means of getting positive feedback from one’s supervisors and of having something for which to strive. According to the 389 Generalists, most supervisors raise the EER with their subordinates as either a procedural matter or a positive opportunity to earn a reward for a good performance. Very few supervisors raise the EER threateningly with their subordinates. In short, the EER probably has some solid merits as a good incentive to perform well. It may further speak well of the current EER system that satisfaction with it would not seem to be largely affected by many things that should arguably not affect it. In my analyses, no significant differences were found in levels of satisfaction with the EER system on the basis of the employee’s: age, sex, cone, years of pre-FS work experience, or educational level. Most Dissatisfied with the Current System and Much is Bad about the System. Here is the catch. According to this dataset, there is an enormous amount wrong with the current EER system. For starters, most of the Generalists surveyed are quite

5

dissatisfied with the EER system, despite being quite satisfied with their own Ratings and Reviews. A whopping 71% of the 389 Generalists were either neutral about or dissatisfied with the current EER system. The bulk of the Generalists (46%) were either somewhat dissatisfied (27%) or very dissatisfied (18%) with the current EER system. Only 29% of the Generalists were satisfied with the current EER system (24% were somewhat satisfied and 5% were very satisfied). The big issue is that employees feel the EER system is not good for the Service, owing to its perceived inability to effectively evaluate and sort the good from the bad. Even a moment’s glance at the open-ended comments from the Generalists reveals deep criticism of the current EER system. One respondent called the system a “Kabuki dance”. Another called it a “game”. Still another called it a “joke” and “waste of time”. Only a small minority of comments expressed support for the current EER system, hedging their opinions with comments like, “it’s not perfect, but it’s the best we have”.

The single most common complaint appeared to be that “everyone is a water walker”. Many respondents indicated in one way or another that such a “water walking” system of over inflated praise is powerless to remove poor performers and does not discern between average and exceptional work. As one Officer summarized succinctly: “EERs don’t seem to be about management, growth, or performance – and they certainly aren’t reviews or evaluations. They are the way the Foreign Service exemplifies its ‘go along to get along’ personality”.

Another apparent problem is that the procedural regulations for completing EERs are very poorly followed. Raters and Reviewers are not seen as being proactive. Only about half (52%) of the Generalists reported their Raters/Reviewers to be proactive in getting their EERs completed. Only 27% reported that the counseling dates on their EERs were accurate and another 27% reported their counseling dates did “not at all” correspond to any actual counseling dates. Only 44% reported getting a written counseling statement, something that is in theory supposed to document good performance as much as bad. Lastly, one in four of the Generalists reported not getting a Work Requirements Statement on time.

Employees also took issue with having no real outlet to disagree with their Raters and Reviewers. Though most saw their own EERs as being quite good, many were disgruntled over the notion that the EER form and culture left no means of dissent. The personal statement could not be effectively used as a tool for dissent, without fear of downgrading, no matter what the true quality of the employee’s performance. Most employees reported writing rosy statements and entertaining no viable option for EER grievances.

Some employees were more dissatisfied with the current EER system than others, as a function of noticeable factors. Those who had a lower level of interest in the FS as a career had a higher level of dissatisfaction with the current EER system (mean of 1.4 versus 1.9, on a satisfaction scale of 0 to 4; p<.05). Both groups were relatively dissatisfied with the EER system, but the group with lower interest in the FS as a career was

6

less satisfied with the EER system. Possibly these individuals are less interested in FS careers precisely because they are dissatisfied with the EER system.

In any event, employees higher on proactive-ness in completing their EERs were also less likely to be satisfied with the current EER system (mean of 1.59 versus 1.84, on a satisfaction scale of 0 to 4; p<.05). In both cases, employees were overall dissatisfied with the current EER system. It may be that proactive people are simply more irritated by a perceived sluggish nature to the EER system.

Certainly the Generalists were much less satisfied with the EER system and their own EERs as well, if they had reported their counseling dates to be inaccurate and/or that they never received a written counseling session. While the Generalists were quite satisfied with their own EER Ratings and Reviews overall, the more they reported their counseling dates as inaccurate and/or their written counseling sessions as absent, the less likely they were to be as satisfied with their Ratings, Reviews, and the EER system (p<.01 in all cases).

These highly significant findings fly in the face of those who would argue that counseling dates and formats do not matter. It appears that they matter to the extent that employees probably do not feel they were treated as fairly or effectively when such structures are lacking. It is somewhat surprising that absence of a written counseling session did not correlate with greater satisfaction with one’s own EER Ratings and Reviews, given the presumption that written counseling sessions may

typically be used punitively. Possibly this presumption is incorrect and/or possibly employees are more satisfied across the board when they are routinely informed of their performance, whether it is good or bad.

In analyses of EER system dissatisfaction in the larger dataset of 446 Officers, including more senior officers with as much as 33 years of experience in the FS, there were also noticeable patterns. Those who were relatively new to the FS (with less than nine years in) were less satisfied with the current EER system than their more experienced counterparts (with more than nine years in); p<.05. That said, neither group was very satisfied with the system. The mean level of satisfaction among the new Officers was 1.7 on a satisfaction scale of 0 to 4 (where 2 is neutral), i.e. the new Officers were mainly dissatisfied. The mean level among more experienced Officers was 2.12; i.e. they were neutral.

Interestingly, when comparing Specialists on EER system satisfaction, it is clear that they are the least satisfied. Analyzing the 175 usable surveys from the Specialists, the mean level of satisfaction for the newer employees was a largely dissatisfied 1.59 (on a scale of 0 to 4). In contrast to the pattern with the Officers, Specialists with more time in the FS were even less satisfied with the EER system compared with their newer counterparts. In fact, the least satisfied of all FS employees were the Specialists with greater than nine years of time in the FS (mean of 1.57).

None of these low levels of satisfaction with our EER system may come as any surprise. As in any survey about a controversial topic, the

7

disaffected may be more likely to respond by virtue of the discontent they wish to vocalize. That said, the sample size in this study is substantial and the survey was worded in quite neutral terms, allowing for the widest range of expression. Most importantly, the bulk of the employees surveyed reported high levels of satisfaction with their own EER Ratings and Reviews – hence their discontent appears to rest mainly on the EER system, as opposed to any general demoralized state. Many arbitrary factors affect EERs, beside performance.

What may be most disconcerting about the current EER system is that many seemingly arbitrary factors have significant affects on the outcome of one’s EER – things that are not directly related to performance. For one, in my research I found strong evidence that people cannot easily separate their fondness (or lack thereof) for other people as people, versus as fellow employees. If this is true, EERs may be more about winning friendly favor (with bosses exclusively) than performing well on the job per se.

Employees were asked to report how much they liked their Raters and Reviewers and how much they perceived their Raters and Reviewers liked them. Ratings were solicited in all cases in two different categories: “as a person” and “as a boss” (e.g. “how much did you like your Rater as a person? As a boss?”). Results showed that employees’ ratings of their bosses as people strongly correlated with their ratings of their bosses as bosses. The same was true for how employees perceived their bosses to perceive them as people and as

subordinates. In short, there is strong evidence that liking and being liked as both a person and as a boss/subordinate are tightly linked. These findings were highly significant using a variety of statistical tests (independent samples t-tests: p<.001; paired samples correlations: p<.001).

To illustrate, let us look at the relationship between two variables: perception of being liked as a person by one’s Rater and perception of being liked as a subordinate by one’s Rater. I divided each of these variables into “high” and “low” groups, wherein the survey respondent was put in the “high” group if he/she reported a 2 on the likeability scale of 0 to 2 and in the “low” group if he/she reported a 0 or 1. Statistical results were highly significant, that “low” group employees perceived that their Raters liked them as subordinates significantly less (mean of 1.26) than “high” group employees (mean of 1.87), as a function of how little or much they perceived the Raters liked them as people.

The same result was found with the inverse model. That is, those who perceived their Raters liked them as subordinates to a “low” extent also perceived their Raters liked them as people to a lower extent (mean of 1.11), compared with their counterparts who perceived their Raters liked them as subordinates to a “high” extent (mean of 1.78).

Though we cannot say for sure statistically whether there is a causal relationship between liking and being liked as a person and liking and being liked as a boss or subordinate, we can say that probabilistically speaking, one’s likelihood of getting a good EER

8

decreases rapidly if one perceives he/she is not liked on a personal level. Conversely, one’s likelihood of getting a good EER increases rapidly if one perceives he/she is liked on a personal level. In this case “good” is defined by level of satisfaction one has. One’s likelihood of being more satisfied with one’s own EER as a function of perceiving being liked by one’s bosses as a person also contributed to greater satisfaction with the EER system. This can be reasonably said because in all statistical analyses of EER satisfaction in the present study, satisfaction increased as a function of increased perception of being liked as a person. In all cases these findings were highly significant at p<.001. We would not expect to find these statistical relationships if how one was perceived as a person bore little relationship to how one was perceived as a subordinate in the job or a supervisor in the job.

In short, it does not appear from these findings that people can easily separate their feelings toward one another as people and as bosses or subordinates. This may reflect the age old wisdom of throwing cocktail parties for your bosses and subordinates. It would be fascinating to compare the EER and career successes of employees on the basis of how much they involved themselves in socially ingratiating their bosses and subordinates over the years. The effect may be more powerful than we can imagine. The mechanism is two-fold: (1) people rate people as more competent if they like them on a personal level and (2) people are disinclined to rate people as incompetent if they like them on a personal level.

One of the more controversial elements of the EER is the Area for Improvement box. I asked employees to tell me what was written in their boxes and how they chose to respond. Most reported that their boxes mentioned Managerial skills (135, or 35%), then Substantive Knowledge skills (107, or 28%), then Communication/Foreign Language skills (90, or 23%), then Interpersonal skills (49, or 13%), then Leadership skills (41, or 11%), and finally Intellectual skills (38, or 10%). These percentages do not total 100 because some respondents had more than one core competency group sited in their Area for improvement boxes.

Interestingly, respondents varied greatly in the extent to which they thought their Areas for Improvement were accurate reflections of their performance. Taken as a whole it appears that over a third of the Generalists (35%) did not think that their Areas for Improvement were very germane to their actual performance. This would seem to imply there is a relatively large gap between Raters’ and their subordinates’ perceptions of the subordinates’ Areas for Improvement.

Yet what employees had in their Area for Improvement boxes on their last EERs appeared to be related to tenure. The tenured subset of 150 Generalists from the 389 with fewer than nine years in the FS was examined for relationships between Area for Improvement boxes and tenure. The comments in the improvement boxes were coded by “core competency”: Leadership skills, Managerial skills, Interpersonal skills, Communication/Foreign Language skills, Intellectual skills, and Substantive

9

Knowledge. Not one tenured Officer in the dataset of 150 had Leadership skills and not one Officer had Intellectual skills cited in his/her last EER Area for Improvement. Thirty-eight percent (57) had Managerial skills cited, 29% (44) Substantive Knowledge, 19% (29) Communication/Foreign Language skills, and 11% (17) Interpersonal skills. The numbers do not add up correctly due to the fact that some employees reported multiple Areas for Improvement.

I wanted to see if one’s odds of being tenured on the first review changed as a function of the core competency cited in one’s Area for Improvement box, so I calculated the average numbers of tenure reviews officers had prior to getting tenure in the subset of 150 tenured officers. There were too few cases to perform legitimate statistical analysis, however trends emerged. Officers with Managerial skills cited for improvement were reviewed on average the most: 1.44 times; then those with Communication/Foreign Language skills, 1.31; then those with Substantive Knowledge, 1.25; and finally those with Interpersonal skills 1.00. This may not be a strong finding, but it does run counter to the popular expectation that having Interpersonal skills cited in your Area for Improvement is the “kiss of death”. These data do not support that.

How did the 389 Generalists choose to respond to their Area for Improvement boxes? Few chose to explicitly disagree with their Rater’s assessments: less than 2%. The bulk (40%) of the Generalists surveyed reported that they chose not to respond to the Area for Improvement at all in their EER statement, preferring to ignore

it or to wait until the next EER to address it. Another 34% chose to agree with the comments in their Areas for Improvement and to grant the items as “something to work on”. The remaining 25% or so chose to interpret their Area for Improvement positively, with a “spin” or reframe of the item.

These different employee responses to the Area for Improvement appear to have different consequences. We can see evidence of different tenure rates for different responses to the Area for Improvement box by comparing the average tenure review numbers. When asked about the Area for Improvement, those who reported they “interpreted it positively, with a ‘spin’ or reframing of it” were tenured in the lowest average number of reviews (1.24), then those who “agreed with it explicitly, granting it as something to work on” (1.33), then those who “did not address it, preferring to ignore it or wait until next EER” (1.42), and finally those who “disagreed with it explicitly, offering a counterargument” (1.50). It is likely that the more one overtly disagrees with one’s Area for Improvement, the less likely one is well received by tenure panels (and probably promotion panels too).

The extent to which the 389 Generalists “haggled” or negotiated with their Raters/Reviewers was examined, along with the number of rewrite requests the Generalists made of their Raters/Reviewers. Hypothetically one could get a better Rating/Review if one endeavored to elicit changes for the better from his/her Rater/Reviewer. I did not find evidence of that however. The results I found in these analyses indicate that dissatisfaction with EER

10

Ratings, Reviews, and with the EER system, goes up as the amount of haggling/negotiating goes up and as the number of rewrite requests goes up. Most likely haggling/negotiating and rewrite requests go up rather as a function of dissatisfaction with one’s EER Ratings and Reviews, and this in turn decreases one’s satisfaction with the EER system. For the record, most employees did not haggle/negotiate very much, nor did they request many rewrites. From the larger dataset, some 70% reported not haggling/negotiating at all and the average number of rewrite requests was only one.

It is tempting to conclude from the above that employees do not influence very much the outcomes of their Ratings and Reviews through haggling/negotiating or through multiple rewrite requests. However, it is clear from comparing EER complaints before and after requests for changes, that employees have significantly fewer complaints about EERs in the end – an estimated 30% less.

The survey respondents collectively had 492 various, reported complaints about their own EER Ratings/Reviews before requesting changes. After requesting changes, the number of total complaints reported dropped to 349, a difference of 143 or 30%. In sum, it is safe to say that requesting changes of your Rater/Reviewer can dramatically reduce complaints you have about your EER, and in turn probably influence your own competitiveness vis-à-vis your peers. Among seemingly arbitrary factors affecting one’s experience of the EER, I found that sex/gender of the employee comes into play. Though

males and females did not differ significantly in their levels of satisfaction with the EER system, they did differ in their levels of satisfaction with their own EER Ratings and Reviews. In a sample of 377 subjects (12 subjects of 389 did not report their sex/gender), there were 188 females and 199 males. The mean satisfaction level for EER Ratings among males was 3.42; for females it was 3.22 (on a scale of 0 to 4). The mean satisfaction level for EER Reviews among males was 3.43; for females it was 3.18. In both cases the figures are significantly different at p<.05. In explaining this sex/gender difference, we might consider general differences in male/female willingness to express satisfaction with one’s performance (or to hide dissatisfaction). Alternatively, perhaps there is a differential in how males are rewarded compared to females, in terms of either what Raters/Reviewers value and/or what they perceive as their employees’ strengths.

Sex/gender differences in EER satisfaction might not be worth speculating were it not for tenure review patterns. Analyzing the subset of 150 tenured officers from within the 389 Generalists with less than nine years in the FS, I found that females were reviewed for tenure an average of 1.40 times before getting tenured. In contrast males were reviewed an average of 1.28 times before getting tenured. It would seem that males are slightly more likely than females to get tenured on the first review. However, it is difficult to tell whether this was a chance finding given that the sample size of tenured officers was too small to legitimately run

11

inferential statistics on sex/gender and average number of tenure reviews. One major complaint employees had about the EER system is that their evaluations are subject entirely to the approaches of the individuals who serve as their Raters and Reviewers. I have already noted that most employees surveyed saw their Raters/Reviewers as rather low on proactive-ness when it came to completing the EER process. To investigate the effects of differing levels of perceived proactive-ness among Raters/Reviewers on the part of employees, I divided the data into high and low categories and compared them. In the case of both Raters and Reviewers, there were highly significant effects on employee satisfaction with one’s own EER Rating, EER Review, and the EER system (p<.01). That is, Generalists viewing their Raters and Reviewers as low in proactive-ness were less satisfied all around. This finding would appear to run counter to the notion many Raters/Reviewers have that “so long as it gets done, that is all that matters.” There are clearly deleterious effects to procrastinating on EERs, some of which employees themselves may not have been aware. The impact of nurturant leadership on employees is also an important factor in levels of satisfaction with EERs and the EER system. I looked at the extent employees perceived their Raters as mentors or coaches “proactively nurturing professional development throughout the rating period” (on a scale of 0 to 4, where 0 is “not at all” and 4 is “all the time”). In all cases, if employees viewed their Raters as low in this nurturing factor, they also reported lower

levels of satisfaction with their EER Ratings, Reviews, and with the EER system as a whole; p<.001. That is, those reporting low nurturing Raters had a mean satisfaction level of 3.14 for their Ratings, compared with 3.68 for those reporting high nurturing Raters; they had a mean satisfaction level of 3.12 for their Reviews, compared with 3.66 for those reporting high nurturing Raters; and they had a mean satisfaction level of 1.42 for the EER system, compared with 2.23 for those reporting high nurturing Raters. In sum, it is very clear that the level of nurturance of the Rater has major affects on how satisfied the employee is with his or her EER and with the EER system in general. Another common complaint employees had about the EER system in their open-ended responses to the survey was that the quality of EERs is largely driven by the writing styles (or lack thereof) of the Raters and Reviewers writing them. As one way of exploring this hypothesis, I compared satisfaction levels between groups of employees whose Raters and Reviewers had written their EERs in “list” versus “story” format. It has been said that the story format is more powerful in that it provides a chronological narrative of events, as opposed to a mere list or inventory of accomplishments. The story format was much more unique in the employee sample compared to the list format. Among the subset of 150 tenured officers, taken from the larger group of 389 who had served fewer than nine years in the FS, there were 130 who had list format Ratings and 17 who had story format. Statistical conclusions are difficult with

12

too few data points, however suspending prerequisites, the difference in mean tenure rates between the two groups was significant (p<.05). Those tenured officers who had the story format Rating had an average number of tenure reviews of 1.12, compared with their list format counterparts who were tenured on average in 1.38 reviews.

A similar pattern was found for Reviews. One hundred three officers had list format Reviews and 47 had story format reviews. Officers with the story format Review had a slightly faster tenure rate; 1.32 versus 1.35. Taken as a whole, it seems that the story format is more advantageous to the employee being rated. Possibly it conveys a more compelling endorsement. The fact that it is less common than the list format might also have implications. Perhaps Raters/Reviewers are less inclined to take the time to write the presumably more labor-intensive story format unless they already have a compelling endorsement in mind for the employee. In any case, it seems clear that the writing style of the Rater/Reviewer does matter and does have important implications for the employee being rated. If the list/story format distinction has implications for the employee’s success, presumably more subtle forces such as sentence construction, vocabulary, and grammar most certainly do as well. The concern for the employee is that these forces are considerably beyond his/her control.

Not surprisingly, the most common complaint that employees had about the EER system is that their EERs were “not well-written in style/form” (32%). The second most common complaint was that the EER was “not

done as quickly as it could/should have been” (28%). Only 2% complained there was “too much criticism.”

Surprisingly, 250 people out of 389 (64%) wrote lengthy comments in the open-ended box for this item. There were many different kinds of complaints, most about style and form. Some wrote that their EERs were done too hastily, after much procrastination on the part of the Rater or Reviewer. Interestingly, many respondents complained that their Rater or Reviewer did not write the Rating or Reviewer, rather the employee him or herself wrote the Rating or Review. A few employees reported that their EERs had Reviewers from the Civil Service or did not have Reviewers at all.

There were a number of variables that I looked at that did not generate any significant findings and that is perhaps itself significant. For instance, I could not find any evidence that the number of times one took the Foreign Service Written Exam and/or the Oral Assessment made any difference in one’s likelihood getting tenured on the first review.

Some other factors that showed no relationship to the rate at which one gains tenure were: marital/relationship status, having dependent children or not, doing hardship posts (with the exception of having served in Iraq or Afghanistan, which made tenure on the first review very likely), number of months spent in language training (with the exception of those who had only one EER – no one in the sample got tenured with only one EER), having the box “Candidate is recommended for tenure…” checked, level of promotion aspiration, employee’s pay grade, Rater’s pay grade, and Reviewer’s pay grade.

13

None of the above factors were statistically related to tenure rate. I do suspect some of them would be related to promotion rate. I undertook to compare promotion rates (pay grade divided by time in the FS, controlling for various factors like starting pay grade), however it was not possible given the lack of survey participation on the part of mid-level and senior-level employees. Perhaps a future study could look at promotion rates.

Because there was concern from employees taking lengthy language courses that they might not be tenured as fast, I took an additional look at the effects of language on tenure rate, with cross-tabulations of the 150 tenured officers. Firstly, not one tenured Officer was still on language probation. Secondly, calculations of means revealed a pattern wherein Officers off probation in more difficult languages were reviewed on average fewer times before being tenured. That is, Officers off probation in “Superhard” languages were reviewed on average 1.25 times, those off probation in “Hard” languages were reviewed on average 1.32 times, and those off probation in “World” languages were reviewed on average 1.34 times. Hence, provided one has at least two EERs, it is differentially easier to gain tenure the harder the language one speaks.

Conclusions From one psychologist to another: learning is the detection and correction of error. In researching for this paper, I ultimately came across the work of another Psychologist, named Chris Argyris, who wrote about the Department of State over 30 years ago.

Now a semi-retired Harvard University professor, Argyris was hired by the Department as a consultant in the late 1960s to analyze the Department’s bureaucratic culture. I emailed with Argyris, as a fellow psychologist, and was fascinated to read the papers he sent me. He works for an organizational consulting firm now and uses the Department in his examples of problem cultures! What Argyris pointed out about the Department over three decades ago was that Department leadership “espouses learning yet acts in ways that inhibit learning”. He defined learning as “the detection and correction of error” and said it was “key to effective organizational change and development”. Argyris pointed out that the Department also has a problem of culture in that it “rewards spinning and cover-up”, something that also precludes effective learning and development. Even though I have been in the Foreign Service for less than four years, I was amazed at that poignancy of Argyris’ description of Foreign Service culture, derived from his work back in the 1960s: “As a result of a powerful feedback loop, a process within the Foreign Service culture tends to reinforce the participants to minimize interpersonal threat by minimizing risk-taking, being open and being forthright, as well as minimizing their feelings of responsibility and their willingness to confront conflict openly. This, in turn, tends to reinforce those who have decided to withdraw, play it safe, not make waves, and to do both in their behavior and writing. Under these conditions people soon learn the survival quotient of ‘checking with everyone,’ of

14

developing policies that upset no one, of establishing policies in such a way that the superior takes responsibility for them. It also coerces ‘layering,’ because (1) subordinates staff to be ready for a crisis, (2) more people are needed to make a decision, and (3) protection of one’s bureaucratic skin becomes critical for survival.” So what does Argyris propose we do then? Argyris sites the now historical issue of the “slam dunk” case for weapons of mass destruction in Iraq as an error in organizational cultures that we can only move beyond through the enabling of learning as a key objective. He does not specify how that should take place, but maintains that “productive reasoning” must be encouraged such that claims in an organization can be tested for validity. Science knows this. Information Technology also knows this. Real progress does not happen without actual, empirical testing of truth claims. As I read Argyris’ work, I believe reforming our EER system is one such way to better enable learning and in turn transformational diplomacy. The EER is probably not transformative; much could be improved. Though the current EER system has much going for it that is positive, I am hard pressed to find much about it that lends itself to transformational diplomacy. The evidence from my research strongly confirms the hypothesis that one’s experience with the EER is based on much more than just his or her actual work performance in a vacuum. Rather it is based on the circumstances of the approaches of the Rater and Reviewer, the background of

the employee, the circumstances of the EER process, and more. Certainly getting an EER one is satisfied with and being satisfied with the EER system is based on a variety of seemingly arbitrary factors. Collectively this evidence suggests that the EER is an imperfect tool that could be significantly improved for the betterment of the Foreign Service and hence for the advancement of Transformational Diplomacy. The current EER system has some basic flaws. One key theme through the research findings is that the EER is in large part an exercise in “water walking”, or as it has been referred to before, “apple polishing”. That is, EERs suffer from over-inflated praise and the complicities in it, which may have the effect of undermining true organizational learning. Though there is a critical Area for Improvement box that strives to convey real information, it may primarily be filled with soft items that either do not convey real needs or that serve only to covertly assassinate the employee’s character. Another issue with the Area for Improvement is that it is generated from only one perspective, the Rater’s/Reviewer’s. Some survey participants noted that the Area for Improvement is also not supposed to repeat across EERs, which is questionable itself, given that employees can have enduring issues that take time to correct. The tenure check box section also does not appear to convey much useful information, since it almost invariably recommends the employee for tenure. Besides, nearly all employees get tenure sooner rather than later, regardless of the tenure check box. Finally, the most profound evidence that the current EER system is flawed comes

15

from the many employees who are happy with their own glowing EERs but disenchanted with the system for the same reason, that everyone gets glowing EERs. I have several concrete recommendations for improving the EER system, some of which come directly from doing this research, others of which come from basic psychometrics. Consider briefly the allegory of the blind men and the elephant, taken from Eastern lore. A group of blind men are asked to each feel part of the same elephant and then to characterize it. Naturally they each describe vastly different things, one describes an elongated trunk-like part, another a floppy ear-like part, another a flat and furry hide, etc. Each is convinced he clearly knows the true qualities of an elephant. The irony is that none of them knows the full reality. In this regard, evaluating our employees using exclusively qualitative, one-dimensional procedures is fundamentally flawed. You might say that the Rater-Reviewer system provides at least two dimensions, even if no quantitative measure. Yet the Reviewer is merely another top-down perspective, and one that has a power-differential over the Rater at that. It is hardly an added, independent dimension. We need an EER system that has both qualitative and quantitative components, as well as multi-dimensional perspectives on the employee. Many support “360-degree” type evaluations (and the Department has to its credit begun implementing some such reforms). Given that an employee cannot please all the people all the time in an organization, especially

when some of their interests run counter to one another, a system that takes into account pleasing the bosses while also pleasing coworkers and subordinates would be more optimal. One would think that Raters/Reviewers would want to know these other dimensions in their subordinates as well – otherwise they could be erroneously recommending for tenure and promotion subordinates who are dysfunctional in these other, arguably just as important, dimensions. In my survey, I asked employees some questions about the so-called “360-degree”, multi-dimensional evaluation concept and its components. A whopping 92% of the Generalists reported that they would like at least a “little bit” (11%) the Department to consider changing the EER system to incorporate 360-degree evaluations. The bulk of the respondents (36%) “absolutely” wanted the Department to consider 360-degree evaluations, 25% “very much”, and 20% “a moderate amount”. Only 8% reported that they do “not at all” want the Department to consider incorporating 360-degree evaluations.

The Generalists also had solid ideas about what components of 360-degree evaluations they would endorse. A remarkable 76% wanted to have evaluations of supervisors, by subordinates. That was the only solid agreement among the Generalists on 360-degree components. Fifty percent wanted to have evaluations of Americans by FSNs and/or LES employees. Forty-five percent wanted to have evaluations of same-level peers by same-level peers. Only 9% reported they wanted no additional types of evaluations within the 360-degree

16

concept. It seems clear that when the Generalists report being in favor for 360-degree evaluations, they are mainly advocating the bottom-up component wherein subordinates evaluate their supervisors.

Modern psychology has long held that qualitative components, even multi-dimensional ones, are not sufficient on their own – they need to be accompanied by quantitative components to effectively evaluate human behavior. Qualitative and quantitative components of evaluation each contribute critical pieces of information that should be viewed jointly. In the context of EERs, complaints of too much emphasis on qualitative components abound, in the form of complaints about the over-importance of writing skills in the EER process. One might therefore think Generalists would be supportive of quantitative measures of performance evaluation. Such measures would clearly not involve writing skills.

However, when the Generalists in the survey were asked the extent to which they would like the Department to consider quantitative measures of performance, the bulk of them were “not at all” in favor (45%). Still, a majority of respondents (55%) were at least “a little bit” (13%) in favor of adding some sort of quantitative measure of performance to the EER (“a moderate amount”, 20%; “very much”, 12%; “absolutely”, 10%).

The Generalists were asked about several basic types of quantitative measures to gauge their views on them. They were quite divided in their responses. A majority of 58% percent wanted some type of quantitative

measure added, but the remaining 42% did not want any added quantitative measure added. Twenty-nine percent supported scaled “grades” for employees along each of the six core competencies (e.g., John Doe gets a 4.0 in the Leadership skills competency, a 3.0 in the Management skills competency, etcetera). Twenty-seven percent supported percentile rankings (e.g., Suzie Smith is in the top 20% of subordinates I have supervised). Fourteen percent supported “within-the-person” rankings (e.g., Fred Johnson is best in Leadership skills, then Intellectual skills, then Substantive knowledge, etcetera). Lastly, in an open-ended question about quantitative measures, some Generalists expressed concern that quantitative measures were “too subjective”. Still others expressed that they would like to have some sort of scale quantifying employees vis-à-vis their peers, to avoid the so-called “Lake Wobegon effect”, wherein most people claim to be above average, despite the fact that most people cannot be above average, by definition.

In my own interpretation of the Generalists’ less than full support for quantitative measures, I imagine that they may not have all known about the great benefit of quantitative measures that are used within context. Some stated in the open-ended comment boxes that they feared quantitative measures because they suspected they would suffer from “grade inflation” just as the qualitative narratives have. Let me just point out for now that there are ways to mitigate that, including collecting data on average quantitative evaluations given by raters, in order to provide context.

17

We should consider a wide range of possibilities for implementing changes to the current EER system. We certainly do not want to make it any longer or complicated than it currently is. There should be ample means of both improving the EER system and streamlining it. We should consider reducing the amount of extraneous work that goes into EERs, while making the reduced efforts count more fully. Some means of doing this include not only adding additional dimensions and quantitative input, but also reducing overall the depth of these measures, shifting instead to more frequent, perhaps quarterly evaluations of a smaller, yet wider scale.

One way to both enhance and streamline the EER system could be to implement a new computer program which could be utilized by randomly selected members of a 360-degree rating panel whose members privately (and possibly anonymously) enter both qualitative and quantitative information into secured, on-line employee profiles, in a systematic fashion, orchestrated by Human Resources sections. Such a computer program would be easy to design and could advance our outdated, analog system significantly. It could also easily track contextual factors of evaluations, like the given evaluators’ average ratings.

Regardless of whether a new system should be more computerized, additional rating components from other angles can be added, while tracking the average ratings of those doing the rating in order to provide context. We might consider invoking the well-established core precepts as a foundation for this, to highlight employee strengths within the

individual. Prompting questions could be taken directly from the six core precepts to stimulate quantitative, evaluative responses from 360-degree panel members in order to arrive at scaled ratings for employees, within the core precepts and overall.

We might even systematically derive Areas for Improvement from the output new scales could generate. The supervisor’s task would then not be so complicated in crafting the perfect Area for Improvement for the subordinate. It would be a matter of saying, “I see that you got your lowest 360-degree score in X core precept, here is what I propose you do to raise that score.” Employees could then of course have the same, system-generated Areas for Improvement across EERs, as they worked to address pervasive problems and hence to better themselves.

If the above initiatives proved to be too risky for the Department to adopt as part of its formal personnel evaluation system, we might consider phasing it in gradually, as part of an informal process which would in time come to be normative and ultimately institutionalized as a formal evaluation process. The point is that the proposals can be adopted incrementally to manage risks.

We have seen other institutions such as the military and private sector companies updating their performance appraisal systems for the better. Indeed many of my survey respondents pointed out that they had experienced much better systems of employee evaluation in their previous careers, including in the military. To the extent that Secretary Rice has advised that State will be cooperating much more with the military

18

under Transformational Diplomacy, one could argue that our EER system should be at least as good as theirs. In doing this research I did review a number of military evaluation forms. I noted that they were in general more conducive to multi-dimensional and quantitative assessment than State’s forms. I wondered if we might learn something from both the military and the private sector.

To further develop the performance appraisal system at State, some lines of future research could also be pursued. We might wish to check promotion rates across a wide range of employee grades, making use of data on when individuals entered State and what ranks they have attained in how much time and why. We would need greater participation from senior level employees, but if we gained it we could learn much about what we are reinforcing and developing in our human resources as an institution. The corollary to the present study would of course be a study on Raters and Reviewers and their experiences evaluating employees. We might at some point wish to collect data on perceptions of and from FSNs and peer-level coworkers too. Certainly “what we measure, we improve in” and hence we should not expect to easily improve in important factors without first measuring them.

Ultimately deciding what if anything we should do to improve the EER system depends on the answer to the question: what is our goal? Do we really want to transform our workforce to carry out the work of Transformational Diplomacy? Or do we want to continue shaping and reinforcing

a “go along to get along” workforce? I take Secretary Rice at face value when she says: “We must transform old diplomatic institutions to serve new diplomatic purposes” and I submit to you that transforming an old EER system is a key component to reaching that goal. About the Author Don Kilburg has been an FSO since 2003. He served in Mexico City and is moving onward to Santo Domingo with his wife Keely. He holds a doctorate in Experimental Social Psychology from DePaul University and a bachelor’s degree in Research Psychology from the University of Illinois. Before coming into the Foreign Service, he was a professor at Eastern Washington University and more recently at Saint Olaf College.

19

To contact the author: [email protected]

20

mailto:[email protected]

Education

2006 Kilburg EER Long Form Article 20pgs