28
Running head: IOWA GAMBLING TASK MODIFIED FOR MILITARY DOMAIN Iowa Gambling Task Modified for Military Domain Peter Nesbitt TRADOC Analysis Center, Monterey Quinn Kennedy, Jonathan K. Alt, & Ron Fricker Operations Research Department, Naval Postgraduate School Title page with All Author Information

Running head: IOWA GAMBLING TASK MODIFIED FOR MILITARY …

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Running head: IOWA GAMBLING TASK MODIFIED FOR MILITARY …

Running head: IOWA GAMBLING TASK MODIFIED FOR MILITARY DOMAIN

Iowa Gambling Task Modified for Military Domain

Peter Nesbitt

TRADOC Analysis Center, Monterey

Quinn Kennedy, Jonathan K. Alt, & Ron Fricker

Operations Research Department, Naval Postgraduate School

Title page with All Author Information

Page 2: Running head: IOWA GAMBLING TASK MODIFIED FOR MILITARY …

Dear Dr. Estrada, We appreciate the thoughtful comments made by the reviewers. Below and in the revised manuscript, we address each comment. We note that on the statistics-related comments, we sought guidance from one our department's statisticians, Dr. Ron Fricker. Because of his contributions in addressing these comments (which included re-analyzing the data to double check that it was reasonable to use cluster analysis), he has been added to the manuscript as a co-author. Finally, due to Major's Nesbitt's constraints in his current job, I will be the corresponding author. Sincerely, Quinn Kennedy ****** ACTION EDITOR COMMENTS TO AUTHOR(S) Recommendation: Revise and Resubmit for Further Review I have read your paper carefully and considered the individual comments from the expert reviews. The topic under investigation is interesting and potentially relevant for publication in Military Psychology. However, there are several concerns that need to be addressed before the manuscript can be considered for publication. Fortunately, these concerns seem amenable to revision. Accordingly, I am recommending revise and resubmit. Below I have highlighted the major concerns raised through the review process: 1) On p. 8, I didn't understand r*-sub-i,t. Is it equal to "Max." in Table 1? Or is it the expected value of the payoff schedule? We don’t use the expected regret in this case since the pay-out schedules are fixed. Instead, we calculated it based on the best choice (in terms of pay-out) available to each subject at each trial (the first term) and the choice (in terms of their realized pay-out) made by the subject at that trial (second term). The calculation has no relation to Table 1. On page 10, we have clarified the notation as follows: Given K 2 routes and sequences Xi,1, Xi,2,...Xi,n of unknown outcomes associated with each route i = 1, ..., K, at each trial, t = 1, ..., n, participants select a route I and receive the associated outcomes XI,t. Let tX be the best outcome possible available to the participant on trial t (Auer and Ortner, 2010),

Response to Reviewer Comments

Page 3: Running head: IOWA GAMBLING TASK MODIFIED FOR MILITARY …

� �1, 2, 3, 4,max , , , , .t t t t tX X X X X

The regret after n plays is

1 1 ,n n

n t t t I tr X X �¦ ¦

So, for instance if the best outcome across all routes for a given trial is 5, but the participant chooses a route and received a 3, his regret for that trial would be 2. 2) Reviewer 2 requests several methodological revisions such as providing effect sizes when possible and using trials 101-200 to index performance. Effect sizes have been included when possible in the Results section. The main research question that we begin to address in this study is: how do military personnel make the transition from naive decision making, when they know nothing about the task, to experienced decision making, when they feel that they have figured the task out? This point has been clarified in the first paragraph of the Introduction on page 3. Therefore, the inclusion of the early trials in the data analysis is key to addressing the overarching research question. On page 14 of the Discussion, we now point out that high performers were more likely to make this transition, and to make it earlier, than low performers. 3) Reviewer 1 wonders about the generalizability of the reinforcement task to the military domain. Can you describe a situation where a military officer needs to learn from repeated experience? Military officers learn extensively from repeated experience. As we describe in the paper, they are trained on templates for decision making, but they then must adjust that template based on experience gained through repeated trials. To keep the example in the context of the convoy task, imagine an Army or Marine Corps convoy leader who must make a decision on which route to take each day for a year long deployment. In this example, the same decision point or trial would be encountered each day with some variability over time, such as time of day, illumination or weather. Over time the officer would be required to learn which route was more desirable (less risky, most efficient). This situation now is described on page 3. Of course, while this is a land combat-based example, there are a multitude of other situations in which military officers must learn from repeated experience. For example, pilots require reinforcement learning to conduct many critical tasks, from take-off and landing experiences (consider a Naval pilot who regularly has to conduct a landing on an aircraft carrier) to various in-flight tasks. Similarly, consider a Navy surface warfare officer who has to repeatedly conduct underway replenishments. The list is essentially endless.

Page 4: Running head: IOWA GAMBLING TASK MODIFIED FOR MILITARY …

REVIEWER 1 COMMENTS TO AUTHOR(S) Recommendation: Revise and Resubmit for Further Review Summary: This study took a task -- the Iowa Gaming Task -- and modified it so it has face validity with the military. A research question focuses on reinforcement learning. 200 trails were provided to military officers. The authors argue for the development of a new metric, one called regret. They further argue that clustering can be utilized to define high performing groups. Conceptual & Theoretical Issues. This is an interesting application of taking a task that is used in one domain and transporting it to the military context. I like the fact that military officers were used as the research participants. 1. I wonder, however, the extent to which reinforcement learning plays a role in the military domain. I imagine it does at some level, but when does an officer have 200 trails for any decision? I am not saying they don't, but you could make it clearer to the reader. Military officers learn extensively from repeated experience. As we describe in the paper, they are trained on templates for decision making, but they then must adjust that template based on experience gained through repeated trials. To keep the example in the context of the convoy task, imagine an Army or Marine Corps convoy leader who must make a decision on which route to take each day for a year long deployment. In this example, the same decision point or trial would be encountered each day with some variability over time, such as time of day, illumination or weather. Over time the officer would be required to learn which route was more desirable (less risky, most efficient). This information has been added on page 3. Of course, while this is a land combat-based example, there are a multitude of other situations in which military officers must learn from repeated experience. For example, pilots require reinforcement learning to conduct many critical tasks, from take-off and landing experiences (consider a Naval pilot who regularly has to conduct a landing on an aircraft carrier) to various in-flight tasks. Similarly, consider a Navy surface warfare officer who has to repeatedly conduct underway replenishments. The list is essentially endless. 2. You utilize the convoy task--are you arguing that every time these folks drive down a road in-country that it is considered a trial? In the real world, would each bend in the road constitute a new trial? I am not trying to be argumentative, I just want to understand the problem.

Page 5: Running head: IOWA GAMBLING TASK MODIFIED FOR MILITARY …

The image depicts a bend in the road, but is really intended to apply context to the problem of choosing one of four routes. To be clear each bend in the road to a convoy leader in theater is a new trial – the decision is whether to proceed around the bend or not. In point of fact these leaders make decisions such as this numerous times on a single convoy trip and poor decision making can result in death or serious injury. The ability to learn from feedback from the environment and associated that feedback with actions at each decision situation, reinforcement learning, is essential to military decision making. This information has been added on page 7. Methodological & Statistical Issues. The sample was well constructed, covering the branches of the services although with different coverage --a solid sample none the less. The convoy task as depicted in the paper appears realistic. The subjects were well assessed in terms of background variables using the digit span, visual acuity, trials A & B, and demographics. 3. Did you measure if any of the research participants had advanced course training e.g., war college or the like? We did not request any information regarding advanced course training. The reviewer raises a good question and we will include that information in our next study. 4. Your cluster analysis resulted in two groups. Just a caution about dichotomization--see a paper by MacCallum.On the practice of dichotomization of quantitative variables. MacCallum, Robert C.; Zhang, Shaobo; Preacher, Kristopher J.; Rucker, Derek D. Psychological Methods, Vol 7(1), Mar 2002, 19-40. http://dx.doi.org/10.1037/1082-989X.7.1.19 Thank you for the pointer to the MacCallum et al. (2002) paper. In it, the authors make a number of strong assumptions in their comparisons, including perfect multivariate normality and linearity between the independent and dependent variables, as well as perfect measurement. None of these assumptions hold with our data, which is generally ordinal, not necessarily linear, and certainly measured with error. Nonetheless, we compared the results of our t-tests based on dichotomizing the sample with regression-based methods that MacCallum et al. (2002) advocate and the new analyses did not change our results or conclusions. For these comments see Table 2. 5. I am curious as to why standard deviations for the high performers is nearly double that of the low performers in many areas. See First 100 trials: "no trials with friendly

Page 6: Running head: IOWA GAMBLING TASK MODIFIED FOR MILITARY …

damage", "adv selection bias"; Trials 101-200: "no trials with friendly damage"; I would have thought higher performers would have has less variability in their scores. During the first 100 trials, the high performers were more likely to explore the various routes than the low performers. During the second half of the task, high performers were more likely to settle upon Routes 3 and 4, which vary modestly in the amount of friendly damage incurred. In contrast, the low performers were more likely to stick with Route 2, and thus have less variability in number of trials with friendly damage, but a significantly higher number of trials with heavy friendly damage. This pattern is now noted on page 12. 6. Also, what do you think is going on with route 2 for the low performers. The mean is so high at .407. I would think the expected value would be closer to .25. Is there some sort of bias for selecting route 2? Many of the low performers selected Route 2 frequently because it seems to have a good payout schedule -- the first 9 times that Route 2 is selected, they give 100 enemy damage and 0 friendly damage. However, the 10th time that route 2 is selected, the subject receives - 1250 in friendly damage, such that the mean total damage over the long run is -25. Through trial and error, subjects should eventually realize that this route is not optimal over the long term. Several low performers continued to choose Route 2 despite the heavy friendly damage. The danger of Route 2 is now explained on page 7. Discussion of Contributions, Limitations and Future Research. 7. The regret metric is interesting. As you define it, it "compares the outcome of participant actions to the outcome generated by playing the optimal policy at each of n trials". I believe a metric like this would be useful, but how often in the real world do we have something like this available? For example, if I am going down a road and receiving feedback from a regret metric would indicate that I have knowledge of, or at least feedback from, an optimal strategy. In a 'real world' decision making situation this won’t be known until after the fact, but the intent is to show the utility of this metric as a means of characterizing performance in wargames or training situations where the best path is known. Hence, it is not available to the decision maker for making decisions in real time (if it was, then the best decision/route would be known), but it is an effective metric for evaluating performance after the fact (again, in training situations where the best path is known and can be used in the subsequent analysis). 8. I am thinking there might be two ways to define optimal; one where you know and "optimal route". Perhaps one the action plan defines? Otherwise, where does this route come from? The other possibility is one where simply nothing bad happens e.g., nothing blows up; no ambushes, etc. Is this correct? Yes, here, optimal route means the one in which nothing bad happens to the subject's forces, but some damage occurs to the enemy. We have clarified the definition of optimal route on page 10.

Page 7: Running head: IOWA GAMBLING TASK MODIFIED FOR MILITARY …

REVIEWER 2 COMMENTS TO AUTHOR(S) Recommendation: Conditional Acceptance with Minor Revisions Summary: The authors examined military decision-making ability using a paradigm based on the Iowa Gambling Task, The task demonstrated that it could distinguish between good and poor performers. Potential applications for personnel selection and training were noted. Conceptual & Theoretical Issues. The conceptual rationale for the modified Iowa Gambling Task is good. Methodological & Statistical Issues. In addition to reporting statistical significance the authors should report effect sizes for the various analyses. Also, it was not clear why the authors chose 2-tailed tests. A directional hypothesis seems more appropriate. Effect sizes are now provided in the Results section where appropriate. Given that this was our first time testing the convoy task, we used two-tailed tests because we would want to know if, on average, subjects performed significantly more poorly than expected. In this case, it would suggest that the convoy task needed tweaking. We have added this rationale in the paper on the bottom of page 5 and as a footnote on page 11. Discussion of Contributions, Limitations and Future Research. The paper was well organized ands well written. Other Editorial Concerns. 1.P. 5: Some readers will not be familiar with the Traits A and B or the Digit Span Forwards and Digit Span Backwards tasks. Provide more detail such as whether these were included in a larger test battery and the reliability of the scores. The reliability should be reported for the current sample on page 7 where the tests are described. On page 6, we now include a sentence clarifying that only these two sets of cognitive tests were used. The test retest reliability of Trails A and B range from .76 to 94 (Wagner, Helmreich, Dhamen, Lieb, & Tadic (2011). In the current sample, performance on Trails A and B was strongly positively correlated, as expected (r =.506, p = .003). This information is provided on page 8. Test retest reliability of the digit span measures range from .66 to .89 (Lezak, 1995). In the current sample, performance on digit span forwards and backwards was positively correlated, as expected (r = .350, p =.042). This information is now provided on page 9. 2.P. 6: The authors noted that the number of trials was increased from 100 to 200 because it takes participants a while to detect the pattern of payoffs. Were all trials

Page 8: Running head: IOWA GAMBLING TASK MODIFIED FOR MILITARY …

included in the final score or just the later trials (after learning has occurred)? The write-up suggests that all 200 trials were used to compute total performance. Please clarify. We have clarified on page 8 that all 200 trials were used to compute total performance. 3.P. 7: Did participants need to achieve a specified level of performance on the Sullivan Eye Chart? Were any participants removed from the study due to poor visual acuity? Subjects needed to achieve 20/30 on the Snellen eye chart. All subjects met that criterion and that information is now mentioned on page 9. 4.P. 9: Why was a two-tailed hypothesis tested? Didn't the researchers have a clear idea of the expected direction of the effect? As stated above, given that this was the first test of a newly created task, we wanted to be conservative in terms of the direction of the effects. Although we hoped that subjects would perform well on the task, we were equally interested to know if subjects did significantly more poorly on the convoy task than expected because it would indicate that the task probably needed tweaking. This rationale is now at the end of the Introduction on page 5. 5.P. 9: As the authors noted, participant performance improved during trials 101-200 (learning effect). I suggest that the researchers test their hypotheses using only trial 101-200 data. As noted on page 3 in the first paragraph of the Introduction, a key component of understanding military decision making is being able to determine when decision makers transition from naïve decision making, when they know nothing about a task, to experienced decision making, when they act upon the knowledge they’ve accrued through trial and error. Therefore, it is important to look at participants’ decision behavior across all 200 trials to get a sense of when this transition occurred. On page 14 of the Discussion, we now point out that high performers were more likely to make this transition than low performers. 6.Pp. 9-10: In addition to tests of statistical significance and associated probability levels, it would be helpful to report effect sizes (e.g., Cohen's d) to provide readers with a better sense of the magnitude of the effects. Effect sizes are now included in the Results section where appropriate.

Page 9: Running head: IOWA GAMBLING TASK MODIFIED FOR MILITARY …

IOWA GAMBLING TASK MODIFED FOR MILITARY DOMAIN 1

Running head: IOWA GAMBLING TASK MODIFIED FOR MILITARY DOMAIN

Iowa Gambling Task Modified for Military Domain

Masked Manuscript without Author Information

Page 10: Running head: IOWA GAMBLING TASK MODIFIED FOR MILITARY …

IOWA GAMBLING TASK MODIFED FOR MILITARY DOMAIN 2

Abstract

One key component of optimal military decision making is that the decision maker demonstrates

reinforcement learning. The modification of psychological tasks gives insight into understanding

how to effectively train military decision makers and how experienced decision makers arrive at

optimal or near optimal decisions. We developed a task modeled after the Iowa Gambling Task

(IGT) to measure military decision making performance. This new task focuses on high stakes

and uncertain environments particular to military decision making conditions. Thirty-four US

military officers from all branches of service completed the tasks yielding decision data for

validation. The new task retains essential characteristics of the foundational task and gives

insight into reinforcement learning of military decision makers. Results indicate that the

additional metric of regret defines higher performance at a trial-by-trial level, and clustering by

multiple metrics defines high performance groups.

Keywords: Iowa Gambling Task, military decision making, wargaming

Page 11: Running head: IOWA GAMBLING TASK MODIFIED FOR MILITARY …

IOWA GAMBLING TASK MODIFED FOR MILITARY DOMAIN 3

Iowa Gambling Task Modified for Military Domain

Recently, the U.S. Army has focused on enhancing leader development and decision

making to improve the effectiveness of forces in combat (Lopez, 2011). This focus increases the

importance of developing a set of tools and procedures that give insight into understanding how

to effectively train decision makers and how experienced decision makers arrive at optimal or

near optimal decisions. In order to develop these tools and procedures, it is important to

understand how military decision makers make the transition from naïve decision making, when

they know nothing about the task or environment to experienced decision making, when they act

upon the knowledge they have accrued through trial and error. The purpose of this study is to

develop tasks that measure militarily relevant cognitive characteristics and assess active duty

military officers’ decision making behavior on these tasks, including when officers make the

transition from naive to experienced decision making.

One cognitive characteristic necessary for military personnel to reach optimal decision

making is reinforcement learning: the ability to learn from trial and error (Vartanian & Mande.,

2011). Military officers learn extensively from repeated experience. They are trained on

templates for decision making, but they adjust that template based on experience gained through

repeated trials. For example, a convoy leader must make a decision on which route to take each

day for a year-long deployment. The consequences of this decision can impact lives and

resources. In this example, the same decision point or trial would be encountered each day with

some variability over time, such as time of day, illumination or weather. Over time and through

experience, the officer should learn which route was more desirable (less risky, most efficient).

Current laboratory tests for reinforcement learning may not fully capture the conditions under

which military decisions are made, to include high stakes and uncertain environment. Tasks that

Page 12: Running head: IOWA GAMBLING TASK MODIFIED FOR MILITARY …

IOWA GAMBLING TASK MODIFED FOR MILITARY DOMAIN 4

tap this cognitive characteristic and simulate realistic military scenarios, such as simple

wargames, are needed. We propose that the modification of a well-understood psychological task

to include military scenarios is a viable approach for developing tools and procedures that give

insight into military decision making.

One common psychological task that measures reinforcement learning is the Iowa

Gambling Task (IGT) (Bechara et al., 1994). The IGT was developed to measure prefrontal

damage (Bechara et al.1994). Persons with prefrontal damage tend to have difficulty detecting

the long-term consequences of their decisions and actions. In the IGT, participants receive a loan

of $2,000 of play money and are asked to make a series of decisions to maximize the profit on

the loan. Each decision entails selecting one card at a time from any of four available decks of

cards (decks A-D). All cards give money and some cards also issue a penalty. Decks differ in the

amount of money given on a single trial ($50 or $100) as well as the frequency and severity of

penalties ($0 to $1250). Healthy participants should learn through reinforcement learning which

decks have the best long term payoffs (decks C and D) (Bechara et al., 1994; Steingroever et al.,

2013). Established measures of decision performance are total money won and an advantageous

selection bias (the proportion of good decks selected minus the proportion of bad decks

selected).

We seek to statistically model participant decisions in a trial-by-trial manner. We gain

insight by approaching our version of the IGT, the convoy task, as a multi-arm bandit problem.

The goal in a multi-armed bandit problem is to maximize the total payoff obtained in a sequence

of allocations (Lai and Robbins, 1985). The problem is often described as a sequential allocation

problem, sequential sampling problem, or sequential decision making problem and was inspired

by the problem of a gambler facing a collection of slot machines, each with a different and

Page 13: Running head: IOWA GAMBLING TASK MODIFIED FOR MILITARY …

IOWA GAMBLING TASK MODIFED FOR MILITARY DOMAIN 5

initially unknown probability of winning along with an equally unknown payout schedule (Auer

et al., 2002; Szepesvari, 2010; Auer and Ortner, 2010). Our participants only receive information

by sampling each option and collecting an observation. The IGT has a specific sequence of

payout schedules, providing the means to know at any point in the sequence of trials which deck

provides the best reward (although still unknown to the participants). It is possible to calculate

the difference between the reward achieved by a participant at a given trial and the best possible

reward available at that trial. This difference is referred to as regret, an absolute performance

metric often used for multi-armed bandit problems (Szepesvari, 2010; Sutton and Barto, 1998).

Additionally, cluster analysis allows us to identify distinct groups with regret and other

established performance measures. A previous study on the Iowa Gambling Task grouped their

sample by the difference of the visual processing speed performance as measured by Trails A

and Trails B (Barry & Petry, 2008). Identifying high and low performing groups provides a

better understanding of what characterizes successful decision making.

The IGT is ideally suited as a model for a simple wargame. The convoy task, in which

participants incur or receive enemy or friendly damage, is analogous to the IGT. Enemy damage

corresponds to dollars gained, friendly damage corresponds to dollars lost. The first goal of this

study was to determine if the convoy task successfully elicits reinforcement learning. We

predicted participants would demonstrate reinforcement learning by having total damage score

greater than 2000, a positive advantageous selection bias, and by correctly reporting which

routes are the safest and most dangerous. However, because we were testing the convoy task for

the first time, we were equally interested to know if participants performed significantly poorer

than expected. A secondary goal was to use regret to model and describe decision maker

Page 14: Running head: IOWA GAMBLING TASK MODIFIED FOR MILITARY …

IOWA GAMBLING TASK MODIFED FOR MILITARY DOMAIN 6

performance at the trial-by-trial level. Regret serves as a means to understand the deviation of

each participant’s series of decisions from the known optimal path.

Method

Participants

The study collected data from 34 military officers from all branches of service: nine U.S.

Army, eleven U.S. Marine Corps, ten U.S. Navy, three U.S. Coast Guard, and one U.S. Air

Force. Participant mean age was 35.1 years (s = 4.90) and mean time in service was 12.7 years (s

= 4.42). The average time deployed was variable (M = 19.57, s = 12.12 months) (note: one

participant did not report their deployment time). Of the 31 participants with deployment

experience, the time since their last deployment was 38.0 months (s = 25.18), and 19 of those

deployments were to combat zones (Iraq or Afghanistan). Over seventy percent of the

participants served as staff officers during their most recent deployment. The majority of the

participants were male (n = 30) and the majority of participants possessed 20/20 or better visual

acuity (n = 29). The participants scored within normal ranges in two sets of cognitive measures

of assessing visual processing speed (Trails A and B; Grant & Berg, 1948) and short term

memory (Digit Span Forwards and Backwards; Wechsler, 2008). The Trails A mean score of

22.6 seconds (s = 6.29) and Trails B mean score of 44.0 seconds (s = 20.13) showed normal

ranges of performance in visual processing speed. The Digit Span Forwards mean score of 11.44

seconds (s = 2.11) and Digit Span Backwards mean score of 9.5 (s = 2.43) showed normal ranges

of performance in working memory. The race and ethnicity of the participants was not noted. All

participants had at least an undergraduate degree.

Page 15: Running head: IOWA GAMBLING TASK MODIFIED FOR MILITARY …

IOWA GAMBLING TASK MODIFED FOR MILITARY DOMAIN 7

Decision Task

Convoy task (Modified IGT). On a computer screen, participants saw four identical

routes as shown in Figure 1. The image of each route depicts a bend in the road, but is really

intended to apply context to the problem of choosing one of four routes. To be clear, each bend

in the road to a convoy leader in theater is a new trial – the decision is whether to proceed around

the bend or not. Convoy leaders make decisions such as this numerous times on a single convoy

trip and poor decision making can result in death or serious injury. The ability to learn from

feedback from the environment and associated that feedback with actions at each decision

situation, reinforcement learning, is essential to military decision making. Participants were

instructed that, over several trials, they must decide on which route to send convoys and based on

each decision they may incur enemy damage (good outcome) or receive friendly damage (bad

outcome). They were also instructed that the pictures of the routes are identical. Their goal was

to achieve the highest possible total damage score by maximizing enemy damage and

minimizing the friendly damage accrued over all trials.

The selection of routes as options is similar to the selection of decks in the original IGT.

These routes have the same payout format as the decks of the original IGT (Bechara et al., 1994):

routes 3 and 4 are considered good; routes 1 and 2 are considered bad. Route 2 is the most

dangerous as it often returns moderate enemy damage. However, every 10th time Route 2 is

selected, it returns heavy friendly damage of -1250 such that, on average, it returns a -25 total

damage. Descriptive statistics for each route are found in Table 1.

Participants receive immediate feedback on each trial in the form of the current total

damage score and enemy damage and friendly damage that occurred on that trial. The Total

Page 16: Running head: IOWA GAMBLING TASK MODIFIED FOR MILITARY …

IOWA GAMBLING TASK MODIFED FOR MILITARY DOMAIN 8

Damage score is analogous to CashPile in the IGT, Friendly Damage to $Lost, and Enemy

Damage to $Won. As in the IGT, participants start with 2000 units of Total Damage. One

difference between the convoy task and the IGT is that we extended the number of trials from

100 (as in the IGT) to 200. In pilot testing of the convoy task, participants needed more than 100

trials to detect the long-term payout. Decision performance variables were measured using

typical IGT variables for all 200 trials: Total Damage at trial 200 and advantageous selection

bias (proportion of good routes selected minus the proportion of bad routes selected).

Measures

Demographics survey. Demographic information regarding age, gender, service branch,

rank, and deployment experience were captured in the demographic survey.

Post-task survey. The post-task survey included questions regarding a ranking of the

routes in the convoy task from safest to most dangerous and surveying decision strategies.

Trails A, Trails B. Trails A and B test visual processing speed (Grant and Berg, 1948).

In Trails A, the numbers 1 through 25 are randomly distributed on a worksheet. The participant

starts at 1 and must draw a line to each number in chronological order. Participants are instructed

to work as quickly and accurately as they can. In Trails B, participants now see both numbers

and letters and must connect 1 to A, A to 2, 2 to B and so on until they reach L and then 12. They

also are instructed to work as quickly and accurately as they can. Test retest reliability on these

measures range from .76 to .94 (Wagner, Helmreich, Dhamen, Lieb, & Tadic (2011). In the

current sample, performance on Trails A and B was moderately correlated, as expected (r =.506,

p = .003). Trails A and B have age and education based norms; these norms were used in

computing Trails A and B performance in the current sample (Tombaugh, 2004).

Page 17: Running head: IOWA GAMBLING TASK MODIFIED FOR MILITARY …

IOWA GAMBLING TASK MODIFED FOR MILITARY DOMAIN 9

Digit Span Forwards and Backwards Tests. Digit span forwards and backwards test

from the Wechsler Adult Intelligence Scale (WAIS-IV) measures working memory and is scored

based on age and education based norms (Wechsler, 2008). In digit span forwards, the

experimenter states a series of digits, starting with two digits, and the participant must repeat

them back. The number of digits increases, with two trials per number of digits. The test is

discontinued if the participant has an incorrect response to both trials for a particular number of

digits. In digit span backwards, the same procedure is followed, except this time the participant

must repeat the digits in the reverse order. The maximum number of digits is 16 for forward and

16 for backward by different participants. Test retest reliability of the digit span measures range

from .66 to .89 (Lezak, 1995). In the current sample, performance on digit span forwards and

backwards was positively correlated as expected (r = .350, p =.042).

Visual acuity test. Because the decision tasks are visually based, the Snellen eye chart is

used to measure visual acuity at the beginning of the experiment. The Snellen eye chart is placed

on the wall and consists of 11 lines of block letters, in which each line of letters gets

progressively smaller. Participants stand 20 feet from the chart, cover one eye, read aloud as

many lines as they can. They then cover the other eye and read aloud as many lines as they can.

The experimenter records the last line that the participant could accurately read for each eye.

Participants needed to have at least 20/30 vision to participate in the study.

Environment/Equipment

A purpose built synthetic environment was developed for the study. The participant sat at

a standard desk and completed the tasks as if they were informing, yet removed from, tactical

operations in a military operations center. The tasks were developed in consultation with military

Page 18: Running head: IOWA GAMBLING TASK MODIFIED FOR MILITARY …

IOWA GAMBLING TASK MODIFED FOR MILITARY DOMAIN 10

advisors. The tasks are written in Python scripting language and presented on a laptop computer

running the Windows 7 operating system.

Statistical Modeling Techniques

Regret. In the convoy task, regret is the difference of a participant’s single trial outcome

and the outcome from the ideal decision given perfect knowledge. In this context, the ideal

decision or best possible outcome is one that leads to no damage to friendly forces and some

damage to enemy forces. Less regret is better; on any given trial, regret can be zero if the

participant selects the best decision. More generally, absolute regret compares the outcome of

participant actions to the outcome generated by playing the optimal policy at each of the n trials.

Given K 2 routes and sequences Xi,1, Xi,2,...Xi,n of unknown outcomes associated with each

route i = 1, ..., K, at each trial, t = 1, ..., n, participants select a route I and receive the associated

outcomes XI,t. Let tX be the best outcome possible available to the participant on trial t (Auer

and Ortner, 2010),

� �1, 2, 3, 4,max , , , , .t t t t tX X X X X

The regret after n plays is

1 1 ,n n

n t t t I tr X X �¦ ¦

So, for instance if the best outcome across all routes for a given trial is 5, but the participant

chooses a route and received a 3, his regret for that trial would be 2. When used as a

performance metric for the convoy task, regret provides insights in the aggregate performance

over the course of a set of n trials (i.e. total regret). When examined on a per trial basis regret

provides insight into how subjects performance changes based on feedback from the

environment. Thus, as a participant identifies routes with better outcomes the regret accumulated

Page 19: Running head: IOWA GAMBLING TASK MODIFIED FOR MILITARY …

IOWA GAMBLING TASK MODIFED FOR MILITARY DOMAIN 11

per trial should decrease. Regret per trial, in particular, can provide a measure of how well a

participant is balancing exploration and exploitation, and is a measure of the participant’s ability

to identify the best route available at a given point in time.

Cluster Analysis. Using Ward Hierarchial Clustering using euclidean distance, we

separated the population into two clusters (groups) of participants according to the multiple

measures of performance. The placement of a participant in a group reflects an aggregate and

relative assessment of his or her cognitive ability as a high or low performer.

Procedures

This study was approved by the institution’s IRB. Participants attended the laboratory

individually for a single testing session. They first completed the approved consent form, then

the visual acuity test, demographic survey, Trails A and B and Digit Span tests and finally the

participants completed the convoy task followed by answering the post task survey questions.

Results

Although the mean Total Damage score was above 2000 and the advantage selection bias

was positive, results were not significant (p’s > .05).1 As would be expected, the Total Damage

score was negatively correlated with the number of High Friendly Damage, (r = -.87, p < .001)

and the frequency of Friendly Damage (r = .39, p < .05), and very strongly positively associated

with advantageous selection bias (r = .97, p < .001). As shown in Table 2, participants

successfully distinguished between safe and dangerous roads, (F2(3) = 23.63, p = .005). In a

question asking participants to rank order the routes from safest to most dangerous, 42% reported

route 4 as the safest followed by route 3 (27%), whereas 42% of participants reported route 1 as

1 All t-tests were conservatively conducted as two-tailed with a .05 alpha significance level.

Page 20: Running head: IOWA GAMBLING TASK MODIFIED FOR MILITARY …

IOWA GAMBLING TASK MODIFED FOR MILITARY DOMAIN 12

the most dangerous, followed by route 2 (33%). Table 2 reveals that participants benefited from

having 200 trials instead of 100. Results from paired t-tests indicated that the advantageous

selection bias improved in trials 101-200 compared to trials 1-100 (t(33) = 2.87, p = .007, effect

size = .447) and a trend for people to learn to avoid high friendly damage (t(33) = 1.85, p =.07,

effect size = .307) in the second half of the wargame. Improvements in decision performance

were due to the decrease in route 2 selection (t(33)=2.70, p = .01, effect size = .425), and

increase in route 3 selection (t(33) = 1.87, p = .07, effect size = .310). Improvements in decision

performance over time are captured in Table 2, which indicates that only after about trial 130 did

participants’ total damage, on average, exceed the baseline of 2000. Table 2 also illustrates the

large range of variability in decision performance.

Identifying high performers

We separated the participants into two groups by classic performance measures of Final

Damage and the advantage selection bias using clustering. Clustering separates our sample

cleanly into two groups; high and low performers (see Table 2). The high performing group

comprised of 12 participants with a Total Damage score significantly higher (M=4400) than the

low performing group (M=1313; t=9.438, df=20, p<.0001, effect size = .904). Similarly, the

high performing group’s advantage selection bias is significantly higher (M=75.7) than the lower

performing group (M=-29.0, t=7.365, df=18.1, p<.0001, effect size = .866). These group

differences are due in part to the low performing group's strong preference for Route 2

throughout the task. In contrast, high performers tended to explore the different routes during

the first 100 trials (as illustrated by the larger variability in some measures among the high

performers compared to the low performers) and settle on Routes 3 and 4 during the second 100

trials. Mean regret per trial can indicate a single or group of participants’ ability to identify the

Page 21: Running head: IOWA GAMBLING TASK MODIFIED FOR MILITARY …

IOWA GAMBLING TASK MODIFED FOR MILITARY DOMAIN 13

best routes available during the task, giving insight to the balance of exploration and

exploitation. Figure 2 illustrates mean regret per trial for each group. As the participants explore

and develop an expected outcome for each route, they receive damage, shown as high increases

of regret during the early trials. We see in Figure 2, the two groups begin to diverge in their

measurement of regret at approximately trial 120. By trial 200, there is no overlap between the

two groups.

Discussion

Wargames are a preferred method of training military personnel to make optimal military

decisions. However, wargames typically are not assessed objectively and may not focus on

training cognitive functions necessary for optimal decision making. The purpose of this study

was to take the first steps to bridge the gap between the study of decision making ability in the

field of cognitive psychology and the study of decision making in a military setting. The use of

well-known objective assessments to evaluate the effectiveness of training shows great potential.

We demonstrate successful modification of the IGT into a military context. Results from the

convoy task were consistent with other studies in which healthy adults completed the IGT

(Steingroever et al, 2013). Although the total damage score and advantageous selection bias

results were not significant, participants correctly reported which routes were safe and which

were dangerous. Of note, participants benefited from the additional 100 trials than what’s

typically administered in the IGT: participants’ advantageous selection bias significantly

increased due to a shift in route selection patterns. Also consistent with previous studies

(Steingroever et al, 2013), all decision measures showed large amounts of variability, suggesting

that individual differences occur even among healthy participants.

Page 22: Running head: IOWA GAMBLING TASK MODIFIED FOR MILITARY …

IOWA GAMBLING TASK MODIFED FOR MILITARY DOMAIN 14

Regret provides a measure of how well a participant is balancing exploration and

exploitation, potentially while the participant is engaged in the task. In this study, the use of

regret illustrated that high performers made the transition from exploration to exploitation of

Routes 3 and 4 much earlier than low performers. Furthermore, this metric can be directly

compared to other participants' performance. This immediately available and comparable metric

may provide a means to develop tasks that provide performance feedback. Clustering allows the

categorization of participants into performance groups using multiple metrics, perhaps including

regret. Clustering may provide a means to develop task standards that clearly connect observed

individual and collective performance to required standards in a cognitive characteristic. These

statistical modeling techniques provide the potential to understand participant decision strategy

(exploring verses exploiting) and performance during the engagement of the task, using all

available measures of performance.

This work holds several future directions that can inform and make more efficient

training of optimal military decision making. Future directions focus on explaining individual

differences in decision performance and identifying the moment in which participants’ transition

from exploration of the environment to exploitation of knowledge obtained about the

environment. Future studies will examine military decision making performance in sequential

decision making tasks with delayed rewards and more realistic military wargame scenarios.

Page 23: Running head: IOWA GAMBLING TASK MODIFIED FOR MILITARY …

IOWA GAMBLING TASK MODIFED FOR MILITARY DOMAIN 15

References

Auer, P., Cesa-Bianchi, N., & Fischer, P. (2002). Finite-time analysis of the multiarmed bandit

problem. Machine Learning, 47 (2), 235-256.

Auer, P., & Ortner, R. (2010). UCB revisited: Improved regret bounds for the stochastic multi-

armed bandit problem. Periodica Mathematica Hungarica, 61 (1), 55-65.

Barry, D., & Petry, N. M. (2008). Predictors of decision-making on the Iowa Gambling Task:

Independent effects of lifetime history of substance use disorders and performance on the

Trail Making Test. Brain and Cognition, 66 (3), 243-252.

Bechara, A., Damasio, A. R., Damasio, H., & Anderson, S. W. (1994). Insensitivity to future

consequences following damage to human prefrontal cortex. Cognition, 50 (1),

7-15.

Lai, T. L., & Robbins, H. (1985). Asymptotically efficient adaptive allocation rules. Advances in

Applied Mathematics, 6 (1), 4-22.

Lezak, M.D. (1995). Neuropsychological Assessment (3rd ed). New York: Oxford University Press.

Lopez, T. (2011). Odierno Outlines Priorities as Army Chief. Army News Service. Retrieved

from http://www.defense.gov/News/NewsArticle.aspx?ID=65292

Steingroever, H., Wetzels, R., Horstmann, A., Neumann, J., & Wagenmakers, E. J. (2013).

Performance of healthy participants on the Iowa Gambling Task. Psychological

Assessment, 25 (1), 180.

Szepesvari, C. (2010). Algorithms for Reinforcement Learning. San Francisco: Morgan &

Claypool Publishers.

Sutton, R., & Barto, A. (1998). Reinforcement Learning: An Introduction. Cambridge, MA: MIT

Press.

Page 24: Running head: IOWA GAMBLING TASK MODIFIED FOR MILITARY …

IOWA GAMBLING TASK MODIFED FOR MILITARY DOMAIN 16

Tombaugh, T. (2004). Trail Making Test A and B: Normative data stratified by age and

education. Archives of Clinical Neuropsychology, 19 (2), 203 – 214.

Wagner, S., Helmreich, I., Dahmen, N., Lieb, K., & Tadic, A. (2011). Reliability of three

alternate forms of the trail making tests A and B. Archives of Clinical Neuropsychology,

26(4), 314 – 321.

Wechsler, D. (2008). Wechsler Adult Intelligence Scale Fourth Edition (WAIS-IV). San

Antonio, TX: NCS Pearson.

Page 25: Running head: IOWA GAMBLING TASK MODIFIED FOR MILITARY …

IOWA GAMBLING TASK MODIFED FOR MILITARY DOMAIN 17

Route 1 Route 2 Route 3 Route 4

Minimum -250 -1150 0 -200

1st Quartile -150 100 0 50

Median 25 100 25 50

Mean -25 -25 25 25

3rd Quartile 100 100 50 50

Maximum 100 100 50 50

Table 1

Each route has a predetermined payout by schedule. However, descriptive statistics for each

route give insight into the long term value of each route. These routes have the same payout

format as the decks of the original IGT (Bechara et al., 1994): routes 3 and 4 are considered

good; routes 1 and 2 are considered bad.

Page 26: Running head: IOWA GAMBLING TASK MODIFIED FOR MILITARY …

IOWA GAMBLING TASK MODIFED FOR MILITARY DOMAIN 18

Performance Variable Group 1 Group 2

First 100 trials M (s) M (s)

Total damage score ** 2745.83 (779.41) 1713.64 (718.16)

No. trials with friendly damage 25.67 (8.98) 23.86 (4.71)

No. trials with heavy friendly damage * 2.92 (1.44) 4.00 (1.23)

Advantageous selection bias 15.67 (36.98) -17.00 (19.33)

Trials 101- 200 M (s) M (s)

Total damage score *** 1654.17 (554.10) -400.00 (711.64)

No. trials with friendly damage 29.17 (10.81) 25.27 (4.50)

No. trials with heavy friendly damage *** 1.42 (1.00) 3.95 (1.33)

Advantageous selection bias 60.00 (27.20) -12.00 (20.36)

All 200 trials M (s) M (s)

Total damage score *** 4400.00 (955.37) 1313.64 (824.36)

No. trials with friendly damage 54.83 (14.99 ) 49.14 (7.89)

No. trials with heavy friendly damage *** 4.33 (1.83) 7.95 (1.99)

Advantageous selection bias *** 75.67 (42.87) -29.00 (32.77)

Route 1*** 0.07 (.06) .0.17 (0.05)

Route 2*** 0.24 (0.11) 0.41 (0.10)

Route 3 ** 0.37 (0.19) 0.19 (0.07)

Route 4 0.32 (0.17) 0.24 (0.08)

Note: * p < 0.05, ** p < 0.01, *** p < 0.001.

Table 2

Summary statistics for the convoy task. We see participants benefited from having 200 trials

instead of 100. The last four rows show the fraction of participant selection of each route, with

routes 3 and 4 as the true safe routes. P-values indicated by the stars show the results of

between-group comparisons.

Page 27: Running head: IOWA GAMBLING TASK MODIFIED FOR MILITARY …

IOWA GAMBLING TASK MODIFED FOR MILITARY DOMAIN 19

Figure 1 . Screen shot of the convoy task in piloting, a typical participant’s view of the

task. We see the participant’s last decision caused 100 damage to the enemy (Damage to

Enemy Forces) and a loss of -250 to friendly forces (Damage to Friendly Forces) resulting in

a trial loss of -150 (not shown). The Accumulated Damage is 2750. A positive Accumulated

Damage value is desirable to the participant. Notice four routes are represented by the

same image even though the outcome of selecting each one is different (see Table 1).

Page 28: Running head: IOWA GAMBLING TASK MODIFIED FOR MILITARY …

IOWA GAMBLING TASK MODIFED FOR MILITARY DOMAIN 20

Figure 2. Comparison of each group’s regret per trial performance, where the groups were

identified by clustering on Final Damage and advantage selection bias performance measure.

Regret per trial is a measure of how well participants are balancing exploration and exploitation,

and is a measure of a participant’s ability to identify the best route available at a given point in

time. Note how the higher performing group’s regret per trial (solid green line) steadily drops

after about 50 trials, whereas the lower performing group (dashed red line) remains at

approximately 100. The gray shading is a one standard deviation confidence bound, where the

dark gray shows the overlap.