31
Standards and Tests in the military domain: The arbitrary, the absolute, and the achievable Glenn Fulcher University of Leicester BILC Conference 2014

Standards and Tests in the military domain: The arbitrary, the absolute, and the achievable Glenn Fulcher University of Leicester BILC Conference 2014

Embed Size (px)

Citation preview

Page 1: Standards and Tests in the military domain: The arbitrary, the absolute, and the achievable Glenn Fulcher University of Leicester BILC Conference 2014

Standards and Tests in the military domain: The arbitrary, the absolute, and the achievable

Glenn FulcherUniversity of LeicesterBILC Conference 2014

Page 2: Standards and Tests in the military domain: The arbitrary, the absolute, and the achievable Glenn Fulcher University of Leicester BILC Conference 2014

1. Standards

2. Criterion-referenced assessment in the military domain

3. Test design processes for domain specific inferencing and decision making

Page 3: Standards and Tests in the military domain: The arbitrary, the absolute, and the achievable Glenn Fulcher University of Leicester BILC Conference 2014

“’performance’ – if ‘standard’ refers to a performance criterion, e.g. the standard of being able to fly an airplane or negotiate a business transaction, or ‘standard’ refers to a particular ability level or levels, or ‘standard’ refers to a cutscore or cutscores on a distribution” (Davidson et al., 1995, p. 15)

The ArbitraryPerformance Performance Level Level DescriptorsDescriptors

Page 4: Standards and Tests in the military domain: The arbitrary, the absolute, and the achievable Glenn Fulcher University of Leicester BILC Conference 2014

Meaning 2

Policy Oriented

User Oriented

Rater Oriented

Designer Oriented

XX

Page 5: Standards and Tests in the military domain: The arbitrary, the absolute, and the achievable Glenn Fulcher University of Leicester BILC Conference 2014

Perform

ance

Leve

l Descr

iptors

& the Criti

cal D

escripto

rICAO level 4 (Operational) descriptor for “interaction” : “Responses are usually immediate, appropriate, and informative. Initiates and maintains exchanges even when dealing with an unexpected turn of events. Deals adequately with apparent misunderstandings by checking, confirming, or clarifying” (ICAO, 2004, p. A-8).

Page 6: Standards and Tests in the military domain: The arbitrary, the absolute, and the achievable Glenn Fulcher University of Leicester BILC Conference 2014

“And when they are fifty, those who have come through all our practical and intellectual tests with distinction must be brought to their final trial, and made to life their mind’s eye to look at the source of all light, and see the good itself, which they can take as a pattern for ordering their own life as well as that of society and the individual.”

Plato, 380 BCE, p. 354)

Page 7: Standards and Tests in the military domain: The arbitrary, the absolute, and the achievable Glenn Fulcher University of Leicester BILC Conference 2014

“’performance’ – if ‘standard’ refers to a performance criterion, e.g. the standard of being able to fly an airplane or negotiate a business transaction, or ‘standard’ refers to a particular ability level of levels, or ‘standard’ refers to a cutscore or cutscores on a distribution” (Davidson et al., 1995, p. 15)

The Absolute

Page 8: Standards and Tests in the military domain: The arbitrary, the absolute, and the achievable Glenn Fulcher University of Leicester BILC Conference 2014

“A machinist can be categorized as an apprentice, a journeyman, or a master at his trade. The specific behaviors implied by each of these levels of proficiency can be identified and used to describe the specific tasks an individual must be capable of performing before he achieves one of these skill levels. It is in this sense that measures of proficiency can be criterion-referenced …. Measures which assess performance in terms of a criterion standard thus provide information as to the degree of competence attained which is independent of the performance of others” (Glaser & Klaus, 1962, p. 422)

Page 9: Standards and Tests in the military domain: The arbitrary, the absolute, and the achievable Glenn Fulcher University of Leicester BILC Conference 2014

Portraying the Arbitrary as Absolute

“The standard against which an individual’s performance is compared, when measuring in this manner, is the behaviors which define each point along the underling skill continuum.”

Page 10: Standards and Tests in the military domain: The arbitrary, the absolute, and the achievable Glenn Fulcher University of Leicester BILC Conference 2014

Under this interpretation, the whole of the validity question becomes one of linking or mapping a test to the external standard put in place by a policy making authority

Fulcher, (forthcoming)

Page 11: Standards and Tests in the military domain: The arbitrary, the absolute, and the achievable Glenn Fulcher University of Leicester BILC Conference 2014

Mistaken Ideas

“…when we speak of ‘setting performance standards’ we are…referring to the…concrete activity of deriving cut points along a score scale” (Cizek and Bunch, 2007, p. 14).

Climbing the ladder

Social Moderation

Becoming an “adept”

Page 12: Standards and Tests in the military domain: The arbitrary, the absolute, and the achievable Glenn Fulcher University of Leicester BILC Conference 2014

“’performance’ – if ‘standard’ refers to a performance criterion, e.g. the standard of being able to fly an airplane or negotiate a business transaction, or ‘standard’ refers to a particular ability level of levels, or ‘standard’ refers to a cutscore or cutscores on a distribution” (Davidson et al., 1995, p. 15)

Tannenbaum, R. J. and Baron, P. A. (2010). Mapping TOEIC Test Scores to the STANAG 6001 Language Proficiency Levels. Research Monograph 10-11. Princeton NJ: Educational Testing Service.

Page 13: Standards and Tests in the military domain: The arbitrary, the absolute, and the achievable Glenn Fulcher University of Leicester BILC Conference 2014

Witchcraft

Stakeholder

“Panelists thought that the TOEIC assessment lacked face validity for military personnel. Even though they knew that the assessment was a measure of general English language skills, they were concerned that there was no military context represented” (Tannenbaum and Baron, 2010, p. 18).

Page 14: Standards and Tests in the military domain: The arbitrary, the absolute, and the achievable Glenn Fulcher University of Leicester BILC Conference 2014

“To my knowledge, every attempt to derive a criterion score is either blatantly arbitrary or derives from a set of arbitrary premises.”

Glass (1977/2011, p. 254)

Page 15: Standards and Tests in the military domain: The arbitrary, the absolute, and the achievable Glenn Fulcher University of Leicester BILC Conference 2014

ILR – STANAG 6001 RelationshipCapsule Characterizations

“Socialization” is a “desideratum”

Experientially based

Require benchmark samples and extensive training

Constant institutional contact required

“The ILR approach has permitted successful use of the WENS (well educated native speaker) concept as the ultimate criterion in government for over thirty years” (Lowe, 1986, p. 394).

See Item 1 on Handout

Page 16: Standards and Tests in the military domain: The arbitrary, the absolute, and the achievable Glenn Fulcher University of Leicester BILC Conference 2014

“What I shall call criterion-referenced measures depend upon an absolute standard of quality….”

Glaser (1963: 519)

Page 17: Standards and Tests in the military domain: The arbitrary, the absolute, and the achievable Glenn Fulcher University of Leicester BILC Conference 2014

Models, Frameworks and Test Specifications (Fulcher and Davidson, 2009, p. 127).

The Achievable

Page 18: Standards and Tests in the military domain: The arbitrary, the absolute, and the achievable Glenn Fulcher University of Leicester BILC Conference 2014

“The adequacy of a proficiency test depends upon the extent to which it satisfactorily samples the universe of behaviors which constitute criterion performance. In this sense, a test instrument is said to have content validity; the greater the degree to which the test requires performance representative of the defined universe, the greater is its content validity”

(Glaser & Klaus, 1962: 435; emphasis in the original)

Page 19: Standards and Tests in the military domain: The arbitrary, the absolute, and the achievable Glenn Fulcher University of Leicester BILC Conference 2014

“Men can be tested for English-speaking ability and rated on a scale of A, B, C, D, E. In language the rating E means inability to obey the very simplest commands unless they are repeated and accompanied by gestures, or to answer the simplest questions about name, work, and home unless the questions are repeated and varied. Rating D means an ability to obey very simple comments (e.g., “Sit down,” “Put your hat on the table”), or to reply to very simple questions without the aid of gesture or the

need of repetition. Rating C is the level required for simple explanation of drill; rating B is the level of understanding of most of the phrases in the Infantry Drill Regulations; rating A is a very superior level. Men rating D or E in language ability should be classified as non-English”

(Yerkes, 1921, p. 357). See Item 2 on Handout

Page 20: Standards and Tests in the military domain: The arbitrary, the absolute, and the achievable Glenn Fulcher University of Leicester BILC Conference 2014

“For wise and effective industrial placement and occupational guidance, two things at least are absolutely essential: first, definite knowledge of the physical and mental requirements (specification) of the job, and second, equally definitely knowledge of the physical and mental characteristics and capacities of the individual to be placed”

(Yoakim and Yerkes, 1920, p. 200)

Page 21: Standards and Tests in the military domain: The arbitrary, the absolute, and the achievable Glenn Fulcher University of Leicester BILC Conference 2014

(Bingham, 1919, p. 12)

See Item 3 on Handout

Page 22: Standards and Tests in the military domain: The arbitrary, the absolute, and the achievable Glenn Fulcher University of Leicester BILC Conference 2014
Page 23: Standards and Tests in the military domain: The arbitrary, the absolute, and the achievable Glenn Fulcher University of Leicester BILC Conference 2014
Page 24: Standards and Tests in the military domain: The arbitrary, the absolute, and the achievable Glenn Fulcher University of Leicester BILC Conference 2014
Page 25: Standards and Tests in the military domain: The arbitrary, the absolute, and the achievable Glenn Fulcher University of Leicester BILC Conference 2014

Abstraction and the Correlational Fallacy

“The nature of the individual test items should be such as to provide specific, recognisable evidence of the examinee’s readiness to perform in a life-situation, where lack of ability to understand and speak extemporaneously might be a serious handicap to safety and comfort, or to the effective execution of military responsibilities”

(Kaulfers, 1944: 137)

Page 26: Standards and Tests in the military domain: The arbitrary, the absolute, and the achievable Glenn Fulcher University of Leicester BILC Conference 2014

The Theory of Criterion Referenced Measurement“The theory requires that the test environment and circumstances approximate those of the work situation, which, for our students, may be a technical school, a maintenance hangar, an aircraft at 40,000 feet, sometimes even somewhere ten fathoms deep. Those circumstances are pretty hard to duplicate, but it may be possible to set up situations in which the student must understand and respond in English under distractions and psychological pressure” (Cartier, 1968: 28)

Page 27: Standards and Tests in the military domain: The arbitrary, the absolute, and the achievable Glenn Fulcher University of Leicester BILC Conference 2014

Glaser, R et al. (1954). A Comparative Analysis of Missileman Tasks for Five Guided Missiles: Methodology and Results. Bureau of Naval Personnel Technical Bulletin 55-15. Pittsburgh: American Institute for Research.

Page 28: Standards and Tests in the military domain: The arbitrary, the absolute, and the achievable Glenn Fulcher University of Leicester BILC Conference 2014

(Glaser and Klaus, 1962)

Page 29: Standards and Tests in the military domain: The arbitrary, the absolute, and the achievable Glenn Fulcher University of Leicester BILC Conference 2014

“An absolute interpretation attempts to describe, as clearly as possible, just what it is that the student can or cannot do….The inference being made is an absolute one in the sense that if the assessment domain is well described, you can simply make an inference identifying the proportion of the domain that the student has mastered….Poorly described assessment domains lead to shoddy criterion-referenced interpretations”

(Popham, 2000, pp. 32 – 34)

Page 30: Standards and Tests in the military domain: The arbitrary, the absolute, and the achievable Glenn Fulcher University of Leicester BILC Conference 2014

Framework Document

Models of Competence

Domain Analyses

Test Specification

ScoringModel

Task Model

Delivery Model

Output Model

Presentation Model

AssemblyModel

ReportingStandards

Decisions

Policy Mandate

Page 31: Standards and Tests in the military domain: The arbitrary, the absolute, and the achievable Glenn Fulcher University of Leicester BILC Conference 2014

http://languagetesting.info/whatis/scenarios/military.php