44
763 55 Data Collection and Analysis Tamara van Gog and Fred Paas* Open University of the Netherlands, Heerlen, the Netherlands Wilhelmina Savenye Arizona State University-Tempe, Tempe, Arizona Rhonda Robinson Northern Illinois University, DeKalb, Illinois Mary Niemczyk Arizona State University-Polytechnic, Mesa, Arizona Robert Atkinson Arizona State University-Tempe, Tempe, Arizona Tristan E. Johnson Florida State University, Tallahassee, Florida Debra L. O’Connor Intelligent Decision Systems, Inc., Williamsburg, Virginia Remy M. J. P. Rikers Erasmus University Rotterdam, Rotterdam, the Netherlands Paul Ayres University of New South Wales, Sydney, Australia Aaron R. Duley National Aeronautics and Space Administration, Ames Research Center, Moffett Field, California Paul Ward Florida State University, Tallahassee, Florida Peter A. Hancock University of Central Florida, Orlando, Florida * Tamara van Gog and Fred Paas were lead authors for this chapter and coordinated the various sections comprising this chapter.

55 data collection

  • Upload
    cvtc-bs

  • View
    214

  • Download
    1

Embed Size (px)

DESCRIPTION

Robert Atkinson Debra L. O’Connor Remy M. J. P. Rikers Tristan E. Johnson Paul Ayres Arizona State University-Tempe, Tempe, Arizona Arizona State University-Tempe, Tempe, Arizona Erasmus University Rotterdam, Rotterdam, the Netherlands University of New South Wales, Sydney, Australia National Aeronautics and Space Administration, Ames Research Center, Moffett Field, California Arizona State University-Polytechnic, Mesa, Arizona 763 Open University of the Netherlands, Heerlen, the Netherlands

Citation preview

763

55

Data Collection and Analysis

Tamara van Gog and Fred Paas*

Open University of the Netherlands, Heerlen, the Netherlands

Wilhelmina Savenye

Arizona State University-Tempe, Tempe, Arizona

Rhonda Robinson

Northern Illinois University, DeKalb, Illinois

Mary Niemczyk

Arizona State University-Polytechnic, Mesa, Arizona

Robert Atkinson

Arizona State University-Tempe, Tempe, Arizona

Tristan E. Johnson

Florida State University, Tallahassee, Florida

Debra L. O’Connor

Intelligent Decision Systems, Inc., Williamsburg, Virginia

Remy M. J. P. Rikers

Erasmus University Rotterdam, Rotterdam, the Netherlands

Paul Ayres

University of New South Wales, Sydney, Australia

Aaron R. Duley

National Aeronautics and Space Administration, Ames Research Center, Moffett Field, California

Paul Ward

Florida State University, Tallahassee, Florida

Peter A. Hancock

University of Central Florida, Orlando, Florida

* Tamara van Gog and Fred Paas were lead authors for this chapter and coordinated the various sections comprising this chapter.

Tamara van Gog, Fred Paas et al.

764

CONTENTS

Introduction .....................................................................................................................................................................766Assessment of Learning vs. Performance.............................................................................................................766Brief Overview of the Chapter Sections ...............................................................................................................767

Assessment of Individual Learning Processes................................................................................................................767Rationale for Using Mixed Methods.....................................................................................................................768Analyzing Learning Using Quantitative Methods and Techniques......................................................................768Selecting Tests .......................................................................................................................................................768

Validity .........................................................................................................................................................769Reliability .....................................................................................................................................................769Evaluating and Developing Tests and Test Items........................................................................................770Scores on Numerically Based Rubrics and Checklists ...............................................................................770Measuring Learning Processes in Technology-Mediated Communications ...............................................770Using Technology-Based Course Statistics to Examine Learning Processes.............................................771Measuring Attitudes Using Questionnaires That Use Likert-Type Items...................................................771

Analyzing Learning Using More Qualitative Methods and Techniques ..............................................................771Grounded Theory .........................................................................................................................................772Participant Observation ................................................................................................................................772Nonparticipant Observation .........................................................................................................................772Issues Related to Conducting Observations ................................................................................................773Interviews .....................................................................................................................................................773Document, Artifact, and Online Communications and Activities Analysis................................................774Methods for Analyzing Qualitative Data.....................................................................................................774

Writing the Research Report .................................................................................................................................775Conclusion .............................................................................................................................................................775

Assessment of Group Learning Processes......................................................................................................................776Group Learning Processes Compared with Individual Learning Processes and Group Performance ................776Methodological Framework: Direct and Indirect Process Measures....................................................................777Data Collection and Analysis Techniques.............................................................................................................778

Direct Process Data Collection and Analysis..............................................................................................778Use of Technology to Capture Group Process............................................................................................779Use of Observations to Capture Group Process..........................................................................................779Direct Process Data Analysis.......................................................................................................................779Indirect Process Data Collection and Analysis ...........................................................................................780Interviews .....................................................................................................................................................780Questionnaires ..............................................................................................................................................780Conceptual Methods ....................................................................................................................................781

General Considerations for Group Learning Process Assessment .......................................................................782Group Setting ...............................................................................................................................................782Variance in Group Member Participation....................................................................................................782Overall Approach to Data Collection and Analysis ....................................................................................782Thresholds ....................................................................................................................................................782

Conclusion .............................................................................................................................................................782Assessment of Complex Performance ............................................................................................................................783

Assessment Tasks ..................................................................................................................................................784Assessment Criteria and Standards .......................................................................................................................784Collecting Performance Data ................................................................................................................................785

Collecting Performance Outcome (Product) Data ......................................................................................785Collecting Performance Process Data .........................................................................................................785

Data Analysis .........................................................................................................................................................788Analysis of Observation, Eye Movement, and Verbal Protocol Data.........................................................788Combining Methods and Measures .............................................................................................................789

Discussion ..............................................................................................................................................................789

Data Collection and Analysis

765

Setting Up a Laboratory for Measurement of Complex Performances .........................................................................789Instrumentation and Common Configurations ......................................................................................................790Design Patterns for Laboratory Instrumentation...................................................................................................790

Stimulus Presentation and Control Model ..................................................................................................791Stimulus Presentation and Control Model with External Hardware ..........................................................793Common Paradigms and Configurations.....................................................................................................795Summary of Design Configurations ............................................................................................................796

General-Purpose Hardware....................................................................................................................................797Data Acquisition Devices.............................................................................................................................797

Computers as Instrumentation...............................................................................................................................789Discussion ..............................................................................................................................................................800

Concluding Remarks .......................................................................................................................................................800References .......................................................................................................................................................................800

ABSTRACT

The focus of this chapter is on methods of data collec-tion and analysis for the assessment of learning pro-cesses and complex performance, the last part of theempirical cycle after theory development and experi-mental design. In the introduction (van Gog and Paas),the general background and the relation between thechapter sections are briefly described. The section bySavenye, Robinson, Niemczyk, and Atkinson focuseson methods of data collection and analysis for assess-ment of individual learning processes, whereas the sec-tion by Johnson and O’Connor is concerned with meth-ods for assessment of group learning processes. Thechapter section by van Gog, Rikers, and Ayres dis-cusses the assessment of complex performance, andthe final chapter section by Duley, Ward, Szalma, andHancock is concerned with setting up laboratories tomeasure learning and complex performance.

KEYWORDS

Assessment criteria:

Describe the aspects of perfor-mance that will be assessed.

Assessment of learning:

Measuring learning achieve-ment, performance, outcomes, and processes bymany means.

Assessment standards:

Describe the quality of perfor-mance on each of the criteria that can be expectedof participants at different stages (e.g., age, grade)based on a participant’s past performance (self-referenced), peer group performance (norm-refer-enced), or an objective standard (criterion-refer-enced).

Collective data collection:

Obtaining data from indi-vidual group members; data are later aggregated ormanipulated into a representation of the group asa whole.

Complex performance:

Refers to real-world activitiesthat require the integration of disparate measure-ment instrumentation as well as the need for time-critical experimental control.

Direct process measure:

Continuous elicitation of datafrom beginning to end of the (group) process; directprocess measures involve videotaping, audiotap-ing, direct researcher observation, or a combinationof these methods.

Group:

Two or more individuals working together toachieve a common goal.

Group learning process:

Actions and interactions per-formed by group members during the group learn-ing task.

Holistic data collection:

Obtaining data from the groupas a whole; as this type of data collection resultsin a representation of the group rather than indi-vidual group member, it is not necessary to aggre-gate or manipulate data.

Indirect process measure:

Discrete measure at a spe-cific point in time during the (group) process; ofteninvolves multiple points of data collection; indirectprocess measures may measure processes, out-comes, products, or other factors related to groupprocess.

Instrumentation:

Hardware devices used to assist withthe process of data acquisition and measurement.

Mixed-methods research:

Studies that rely on quanti-tative and qualitative as well as other methods forformulating research questions, collecting and ana-lyzing data, and interpreting findings.

Online/offline measures:

Online measures are recordedduring task performance, offline measures arerecorded after task performance.

Process-tracing techniques:

Records performance pro-cess data such as verbal reports, eye movements,and actions that can be used to make inferencesabout the cognitive processes or knowledge under-lying task performance.

Tamara van Gog, Fred Paas et al.

766

Qualitative research:

Sometimes called naturalistic;research on human systems whose hallmarksinclude researcher as instrument, natural settings,and little manipulation.

Quantitative research:

Often conceived of as moretraditional or positivistic; typified by experimentalor correlational studies. Data and findings are usu-ally represented through numbers and results ofstatistical tests.

Task complexity:

Can be defined subjectively (individ-ual characteristics, such as expertise or perception),objectively (task characteristics, such as multiplesolution paths or goals), or as an interaction (indi-vidual and task characteristics).

INTRODUCTION

Tamara van Gog and Fred Paas

The most important rule concerning data collection andanalysis is

do not attempt to collect or analyze all pos-sible kinds of data

. Unless you are conducting a trulyexplorative study (which is hardly ever necessary now-adays, considering the abundance of literature on mosttopics), the first part of the empirical cycle—the processof theory development—should result in clear researchquestions or hypotheses that will allow you to choosean appropriate design to study these. These hypothesesshould also indicate the kind of data you will need tocollect—that is, the data you have hypotheses about andsome necessary control data (e.g., time on task), andtogether with the design provide some indications as tohow to analyze those data (e.g., 2

×

2 factorial design,2

×

2 MANCOVA). But, these are just indications, andmany decisions remain to be made. To name just a fewissues regarding data collection (for an elaboration onthose questions, see, for example, Christensen, 2006;Sapsford and Jupp, 1996): Which participants(human/nonhuman, age, educational background, gen-der) and how many to use? What and how many tasksor stimuli to present and on what apparatus? What (con-trol) measures to take? What instructions to give? Whatprocedure to use? When to schedule the sessions?

Making those decisions is not an easy task, andunfortunately strict guidelines cannot be given becauseacceptable answers are highly dependent on the exactnature, background, goals, and context of the study.To give you some direction, it might help to have alook at how these questions have been dealt with inhigh-quality studies in your domain (which are gener-ally published in peer-reviewed, high-impact journals).Because of the importance and difficulty of findingcorrect operationalizations of these issues, it is gener-

ally advisable to conduct a pilot study to test your datacollection and analysis procedures.

In educational research many studies share thecommon goal of assessing learning or performance,and the chapter sections in this chapter provide infor-mation on methods for collecting and analyzing learn-ing and performance data. Even though learning andperformance are conceptually different, many of thedata collection and analysis techniques can be used toassess both; therefore, we first discuss the differencesbetween the assessment of learning and the assessmentof performance before giving a brief overview of thecontent of the chapter sections.

Assessment of Learning vs. Performance

The definitions of learning and performance have animportant similarity, in that they can be used to referboth to an outcome or product and to a process. Theterm

learning

is used to refer to the knowledge or skillacquired through instruction or study (note that thisdictionary definition ignores the possibility of informallearning, unless this is encompassed by study), as wellas the process of acquiring knowledge or skill throughinstruction or study. The term

performance

is used torefer to things accomplished (outcome or product) andto the accomplishment of things (process). Performanceimplies the

use

of knowledge rather than merely

pos-sessing

it. It seems that performance is more closelyrelated to skill than to knowledge acquisition (i.e., learn-ing), but an important difference between the definitionsof learning and performance is that performance can be,but is not defined as, a result of instruction or study.

The similarities and differences between theseterms have some important implications for educa-tional research. First of all, the fact that both learningand performance can refer to a product and a processenables the use of many different kinds of measuresor combinations of measures to assess learning or per-formance. This can make it quite difficult to compareresults of different studies on learning or performance,as they might have assessed different aspects of thesame concept and come to very different conclusions.

Second, collection and analysis of data about theknowledge an individual possesses can be used toassess their learning but not their performance. Thatpossessing knowledge does not guarantee the ability touse it has been shown in many studies (see, for exam-ple, Ericsson and Lehmann, 1996). Nonetheless, for along time, educational certification practices werebased on this assumption: Students received their diplo-mas after completing a series of courses successfully,and success was usually measured by the amount ofknowledge a student possessed. Given that this measure

Data Collection and Analysis

767

has no one-to-one mapping with successful perfor-mance, this practice posed many problems, both forstudents and employers, when students went to workafter their educational trajectory. Hence, in the field ofeducation it is recognized now that knowledge is anecessary but not sufficient condition for performance,and the field is gradually making a shift from a knowl-edge-based testing culture to a performance-basedassessment culture (Birenbaum and Dochy, 1996).

Finally, because performance is not defined as aresult of instruction or study, it can be assessed in allkinds of situations, and when applied in instructionalor study settings it may be assessed before, during,and after instruction or study phases. Note though, thatin that case, only the difference between performanceassessed before and after instruction or study is indic-ative for learning. One should be careful not to inter-pret gains in performance during instruction or studyas indicators for learning, as these may be artifacts ofinstructional methods (Bjork, 1999).

Brief Overview of the Chapter Sections

The first chapter section, Assessment of IndividualLearning Processes by Savenye, Robinson, Niemczyk,and Atkinson, introduces educational technologyresearchers to the conceptual basis and methods of datacollection and analysis for investigating individuallearning processes. They discuss the quantitative andqualitative research paradigms and the associatedapproaches to data collection and analysis. They alsopoint out the benefits of combining quantitative andqualitative approaches by conducting mixed-methodsstudies.

The second chapter section, Assessment of GroupLearning Processes by Johnson and O’Connor, focuseson the study of group learning processes, which ismore complex than the study of individual learningprocesses. They discuss several issues that need to beconsidered prior to setting up a study of group learningprocesses, such as holistic vs. collective data collec-tion, direct vs. indirect methods of data collection,aggregation or manipulation of individual data intogroup level data, and special considerations for settingup a study of group learning processes.

The third chapter section, Assessment of ComplexPerformance by van Gog, Rikers, and Ayres, discussesdata collection and analysis methods for assessmentof complex performance. In line with the two-edgeddefinition of performance as a thing accomplished oraccomplishing a thing, they distinguish product andprocess measures and subdivide the process measuresfurther into online (while working on a task) vs. offline(after task completion) measures. They also discuss

the opportunities to combine several different mea-sures and the benefits of doing so.

The fourth and final chapter section, Setting Up aLaboratory for Measurement of Complex Perfor-mances by Duley, Ward, Szalma, and Hancock, pro-vides insight into the technical setup of laboratoriesfor the assessment of learning processes and complexperformance. Rather than providing a list of availablehardware, software, and instruments, they have chosento take the more sensible approach of familiarizing thereader with setting up configurations for stimulus pre-sentation, control options, and response recording,which are relevant for many laboratory studies.

ASSESSMENT OF INDIVIDUAL LEARNING PROCESSES

Wilhelmina Savenye, Rhonda Robinson,

Mary Niemczyk, and Robert Atkinson

It is the goal of this section to introduce educationaltechnology researchers to the conceptual basis andmethods of data collection and analysis for investigat-ing individual learning processes, including both quan-titative and qualitative research techniques. Learningprocesses, of course, may involve both individual andgroup efforts of learners in the form of strategies andactivities designed to facilitate their learning. Thoughthis section focuses on individual processes and per-formances, using a variety of methods, these may beadapted for group use (see the chapter section byJohnson and O’Connor).

Several assumptions guide this work. Althoughmethods can be suggested here, the researcher must beresponsible for understanding the foundational ideas ofany study. He or she will want to conduct the studywith the utmost attention to quality and therefore willwant to turn to specific and detailed texts to learn moredeeply how to apply research methods. This section willpoint the researcher to such references and resources.

The objectives of this section are listed below. Itis hoped that after reading this chapter, educationaltechnology researchers will be able to:

• Describe methods and techniques for con-ducting research on individual learning, andcompare qualitative and quantitative methods.

• Describe common problems in conductingand evaluating quantitative and qualitativeresearch methods to examine learning pro-cesses.

• Consider issues that contribute to the qualityof studies using mixed methods.

Tamara van Gog, Fred Paas et al.

768

Rationale for Using Mixed Methods

The terms

quantitative

and

qualitative

are commonlyused to describe contrasting research approaches. Typ-ically, quantitative research is considered to be morenumbers driven, positivistic, and traditional (Borg andGall, 1989), while qualitative research is often usedinterchangeably with terms such as

naturalistic

,

ethno-graphic

(Goetz and LeCompte, 1984),

subjective

, or

post-positivistic

. We define qualitative research in thissection as research that is devoted to developing anunderstanding of human systems, be they small, suchas a technology-using teacher and his or her studentsand classroom, or large, such as a cultural system.Quantitative and qualitative methods for data collectionderive in some measure from a difference in the wayone sees the world, which results in what some considera paradigm debate; however, in assessing learning pro-cesses, both approaches to data collection have impor-tance, and using elements from both approaches can bevery helpful. Driscoll (1995) suggested that educationaltechnologists select research paradigms based on whatthey perceive to be the most critical questions. Robin-son (1995) and Reigeluth (1989) concurred, noting theconsiderable debate within the field regarding suitableresearch questions and methods. Learning processesare complex and individual. Lowyck and Elen (2004)argued that learning processes are active, constructive,self-regulated, goal oriented, and contextualized. Inaddition, digital technologies are changing the natureof knowledge and of teaching and learning (Cornu,2004). It is clear then that the methods required tocollect and analyze how learning processes work, whenthey work, and why they work can be drawn from amixed-method approach. Thus, researchers can inves-tigate carefully and creatively any questions theychoose and derive valid data to help understand learn-ing processes using a combination of methods fromboth perspectives. Although not the main focus of thischapter, it is assumed that researchers will submit allprocedures, protocols, instruments, and participationforms to the appropriate human-subjects or ethicsreview unit within their organizations. In any case,researchers should be specific about how they definethe assumptions of the study and why what was donewas done—in short, they should be able to enter intothe current and upcoming discussions as thoughtful,critical, and creative researchers.

Analyzing Learning Using Quantitative Methods and Techniques

Learning achievement or performance in educationaltechnology research is often the primary outcome mea-sure or dependent variable of concern to the researcher.

Learning is often therefore studied using more quan-titative measures, including what researchers may calltests, assessments, examinations, or quizzes. Thesemeasures may be administered in paper-and-pencilform or may be technology based. If they are technol-ogy based, they may be administered at a testing cen-ter, with tutors or proctors, or completed on the stu-dent’s own. In either format, they may be scored byan instructor or tutor or may be automatically scoredby the computer (Savenye, 2004a,b). Issues of concernin selecting and developing tests and test items alsoare relevant when learning is measured

en route

asperformance on practice items. Completion time, oftenin conjunction with testing, is another learning processvariable that can efficiently be examined using quan-titative methods.

Learning achievement on complex tasks may alsobe measured more quantitatively using numericallybased rubrics and checklists to evaluate products andperformances or to evaluate essays or learner-createdportfolios. Rubrics and checklists are also often usedto derive quantitative data for measuring learning inonline discussions or to build frequencies of behaviorsfrom observations of learning processes, often used inconjunction with more qualitative methods (discussedlater in this section). Many computer-based coursemanagement systems now routinely collect course sta-tistics that may be examined to determine how learnersproceed through instruction and what choices theymake as they go. Self-evaluations and other aspects oflearning, such as learner attitudes, are more commonlymeasured using questionnaires. Selected types ofquantitative methods for examining learning are dis-cussed in turn:

• Tests, examinations, quizzes (administeredvia paper or technology, including self-eval-uations)

• Rubrics or checklists to measure learner per-formance

• Measuring learning processes in technology-mediated communications

• Technology-based course statistics• Attitude measures such as questionnaires

using Likert-type items

Selecting Tests

Educational researchers frequently select existing teststo assess how individual learning processes areimpacted by a novel educational intervention. Duringthis process, the researchers must be conversant witha number of important concepts, including validity andreliability. In the following sections, these concepts are

Data Collection and Analysis

769

described in greater detail with a specific focus on whatresearchers need to know when selecting tests.

Validity

Arguably, the most critical aspect of a test is its qualityor validity. Simply put, a test is considered valid if itmeasures what it was created to measure (Borg andGall, 1989). A test is generally considered valid if thescores it produces help individuals administering thetest make accurate inferences about a particular char-acteristic, trait, or attribute intrinsic to the test taker.As an example, researchers exploring the relativeimpact of several learning environments would con-sider a test valid to the extent to which it helps themmake an accurate determination of the relative qualityand quantity of learning displayed by the studentsexposed to the respective learning environments.

Validity is not a unitary concept; in fact, test devel-opers use several widely accepted procedures to doc-ument the level of validity of their test, including con-tent-related, criterion-related, and construct-relatedvalidity.

Content-related validity

represents the extentto which the content of a test is a representative sampleof the total subject matter content provided in thelearning environment. Another type of validity is

cri-terion-related validity

, which depicts how closelyscores on a given test correspond to or predict perfor-mance on a criterion measure that exists outside thetest. Unlike content validity, this type of validity yieldsa numeric value that is the correlation coefficientreported on a scale of –1 (perfect, negative relation-ship) to +1 (perfect, positive relationship). The thirdtype of validity is

construct-related validity

, whichrefers to the extent to which the scores on a test cor-respond with a particular construct or hypotheticalconcept originating from a theory.

Also worth mentioning is a relatively unsophisti-cated type of validity known as

face validity

, which isbased on the outward appearance of the test. Althoughthis is considered a rather rudimentary approach toestablishing validity, it is considered importantbecause of its potential impact on the test taker’s moti-vation. In particular, respondents may be reluctant tocomplete a test without any apparent face validity.

Reliability

Another important concept involved in test selectionis reliability. Simply put, reliability refers to the con-sistency with which a test yields the same results fora respondent across repeated administrations (Borgand Gall, 1989). Assuming that the focus of the test—a particular attribute or characteristic—remains

unchanged between test administrations for a givenindividual, reliability sheds light on the followingquestion: Does the test always yield the same scorefor an individual when it is administered on severaloccasions?

Determining and Judging Reliability

The three basic approaches to determining the reliabil-ity of a test are test–retest, alternate forms, and internalconsistency (Borg and Gall, 1989; Morris et al., 1987).Perhaps the simplest technique for estimating reliabil-ity is the

test–retest method

. With this approach, a testdeveloper simply administers the test twice to the samegroup of respondents and then calculates the correla-tion between the two sets of scores. As a general rule,researchers select tests displaying the highest reliabil-ity coefficient because values approaching +1.00 areindicative of a strong relationship between the two setsof respondents’ scores; that is, the respondents’ rela-tive performance has remained similar across the twotesting occasions. Specifically, values above .80 arepreferable (Chase, 1999).

Another approach to determining reliability is the

alternate forms method

, in which two equivalent formsof a test are administered to a group of respondents ontwo separate occasions and the resulting scores corre-lated. As with the test–retest method, the higher thereliability coefficient, the more confidence a testadminister can place in the ability of a test to consis-tently measure what it was designed to measure.

The final method for estimating the reliability of atest is referred to as

internal consistency

. Unlike thetwo previous methods, it does not rely on testing thesame group of respondents twice to estimate the inter-nal consistency of a test. Instead, the reliability of atest is estimated based on a single test administration,which can be accomplished in two ways—either usingthe

split halves

method or using one of the Kuder–Richardson methods, which do not require splitting atest in half.

Limits of Reliability

A number of caveats are associated with reliability.First, it is important to recognize that high reliabilitydoes not guarantee validity; in other words, a test canconsistently measure what it was intended to measurewhile still lacking validity. Knowing that a test is reli-able does not permit someone to make judgments aboutits validity. Reliability is, however, necessary for valid-ity, as it impacts the accuracy with which one can drawinferences about a particular characteristic or attributeintrinsic to the test taker. The reliability is impacted byseveral factors. Test length is the first. All things beingequal, shorter tests tend to be less reliable than longer

Tamara van Gog, Fred Paas et al.

770

tests because the latter afford the test developer moreopportunities to accurately measure the trait or charac-teristic under examination. The reliability of a test isalso impacted by the format of its items. A generalheuristic to remember is that tests constructed withselect-type items tend to be more reliable than testswith supply-type or other subjectively scored items.

Evaluating and Developing Tests and Test Items

The construction of learning assessments is one of themost important responsibilities of instructors andresearchers. Tests should be comprised of items thatrepresent important and clearly stated objectives andthat adequately sample subject matter from all of thelearning objectives. The most effective way to ensureadequate representation of items across content, cog-nitive processes, and objectives is to develop a testblueprint or table of specifications (Sax, 1980). Mul-tiple types of performance measures allow students anopportunity to demonstrate their particular skills indefined areas and to receive varied feedback on theirperformances; this is particularly important in self-instructional settings, such as online courses (Savenye,2004a,b). Multiple learning measures in online settingsalso offer security advantages (Ko and Rossen, 2001).

Tests should also give students the opportunity torespond to different types of item formats that assessdifferent levels of cognition, such as comprehension,application, analysis, and synthesis (Popham, 1991).Different approaches and formats can yield differentdiagnostic information to instructors, as well; forexample, well-developed multiple-choice items con-tain alternatives that represent common student mis-conceptions or errors. Short-answer item responses cangive the instructor information about the student’sthinking underlying the answer (Mehrens et al., 1998).

Because the test item is the essential building blockof any test, it is critical to determine the validity of thetest item before determining the validity of the testitself. Commercial test publishers typically conductpilot studies (called

item tryouts

) to get empirical evi-dence concerning item quality. For these tryouts, sev-eral forms of the test are prepared with different subsetsof items, so each item appears with every other item.Each form may be given to several hundred examinees.Item analysis data are then calculated, followed byassessment of the performance characteristics of theitems, such as item difficulty and item discrimination(i.e., how well the item separates, or discriminates,between those who do well on the test and those whodo poorly). The developers discard items that fail todisplay proper statistical properties (Downing and Hal-adyna, 1997; Nitko, 2001; Thorndike, 1997).

Scores on Numerically Based Rubrics and Checklists

Assessing performance can be done by utilizingnumerically based rubrics and checklists. Typically,two aspects of a learner’s performance can be assessed:the product the learner produces and the process alearner uses to complete the product. Either or both ofthese elements may be evaluated. Because perfor-mance tasks are usually complex, each task providesan opportunity to assess students on several learninggoals (Nitko, 2001). Performance criteria are the spe-cific behaviors a student should perform to properlycarry out a performance or produce a product. The keyto identifying performance criteria is to break downthe overall performance or product into its componentparts. It is important that performance criteria be spe-cific, observable, and clearly stated (Airasian, 1996).

Scoring rubrics are brief, written descriptions ofdifferent levels of performance. They can be used tosummarize both performances and products. Scoringrubrics summarize performance in a general way,whereas checklists and rating scales can provide spe-cific diagnostic information about student strengthsand weaknesses (Airasian, 1996). Checklists usuallycontain lists of behaviors, traits, or characteristics thatare either present or absent, to be checked off by anobserver (Sax, 1980). Although they are similar tochecklists, rating scales allow the observer to judgeperformance along a continuum rather than as a dichot-omy (Airasian, 1996).

Measuring Learning Processes in Technology-Mediated Communications

Tiffin and Rajasingham (1995) suggested that educa-tion is based on communication. Online technologies,therefore, provide tremendous opportunities for learn-ing and allow us to measure learning in new ways; forexample, interactions in online discussions withinInternet-based courses may be used to assess students’learning processes. Paulsen (2003) delineated manytypes of learning activities, including online inter-views, online interest groups, role plays, brainstorm-ing, and project groups. These activities, generallyinvolving digital records, will also yield online com-munication data for research purposes, provided theappropriate ethics and subject guidelines have beenfollowed.

The postings learners create and share may also beevaluated using the types of rubrics and checklistsdiscussed earlier. These are of particular value to learn-ers when they receive the assessment tools early in thecourse and use them to self-evaluate or to conduct peerevaluations to improve the quality of their work

Data Collection and Analysis

771

(Savenye, 2006, 2007). Goodyear (2000) reminded usthat digital technologies add to the research and devel-opment enterprise the capability for multimedia com-munications.

Another aspect of online discussions of value toresearchers is that the types of postings students makeand the ideas they discuss can be quantified to illumi-nate students’ learning processes. Chen (2005), in anonline course activity conducted with groups of sixstudents who did not know each other, found thatlearners under a less-structured forum condition postedmany more socially oriented postings, although theirperformance on the task was not less than that of thestudents who did not post as many social postings. Shealso found that the more interactions a group made,the more positive students’ attitudes were toward thecourse.

Using Technology-Based Course Statistics To Examine Learning Processes

In addition to recording learners’ performance on quiz-zes, tests, and other assignments, most online coursemanagement systems automatically collect numeroustypes of data, which may be used to investigate learn-ing processes. Such data may include informationabout exactly which components of the course alearner has completed, on which days, and for howmuch time. Compilations of these data can indicatepatterns of use of course components and features(Savenye, 2004a).

Measuring Attitudes Using Questionnaires That Use Likert-Type Items

Several techniques have been used to assess attitudesand feelings of learners in research studies and as partof instruction. Of these methods, Likert-type scales arethe most common. Typically, respondents are asked toindicate their strength of feeling toward a series ofstatements, often in terms of the degree to which theyagree or disagree with the position being described.Previous research has found that responding to a Lik-ert-type item is an easier task and provides more infor-mation than ranking and paired comparisons. Theadvantage of a Likert-type item scale is that an absolutelevel of an individual’s responses can be obtained todetermine the strength of the attitude (O’Neal andChissom, 1993).

Thorndike (1997) suggested several factors to con-sider in developing a Likert-type scale, including thenumber of steps, odd or even number of steps, andtypes of anchors. The number of steps in the scale isimportant as it relates to reliability—the more steps,

the greater the reliability. The increase is noticeableup to about seven steps; after this, the reliability beginsto diminish, as it becomes difficult to develop mean-ingful anchors. Five-point scales tend to be the mostcommon. Increasing the number of items can alsoincrease reliability. Although there is considerabledebate about this, many researchers hold that betterresults can be obtained by using an odd number ofsteps, which provides for a neutral response. Theanchors used should fit the meaning of the statementsand the goal of the measurement. Common examplesinclude continua such as agree–disagree, effective–ineffective, important–unimportant, and like me–notlike me.

Analyzing Learning Using More Qualitative Methods and Techniques

Although learning outcomes and processes can be pro-ductively examined using the quantitative methods dis-cussed earlier, in a mixed-methods approach manyqualitative methods are used to build a deeper under-standing of what, why, and how learners learn. Withthe increasing use of interactive and distance technol-ogies in education and industry, opportunities and attimes the responsibility to explore new questions aboutthe processes of learning and instruction have evolved.New technologies also enable researchers to studylearners and learning processes in new ways and toexpand our views of what we should investigate andhow; for example, a qualitative view of how instructorsand their students learn through a new technology mayyield a view of what is really happening when thetechnology is used.

As in any research project, the actual researchquestions guide the selection of appropriate methodsof data collection. Once a question or issue has beenselected, the choice of qualitative methods fallsroughly into the categories of observations, interviews,and document and artifact analysis, although othershave conceptualized the methods somewhat differently(Bogdan and Biklen, 1992; Goetz and LeCompte,1984; Lincoln and Guba, 1985). Qualitative research-ers have basically agreed that the human investigatoris the primary research instrument (Pelto and Pelto,1978).

In this section, we begin with one approach toconducting qualitative research: grounded theory. Wethen discuss specific methods that may be called

obser-vations, interviews, and

document

and artifact analy-sis

. As in all qualitative research, it is also assumedthat educational technology researchers will use andrefine methods with the view that these methods varyin their degree of interactiveness with participants. The

Tamara van Gog, Fred Paas et al.

772

following qualitative methods, along with severalresearch perspectives, are examined next:

• Grounded theory• Participant observations• Nonparticipant observations• Interviews, including group and individual• Document, artifact, and online communica-

tions and activities analysis

Grounded Theory

In their overview of grounded theory, Strauss andCorbin (1994, p. 273) noted that it is “a general meth-odology for developing theory that is grounded in datasystematically gathered and analyzed,” adding that itis sometimes referred to as the

constant comparativemethod

and that it is applicable as well to quantitativeresearch. In grounded theory, the data may come fromobservations, interviews, and video or document anal-ysis, and, as in other qualitative research, these datamay be considered strictly qualitative or may be quan-titative. The purpose of the methodology is to developtheory, through an iterative process of data analysisand theoretical analysis, with verification of hypothe-ses ongoing throughout the study. The researcherbegins a study without completely preconceivednotions about what the research questions should beand collects and analyzes extensive data with an openmind. As the study progresses, he or she continuallyexamines the data for patterns, and the patterns leadthe researcher to build the theory. The researcher con-tinues collecting and examining data until the patternscontinue to repeat and few new patterns emerge. Theresearcher builds the theory from the data, and thetheory is thus built on, or

grounded

in, the phenomena.

Participant Observation

In participant observation, the observer becomes partof the environment, or the cultural context. The hall-mark of participant observation is continual interactionbetween the researcher and the participants; for exam-ple, the study may involve periodic interviews inter-spersed with observations so the researcher can ques-tion the participants and verify perceptions andpatterns. Results of these interviews may then deter-mine what will initially be recorded during observa-tions. Later, after patterns begin to appear in the obser-vational data, the researcher may conduct interviewsasking the participants about these patterns and whythey think they are occurring.

As the researcher cannot observe and record every-thing, in most educational research studies the inves-

tigator determines ahead of time what will be observedand recorded, guided but not limited by the researchquestions. Participant observation is often successfullyused to describe what is happening in a context andwhy it happens. These are questions that cannot beanswered in the standard experiment.

Many researchers have utilized participant obser-vation methods to examine learning processes. Robin-son (1994) observed classes using Channel One in aMidwestern middle school; she focused her observa-tions on the use of the televised news show and thereaction to it from students, teachers, administrators,and parents. Reilly (1994) analyzed video recordingsof both the researcher and students in a project thatinvolved defining a new type of literacy that combinedprint, video, and computer technologies. Higgins andRice (1991) investigated teachers’ perceptions of test-ing. They used triangulation and a variety of methodsto collect data; however, a key feature of the study wasparticipant observation. Researchers observed 6 teach-ers for a sample of 10 hours each and recordedinstances of classroom behaviors that could be classi-fied as assessment. Similarly, Moallem (1994) usedmultiple methods to build an experienced teacher’smodel of teaching and thinking by conducting a seriesof observations and interviews over a 7-month period.

Nonparticipant Observation

Nonparticipant observation is one of several methodsfor collecting data considered to be relatively unobtru-sive. Many recent authors cite the early work of Webbet al. (1966) as laying the groundwork for use of alltypes of unobtrusive measures. Several types of non-participant observation have been identified by Goetzand LeCompte (1984). These include stream-of-behav-ior chronicles recorded in written narratives or usingvideo or audio recordings, proxemics and kinesics (i.e.,the study of uses of social space and movement), andinteraction analysis protocols, typically in the form ofobservations of particular types of behaviors that arecategorized and coded for analysis of patterns. In non-participant observation, observers do not interact to agreat degree with those they are observing. Theresearchers primarily observe and record, using obser-vational forms developed for the study or in the formof extensive field notes; they have no specific roles asparticipants.

Examples of studies in which observations wereconducted that could be considered relatively nonpar-ticipant observation include Savenye and Strand (1989)in the initial pilot test and Savenye (1989) in the sub-sequent larger field test of a multimedia-based sciencecurriculum. Of most concern during implementation

Data Collection and Analysis

773

was how teachers used the curriculum. A careful sam-ple of classroom lessons was recorded using video,and the data were coded; for example, teacher ques-tions were coded, and the results indicated that teach-ers typically used the system pauses to ask recall-levelrather than higher-level questions. Analysis of thecoded behaviors for what teachers added indicated thatmost of the teachers in the sample added examples tothe lessons that would provide relevance for their ownlearners. Of particular value to the developers was thefinding that teachers had a great degree of freedom inusing the curriculum and the students’ learningachievement was still high.

In a mixed-methods study, nonparticipant observa-tions may be used along with more quantitative meth-ods to answer focused research questions about whatlearners do while learning. In a mixed-methods studyinvestigating the effects and use of multimedia learn-ing materials, the researchers collected learning out-come data using periodic tests. They also observedlearners as they worked together. These observationswere video recorded and the records analyzed to exam-ine many learning processes, including students’ levelof cognitive processing, exploratory talk, and collab-orative processing (Olkinuora et al., 2004). Research-ers may also be interested in using observations tostudy what types of choices learners make while theyproceed through a lesson. Klein and colleagues, forinstance, developed an observational instrument usedto examine cooperative learning behaviors in technol-ogy-based lessons (Crooks et al., 1995; Jones et al.,1995; Klein and Pridemore, 1994).

A variation on nonparticipant observations repre-sents a blend with trace-behavior, artifact, or documentanalysis. This technique, known as

read-think-aloudprotocols

, asks learners to describe what they do andwhy they do it (i.e., their thoughts about their pro-cesses) as they proceed through an activity, such as alesson. Smith and Wedman (1988) used this techniqueto analyze learner tracking and choices. Techniquesfor coding are described by Spradley (1980); however,protocol analysis (Ericsson and Simon, 1984) tech-niques could be used on the resulting verbal data.

Issues Related to Conducting Observations

Savenye and Robinson (2004, 2005) have suggestedseveral issues that are critical to using observations tostudying learning. These issues include those relatedto scope, biases and the observer’s role, sampling, andthe use of multiple observers. They caution that aresearcher can become lost in the multitudes of obser-vational data that can be collected, both in person andwhen using audio or video. They recommend limiting

the scope of the study specifically to answering thequestions at hand. Observers must be careful not toinfluence the results of the study; that is, they must notmake things happen that they want to happen. Potentialbias may be handled by simply describing theresearcher’s role in the research report, but investiga-tors will want to examine periodically what their roleis and what type of influences may result from it. Inobservational research, sampling becomes not randombut purposive (Borg and Gall, 1989). For the study tobe valid, the reader should be able to believe that arepresentative sample of involved individuals wasobserved. The multiple realities of any cultural contextshould be represented. If several observers will be usedto collect the data, and their data will be compared oraggregated, problems with reliability of data mayoccur. Observers tend to see and subsequently interpretthe same phenomena in many different ways. Itbecomes necessary to train the observers and to ensurethat observers are recording the same phenomena inthe same ways. When multiple observers are used andbehaviors counted or categorized and tallied, it is desir-able to calculate and report inter-rater reliability.

Interviews

In contrast with the relatively non-interactive, nonpar-ticipant observation methods described earlier, inter-views represent a classic qualitative research methodthat is directly interactive. Interviews may be struc-tured or unstructured and may be conducted in groupsor individually. In an information and communicationtechnologies (ICT) study to investigate how ICT canbe introduced into the context of a traditional school,Demetriadis et al. (2005) conducted a series of semi-structured interviews over 2 years with 15 teachers/mentors who offered technology training to otherteachers.

The cornerstone for conducting good interviews isto be sure one truly listens to respondents and recordswhat they say rather than the researcher’s perceptionsor interpretations. This is a good rule of thumb inqualitative research in general. It is best to maintainthe integrity of the raw data and to make liberal useof the respondents’ own words, including quotes. Mostresearchers, as a study progresses, also maintain fieldnotes that contain interpretations of patterns to berefined and investigated on an ongoing basis.

Many old, adapted, and exciting techniques forstructured interviewing are evolving. One example ofsuch a use of interviews is in the Higgins and Rice(1991) study mentioned earlier. In this study, teacherssorted the types of assessment they had named previ-ously in interviews into sets of assessments that were

Tamara van Gog, Fred Paas et al.

774

most alike; subsequently, multidimensional scalingwas used to analyze these data, yielding a picture ofhow these teachers’ viewed testing. Another type ofstructured interview, mentioned by Goetz andLeCompte (1984), is the interview using projectivetechniques. Photographs, drawings, and other visualsor objects may be used to elicit individuals’ opinionsor feelings.

Instructional planning and design processes havelong been of interest to educational technologyresearchers; for example, using a case-study approach,Reiser and Mory (1991) employed interviews to exam-ine two teachers’ instructional design and planningtechniques. One of the models proposed for the designof complex learning is that of van Merriënboer et al.(1992), who developed the four-component model,which subsequently was further developed as the4C/ID model (van Merriënboer, 1997). Such designmodels have been effectively studied using mixedmethods, including interviews, particularly when thoseprocesses relate to complex learning. How expertdesigners go about complex design tasks has beeninvestigated using both interviews and examination ofthe designers products (Kirschner et al., 2002).

Problem-based instructional design, blendingmany aspects of curriculum, instruction, and mediaoptions (Dijkstra, 2004), could also be productivelystudied using interviews. Interviews to examine learn-ing processes may be conducted individually or ingroups. A specialized group interview method is thefocus group (Morgan, 1996), which is typically con-ducted with relatively similar participants using astructured or semi-structured protocol to examineoverall patterns in learning behaviors, attitudes, orinterests.

Suggestions for heightening the quality of inter-views include employing careful listening and record-ing techniques; taking care to ask probing questionswhen needed; keeping the data in their original form,even after they have been analyzed; being respectfulof participants; and debriefing participants after theinterviews (Savenye and Robinson, 2005).

Document, Artifact, and Online Communications and Activities Analysis

Beyond nonparticipant observation, many unobtrusivemethods exist for collecting information about humanbehaviors. These fall roughly into the categories ofdocument and artifact analyses but overlap with othermethods; for example, verbal or nonverbal behaviorstreams produced during video observations may besubjected to intense microanalysis to answer an almostunlimited number of research questions. Content anal-

ysis, as one example, may be done on these narratives.In the Moallem (1994), Higgins and Rice (1991), andReiser and Mory (1991) studies of teachers’ planning,thinking, behaviors, and conceptions of testing, docu-ments developed by the teachers, such as instructionalplans and actual tests, were collected and analyzed.

Goetz and LeCompte (1984) defined artifacts ofinterest to researchers as things that people make anddo. The artifacts of interest to educational technolo-gists are often written, but computer and online trailsof behavior are the objects of analysis as well. Exam-ples of artifacts that may help to illuminate researchquestions include textbooks and other instructionalmaterials, such as media materials; memos, letters, andnow e-mail records, as well as logs of meetings andactivities; demographic information, such as enroll-ment, attendance, and detailed information about par-ticipants; and personal logs participants may keep.

In studies in educational technology, researchersoften analyze the patterns of learner pathways, deci-sions, and choices they make as they proceed throughcomputer-based lessons (Savenye et al., 1996; Shin etal., 1994). Content analysis of prose in any form mayalso be considered to fall into this artifact-and-docu-ment category of qualitative methodology. Lawless(1994) used concept maps developed by students inthe Open University to check for student understand-ing. Entries in students’ journals were analyzed byPerez-Prado and Thirunarayanan (2002) to learn aboutstudents’ perceptions of online and on-ground versionsof the same college course. Espey (2000) studied thecontent of a school district technology plan.

Methods for Analyzing Qualitative Data

One of the major hallmarks of conducting qualitativeresearch is that data are analyzed continually, through-out the study, from conceptualization through theentire data collection phase and into the interpretationand writing phases. In fact, Goetz and LeCompte(1984) described the processes of analyzing and writ-ing together in what they called analysis and interpre-tation.

Data Reduction

Goetz and LeCompte (1994) described the conceptualbasis for reducing and condensing data in an ongoingstyle as the study progresses. Researchers theorize asthe study begins and build and continually test theoriesbased on observed patterns in data. Goetz andLeCompte described the analytic procedures research-ers use to determine what the data mean. These pro-cedures involve looking for patterns, links, and rela-tionships. In contrast to experimental research, the

Data Collection and Analysis

775

qualitative researcher engages in speculation whilelooking for meaning in data; this speculation will leadthe researcher to make new observations, conduct newinterviews, and look more deeply for new patterns inthis recursive process. It is advisable to collect data inits raw, detailed form and then record patterns. Thisenables the researcher later to analyze the original datain different ways, perhaps to answer deeper questionsthan originally conceived. It should be noted that vir-tually all researchers who use an ethnographicapproach advocate writing up field notes immediatelyafter leaving the research site each day. If researchershave collected documents from participants, such aslogs, journals, diaries, memos, and letters, these canalso be analyzed as raw data. Similarly, official docu-ments of an organization can be subjected to analysis.Collecting data in the form of photographs, films, andvideos, either produced by participants or theresearcher, has a long tradition in anthropology andeducation. These data, too, can be analyzed for mean-ing. (Bellman and Jules-Rosette, 1977; Bogaart andKetelaar, 1983; Bogdan and Biklen, 1992; Collier andCollier, 1986; Heider, 1976; Hockings, 1975).

Coding Data

Early in the study, the researcher will begin to scanrecorded data and to develop categories of phenomena.These categories are usually called

codes

. They enablethe researcher to manage data by labeling, storing, andretrieving it according to the codes. Miles and Huber-man (1994) suggested that data can be coded descrip-tively or interpretively. Bogdan and Biklen (1992) rec-ommended reading data over at least several times tobegin to develop a coding scheme. In one of the manyexamples he provided, Spradley (1979) described inextensive detail how to code and analyze interviewdata, which are semantic data as are most qualitativedata. He also described how to construct domain, struc-tural, taxonomic, and componential analyses.

Data Management

Analysis of data requires continually examining, sort-ing, and reexamining data. Qualitative researchers usemany means to organize, retrieve, and analyze theirdata. To code data, many researchers simply use note-books and boxes of paper, which can then be resortedand analyzed on an ongoing basis. Computers havelong been used for managing and analyzing qualitativedata. Several resources exist to aid the researcher infinding and using software for data analysis and man-agement, including books (Weitzman and Miles, 1995)and websites that discuss and evaluate research soft-ware (American Evaluation Association, 2007; Cuneo,2000; Horber, 2006).

Writing the Research Report

In some respects, writing a report of a study that usesmixed-methods may not differ greatly from writing areport summarizing a more traditional experimentalstudy; for example, a standard format for preparing aresearch report includes an introduction, literaturereview, description of methods, and presentation offindings, completed by a summary and discussion(Borg and Gall, 1989). A mixed-methods study, how-ever, allows the researcher the opportunity to createsections of the report that may expand on the tradi-tional. The quantitative findings may be reported in themanner of an experimental study (Ross and Morrison,2004). The qualitative components of research reportstypically will be woven around a theme or centralmessage and will include an introduction, core mate-rial, and conclusion (Bogdan and Biklen, 1992). Qual-itative findings may take the form of a series of themesfrom interview data or the form of a case study, as inthe Reiser and Mory (1991) study. For a case study,the report may include considerable quantification andtables of enumerated data, or it may take a strictlynarrative form. Recent studies have been reported inmore nontraditional forms, such as stories, plays, andpoems that show participants’ views. Suggestions forwriting up qualitative research are many (Meloy, 1994;Van Maanen, 1988; Wolcott, 1990).

In addition to the studies mentioned earlier, manyexcellent examples of mixed-methods studies may beexamined to see the various ways in which the resultsof these studies have been written. Seel and colleagues(2000), in an investigation of mental models andmodel-centered learning environments, used quantita-tive learning measures that included pretests, post-tests, and a measure of the stability of learning fourmonths after the instruction. They also used a receptiveinterview technique they called

causal explanations

toinvestigate learners’ mental models and learning pro-cesses. In this and subsequent studies, Seel (2004) alsoinvestigated learners’ mental models of dynamic sys-tems using causal diagrams that learners developedand teach-back procedures, in which a student explainsa model to another student and this epistemic discourseis then examined.

Conclusion

The challenges to educational technology researcherswho choose to use multiple methods to answer theirquestions are many, but the outcome of choosingmixed methods has great potential. Issues of validity,reliability, and generalizability are central to experi-mental research (Ross and Morrison, 2004) and mixed-

Tamara van Gog, Fred Paas et al.

776

methods research; however, these concerns areaddressed quite differently when using qualitativemethods and techniques. Suggestions and criteria forevaluating the quality of mixed-methods studies andresearch activities may be adapted from those sug-gested by Savenye and Robinson (2005):

• Learn as much as possible about the contextof the study, and build in enough time toconduct the study well.

• Learn more about the methods to be used,and train yourself in these methods.

• Conduct pilot studies whenever possible.• Use

triangulation

(simply put, use multipledata sources to yield deeper, more true viewsof the findings).

• Be ethical in all ways when conductingresearch.

• Listen carefully to participants, and carefullyrecord what they say and do.

• Keep good records, including audit trails.• Analyze data continually throughout the

study, and consider having other researchersand participants review your themes, pat-terns, and findings to verify them.

• Describe well all methods, decisions,assumptions, and biases.

• Using the appropriate methods (and balanceof methods when using mixed methods) isthe key to successful educational research.

ASSESSMENT OF GROUP LEARNING PROCESSES

Tristan E. Johnson and Debra L. O’Connor

Similar to organizations that rely on groups of workersto address a variety of difficult and challenging tasks(Salas and Fiore, 2004), groups are formed in variouslearning settings to meet instructional needs as well asto exploit the pedagogical, learning, and pragmaticbenefits associated with group learning (Stahl, 2006).In educational settings, small groups have been typi-cally used to promote participation and enhance learn-ing. One of the main reasons for creating learninggroups is to facilitate the development of professionalskills that are promoted from group learning, such ascommunication, teamwork, decision making, leader-ship, valuing others, problem solving, negotiation,thinking creatively, and working as a member of agroup (Carnevale et al., 1989).

Group learning processes are the interactions oftwo or more individuals with themselves and their

environment with the intent to change knowledge,skill, or attitude. We use the term

group

to refer to thenotion of small groups and not large groups character-ized as large organizations (Levine and Moreland,1990; Woods et al., 2000). Interest in group learningprocesses can be found not only in traditional educa-tional settings such as elementary and secondaryschools but also in workplace settings, including themilitary, industry, business, and even sports (Guzzoand Shea, 1992; Levine and Moreland, 1990).

There are several reasons to assess group learningprocesses. These include the need to measure grouplearning as a process outcome and to capture the learn-ing process to provide feedback to the group with theintent to improve team interactions and therebyimprove overall team performance. Studies looking atgroup processes have led to improved understandingabout what groups do and how and why they do whatthey do (Salas and Cannon-Bowers, 2000). Anotherreason to assess group learning processes is to capturehighly successful group process behaviors to developan interaction framework that could then be used toinform the design and development of group instruc-tional strategies. Further, because the roles and use ofgroups in supporting and facilitating learning haveincreased, the interest in studying the underlying groupmechanisms has increased.

Many different types of data collection and analy-sis methods can be used to assess group learning pro-cesses. The purpose of this section is to describe thesemethods by: (1) clarifying how these methods are sim-ilar to and different from single learner methods, (2)describing a framework of data collection and analysistechniques, and (3) presenting analysis considerationsspecific to studying group learning processes alongwith several examples of existing methodologies.

Group Learning Processes Compared with Individual Learning Processes and Group Performance

Traditional research on learning processes has focusedon psychological perspectives using traditional psy-chological methods. The unit of analysis for thesemethods emphasizes the behavior or mental activity ofan individual concentrating on learning, instructionaloutcomes, meaning making, or cognition, all at anindividual level (Koschmann, 1996; Stahl, 2006). Incontrast, the framework for group research focuses onresearch traditions of multiple disciplines, such ascommunication, information, sociology, linguistics,military, human factors, and medicine, as well as fieldsof applied psychology such as instructional, educa-tional, social, industrial, and organization psychology.

Data Collection and Analysis

777

As a whole, these disciplines extend the traditionalpsychological perspectives and seek understandingrelated to interaction, spoken language, written lan-guage, culture, and other aspects related to social sit-uations. Stahl (2006) pointed out that individuals oftenthink and learn apart from others, but learning andthinking in isolation are still conditioned and mediatedby important social considerations.

Group research considers various social dimen-sions but primarily focuses on either group perfor-mance or group learning. Group learning research isfocused in typical learning settings. We often see chil-dren and youth engaged in group learning in a schoolsetting. Adult group learning is found in post-second-ary education, professional schools, vocationalschools, colleges and universities, and training ses-sions, as well as in on-the-job training environments.A number of learning methods have been used in allof these settings. A few specific strategies that usegroups to facilitate learning include cooperative learn-ing, collaborative learning (Johnson et al., 2000;Salomon and Perkins, 1998), computer-supported col-laborative learning (Stahl, 2006), and team-basedlearning (Michaelsen, 2004). General terms used torefer to the use of multiple person learning activitiesinclude

learning groups

,

team learning, and grouplearning. Often, these terms and specific strategies areused interchangeably and sometimes not in the waysjust described.

In addition to learning groups, adults engage ingroups activities in performance (workplace) settings.Although a distinction is made between learning andperformance, the processes are similar for groupswhose primary intent is to learn and for those focusedon performing. The literature on workplace groups(whose primary focus is on completing a task) offersa number of techniques that can be used to study grouplearning processes much like the literature on individ-ual learning. Group learning process methods includethe various methods typically found when studyingindividuals and also have additional methodologiesunique to studying groups.

Methodological Framework: Direct and Indirect Process Measures

When studying group learning processes, three generalcategories of measures can be employed: (1) the pro-cess or action of a group (direct process measures),(2) a state or a point in time of a group (indirect processmeasure), and (3) an outcome or performance of agroup (indirect non-process measure). Direct processmeasures are techniques that directly capture the pro-cess of a group. These measures are continuous in

nature and capture data across time by recording thesound and sight of the group interactions. Examplesof these measures include recording the spoken lan-guage, written language, and visible interactions.These recording can be video or audio recordings, aswell as observation notes.

Indirect process measures use techniques that indi-rectly capture group processes. These measures arediscrete and capture a state or condition of the groupprocesses at a particular point in time, either during orafter group activity. These measures involve capturinggroup member or observer perceptions and reactionsthat focus on group processes. Examples of these mea-sures involve interviews, surveys, and questionnairesthat focus on explicating the nature of a group learningprocess at a given point in time. These measures focuson collecting group member responses about the pro-cess and are specifically not a direct observation of theprocess.

Indirect non-process measures capture grouplearning data relating to outcomes, products, perfor-mance. These are not measures of the actual processbut are measures that might be related to group pro-cesses. They may include group characteristics suchas demographics, beliefs, efficacy, preferences, size,background, experience, diversity, and trust (Mathieuet al., 2000). These types of measures have the poten-tial to explicate and support the nature of group learn-ing processes. These measures are focused on collect-ing products or performance scores as well assoliciting group member responses about group char-acteristics. These measures are not a direct observationof the group learning process. Examples of these mea-sures include performance scores, product evaluations,surveys, questionnaires, interview transcripts, andgroup member knowledge structures. These measuresfocus on explicating the nature of a given group’s non-process characteristics.

When considering how to assess group learningprocesses, many of the techniques are very similar orthe same as those used to study individuals. Grouplearning process measures can be collected at both theindividual and group levels (O’Neil et al., 2000; Webb,1982). Because the techniques can be similar or iden-tical for individuals and groups, some confusion mayarise when it is realized that individual-level data arenot in a form that can be analyzed; the data must begroup-level data (group dataset; see Figure 55.1) foranalysis.

When designing a study on group learning pro-cesses, various measurement techniques can be useddepending on the type of questions being asked.Although numerous possibilities are associated withthe assessment of group learning processes, three

Tamara van Gog, Fred Paas et al.

778

elements must be considered when deciding on whattechniques to use: data collection, data manipulation,or data analysis.

Data collection techniques involve capturing oreliciting data related to group learning processes at anindividual or group level. Data collected at the grouplevel (capturing group interactions or eliciting groupdata) yield holistic group datasets (Figure 55.1). Whenthe collected data are in this format, it is not necessaryto manipulate the data. In this form, the data are readyto be analyzed. If, however, data are collected at theindividual level, then the data must be manipulated,typically via aggregation (Stahl, 2006), to create adataset that represents the group (collective groupdataset) prior to data analysis (Figure 55.1). Collectingdata at the individual level involves collecting individ-ual group members’ data and then transforming theindividual data to an appropriate form (collective groupdataset) for analysis (see Figure 55.1). This techniqueof creating a collective group dataset from individualdata is similar to a process referred to as analysis con-structed dataset creation (O’Connor and Johnson,2004). In this form, the data are ready to be analyzed.

Data Collection and Analysis Techniques

When considering the different group learning processassessment techniques, they can be classified based onthe type of measure (continuous or discrete). The cor-responding analytical techniques that can be used aredependent on the collected data. Many techniques havebeen used to assess groups. The following section pre-sents the major categories of techniques based on theirability to measure group processes directly (continu-ous measures) or indirectly (discrete measures). Table55.1 summarizes the nature of data collection, manip-ulation, and analysis for the three major grouping ofmeasurement techniques: direct process measures andthe two variations of indirect process measures.

Direct Process Data Collection and Analysis

Direct process measurement techniques focus specifi-cally on capturing the continuous process interactionsin groups (O’Neil et al., 2000). These techniquesinclude measures of auditory and visual interactions.Several data collection and data analysis techniques

Figure 55.1 Alternative process measures for assessing group learning processes.

A. Direct capture of group level process datayielding holistic group dataset

Direct Process Measures

A. Indirect elicitation of group level processdata yielding holistic group dataset

B. Indirect elicitation of individual level processdata yielding collective group dataset

Indirect Process Measures

Gro

up p

roce

ss d

ata

capt

ure

over

tim

e

Holistic GroupDataset

Holistic GroupDataset

GroupInteractions

Group Constructs

Gro

up p

roce

ss d

ata

elic

itatio

nsp

ecifi

c po

int i

n tim

e

Indi

vidu

al le

vel d

ata

elic

itatio

nsp

ecifi

c po

int i

n tim

e

Ana

lysi

s of

indi

vidu

al d

ata

to c

onst

ruct

colle

ctiv

e gr

oup

data

set

ID

ID

ID

IDID

ID

ID

ID

ID ID

Gro

up L

evel

Dat

a

Collective GroupDataset

Data Collection and Analysis

779

are related to measuring the group learning processesdirectly. The two key techniques for capturing actionsand language are (1) technology and (2) observation.Using technology to capture group processes can pro-vide researchers with data different from the observa-tion data. Researchers can combine the use of technol-ogy and observation simultaneously to capture groupprocesses (O’Neil et al., 2000; Paterson et al., 2003).These data can be analyzed in the captured form ortranscribed into a text form.

Use of Technology to Capture Group Process

Spoken Language Processes

Techniques to capture a group’s spoken languageinvolve either audio recording or video recording(Schweiger, 1986; Willis, 2002) the spoken languagethat occurs during group interactions (Pavitt, 1998). Itcan also include the spoken language of group mem-bers as they explain their thinking during group pro-cesses in the form of a think-aloud protocol (Ericssonand Simon, 1993).

Written Language Processes

Group learning processes are typically thought of asusing spoken language, but new communication toolsare available that allow groups to communicate andinteract using written language. Examples include chatboards, whiteboards (although these are not limited towritten language), and discussion boards. Also, com-puter-supported collaborative learning (CSCL) is acomputer-based network system that supports grouplearning interactions (Stahl, 2006).

Visible Processes

Techniques to capture a group’s visible interactionsinclude video recording of the behaviors and actionsthat occur in group interactions (Losada, 1990; Prichard,2006; Schweiger, 1986; Sy, 2005; Willis et al., 2002).

Use of Observations to Capture Group Process

Although the use of technology may capture data witha high level of realism, some group events can be bettercaptured by humans because of their ability to observemore than what can be captured by technology. Obser-vations ideally are carried out with a set of carefullydeveloped observation protocols to help focus theobservers and to teach them how to describe key pro-cess events. Observers are a good source for capturingvarious types of information (Patton, 2001), such assettings, human and social environments, group activ-ities, style and types of language used, nonverbal com-munication, and events that are not ordinary. Observa-tional data, for example, are important for studyinggroup learning process (Battistich et al., 1993; Lin-gard, 2002; Sy, 2005; Webb, 1982; Willis et al., 2002).The type of information typically captured includeslocation, organization, activities, and behaviors (Bat-tistich et al., 1993; Losada, 1990), as well as the fre-quency and quality of interactions (Battistich, 1993).

Direct Process Data Analysis

Data that are a direct measure of group processes arecaptured in a holistic format that is ready to be ana-lyzed (Figure 55.1 and Table 55.1). Several analytical

TABLE 55.1Summary of Measurement Techniques Used to Assess Group Learning Processes

Direct Process Measure Techniques—Holistic Group DatasetData collection Directly capturing group learning processes involves techniques that are used by all group members at the same time.Data manipulation Data manipulation not needed because data is captured at group level (holistic group dataset).Data analysis Continuous process techniques focus on interactions of group members generating qualitative and quantitative findings

associated with continuous measures.

Indirect Process Measure Techniques—Holistic Group DatasetData collection Indirectly eliciting group learning processes involves techniques that are used by all group members at the same time.Data manipulation Data manipulation not needed because data is captured at group level (holistic group dataset).Data analysis Discrete process techniques are dependent on dataset characteristics (focus on process or performance). They can include

qualitative and quantitative data analysis techniques associated with discrete measures.

Indirect Process Measure Techniques—Collective Group DatasetData collection Indirectly eliciting group learning processes involves techniques that are used by each group member separately.Data manipulation Individual data is then aggregated to create a dataset that represents the group data (analysis constructed).Data analysis Discrete process techniques are dependent on dataset characteristics (focus on process or performance). They can include

qualitative and quantitative data analysis techniques associated with discrete measures.

Tamara van Gog, Fred Paas et al.

780

techniques are available that can be used for analyzinggroup data, particularly direct process data. The fol-lowing list is a representative sample of the analysistechniques applied to spoken or written language, vis-ible interaction, and observational data: sequentialanalysis of group interactions (Bowers, 2006; Jeong,2003; Rourke et al., 2001), analysis of interaction com-munication (Bales, 1950; Qureshi, 1995), communica-tion analysis (Bowers et al., 1998), anticipation ratio(Eccles and Tenenbaum, 2004), in-process coordina-tion (Eccles and Tenenbaum, 2004), discourse analysis(Aviv, 2003; Hara et al., 2000), content analysis (Aviv,2003; Hara et al., 2000), cohesion analysis (Aviv,2003), and protocol analysis (Ericsson and Simon,1980, 1993). Visible interactions techniques alsoinclude using a behavior time series analysis (Losada,1990). This analysis involves looking at dominate vs.submissive, friendly vs. unfriendly, or task-oriented vs.emotionally expressive behavior. For observationaldata, researchers focus on various qualitative tech-niques associated with naturalistic observations (Adlerand Adler, 1994; Patton, 2001). Some common tasksassociated with this type of analysis include group andcharacter sequence analysis and assertion evaluation(Garfinkel, 1967; Jorgensen, 1989).

Indirect Process Data Collection and Analysis

Many data collection techniques are related to measur-ing the group learning processes indirectly. Indirectgroup process, characteristic, and product measurementtechniques elicit group information at a specific pointin time. These discrete measures do not capture groupprocesses directly but elicit data that describe groupprocesses or process-related data such as group charac-teristics or group outcomes (things that may have arelation to the group processes). The three key types ofdata related to group learning processes are indirectgroup process data, group characteristic data, andgroup product data, within which specific factors canbe measured. Indirect group process data describe groupprocesses and can include factors such as group com-munication (verbal/nonverbal), group actions, groupbehaviors, group performance, and group processes.Group characteristic data, relating to group processes,include factors such as group knowledge, group skills,group efficacy, group attitudes, group member roles,group environment, and group leadership. The key elic-itation techniques for both of these types of indirect datainclude interviews, questionnaires, and conceptualmethods (Cooke et al., 2000). Each technique can befocused on group process or group characteristics. Afterreviewing methods to analyze group processes, we dis-cuss methods for analyzing group products.

Interviews

Interviews are a good technique for collecting gen-eral data about a group. The various types of inter-viewing techniques include unstructured interviews(Lingard, 2002) and more structured interviews,which are guided by a predetermined format that canprovide either a rigid or loosely constrained format.Structured interviews require more time to developbut are more systematic (Cooke et al., 2000). Inter-views are typically conducted with a single personat a time; however, is not uncommon to conduct afocus group, where the entire group is simultaneouslyinterviewed. In a focus group, a facilitator interviewsby leading a free and open group discussion (Myl-lyaho et al., 2004). The analysis of interview datarequires basic qualitative data analysis techniques(Adler and Adler, 1994; Patton, 2001). Conductinginterviews can be straightforward, but the analysis ofthe data relies tremendously on the interviewer’sinterpretations (Langan-Fox, 2000). Key steps to ana-lyzing interviews are coding the data for themes (Lin-gard, 2002) and then studying the codes for meaning.Each phrase is closely examined to discover impor-tant concepts and reveal overall relationships. For amore holistic approach to analysis, a group interviewtechnique can be used to discuss findings and togenerate collective meaning given specific questions(Myllyaho et al., 2004). Content analysis is com-monly used to analyze written statements (Langan-Fox and Tan, 1997). Other key analysis techniquesfocus on process analysis (Fussell et al., 1998; Pri-chard, 2006), specifically looking at discussion top-ics, group coordination, group cognitive overload,and analysis of task process. Other group character-istic analysis techniques could include role analysisand power analysis (Aviv, 2003).

Questionnaires

Questionnaires are a commonly used technique to col-lect data about group processes (O’Neil et al., 2000;Sy, 2005; Webb, 1982; Willis et al., 2002). Similar tohighly structured interviews, questionnaires can alsolook at relationship-oriented processes and task-ori-ented processes (Urch Druskat and Kayes, 2000).Questionnaires can be either closed ended or openended (Alavi, 1994). Open-ended questionnaires aremore closely related to a structured interview; the datacollected using this format can be focused on groupprocesses as well as group characteristics. Closed-ended questionnaires offer a limited set of responses.The limited responses involve some form of scale thatcould be nominal, ordinal, interval, or ratio. Data from

Data Collection and Analysis

781

this format have a limited ability to capture groupprocess data, but this is the typical format for collectingdata associated with group characteristics such associal space, group efficacy scales, group skills, groupefficacy, group attitudes, group member roles, leader-ship, and group knowledge. Data from questionnairescan be analyzed much like interview data if the itemsare open ended. If the questionnaire is closed ended,then the instrument must be scrutinized for reliabilityprior to data analysis. Assuming sufficient evidence ofreliability, analyzing data from closed-ended question-naires involves interpreting a measurement based on aparticular theoretical construct. The types of data anal-ysis techniques that are appropriate depend on the typeof scale used in a questionnaire (nominal, ordinal,interval, or ratio).

Conceptual Methods

Conceptual methods involve assessing individual orgroup understanding about a given topic. Several datacollection techniques are utilized to elicit knowledgestructures. A review of the literature by Langan-Foxet al. (2000) found that knowledge in teams has beeninvestigated by several qualitative and quantitativemethods, including various elicitation techniques (e.g.,cognitive interviewing, observation, card sorting,causal mapping, pairwise ratings) and representationtechniques (e.g., MDS, distance ratio formulas, Path-finder) that utilize aggregate methods.

One of the most common methods for assessinggroup knowledge is the use of concept maps (Herl etal., 1999; Ifenthaler, 2005; O’Connor and Johnson,2004; O’Neil et al., 2000). Through concept mapping,similarity of group mental models can be measured interms of the proportion of nodes and links sharedbetween one concept map (mental model) and another(Rowe and Cooke, 1995). Several researchers believethat group knowledge and group processes are linked.Research has shown that specific group interactionssuch as communication and coordination mediate thedevelopment of group knowledge and thus mediategroup performance (Mathieu et al., 2000). Group inter-actions coupled with group shared knowledge are apredominate force in the construct of group cognition.As teammates interact, they begin to share knowledge,thus enabling them to interpret cues in similar ways,make compatible decisions, and take proper actions(Klimoski and Mohammed, 1994). Group sharedknowledge can help group members explain othermembers’ actions, understand what is occurring withthe task, develop accurate expectations about futuremember actions and task states, and communicatemeanings efficiently.

Analyzing knowledge data can certainly involvequalitative methods. These methods tend to offer moredetail and depth of information than might be foundthrough statistical analyses (Miles and Huberman,1994; Patton, 2001). Using qualitative analysis, weobtain greater understanding about the relationshipsbetween concepts within the context of the individualmental model. We also gain better insight into thesharedness of understanding between group members.Quantitative data analysis techniques provideresearchers with tools to draw inferences on the changein group knowledge as well as statistically proving achange or variation in knowledge structures.

Several methods have been developed to analyzedata regarding group knowledge. Most of them includean elicitation and analysis component. Some tech-niques use mixed methods such as the analysis-con-structed shared mental model (ACSMM) (O’Connorand Johnson, 2004), DEEP (Spector and Koszalka,2004), and social network analysis (Qureshi, 1995).Other methods are quantitative in nature, such as theStanford Microarray Database (SMD) (Ifenthaler,2005), Model Inspection Trace of Concepts of Rela-tions (MITOCAR) (Pirnay-Dummer, 2006), multidi-mensional scaling (MDS), distance ratio formula, andPathfinder (Cooke et al., 2000).

Group product data are the artifacts created froma group interaction. Group products typically do notcapture the process that a group undertook to createthe product but is evidence of the group’s abilities.Many research studies that claim to study group pro-cesses only capture group product data. This is due inpart to the claim that is made regarding the linkbetween group products and group processes and char-acteristics (Cooke et al., 2000; Lesh and Dorr, 2003;Mathieu et al., 2000; Salas and Cannon-Bowers, 2001;Schweiger, 1986). Although some evidence suggeststhis relationship in a few areas, more research isrequired to substantiate this claim.

Analysis of the group product data involves tech-niques used when analyzing individual products. Ana-lyzing the quality of a product can be facilitated bythe use of specified criteria. These criteria are used tocreate a product rating scale. Rating scales can includenumerical scales, descriptive scales, or checklists.Numerical scales present a range of numbers (usuallysequential) that are defined by a label on either end.Each item in the questionnaire is rated according tothe numerical scale. There is no specific definition ofwhat the various numbers mean, except for the indi-cators at the ends of the scale; for example, a scalefrom 1 (very weak) to 5 (very strong) is very subjectivebut relatively easy to create. Descriptive scales aresimilar, but focus on verbal statements. Numbers can

Tamara van Gog, Fred Paas et al.

782

be assigned to each statement. Statements are typicallyin a logical order. A common example of a descriptivescale is “strongly disagree (1), disagree (2), neutral(3), agree (4), and strongly agree (5).” A checklist canbe developed to delineate specific qualities for a givencriterion. This can provide a high level of reliabilitybecause a specific quality is presented and the ratersimply indicates whether an item is present or not. Thevalidity of a checklist requires a careful task analysisto ensure scale validity.

General Considerations for Group Learning Process Assessment

In assessment of group learning processes, researchersshould consider several issues. These issues fall intofour categories: (1) group setting, (2) variance in groupmember participation, (3) overall approach to data col-lection and analysis, and (4) thresholds.

Group Setting

To a somewhat lesser degree than the other three issues,group settings should be considered when determiningthe best approach and methods for a particular study.Finalizing which techniques to use may depend onwhether the groups will be working in a group learningsetting or individually and then coming together as agroup at various points. Some groups may meet in face-to-face settings or other settings that allow for synchro-nous interactions; however, distributed groups havetechnology-enabled synchronous and asynchronoustools or asynchronous interactions only. These varia-tions in group setting can influence the selection ofspecific group learning process assessment methods.

Variance in Group Member Participation

When collecting multiple sets of data over time,researchers should consider how they will deal with avariance in group member participation (group mem-bers absent during data collection or new membersjoining the group midstream). There are benefits andconsequences for any decision made, but it is neces-sary to determine whether or not all data collected willbe used, regardless of the group members present atthe time of data collection. Researchers who choosenot to use all data might consider using only those datasubmitted by group members who were present duringall data collection sessions (O’Connor and Johnson,2004). If data analysis will be based on a consistentnumber of group members, it will be necessary toconsider how to handle data from groups that may nothave the same group members present in each measure

point. Also, with fluctuations in group compositions,it is important to consider the overall group demo-graphics and possible influences of individual groupmembers on the group as a whole.

Overall Approach to Data Collection and Analysis

In a holistic approach, individual group members worktogether and one dataset represents the group as awhole; however, the processes of group interactionnaturally changes how individual group membersthink. The alternative is to capture individual measuresand perform some type of aggregate analysis methodsto represent the group; however, researchers shouldconsider whether or not the aggregate would be a truerepresentation of the group itself.

Thresholds

When using indirect measures that require an aggre-gation or manipulation of data prior to analysis,researchers will have to consider such issues as simi-larity scores. These scores define the parameters fordeciding if responses from one individual group mem-ber are similar to the responses from other group mem-bers (O’Connor and Johnson, 2004; Rentsch and Hall,1994; Rentsch et al., in press); for example, will therating of 3.5 on a 5-point scale be considered similarto a rating of 4.0 or a rating of 3.0? When aggregatingindividual data into a representation of the group, willthe study look only at groups where a certain percent-age of the group responded to measures (Ancona andCaldwell, 1991)? How will what is similar or sharedacross individual group members be determined? Willthe analysis use counts (x number of group members)or percentage of the group (e.g., 50%)? What level ofsimilarity or sensitivity will be used to compare acrossgroups (O’Connor and Johnson, 2004)—50%? 75%?What about the level of mean responses in question-naires (Urch Druskat and Kayes, 2000)? Many differ-ent thresholds that must be considered when assessinggroup learning processes and analyzing group data arenot concerns when studying individuals.

Conclusion

Assessment of group learning processes is more com-plex than assessment of individual learning processesbecause of the additional considerations necessary forselecting data collection and analysis methods. As inmost research, the “very type of experiment set up byresearchers will determine the type of data and there-fore what can be done with such data in analysis andinterpretation” (Langan-Fox et al., 2004, p. 348).

Data Collection and Analysis

783

Indeed, it is logical to allow the specific research ques-tions to drive the identification of data collection meth-ods. The selection of research questions and subse-quent identification of data collection methods willnaturally place limitations on suitable data analysismethods. Careful planning for the study of group learn-ing processes, from the selection of direct or indirectassessment measures to considering the possible influ-ences group characteristics may have on group learn-ing processes, is essential.

Because of the many possible combinations ofmethods and techniques available for studying grouplearning processes, some feel that research has not yetdone enough to determine the best methods for studyinggroups (Langan-Fox et al., 2000). Many group learningprocess studies consider only outcome measures anddo not directly study group learning processes (Worchelet al., 1992). Others look only at portions of the groupprocess or attempt to assess group learning processesthrough a comparison of discrete measures of the groupover time. Still other methods for data collection andanalysis of group data are being developed as we speak(Seel, 1999). No one best method has been identifiedfor analyzing group learning process data, so we sug-gest that studies should consider utilizing multiplemethods to obtain a more comprehensive picture ofgroup learning processes. If we are to better understandthe notion of group learning processes and utilize thatunderstanding in design, implementation, and manage-ment of learning groups in the future, then we mustaddress the basic issues that are related with conceptu-alization and measurement (Langan-Fox et al., 2004).

ASSESSMENT OF COMPLEX PERFORMANCE

Tamara van Gog, Remy M. J. P. Rikers, and Paul Ayres

This chapter section discusses assessment of complexperformance from an educational research perspective,in terms of data collection and analysis. It begins witha short introduction on complex performance and adiscussion on the various issues related to selectingand defining appropriate assessment tasks, criteria, andstandards that give meaning to the assessment.Although many of the issues discussed here are alsoimportant for performance assessment in educationalpractice, readers particularly interested in this topicmight want to refer, for example, to Chapter 44 in thisHandbook or the edited books by Birenbaum andDochy (1996) and Segers et al. (2003). For a discus-sion of laboratory setups for data collection, see thesection by Duley et al. in this chapter.

Complex performance can be defined as perfor-mance on complex tasks; however, definitions of taskcomplexity differ: Campbell (1988), in a review of theliterature, categorized complexity as primarily subjec-tive (psychological) or objective (function of objectivetask characteristics), or as an interaction betweenobjective and subjective (individual) characteristics.Campbell reported that the subjective perspectiveemphasized psychological dimensions such as tasksignificance and identity. On the other hand, objectivedefinitions consider the degree of structuredness of atask or of the possibility of multiple solution paths(Byström and Järvelin, 1995; Campbell, 1988). Whenthe process of task performance can be described indetail a priori (very structured), a task is consideredless complex; in contrast, when there is a great dealof uncertainty, it is considered highly complex. Simi-larly, complexity can vary according to the number ofsolutions paths possible. When there is just one correctsolution path, a task is considered less complex thanwhen multiple paths can lead to a correct solution orwhen multiple solutions are possible.

For the interaction category, Campbell (1988)argued that both the problem solver and the task areimportant. By defining task complexity in terms ofcognitive load (Chandler and Sweller, 1991; Sweller,1988; Sweller et al., 1998) an example of this interac-tion can readily be shown. From a cognitive load per-spective, complexity is defined by the number of inter-acting information elements a task contains, whichhave to be simultaneously handled in working mem-ory. As such, complexity is influenced by expertise(i.e., subjective, individual characteristic); what maybe a complex task for a novice may be a simple taskfor an expert, because a number of elements have beencombined into a cognitive schema that can be handledas a single element in the expert’s working memory.Tasks that are highly complex according to the objec-tive definition (i.e., lack of structuredness and multiplepossible solution paths) will also be complex in theinteraction definition; however, according to the latterdefinition, even tasks with a high degree of structured-ness or one correct solution path can be consideredcomplex, given a high number of interacting informa-tion elements or low performer expertise.

In this chapter section, we limit our discussion tomethods of assessment of complex performance oncognitive tasks. What is important to note throughoutthis discussion is that the methods described here canbe used to assess (improvements in) complex perfor-mance both during training and after training, depend-ing on the research questions one seeks to address.After training, performance assessment usually has thegoal to assess learning, which is a goal of many studies

Tamara van Gog, Fred Paas et al.

784

in education and instructional design. If one seeks toassess learning, one must be careful not to concludethat participants have learned because their perfor-mance improved during training. As Bjork (1999)points out, depending on the training conditions, highperformance gains during training may not be associ-ated with learning, whereas low performance gainsmay be. It is important, therefore, to assess learningon retention or transfer tasks, instead of on practicetasks. Selection of appropriate assessment tasks is animportant issue, which is addressed in the next section.

Assessment Tasks

An essential step in the assessment of performance isthe identification of a collection of representative tasksthat capture those aspects of the participant’s knowl-edge and skills that a study seeks to address (Ericsson,2002). Important factors for representativeness of thecollection of assessment tasks are authenticity, num-ber, and duration of tasks, all of which are highlyinfluenced by the characteristics of the domain ofstudy.

Selecting tasks that adequately capture perfor-mance often turns out to be very difficult. Selectingatypical or artificial tasks may even impede learnersin demonstrating their true level of understanding. Tra-ditional means to evaluate the learners’ knowledge orskills have been criticized because they often fail todemonstrate that the learner can actually do somethingin real life or in their future workplace with theirknowledge and skills they have acquired during theirtraining (see, for example, Anderson et al., 1996; Shep-ard, 2000; Thompson, 2001).

The argument for the use of authentic tasks toassess the learners’ understanding has a long tradition.It started in the days of John Dewey (1916) and con-tinues to the present day (Merrill, 2002; van Merriën-boer, 1997); however, complete authenticity of assess-ment tasks may be difficult to realize in researchsettings, because the structuredness of the domainplays a role here. For structured domains such as chessand bridge, the same conditions can be reproduced ina research laboratory as those under which perfor-mance normally takes place; for less or ill-structureddomains, this is difficult or even impossible to do(Ericsson and Lehmann, 1996). Nonetheless, one canalways strive for a high degree of authenticity. Gulikerset al. (2004) defined authentic assessment as a five-dimensional construct (i.e., task, social context, phys-ical context, form/result, and criteria) that can varyfrom low to high on each of the dimensions.

The number of tasks in the collection and the dura-tion are important factors influencing the reliability

and generalizability of a study. Choosing too few tasksor tasks of too short duration will negatively affectreliability and generalizability. On the other hand,choosing a large number of tasks or tasks of a verylong duration will lead to many practical problems andmight exhaust both participants and researchers. Inmany complex domains (e.g., medical diagnosis), it isquite common and often inevitable to use a very smallset of cases because of practical circumstances andbecause detailed analysis of the learners’ responses tothese complex problems is very difficult and time con-suming (Ericsson, 2004; Ericsson and Smith, 1991).Unfortunately, however, there are no golden rules fordetermining the adequate number of tasks to use ortheir duration, because important factors are highlydependent upon the domain and specific context (Vander Vleuten and Schuwirth, 2005). It is often easier toidentify a small collection of representative tasks thatcapture the relevant aspects of performance in highlystructured domains (e.g., physics, mathematics, chess)than in ill-structured domains (e.g., political science,medicine), where a number of interacting complexskills are required.

Assessment Criteria and Standards

The term assessment criteria refers to a description ofthe elements or aspects of performance that will beassessed, and the term assessment standards refers toa description of the quality of performance (e.g., excel-lent/good/average/poor) on each of those aspects thatcan be expected of participants at different stages (e.g.,age, grade) (Arter and Spandel, 1992). As Woolf(2004) pointed out, however, the term assessment cri-teria is often used in the definition of standards as well.Depending on the question one seeks to answer, dif-ferent standards can be used, such as a participant’spast performance (self-referenced), peer group perfor-mance (norm-referenced), or an objective standard(criterion-referenced), and there are different methodsfor setting standards (Cascallar and Cascallar, 2003).Much of the research on criteria and standard settinghas been conducted in the context of educational prac-tice for national (or statewide) school tests (Hambletonet al., 2000) and for highly skilled professions, suchas medicine, where the stakes of setting appropriatestandards are very high (Hobma et al., 2004; Van derVleuten and Schuwirth, 2005). Although formulationof good criteria and standards is extremely importantin educational practice, where certification is the primegoal, it is no less important in educational researchsettings. What aspects of performance are measuredand what standards are set have a major impact on thegeneralizability and value of a study.

Data Collection and Analysis

785

The degree to which the domain is well structuredinfluences not only the creation of a collection of rep-resentative tasks but also the definition of criteria, set-ting of standards, and interpretation of performance inrelation to standards. In highly structured domains,such as mathematics or chess, assessing the quality ofthe learner’s response is often fairly straightforwardand unproblematic. In less structured domains, how-ever, it is often much more difficult to identify clearstandards; for example, a music student’s interpreta-tion of a piano concerto is more difficult to assess thanthe student’s technical performance on the piece. Theformer contains many more subjective elements (e.g.,taste) or cultural differences than the latter.

Collecting Performance Data

No one best method for complex performance assess-ment exists, and it is often advisable to use multiplemeasures or methods in combination to obtain as com-plete a picture as possible of the performance. A num-ber of methods are described here for collecting per-formance outcome (product) and performance processdata. Methods are classified as online (during taskperformance) or offline (after task performance).Which method or combination of methods is the mostuseful depends on the particular research question, thepossible constraints of the research context, and thedomain. In ill-structured domains, for example, theadded value of process measures may be much higherthan in highly structured domains.

Collecting Performance Outcome (Product) Data

Collecting performance outcome data is quite straight-forward. One takes the product of performance (e.g., anelectrical circuit that was malfunctioning but is nowrepaired) and scores it along the defined criteria (e.g.,do all the components function as they should, individ-ually and as a whole?). Instead of assigning points forcorrect aspects, one can count the number of errors, andanalyze the types of errors made; however, especiallyfor assessment of complex performance, collecting per-formance product data alone is not very informative.Taking into account the process leading up to the prod-uct and the cognitive costs at which it was obtainedprovides equally if not more important information.

Collecting Performance Process Data

Time on Task or Speed

An important indication of the level of mastery of aparticular task is the time needed to complete a task.According to the power law of practice (Newell and

Rosenbloom, 1981; VanLehn, 1996), the time neededto complete a task decreases in proportion to the timespent in practice, raised to some power. Newell andRosenbloom (1981) found that this law operates acrossa broad range of tasks, from solving geometry prob-lems to keyboard typing. To account for the power lawof practice, several theories have been put forward.Anderson’s ACT-R explains the speed-up by assumingthat slow declarative knowledge is transformed intofast procedural knowledge (Anderson, 1993; Andersonand Lebiere, 1998). Another explanation suggestedthat speed-up is the result of repeated encounters withmeaningful patterns (Ericsson and Staszewski, 1989);that is, as a result of frequent encounters with similarelements, these elements will no longer be perceivedas individual units but will be perceived as a meaning-ful whole (i.e., chunk). In addition to chunking, auto-mation processes (Schneider and Shiffrin, 1977; Shif-frin and Schneider, 1977) occur with practice thatallow for faster and more effortless performance. Insummary, as expertise develops equal performance canbe attained in less time; therefore, it is important tocollect time-on-task data to assess improvements incomplex performance.

Cognitive Load

The same processes of chunking and automation thatare associated with decreases in the time required toperform a task are also responsible for decreases inthe cognitive load imposed by performing the task(Paas and van Merriënboer, 1993; Yeo and Neal, 2004).Cognitive load can be measured using both online andoffline techniques. The cognitive capacity that is allo-cated to performing the task is defined as mental effort,which is considered to reflect the actual cognitive loada task imposes (Paas and van Merriënboer, 1994a; Paaset al., 2003). A subjective but reliable technique formeasuring mental effort is having individuals provideself-ratings of the amount of mental effort invested. Asingle-scale subjective rating instrument can be used,such as the nine-point rating scale developed by Paas(1992), or a multiple-scale instrument, such as theNASA Task Load Index (TLX), which was used, forexample by Gerjets et al. (2004, 2006). As subjectivecognitive load measures are usually recorded aftereach task or after a series of tasks has been completedthey are usually considered to be offline measure-ments, although there are some exceptions; for exam-ple, Ayres (2006) required participants to rate cogni-tive load at specific points within tasks.

Objective online measures include physiologicalmeasures such as heart-rate variability (Paas and vanMerriënboer, 1994b), eye-movement data, and second-ary-task procedures (Brünken et al., 2003). Because

Tamara van Gog, Fred Paas et al.

786

they are taken during task performance, those onlinemeasures can show fluctuations in cognitive load dur-ing task performance. It is notable, however, that Paasand van Merriënboer (1994b) found the heart-rate vari-ability measure to be quite intrusive as well as insen-sitive to subtle fluctuations in cognitive load. The sub-jective offline data are often easier to collect andanalyze and provide a good indication of the overallcognitive load a task imposed (Paas et al., 2003).

Actions: Observation and Video Records

Process-tracing techniques are very well suited toassessing the different types of actions taken duringtask performance, some of which are purely cognitive,whereas others result in physical actions, because the“data that are recorded are of a pre-specified type (e.g.,verbal reports, eye movements, actions) and are usedto make inferences about the cognitive processes orknowledge underlying task performance” (Cooke,1994, p. 814). Ways to record data that allow the infer-ence of cognitive actions are addressed in the follow-ing sections. The following options are available forrecording the physical actions taken during task per-formance: (1) trained observers can write down theactions taken or check them off on an a priori con-structed list (use multiple observers), (2) a (digital)video record of the participants’ performance can bemade, or (3) for computer-based tasks, an action recordcan be made using screen recording software or soft-ware that logs key presses and coordinates of mouseclicks.

Attention and Cognitive Actions: Eye-Movement Records

Eye tracking (Duchowski, 2003)—that is, recordingeye-movement data while a participant is working ona (usually, but not necessarily computer-based)task—can also be used to gather online performanceprocess data but is much less used in educationalresearch than the above methods. Eye-movement datagive insights into the allocation of attention and pro-vide a researcher with detailed information of whata participant is looking at, for how long, and in whatorder. Such data allow inferences to be made aboutcognitive processes (Rayner, 1998), albeit cautiousinferences, as the data do not provide information onwhy a participant was looking at something for acertain amount of time or in a certain order. Attentioncan shift in response to exogenous or endogenouscues (Rayner, 1998; Stelmach et al., 1997). Exoge-nous shifts of attention occur mainly in response toenvironmental features or changes in the environment(e.g., if something brightly colored would start flash-ing in the corner of a computer screen, your attention

would be drawn to it). Endogenous shifts are drivenby knowledge of the task, of the environment, and ofthe importance of available information sources (i.e.,influenced by expertise level) (Underwood et al.,2003). In chess, for example, it was found that expertsfixated proportionally more on relevant pieces thannon-expert players (Charness et al., 2001). In elec-trical circuits troubleshooting, van Gog et al. (2005a)also found that participants with higher expertise fix-ated more on a fault-related component during prob-lem orientation than participants with lower exper-tise.* Please note that this is not an exhaustiveoverview and that we have no commercial or otherinterest in any of the programs mentioned here.Haider and Frensch (1999) used eye-movement datato corroborate their information-reduction hypothe-sis, which states that with practice people learn toignore task-redundant information and limit theirprocessing to task-relevant information.

On tasks with many visual performance aspects(e.g., troubleshooting technical systems), eye-move-ment records may therefore provide much more infor-mation than video records. Some important problem-solving actions may be purely visual or cognitive, butthose will show up in an eye-movement record,whereas a video record will only allow inferences ofvisual or cognitive actions that resulted in manual orphysical actions (van Gog et al., 2005b). In additionto providing information on the allocation of attention,eye-movement data can also give information aboutthe cognitive load that particular aspects of task per-formance impose; for example, whereas pupil dilation(Van Gerven et al., 2004) and fixation duration (Under-wood et al., 2004) are known to increase with increasedprocessing demands, the length of saccades (i.e., rapideye movements from one location to another; seeDuchowski, 2003) is known to decrease. (For an in-depth discussion of eye-movement data and cognitiveprocesses, see Rayner, 1998.)

Thought Processes and Cognitive Actions: Verbal Reports

Probably the most widely used verbal reporting tech-niques are concurrent and retrospective reporting(Ericsson and Simon, 1993). As their names imply,concurrent reporting is an online technique, whereasretrospective reporting is an offline technique. Concur-rent reporting, or thinking aloud, requires participantsto verbalize all thoughts that come to mind during task

* Note that the expertise differences between groups were relativelysmall (i.e., this was not an expert–novice study), suggesting that eye-movement data may be a useful tool in investigating relatively subtleexpertise differences or expertise development.

Data Collection and Analysis

787

performance. Retrospective reporting requires partici-pants to report the thoughts they had during task per-formance immediately after completing it. Althoughthere has been considerable debate over the use ofverbal reports as data, both methods are considered toallow valid inferences to be made about the cognitiveprocesses underlying task performance, provided thatverbalization instructions and prompts are carefullyworded (Ericsson and Simon, 1993).

Instructions and prompts should be worded insuch a way that the evoked responses will not inter-fere with the cognitive processes as they occur duringtask performance; for example, instructions for con-current reporting should tell participants to thinkaloud and verbalize everything that comes to mindbut should not ask them to explain any thoughts.Prompts should be as unobtrusive as possible.Prompting participants to “keep thinking aloud” ispreferable over asking them “what are you thinking?”because this would likely evoke self-reflection and,hence, interfere with the cognitive processes. Devia-tions from these instructional and prompting tech-niques can change either the actual cognitive pro-cesses involved or the processes that were reported,thereby compromising the validity of the reports(Boren and Ramey, 2000; Ericsson and Simon, 1993).Magliano et al. (1999), for example, found thatinstructions to explain, predict, associate, or under-stand during reading influenced the inferences fromthe text that participants generated while thinkingaloud. Although the effect of instructions on cogni-tive processes is an interesting topic of study, whenthe intention is to elicit reports of the actual cognitiveprocesses as they would occur without intervention,Ericsson and Simon’s (1993) guidelines for wordinginstructions and prompts should be adhered to.

Both reporting methods can result in verbal proto-cols that allow for valid inferences about cognitiveprocesses; however, the potential for differences in theinformation they contain must be considered whenchoosing an appropriate method for answering a par-ticular research question. According to Taylor andDionne (2000), concurrent protocols mostly seem toprovide information on actions and outcomes, whereasretrospective protocols seem to provide more informa-tion about “strategies that control the problem solvingprocess” and “conditions that elicited a particularresponse” (p. 414). Kuusela and Paul (2000) reportedthat concurrent protocols contained more informationthan retrospective protocols, because the latter oftencontained only references to the effective actions thatled to the solution. van Gog et al. (2005b) investigatedwhether the technique of cued retrospective reporting,in which a retrospective report is cued by a replay of

a record of eye movements and mouse/keyboard oper-ations made during the task, would combine the advan-tages of concurrent (i.e., more action information) andretrospective (i.e., more strategic and conditional infor-mation) reporting. They found that both concurrent andcued retrospective reporting resulted in more actioninformation, as well as in more strategic and condi-tional information, than retrospective reporting with-out a cue.

Contrary to expectations, concurrent reportingresulted in more strategic and conditional informationthan retrospective reporting. This may (1) reflect agenuine difference from Taylor and Dionne’s results,(2) have been due to different operationalizations ofthe information types in the coding scheme used, or(3) have been due to the use of a different segmentationmethod than those used by Taylor and Dionne (2000).

An explanation for the finding that concurrentreports result in more information on actions thanretrospective reports may be that concurrent reportingoccurs online rather than offline. Whereas concurrentreports capture information available in short-termmemory during the process, retrospective reportsreflect memory traces of the process retrieved fromshort-term memory when tasks are of very short dura-tion or from long-term memory when tasks are oflonger duration (Camps, 2003; Ericsson and Simon,1993). It is likely that only the correct steps that haveled to attainment of the goal are stored in long-termmemory, because only these steps are relevant forfuture use. This is why having participants reportretrospectively based on a record of observations orintermediate products of their problem-solving pro-cess is known to lead to better results (due to feweromissions) than retrospective reporting without a cue(van Gog et al., 2005b; Van Someren et al., 1994).Possibly, the involvement of different memory sys-tems might also explain Taylor and Dionne’s (2000)finding that retrospective protocols seem to containmore conditional and strategic information. Thisknowledge might have been used during the processbut may have been omitted in concurrent reportingas a result of the greater processing demands thismethod places on short-term memory (Russo et al.,1989). Although this explanation is tentative, thereare indications that concurrent reporting may becomedifficult to maintain under high cognitive load con-ditions (Ericsson and Simon, 1993). Indeed, partici-pants in van Gog et al.’s study who experienced ahigher cognitive load (i.e., reported investment ofmore mental effort) in performing the tasks indicatedduring a debriefing after the experiment that theydisliked concurrent reporting and preferred cued ret-rospective reporting (van Gog, 2006).

Tamara van Gog, Fred Paas et al.

788

Neuroscientific Data

An emerging and promising area of educationalresearch is the use of neuroscience methodologies tostudy (changes in) brain functions and structuresdirectly, which can provide detailed data on learningprocesses, memory processes, and cognitive develop-ment (see, for example, Goswami, 2004; Katzir andParé-Blagoev, 2006). Methods such as magnetic reso-nance imaging (MRI), functional magnetic resonanceimaging (fMRI), electroencephalography (EEG), mag-netoencephalography (MEG), positron-emissiontomography (PET), and single-photon emission com-puted tomography (SPECT) provide (indirect) mea-sures of neuronal activity. The reader is referred toKatzir and Paré-Blagoev (2006) for a discussion ofthese methods and examples of their use in educationalresearch.

Data Analysis

Analyzing performance product, time on task, andmental effort data (at least when the subjective ratingscales are used) is a very straightforward process, soit is not discussed here. In this section, analysis ofobservation, eye movement, and verbal protocol datais discussed, as well as the analysis of combined meth-ods/measures.

Analysis of Observation, Eye Movement, and Verbal Protocol Data

Observation Data

Coding and analysis of observation data can take manydifferent forms, again depending on the research ques-tion. Coding schemes are developed based on the per-formance aspects (criteria) one wishes to assess andsometimes may incorporate evaluation of performanceaspects. Whether coding is done online (during per-formance by observers) or offline (after performancebased on video, screen capture, or mouse-keyboardrecords), the use of multiple observers or raters isimportant for determining reliability of the coding.Quantitative analysis on the coded data can take theform of comparison of frequencies, appropriateness(e.g., number of errors), or sequences of actions (e.g.,compared to an ideal or expert sequence) and inter-preting the outcome in relation to the set standard.

Several commercial and noncommercial softwareprograms have been developed to assist in the analysisof action data;* for example, Observer (Noldus et al.,

2000) is commercial software for coding and analysisof digital video records; NVivo (Bazeley and Richards,2000) is commercial software for accessing, shaping,managing, and analyzing non-numerical qualitativedata; Multiple Episode Protocol Analysis (MEPA)(Erkens, 2002) is free software for annotating, coding,and analyzing both nonverbal and verbal protocols;and ACT Pro (Fu, 2001) can be used for sequentialanalysis of protocols of discrete user actions such asmouse clicks and key presses.

Eye-Movement Data

For analysis of fixation data it is important to identifythe gaze data points that together represent fixations.This is necessary because during fixation the eyes arenot entirely motionless; small tremors and drifts mayoccur (Duchowski, 2003). According to Salvucci(1999), the three categories of fixation identificationmethods are based on velocity, dispersion, or region.Most eye-tracking software allows for defining val-ues for the dispersion-based method, which identifiesfixation points as a minimum number of data pointsthat are grouped closely together (i.e., fall within acertain dispersion, defined by pixels) and last a min-imum amount of time (duration threshold). Once fix-ations have been defined, defining areas of interest(AoIs) in the stimulus materials will make analysisof the huge data files more manageable by allowingsummaries of fixation data to be made for each AoI,such as the number of fixations, the mean fixationduration, and the total time spent fixating. Further-more, a chronological listing of fixations on AoIs canbe sequentially analyzed to detect patterns in viewingbehavior.

Verbal Protocol Data

When verbal protocols have been transcribed, they canbe segmented and coded. Segmentation based on utter-ances is highly reliable because it uses pauses in nat-ural speech (Ericsson and Simon, 1993); however,many researchers apply segmentation based on mean-ing (Taylor and Dionne, 2000). In this case, segmen-tation and coding become intertwined, and the reliabil-ity of both should be evaluated. It is, again, importantto use multiple raters (at least on a substantial subsetof data) and determine the reliability of the codingscheme. The standard work by Ericsson and Simon(1993) provides a wealth of information on verbalprotocol coding and analysis techniques. The softwareprogram MEPA (Erkens, 2002) can assist in the devel-opment of a coding scheme for verbal data, as well asin analysis of coded data with a variety of quantitativeor qualitative methods.

* Please note that this is not an exhaustive overview and that we haveno commercial or other interest in any of the programs mentioned here.

Data Collection and Analysis

789

Combining Methods and Measures

As mentioned before, there is not a preferred singlemethod for the assessment of complex performances.By combining different methods and measures, a morecomplete or a more detailed picture of performancewill be obtained; for example, various process-tracingtechniques such as eye tracking and verbal reportingcan be collected and analyzed in combination withother methods of assessment (van Gog et al., 2005a).Different product and process measures can easily becombined, and it can be argued that some of themshould be combined because a simple performancescore* ignores the fact that, with expertise develop-ment, time on task and cognitive load decrease,whereas performance increases.

Consider the example of a student who attains thesame performance score on two comparable tasks thatare spread over time, where cognitive load measuresindicate that the learner had to invest a lot of mentaleffort to complete the task the first time and little thesecond. Looking only at the performance score, onemight erroneously conclude that no progress wasmade, whereas the learner actually made a subtle stepforward, because reduced cognitive load means thatmore capacity can be devoted to further learning.

The mental efficiency measure developed by Paasand van Merriënboer (1993) reflects this relation:Higher performance with less mental effort investedto attain that performance results in higher efficiency.This measure is obtained by standardizing perfor-mance and mental effort scores, and then subtractingthe mean standardized mental effort score (zE) fromthe mean standardized performance score (zP) anddividing the outcome by the square root of 2:

When tasks are performed under time constraints, thecombination of mental effort and performance mea-sures will suffice; however, when time on task is self-paced, it is useful to include the additional time param-eter in the efficiency measure (making it three-dimen-sional) (Paas et al., 2003; Tuovinen and Paas, 2004):

Discussion

Much of the research into learning and instructioninvolves assessment of complex performances of cog-nitive tasks. The focus of this chapter section was ondata collection and analysis methods that can be usedfor such assessments. First, the important issues relatedto selecting an appropriate collection of assessmenttasks and defining appropriate assessment criteria andstandards were discussed. Then, different ways forcollecting performance product and process data,using online (during task performance) or offline (aftertask performance) measurements, were described.Analysis techniques were discussed and, given the lackof a single preferred method for complex performanceassessment, ways to combine measures were suggestedthat will foster a more complete or more detailedunderstanding of complex performance.

This chapter section aimed to provide an overviewof the important issues in assessment of complex per-formance on cognitive tasks and of available data col-lection and analysis techniques for such assessments,rather than any definite guidelines. The latter wouldbe impossible when writing for a broad audience,because what constitutes an appropriate collection oftasks, appropriate criteria and standards, and appropri-ate data collection and analysis techniques is highlydependent on the research question one seeks toaddress and on the domain in which one wishes to doso. We hope that this overview, along with other chap-ter sections, provides the reader with a starting pointfor further development of rewarding and informativestudies.

SETTING UP A LABORATORY FOR MEASUREMENT OF COMPLEX PERFORMANCES

Aaron R. Duley, Paul Ward, and Peter A. Hancock

This chapter section describes how to set up laborato-ries for the measurement of complex performance.Complex performance in this context does not exclu-sively refer to tasks that are inherently difficult toperform; rather, the term is used here in a broader senseto refer to the measurement of real-world activities thatrequire the integration of disparate measurementinstrumentation as well as the need for time-criticalexperimental control. We have assumed that our pri-mary readership is comprised of graduate students andresearch faculty, although the chapter addresses issuesrelevant to all who seek a better understanding ofbehavioral response.

* This term is somewhat ambiguous, as we have previously classifiedmental effort and time-on-task data as performance process data. Wefeel they should be regarded as such; however, in the literature perfor-mance score is often used to refer to the grade assigned to a solutionor solution procedure, which is the sense in which the term is used inthis subsection.

zP zE−2

zP zE zT− −3

Tamara van Gog, Fred Paas et al.

790

The central theme of this section relates to labora-tory instrumentation. Because instrumentation is a req-uisite element for complex performance measurement,a common problem encountered by researchers is howto overcome the various technical hurdles that oftendiscourage the pursuit of difficult research objectives.Thus, creating a testing environment suitable toaddress research questions is a major issue when plan-ning any research program; however, searching theliterature for resources relating to laboratory instru-mentation configurations yields a surprisingly scantnumber of references and resources that address theseissues. Having made just such an attempt for the pur-poses of this section, the ability to articulate a general-purpose exposition on laboratory setup is indeed achallenging endeavor. This pursuit is made more dif-ficult by addressing a naturally ambiguous topic suchas complex performance; nevertheless, our sectionlooks to provide the bearings needed to resolve suchquestions. In particular, we cover stimulus presentationand control alternatives, as well as hardware choicesfor signal routing and triggering, while offering solu-tions for commonly encountered problems whenattempting to assemble such a laboratory. Some por-tions of this section are moderately technical, but everyattempt has been made to ensure that the content isappropriate for our target audience.

Instrumentation and Common Configurations

Psychology has a long legacy of employing tools andinstrumentation to support scientific inquiry. The onlineMuseum of the History of Psychological Instrumenta-tion, for example, has illustrations of over 150 devicesused by early researchers to visualize organ functionand systematically investigate human psychologicalprocesses and behavior (see http://www.chss.montclair.edu/psychology/museum/museum.htm). At thismuseum, one can view such devices as an early Wundt-style tachistiscope or the Rotationsapparatus für Komp-likations-Versuche (rotary apparatus for complicationstudies). Titchener, a student of Wundt, continued thistradition in his own laboratory at Cornell Universityand described the building requirements and the costsassociated with items needed for establishing the idealpsychological laboratory (Titchener, 1900, pp.252–253):

For optics, there should be two rooms, light and dark,facing south and north respectively, and the later dividedinto antechamber and inner room. For acoustics, thereshould be one large room, connected directly with a small,dark, and (so far as is possible without special construc-

tion) sound-proof chamber. For haptics, there should bea moderately sized room, devoted to work on cutaneouspressure, temperature, and pain, and a larger room forinvestigations of movement perceptions. Taste and smellshould each have a small room, the latter tiled or glazed,and so situated that ventilation is easy and so does notinvolve the opening of doors or transom-windows into thebuilding. There should, further, be a clock-room, for thetime-registering instruments and their controls; and alarge room for the investigations of the bodily processesand changes underlying affective consciousness.

Instrumentation is a central component of complexperformance measurement; however, the process bywhich one orchestrates several devices in the broadercontext of addressing an experimental question isindeed challenging. Modern-day approaches reflect aparadigm shift with respect to early psychological pro-cedures. Traditionally, a single instrument would beused for an entire experiment. Complex performanceevaluation, however, often entails situations where thepresentation of a stimulus is controlled by one com-puter, while supplementary instrumentation collects astream of other data on a second or perhaps yet a thirdcomputer. Certainly, an ideal testing solution wouldallow one to minimize the time needed to set up anexperiment and maximize the experimental degree ofautomation, thus minimizing investigator intervention,without compromising the scientific integrity of theexperiment. Nevertheless, the measurement of com-plex performance is often in conflict with this idyllicvision. It is not sufficient for contemporary researcherssimply to design experiments. They are also requiredto have access to the manpower and the monetary orcomputational resources necessary to translate a sci-entific question into a tenable methodological test bed.

Design Patterns for Laboratory Instrumentation

Design patterns represent structured solutions for suchrecurring assessment problems (Gamma et al., 1995).The formal application of design patterns as abstractblueprints for common challenges has relevance forlaboratory instrumentation configuration and equip-ment purchasing decisions. Although research ques-tions vary, experiments will regularly share a compa-rable solution. These commonalities are important toidentify, as the ability to employ a single set of toolshas distinct advantages compared to solutions tailoredfor only one particular problem. Such advantagesinclude cost savings, instrumentation sharing, instru-mentation longevity, and laboratory scalability (e.g., thecapacity to run multiple experiments simultaneously).

Data Collection and Analysis

791

The purpose of the following section is to providea level of abstraction for instrumentation configura-tions commonly encountered in the design of experi-ments related to complex performance. This approachis favored beyond simply providing a list of items andproducts that every laboratory should own. Weacknowledge the considerable between-laboratoryvariability regarding research direction, instrumenta-tion, and expertise; therefore, we focus primarily oninstrumentation configuration and architecture as rep-resented by design patterns common to a broad arrayof complex performance manipulations.

Given that an experiment will often require themanipulation of stimuli in a structured way and seeingthe impracticality in comprehensively covering allresearch design scenarios, the following assumptionsare made: (1) Stimuli are physically presented to par-ticipants, (2) some stimulus properties are required tobe under experimental control (e.g., presentationlength), (3) measurable responses by the participantare required, and (4) control or communication of sec-ondary instrumentation may also be necessary. Theseassumptions accommodate a broad spectrum of possi-ble designs and from these assumptions several frame-works can be outlined.

Stimulus Presentation and Control Model

Figure 55.2 depicts the simplest of the design patterns,which we term the stimulus presentation and control(SPC) model. The SPC model is a building block formore complex configurations. The basic framework ofthe SPC model includes the presentation layer and thestimulus, control, and response layer. The presentationlayer represents the medium used to physically displaya stimulus to a participant (e.g., monitor, projector,

speaker, headphones). The stimulus, control, andresponse layer (SCRL) encapsulates a number of inter-related functions central to complex performanceexperimentation, such as the experimental protocollogic, and is the agent that coordinates and controlsexperimental flow and, potentially, participantresponse. Broadly speaking, SCRL-type roles includestimulus manipulation and timing, instrument loggingand coordination, and response logging, in addition toexperiment procedure management.

As the SCRL often contains the logic necessary toexecute the experimental paradigm, it is almost alwaysimplemented in software; thus, the SCRL applicationis assumed to operate on a computing target (e.g.,desktop, portable digital assistant), which is repre-sented by the dashed box in Figure 55.2. As an exampleimplementation of the SPC model, consider a hypo-thetical experiment in which participants are exposedto a number of visual stimuli for 6 sec each. Eachvisual stimulus occurs after a fixed foreperiod of 1 secand a subsequent fixation cross (i.e., the point at whichparticipants are required to direct their gaze) presentedfor 500 msec. Each visual stimulus is followed by a2-sec inter-trial interval (ITI). The only requirement ofthe participant is to view the visual stimuli for theduration of its presentation. How do we implement thisexperiment?

This problem has several possible solutions. Amonitor (presentation layer) and Microsoft PowerPoint(SCRL) would easily accomplish the task; however,the SPC model is suitable to handle an extensivearrangement of experimental designs, so additionalprocedural requirements increase the need for addedSCRL functionality. Consider an experiment whereboth a monitor and speakers are required to presentthe stimuli. This basic pattern still reflects an SPC

Figure 55.2 Stimulus and presentation control model.

VGA or DVI

Model

Presentation Layer

Stimulus, Control,and Response Layer

Example

Monitor

Computing Target

SCRL Application

Tamara van Gog, Fred Paas et al.

792

model and PowerPoint could be configured to presentauditory and visual stimuli within a strict set of param-eters. On the other hand, a real experiment wouldlikely require that the foreperiod, fixations cross, andITI appear with variable and not fixed timing. Presen-tation applications like PowerPoint, however, are notspecifically designed for experimentation. As such,limitations are introduced as experimental designsbecome more elaborate. One solution to this problemis to use the Visual Basic for Applications (VBA) func-tionality embedded in Microsoft Office; however,requiring features such as variable timing, timingdeterminism (i.e., executing a task in the exact amountof time specified), support for randomization and

counterbalancing, response acquisition, and loggingillustrates the advantages for obtaining a flexibleSCRL application equipped for research pursuits.

A number of commercial and freeware applica-tions have been created over the past several decadesto assist researchers with SCRL-type functions. Thechoice to select one application over the other mayhave much to do with programming requirements, theoperating system platform, protocol requirements, orall of the above. Table 55.2 provides a list of some ofthe SCRL applications that are available for psycho-logical and psychophysical experiments. Additionalinformation for these and other SCRL applications canbe found in Florer (2007). The description column is

TABLE 55.2SCRL-Type Applications

Name Description Type Platform

Cogent 2000/Cogent Graphics

Complete PC-based software environment for functional brain mapping experiments; contains commands useful for presenting scanner-synchronized visual stimuli (Cogent Graphics), auditory stimuli, mechanical stimuli, and taste and smell stimuli. It is also used in monitoring key presses and other physiological recordings from the subject.

Freeware Windows

DMDX Win 32-based display system used in psychology laboratories around the world to measure reaction times to visual and auditory stimuli.

Freeware Windows

E-Prime Suite of applications to design, generate, run, collect data, edit, and analyze the data; includes: (1) a graphical environment that allows visual selection and specification of experimental functions; (2) a comprehensive scripting language; (3) data management and analysis tools.

Commercial Windows

Flashdot Program for generating and presenting visual perceptual experiments that require a high temporal precision. It is controlled by a simple experiment building language and allows experiment generation with either a text or a graphical editor.

Freeware Windows, Linux

FLXLab Program for running psychology experiments; capabilities include presenting text and graphics, playing and recording sounds, and recording reaction times via the keyboard or a voice key.

Freeware Windows, Linux

PEBL (Psychology Experiment Building Language)

New language specifically designed to be used to create psychology experiments.

Freeware Linux, Windows, Mac

PsychoPy Psychology stimulus software for Python; combines the graphical strengths of OpenGL with the easy Python syntax to give psychophysics a free and simple stimulus presentation and control package.

Freeware Linux, Mac

PsyScope Interactive graphic system for experimental design and control on the Macintosh.

Freeware Mac

PsyScript Application for scripting psychology experiments, similar to SuperLab, MEL, or E-Prime

Freeware Linux, Mac

PyEPL (Python Experiment-Programming Library)

Library for coding psychology experiments in Python; supports presentation of both visual and auditory stimuli, as well as both manual (keyboard/joystick) and sound (microphone) input as responses.

Freeware Linux, Mac

Realtime Experiment Interface

Extensible hard real-time platform for the development of novel experiment control and signal-processing applications.

Freeware Linux

SuperLab Stimulus presentation software with features that support the presentation of multiple types of media as well as rapid serial visual presentation paradigms and eye tracking integration, among other features.

Commercial Windows, Mac

Data Collection and Analysis

793

text taken directly from narrative about the productprovided by Florer (2007). A conventional program-ming language is best equipped for SCRL functional-ity. This alternative to the applications listed in Table55.2 may be necessary for experiments where commu-nication with external hardware, interfacing withexternal code, querying databases, or program perfor-mance requirements are a priority, although some ofthe SCRL applications listed in Table 55.2 providesome varying degrees of these capabilities (e.g., E-Prime, SuperLab).

The prospect of laboratory productivity can out-weigh the flexibility and functionality afforded by aprogramming language; for example, from a labora-tory management perspective, it is reasonable for alllaboratory members to have a single platform fromwhich they create experiments. Given the investmentrequired to familiarize oneself with a programminglanguage, the single platform option can indeed bechallenging to implement in practice. Formulating alaboratory in this manner does allow members to shareand reuse previous testing applications or utilizeknowledge about the use and idiosyncrasies of anSCRL application.

Despite the learning curve, a programming lan-guage has potential benefits that cannot be realized byturnkey SCRL applications. As mentioned, high-levelprogramming languages offer inherently greater flex-ibility. Although it is important to consider whetherthe SCRL application can be used to generate a par-ticular test bed, one must also consider the analysis

requirements following initial data collection. Theflexibility of a programming language can be veryhelpful in this regard. One might also consider thelarge support base in the form of books, forums, andwebsites dedicated to a particular language which canmitigate the problems that may arise during the learn-ing process.

Stimulus Presentation and Control Model with External Hardware

Communicating with external hardware is essential tocomplex performance design. Building upon the basicSPC framework, Figure 55.3 depicts the SPC modelwith support for external hardware (SPCxh). Figure55.3 illustrates a scenario where the SCRL controlsboth monitor and headphone output. The SCRL alsointerfaces with an eye tracker and a bio-instrumenta-tion device via the parallel port and a data acquisitiondevice (DAQ), respectively. DAQ devices are animportant conduit for signal routing and acquisition,and we discuss them in greater detail later. Extensionsof the SPCxh and the SPC model are the interface andinstrumentation layers. A good argument can be madefor another interface layer to exist between the presen-tation layer and the SCRL, but for our purposes theinterface layer specifically refers to the physical con-nection that exists between the SCRL and the instru-mentation layer. Figure 55.3 depicts the stimulus pre-sentation and control model with external hardwaresupport (SPCxh). The SPCxh is derived from the basic

Figure 55.3 SPC model with external hardware.

Model

Presentation Layer

Stimulus, Control,and Response Layer

Interface Layer

Instrumentation Layer

VGA/DVI Sound Output

Example

Monitor Headphones

Computing Target

LabVIEW

EyeTracker

ParallelPort

Bio-Instrumentation

DAQ

Tamara van Gog, Fred Paas et al.

794

SPC model, with two additional layers: one to repre-sent the external hardware, the second to interface thathardware with the SCRL.

It is important to emphasize that the SPC andSPCxh models are only examples. We recognize thatan actual implementation of any one model will mostcertainly differ among laboratories. The main purposeof illustrating the various arrangements in this manneris to address the major question of how complex per-formance design paradigms are arranged in an abstractsense. When the necessary components are identifiedfor their specific research objective, then comes theprocess of determining the specific hardware and soft-ware to realize that goal.

It is imperative to understand the connectionbetween a given model (abstraction) and its real-worldcounterpart. Using the example described above, con-sider an experiment that requires the additional collec-tion of electrocortical activity in response to theappearance of the visual stimulus. This type of phys-iological data collection is termed event-related poten-tial, as we are evaluating brain potentials time-lockedto some event (i.e., appearance of the visual stimulusin this example). Thus, we need to mark in the phys-iological record where this event appears for offlineanalysis. Figure 55.4 depicts one method to implementthis requirement. On the left, the SPCxh model dia-gram is illustrated for the current scenario. A monitoris used to display the stimulus. A programming lan-

guage called LabVIEW provides the SCRL function-ality. Because the bio-instrumentation supports digitalinput/output (i.e., hardware that allows one to send andreceive digital signals), LabVIEW utilizes the DAQdevice to output digital markers to the digital inputports of the bio-instrumentation while also connectingto the instrument to collect the physiological data overthe network. We are using the term bio-instrumenta-tion here to refer to the hardware used for the collectionand assessment of physiological data. The picture onthe right portrays the instantiation of the diagram. Itshould be observed that the diagram is meant to rep-resent tangible software and hardware entities.Although LabVIEW is used in this example as ourSCRL application, any number of alternatives couldhave also been employed to provide the linkagebetween the software used in our SCRL and the instru-mentation.

A number of derivations can also be organizedfrom the SPCxh model; for example, in many cases,the SCRL may contain only the logic needed to exe-cute the experiment but not the application programinterfaces (APIs) required to directly control a ven-dor’s hardware. In these situations, it may be necessaryto run the SCRL application alongside (e.g., on anothermachine) the vendor-specific hardware application.Figure 55.5 depicts this alternative, where the vendor-specific hardware executes its procedures at the sametime as the SCRL application. Because the layers for

Figure 55.4 A real-world example of the SPCxh model.

Bio-Instrumentation

Digital Input/Output

Network to Computing Target

Example

Monitor

LabVIEW

DAQ Network

Bio-Instrumentation

Computing Target

DAQ Device

LabVIEW

Data Collection and Analysis

795

the SPCxh are the same as in Figure 55.3, Figure 55.5depicts only the example instantiation of the modeland not the layers for the SPCxh model. The SPCxhexample in Figure 55.5 is a common configurationbecause hardware vendors do not always supply soft-ware interfaces that can be used by an external appli-cation. The major difference between the two options,as shown, is that the second option would require atotal of three computing targets: one to execute theSCRL application and for stimulus presentation, oneto execute the bio-instrumentation device software,and a third to execute the eye tracker device software.

A very common question is how to synchronizethe SCRL application and the device instrumentation.As with the previous example, the method of choicefor the example is via the DAQ device for communi-cation with the bio-instrumentation and through theparallel port for the eye tracker; however, the optionfor event marking and synchronization is only appli-cable if it is supported by the particular piece of instru-mentation. Furthermore, the specific interface (e.g.,digital input/output, serial/parallel) is dependent onwhat the manufacturer has made available for the end-user. Given this information, one should ask them-selves the following questions prior to investingresources in any one instrument or SCRL alternative.First, what type of limitations will be encounteredwhen attempting to interface a particular instrumentwith my current resources? That is, does the manufac-turer provide support for external communication withother instruments or applications? Second, does mySCRL application of choice support a method to com-

municate with my external hardware if this option isavailable? Third, does the device manufacturer provideor sell programming libraries or application programinterfaces if data collection has to be curtailed in someparticular way? Fourth, what are the computationalrequirements to run the instrumentation software? Willthe software be so processor intensive that it requiresa sole execution on one dedicated machine?

Common Paradigms and Configurations

A number of common paradigms exist in psychologi-cal research, from recall and recognition paradigms tointerruption-type paradigms. Although it is beyond thescope of this section to provide an example configu-ration for each, we have selected one example that iscommonly used by many of contemporary researchers;this is the secondary task paradigm. Both the SPC andSPCxh models are sufficient for experiments employ-ing this paradigm; however, a common problem canoccur when the logic for the primary and secondarytasks is implemented as mutually exclusive entities.An experiment that employs a simulator for the pri-mary task environment can be viewed as a self-con-tained SPCxh model containing a simulation presen-tation environment (presentation layer), simulationcontrol software (SCRL application), and simulation-specific hardware (interface and instrumentation lay-ers). The question now is how can we interface theprimary task (operated by the simulator in this exam-ple) with another SPRL application that contains thelogic for the secondary task?

Figure 55.5 SCLEs application operating concurrently with software.

Monitor MonitorHeadphones

Option 1 Option 2

Headphones

Sound Output Sound OutputVGA/DVI VGA/DVI

Computing Target Computing Target 1 Computing Target 2 Computing Target 3

SCRL Application SCRL ApplicationDevice

Software

NetworkInterface

Bio-Instrumentation

EyeTracker

SerialPort

DeviceSoftware

DeviceSoftware

ParallelPort

ParallelPort

DAQ DAQNetworkInterface

EyeTracker Bio-Instrumentation

Tamara van Gog, Fred Paas et al.

796

Figure 55.6 contains a graphical representation ofa possible configuration: an SPCxh model for the sim-ulator communicating via a network interface that ismonitored from the SCRL application on the second-ary task side. On the left side of Figure 55.6 is theprimary task configuration, and on the right side is thesecondary task configuration. It should be noted thatthe simulator control software, our SCRL application,or the device-specific software does not necessarilyneed to be executed on separate computers; however,depending on the primary or secondary task, one mayfind that the processor and memory requirementsnecessitate multiple computers. On the secondary taskside, the diagram represents a fairly complex role forthe SCRL application. As shown, the SCRL applica-tion has output responsibilities to a monitor and head-phones while also interfacing with an eye tracker viathe serial port, interfacing the simulator via the net-work, and sending two lines of digital output informa-tion via the DAQ device.

Numerous complex performance test beds requirethat a primary and secondary task paradigm be usedto explicate the relationship among any number ofprocesses. In the field of human factors, in particular,it is not uncommon for an experiment to employ sim-ulation for the primary task and then connect a sec-ondary task to events that may occur during the pri-mary task. A major problem often encountered insimulation research is that the simulators are oftenclosed-systems; nevertheless, most simulators can beviewed as SPCxh arrangements with a presentationlayer of some kind, an application that provides SCRL

function, and the simulator control hardware itself. Ifone wishes to generate secondary task events based onthe occurrence of specific events in the simulation, thequestion then becomes one of how we might go aboutconfiguring such a solution when there is no ready-made entry point between the two systems (i.e., pri-mary task system and secondary task system). Thediagram on the left shows the SPCxh model for thesecondary task, which is responsible for interactingwith a number of additional instruments. The diagramshows a connecting line between the secondary sys-tems network interface and the network interface ofthe primary task as controlled by the simulation.Because simulation manufacturers will often maketechnotes available that specify how the simulator maycommunicate with other computers or hardware, onecan often ascertain this information for integrationwith the secondary tasks SCRL.

Summary of Design Configurations

The above examples have been elaborated in limiteddetail. Primarily, information pertaining to how onewould configure a SCRL application for external soft-ware and hardware communication has been excluded.This specification is not practical with the sheer num-ber of SCRL options available to researchers. As well,the diagrams do not inform the role that the SCRLapplication plays when interfacing with instrumenta-tion. It may be the case that the SCRL application playsa minimal role in starting and stopping the instrumentand does not command full control of the instrument

Figure 55.6 Secondary task paradigm.

Primary Task

Simulator PresentationEnvironment

Computing Target

Simulator ControlSoftware

Network Interface NetworkInterface

DeviceSoftware

Simulator ControlHardware

EyeTracker

ParallelPort

DAQ

Digital Out

SCRL Application

Computing Target

VGA/DVI Sound Output

Headphones

Secondary Task

Monitor

Bio-Instrumentation

Data Collection and Analysis

797

via a program interface; nevertheless, one shouldattempt to understand the various configurationsbecause they do appear with great regularity in com-plex performance designs. Finally, it is particularlyimportant to consider some of the issues raised whenmaking purchasing decisions about a given instrument,interface, or application.

General-Purpose Hardware

It is evident when walking through a home improve-ment store that countless tools have proved their effec-tiveness for an almost limitless number of tasks (e.g.,a hammer). An analog to this apparent fact is thatvarious tools are also exceedingly useful for complexperformance research; thus, the purpose of the follow-ing sections is to discuss some of these tools and theirrole in complex performance evaluation.

Data Acquisition Devices

Given the ubiquity of DAQ hardware in the examplesabove, it is critical that one has a general idea of thefunctionality that a DAQ device can provide. DAQdevices are the research scientist’s Swiss Army knifeand are indispensable tools in the laboratory. DAQhardware completes the bridge between the SCRL andthe array of instrumentation implemented in an exper-iment; that is, the DAQ hardware gives a properlysupported SCRL application a number of useful func-tions for complex performance measurement. To namea few examples, DAQ devices offer a means of trans-mitting important data between instruments, supportsmechanisms to coordinate complex actions or eventsequences, provides deterministic timing via a hard-ware clock, and provides methods to synchronize inde-pendently operating devices.

A common use for DAQ devices is to send andreceive digital signals; however, to frame this applica-tion within the context of complex performancedesign, it is important that one be familiar with a fewterms. An event refers to any information as it occurswithin an experiment; for example, an event may markthe onset or offset of a stimulus, a participant’sresponse, or the beginning or end of a trial. A commondesign obstacle requires that we know an event’s tem-poral appearance for the purpose of subsequent anal-yses or, alternatively, to trigger supplementary instru-mentation. The term trigger is often paired with eventto describe an action associated with an the occurrenceof an event. In some cases, event triggers may becompletely internal to a single SCRL application, butin other instances event triggers may include externalcommunications between devices or systems.

Data acquisition devices have traditionally beenreferred to as A/D boards (analog-to-digital boards)because of their frequent use in signal acquisition. Asignal, in this context, loosely refers to any measurablephysical phenomenon. Signals can be divided into twoprimary classes. Analog signals can vary continuouslyover an infinite range of values, whereas digital signalscontain information in discrete states. To rememberthe difference between these two signals, visualize twographs, one where someone’s voice is recorded (ana-log) and another plotting when a light is switched onor off (digital).

The traditional role of a DAQ device was to acquireand translate measured phenomena into binary unitsthat can be represented by a digital device (e.g., com-puter, oscilloscope). Suppose we need to record theforce exerted on a force plate. A DAQ device could beconfigured such that we could sample the data derivedfrom a force plate at a rate of 1 msec per sample andsubsequently record that data to a computer or logginginstrument.

One should note that the moniker device as areplacement for board is more appropriate given thatmodern DAQ alternatives are not always self-con-tained boards but may connect to a computing targetin a few different ways. DAQ devices are available fora number of bus types. A bus, in computing vernacular,refers to a method of transmission for digital data (e.g.,USB, FireWire, PCI). The DAQ device pictured inFigure 55.4, for example, is designed to connect to theUSB port of a computer.

Despite their traditional role in signal acquisition(e.g., analog input), most DAQ devices contain ananalog output option. Analog output reverses the pro-cess of an A/D conversion and can be used to convertdigital data into analog data (D/A conversion). Analogoutput capabilities are useful for a variety of reasons;for example, an analog output signal can be used toproduce auditory stimuli, control external hardware,or output analog data to supplementary instrumenta-tion. The primary and secondary task example aboveillustrates one application for analog output. Recallthat in the SPCxh model described earlier the simulatorwas a closed-system interfaced with the SCRL appli-cation via a network interface. Suppose that it wasnecessary to proxy events as they occurred in the sim-ulation to secondary hardware. A reason for thisapproach might be as simple as collapsing data to asingle measurement file; for example, suppose we wantto evaluate aiming variability in a weapons simulationin tandem with a physiological measure. One strategywould require that we merge the separate streams ofdata after all events have been recorded, but, by usinganother strategy employing the analog output option,

Tamara van Gog, Fred Paas et al.

798

we could route data from the simulator and proxy itvia the SCRL application controlling the DAQ deviceconnected to our physiological recording device.

In addition to analog output, DAQ functionality fordigital input/output is a critical feature to solving var-ious complex performance measurement issues. Recallthe example above where an experiment required thatthe SCRL application tell the physiological controlsoftware when the visual stimulus was displayed. Thiswas accomplished by the SCRL application sending adigital output from the DAQ device to the digital inputport on the bio-instrument. Figure 55.7 depicts thisoccurrence from the control software for the bio-instrument. The figure shows one channel of datarecorded from a participant’s scalp (i.e., EEG); anotherdigital channel represents the onset and offset of thevisual stimulus.

When configuring digital signals, it is also impor-tant that one understand that the event can be definedin terms of the leading or trailing edge of the digitalsignal. As depicted in Figure 55.7, the leading edge(also called the rising edge) refers to the first positivedeflection of the digital waveform, while the trailingedge (also called the falling edge) refers to the negativegoing portion of the waveform. This distinction isimportant, because in many cases secondary instru-mentation will provide an option to begin or endrecording from the leading or trailing edge; that is, ifwe mistakenly begin recording on the trailing edgewhen the critical event occurs on the leading edge,then the secondary instrument may be triggered lateor not at all. Another term native to digital events istransistor–transistor logic (TTL), which is often usedto express digital triggering that operates within spe-

cific parameters. TTL refers to a standard where agiven instrument’s digital line will change state (e.g.,on to off) if the incoming input is within a givenvoltage range. If the voltage supplied to the digital lineis 0 then it is off, and if the digital line is supplied 5volts then it is on.

Event triggering is an extremely important constit-uent of complex performance experimentation. Know-ing when an event occurs may be vital for data analysisor for triggering subsequent events. The example heredepicts a scenario where we are interested in knowingthe onset occurrence of a visual stimulus so we cananalyze our EEG signal for event-related changes. Thechannel with the square wave is the event that tellswhen the event occurred, with the leading edge repre-senting the onset of the visual stimuli (5 volts) and thefalling edge reflecting its offset (0 volts). A setup sim-ilar to that demonstrated in Figure 55.4 could easilybe configured to produce such an example. Althoughthis example only shows two channels, a real-worldtesting scenario may have several hundred to indicatewhen certain events occur. One strategy is to definedifferent events as different channels. One channelmay represent the visibility of a stimuli (on or off),another may represent a change in its color, and yetanother may indicate any other number of events. Analternative solution is to configure the SCRL applica-tion to send data to a single channel and then create acoding scheme to reflect different events (e.g., 0 volts,stimulus hidden; 2 volts, stimulus visible; 3 volts, colorchanged to black; 4 volts, color changed to white).This approach reduces the number of channels andmaximizes the number of digital channels that one cancontrol.

Figure 55.7 Event triggering and recording.

EEG Signal

Visual Stimulus Onset(leading edge)

Visual Stimulus Offset(trailing edge)

Data Collection and Analysis

799

The utility of digital event triggering cannot beoverstated. Although its application for measurementof complex performance requires a small degree oftechnical expertise, the ability to implement event trig-gering affords a great degree of experimental flexibility.Under certain circumstances, however, analog trigger-ing may also be appropriate. Consider an experimentwhere the threshold of a participant’s voice serves asthe eliciting event for a secondary stimulus. In this case,it is necessary to determine whether or not the SCRLapplication and DAQ interface might support this typeof triggering, because such an approach offers greaterflexibility for some design configurations.

Purchasing a DAQ Device

The following questions are relevant to purchasing aDAQ device for complex performance research. First,is the DAQ device going to be used to collect physio-logical or other data? If the answer is yes, one shouldunderstand that the price of a DAQ device is primarilya function of its resolution, speed, form factor, and thenumber of input/output channels supported. Althougha discussion of unipolar vs. bipolar data acquisition isbeyond the scope of this chapter, the reader shouldconsult Olansen and Rosow (2002) and Stevenson andSoejima (2005) for additional information on how thismay effect the final device choice.

Device resolution, on the other hand, refers to thefidelity of a DAQ device to resolve analog signals; thatis, what is the smallest detectable change that thedevice can discriminate? When choosing a board, res-olution is described in terms of bits. A 16-bit board canresolve signals with greater fidelity than a 12-bit board.By taking the board resolution as an exponent of 2,one can see why. A 12-bit board has 212 or 4096 pos-sible values, while a 16-bit has a 216 or 65,536 possiblevalues. The answer to this question is also a functionof a few other factors (e.g., signal range, amplification).A complete understanding of these two major issuesshould be established prior to deciding on any onedevice. As well, prior to investing in a higher resolutionboard, which will cost more money than a lower res-olution counterpart, one should evaluate what is theappropriate resolution for the application. If the pri-mary application of the DAQ device is for digital eventtriggering, then it is important to purchase a device thatis suitable for handling as many digital channels as arenecessary for a particular research design.

Third, does the design require analog output capa-bilities? Unlike digital channels, which are generallyreconfigurable for input or output, analog channels arenot, so it is important to know in advance the numberof analog channels that a DAQ device supports. Fourth,does the testing environment require a level of timing

determinism that cannot be provided by software inless than 1 msec? For these scenarios, researchersmight want to consider a DAQ device that supportshardware timing. For additional information about A/Dboard specifications and other factors that may affectpurchasing decisions, see Staller (2005).

Computers as Instrumentation

Computers are essential to the modern laboratory. Theirvalue is evident when one considers their versatile rolein the research process; consequently, the computer’subiquity in the science can account for significant costs.Because academic institutions will often hold contractswith large original equipment manufactures, computingsystems are competitively priced and warranties ensuremaintenance for several years. Building a system fromthe ground up is also a viable option that will oftenprovide a cost-effective alternative to purchasingthrough an original equipment manufacturer. Althoughthe prospect of assembling a computer may sounddaunting, the process is really quite simple, and numer-ous books and websites are dedicated to the topic (see,for example, Hardwidge, 2006). Customization is oneof the greatest advantages to building a machine, andbecause only necessary components are purchasedoverall cost is usually reduced. On the other hand, amajor disadvantage to this approach is the time asso-ciated with reviewing the components, assembling thehardware, and installing the necessary software.

Most new computers will likely be capable of han-dling a good majority of laboratory tasks; however,one should have a basic understanding of its majorcomponents to make an informed purchasing decisionwhen planning complex performance test beds. Thisis important given that one can potentially save con-siderable resources that can be allocated for otherequipment and be assured that the computer is ade-quate for a given experimental paradigm.

The following questions should be consideredwhether building or buying a complete computing sys-tem. First, does the number of expansion ports accom-modate input boards that may be needed to interfaceinstrumentation? For example, if an instrument inter-faced an application via a network port and we wantedto maintain the ability to network with other computersor access the Internet, it would be important to confirmthat the motherboard of the computer had a sufficientnumber of slots to accommodate this addition. Fur-thermore, because DAQ devices are often sold as inputboards this same logic would apply.

Computers have evolved from general-purposemachines to machines with specific aptitudes for par-ticular tasks. A recent development is that vendors now

Tamara van Gog, Fred Paas et al.

800

market a particular computing system for gaming vs.video-editing vs. home entertainment purposes. Tounderstand the reasons behind these configurations, westrongly advocate developing a basic understanding ofhow certain components facilitate particular tasks.Although space prevents us from accomplishing thiswithin this chapter, it is important to realize that com-puting performance can alter timing determinism, par-ticularly in complex performance environments.

Discussion

The challenge of understanding the various technicalfacets of laboratory setup and configuration representsa major hurdle when the assessment of some complexperformance is a central objective. This section hasdiscussed the common problems and design configu-rations that one may encounter when setting up sucha laboratory. This approach, abstract in some respects,was not intended to illustrate the full range of designconfigurations available for complex performanceevaluation; rather, the common configurations dis-cussed here should only be viewed as general-purposearchitectures, independent of new technologies thatmay emerge. After developing an understanding of thevarious design configurations, one must determine thespecific hardware and software that are required toaddress the research question. The purpose of provid-ing a few design configurations here was to emphasizethat, in many complex performance testing environ-ments, one must specify what equipment or softwarewill fill the roles of presentation, stimulus control andresponse, instrumentation, and their interfaces.

CONCLUDING REMARKS

Setting up laboratories for the measurement of com-plex performances can indeed be a challenging pursuit;however, becoming knowledgeable about the solutionsand tools available to aid in achieving the researchobjectives is rewarding on a number of levels. Theability to identify and manipulate multiple softwareand hardware components allows quick and effectivetransitioning from a research question into a tenablemethodological test bed.

REFERENCES

Adler, P. A. and Adler, P. (1994). Observational techniques. InHandbook of Qualitative Research, edited by N. K. Denzinand Y. S. Lincoln, pp. 377–392. Thousand Oaks, CA: Sage.*

Airasian, P. W. (1996). Assessment in the Classroom. New York:McGraw-Hill.

Alavi, M. (1994). Computer-mediated collaborative learning:an empirical evaluation. MIS Q., 18, 159–174.

American Evaluation Association. (2007). Qualitative software,www.eval.org/Resources/QDA.htm.

Ancona, D. G. and Caldwell, D. F. (1991). Demography andDesign: Predictors of New Product Team Performance, No.3236-91. Cambridge, MA: MIT Press.

Anderson, J. R. (1993). Rules of the Mind. Hillsdale, NJ:Lawrence Erlbaum Associates.

Anderson, J. R. and Lebiere, C. (1998). The Atomic Componentsof Thought. Mahwah, NJ: Lawrence Erlbaum Associates.

Anderson, J. R., Reder, L. M., and Simon, H. A. (1996). Situatedlearning and education. Educ. Res., 25(4), 5–11.

Arter, J. A. and Spandel, V. (1992). An NCME instructionalmodule on: using portfolios of student work in instructionand assessment. Educ. Meas. Issues Pract., 11, 36–45.

Aviv, R. (2003). Network analysis of knowledge constructionin asynchronous learning networks. J. Asynch. Learn. Netw.,7(3), 1–23.

Ayres, P. (2006). Using subjective measures to detect variationsof intrinsic cognitive load within problems. Learn. Instruct.,16, 389–400.*

Bales, R. F. (1950). Interaction Process Analysis: A Method forthe Study of Small Groups. Cambridge, MA: Addison-Wes-ley.

Battistich, V., Solomon, D., and Delucchi, K. (1993). Interactionprocesses and student outcomes in cooperative learninggroups. Element. School J., 94(1), 19–32.

Bazeley, P. and Richards, L. (2000). The NVivo QualitativeProject Book. London: SAGE.

Bellman, B. L. and Jules-Rosette, B. (1977). A Paradigm forLooking: Cross-Cultural Research with Visual Media. Nor-wood, NJ: Ablex Publishing.

Birenbaum, M. and Dochy, F. (1996). Alternatives in assessmentof achievements, learning processes and prior knowledge.Boston, MA: Kluwer.

Bjork, R. A. (1999). Assessing our own competence: heuristicsand illusions. In Attention and Performance. Vol. XVII. Cog-nitive Regulation of Performance: Interaction of Theory andApplication, edited by D. Gopher and A. Koriat, pp.435–459. Cambridge, MA: MIT Press.

Bogaart, N. C. R. and Ketelaar, H. W. E. R., Eds. (1983).Methodology in Anthropological Filmmaking: Papers of theIUAES Intercongress, Amsterdam, 1981. Gottingen, Ger-many: Edition Herodot.

Bogdan, R. C. and Biklen, S. K. (1992). Qualitative Researchfor Education: An Introduction to Theory and Methods, 2nded. Boston, MA: Allyn & Bacon.*

Boren, M. T. and Ramey, J. (2000). Thinking aloud: reconcilingtheory and practice. IEEE Trans. Prof. Commun., 43,261–278.

Borg, W. R. and Gall, M. D. (1989). Educational Research: AnIntroduction, 5th ed. New York: Longman.

Bowers, C. A. (2006). Analyzing communication sequences forteam training needs assessment. Hum. Factors, 40, 672–678.*

Bowers, C. A., Jentsch, F., Salas, E., and Braun, C. C. (1998).Analyzing communication sequences for team trainingneeds assessment. Hum. Factors, 40, 672–678.*

Bridgeman, B., Cline, F., and Hessinger, J. (2004). Effect ofextra time on verbal and quantitative GRE scores. Appl.Meas. Educ., 17(1), 25–37.

Brünken, R., Plass, J. L., and Leutner, D. (2003). Direct mea-surement of cognitive load in multimedia learning. Educ.Psychol., 38, 53–61.

Data Collection and Analysis

801

Byström, K. and Järvelin, K. (1995). Task complexity affectsinformation seeking and use. Inform. Process. Manage., 31,191–213.

Campbell, D. J. (1988). Task complexity: a review and analysis.Acad. Manage. Rev., 13, 40–52.*

Camps, J. (2003). Concurrent and retrospective verbal reportsas tools to better understand the role of attention in secondlanguage tasks. Int. J. Appl. Linguist., 13, 201–221.

Carnevale, A., Gainer, L., and Meltzer, A. (1989). WorkplaceBasics: The Skills Employers Want. Alexandria, VA: Amer-ican Society for Training and Development.

Cascallar, A. and Cascallar, E. (2003). Setting standards in theassessment of complex performances: the optimisedextended-response standard setting method. In OptimisingNew Modes of Assessment: In Search of Qualities and Stan-dards, edited by M. Segers, F. Dochy, and E. Cascallar, pp.247–266. Dordrecht: Kluwer.

Chandler, P. and Sweller, J. (1991). Cognitive load theory andthe format of instruction. Cogn. Instruct., 8, 293–332.*

Charness, N., Reingold, E. M., Pomplun, M., and Stampe, D. M.(2001). The perceptual aspect of skilled performance inchess: evidence from eye movements. Mem. Cogn., 29,1146–1152.*

Chase, C. I. (1999). Contemporary Assessment for Educators.New York: Longman.

Chen, H. (2005). The Effect of Type of Threading and Level ofSelf-Efficacy on Achievement and Attitudes in OnlineCourse Discussion, Ph.D. dissertation. Tempe: Arizona StateUniversity.

Christensen, L. B. (2006). Experimental Methodology, 10th ed.Boston, MA: Allyn & Bacon.

Collier, J. and Collier, M. (1986). Visual Anthropology: Pho-tography as a Research Method. Albuquerque, NM: Univer-sity of New Mexico Press.

Cooke, N. J. (1994). Varieties of knowledge elicitation tech-niques. Int. J. Hum.–Comput. Stud., 41, 801-849.

Cooke, N. J., Salas E., Cannon-Bowers, J. A., and Stout R. J.(2000). Measuring team knowledge. Hum. Factors, 42,151–173.*

Cornu, B. (2004). Information and communication technologytransforming the teaching profession. In Instructional Design:Addressing the Challenges of Learning Through Technologyand Curriculum, edited by N. Seel and S. Dijkstra, pp.227–238. Mahwah, NJ: Lawrence Erlbaum Associates.*

Crooks, S. M., Klein, J. D., Jones, E. K., and Dwyer, H. (1995).Effects of Cooperative Learning and Learner Control Modesin Computer-Based Instruction. Paper presented at the Asso-ciation for Communications and Technology Annual Meet-ing, February 8–12, Anaheim, CA.

Cuneo, C. (2000). WWW Virtual Library: Sociology Software,http://socserv.mcmaster.ca/w3virtsoclib/software.htm

Demetriadis, S., Barbas, A., Psillos, D., and Pombortsis, A.(2005). Introducing ICT in the learning context of traditionalschool. In Preparing Teachers to Teach with Technology,edited by C. Vrasidas and G. V. Glass, pp. 99–116. Green-wich, CO: Information Age Publishers.

Dewey, J. (1916/1966). Democracy and Education: An Intro-duction to the Philosophy of Education. New York: FreePress.

Dijkstra, S. (2004). The integration of curriculum design, instruc-tional design, and media choice. In Instructional Design:Addressing the Challenges of Learning Through Technologyand Curriculum, edited by N. Seel and S. Dijkstra, pp.145–170. Mahwah, NJ: Lawrence Erlbaum Associates.*

Downing, S. M. and Haladyna, T. M. (1997). Test item devel-opment: validity evidence from quality assurance proce-dures. Appl. Meas. Educ., 10(1), 61–82.

Driscoll, M. P. (1995). Paradigms for research in instructionalsystems. In Instructional Technology: Past, Present andFuture, 2nd ed., edited by G. J. Anglin, pp. 322–329. Engle-wood, CO: Libraries Unlimited.*

Duchowski, A. T. (2003). Eye Tracking Methodology: Theoryand Practice. London: Springer.

Eccles, D. W. and Tenenbaum, G. (2004). Why an expert teamis more than a team of experts: a social-cognitive conceptu-alization of team coordination and communication in sport.J. Sport Exer. Psychol., 26, 542–560.

Ericsson, K. A. (2002). Attaining excellence through deliberatepractice: insights from the study of expert performance. InThe Pursuit of Excellence Through Education, edited by M.Ferrari, pp. 21–55. Hillsdale, NJ: Lawrence Erlbaum Asso-ciates.

Ericsson, K. A. (2004). Deliberate practice and the acquisitionand maintenance of expert performance in medicine andrelated domains. Acad. Med., 79(10), 70–81.*

Ericsson, K. A. and Lehmann, A. C. (1996). Expert and excep-tional performance: evidence for maximal adaptation to taskconstraints. Annu. Rev. Psychol., 47, 273–305.

Ericsson, K. A. and Simon, H. A. (1980). Verbal reports as data.Psychol. Rev., 87, 215–251.*

Ericsson, K. A. and Simon, H. A. (1984). Protocol Analysis:Verbal Reports as Data. Cambridge, MA: MIT Press.*

Ericsson, K. A. and Simon, H. A. (1993). Protocol Analysis:Verbal Reports as Data, rev. ed. Cambridge, MA: MITPress.

Ericsson, K. A. and Smith, J., Eds. (1991). Toward a GeneralTheory of Expertise: Prospects and Limits. Cambridge,U.K.: Cambridge University Press.

Ericsson, K. A. and Staszewski, J. J. (1989). Skilled memoryand expertise: mechanisms of exceptional performance. InComplex Information Processing: The Impact of Herbert A.Simon, edited by D. Klahr and K. Kotovsky, pp. 235–267.Hillsdale, NJ: Lawrence Erlbaum Associates.

Erkens, G. (2002). MEPA: Multiple Episode Protocol Analysis,Version 4.8, http://edugate.fss.uu.nl/mepa/index.htm.

Espey, L. (2000). Technology planning and technology integra-tion: a case study. In Proceedings of Society for InformationTechnology and Teacher Education International Confer-ence 2000, edited by C. Crawford et al., pp. 95–100. Ches-apeake, VA: Association for the Advancement of Computingin Education.

Florer, F. (2007). Software for Psychophysics, http://vision.nyu.edu/Tips/FaithsSoftwareReview.html.

Fu, W.-T. (2001). ACT-PRO action protocol analyzer: a tool foranalyzing discrete action protocols. Behav. Res. MethodsInstrum. Comput., 33, 149–158.

Fussell, S. R., Kraut, R. E., Lerch, F. J., Shcerlis, W. L.,McNally, M. M., and Cadiz, J. J. (1998). Coordination,Overload and Team Performance: Effects of Team Commu-nication Strategies. Paper presented at the Association forComputing Machinery Conference on Computer SupportedCooperative Work, November 14–18, Seattle, WA.

Gamma, E., Helm, R., Johnson, R., and Vlissides, J. (2005).Design Patterns: Elements of Reusable Object-OrientedSoftware. Addison-Wesley: Reading, MA.

Garfinkel, H. (1967). Studies in Ethnomethodology: A Returnto the Origins of Ethnomethodology. Englewood Cliffs, NJ:Prentice Hall.

Tamara van Gog, Fred Paas et al.

802

Gerjets, P., Scheiter, K., and Catrambone, R. (2004). Designinginstructional examples to reduce cognitive load: molar ver-sus modular presentation of solution procedures. Instruct.Sci., 32, 33–58.*

Gerjets, P., Scheiter, K., and Catrambone, R. (2006). Can learn-ing from molar and modular worked examples be enhancedby providing instructional explanations and prompting self-explanations? Learn. Instruct., 16, 104–121.

Goetz, J. P. and LeCompte, M. D. (1984). Ethnography andQualitative Design in Educational Research. Orlando, FL:Academic Press.*

Goodyear, P. (2000). Environments for lifelong learning: ergo-nomics, architecture and educational design. In Integratedand Holistic Perspectives on Learning, Instruction, andTechnology: Understanding Complexity, edited by J. M.Spector and T. M. Anderson, pp. 1–18. Dordrecht: Kluwer.*

Goswami, U. (2004). Neuroscience and education. Br. J. Educ.Psychol., 74, 1–14.

Gulikers, J. T. M., Bastiaens, T. J., and Kirschner, P. A. (2004).A five-dimensional framework for authentic assessment.Educ. Technol. Res. Dev., 52(3), 67–86.*

Guzzo, R. A. and Shea, G. P. (1992). Group performance andintergroup relations in organizations. In Handbook of Indus-trial and Organizational Psychology Vol. 3, 2nd ed., editedby M. D. Dunnette and L. M. Hough, pp. 269–313. PaloAlto, CA: Consulting Psychologists Press.

Haider, H. and Frensch, P. A. (1999). Eye movement during skillacquisition: more evidence for the information reductionhypothesis. J. Exp. Psychol. Learn. Mem. Cogn., 25, 172–190.

Hambleton, R. K., Jaegar, R. M., Plake, B. S., and Mills, C.(2000). Setting performance standards on complex educa-tional assessments. Appl. Psychol. Meas., 24, 355–366.

Hara, N., Bonk, C. J., and Angeli, C. (2000). Content analysisof online discussion in an applied educational psychologycourse. Instruct. Sci., 28, 115–152.

Hardwidge, B. (2006) Building Extreme PCs: The CompleteGuide to Computer Modding. Cambridge, MA: O’ReillyMedia.

Heider, K. G. (1976). Ethnographic Film. Austin, TX: TheUniversity of Texas Press.

Herl, H. E., O’Neil, H. F., Chung, G. K. W. K., and Schacter,J. (1999). Reliability and validity of a computer-basedknowledge mapping system to measure content understand-ing. Comput. Hum. Behav., 15, 315–333.

Higgins, N. and Rice, E. (1991). Teachers’ perspectives oncompetency-based testing. Educ. Technol. Res. Dev., 39(3),59–69.

Hobma, S. O., Ram, P. M., Muijtjens, A. M. M., Grol, R. P. T. M.,and Van der Vleuten, C. P. M. (2004). Setting a standard forperformance assessment of doctor–patient communicationin general practice. Med. Educ., 38, 1244–1252.

Hockings, P., Ed. (1975). Principles of Visual Anthropology.The Hague: Mouton Publishers.

Horber, E. (2006). Qualitative Data Analysis Links, http://www.unige.ch/ses/sococ/qual/qual.html.

Ifenthaler, D. (2005). The measurement of change: learning-dependent progression of mental models. Technol. Instruct.Cogn. Learn., 2, 317–336.*

Jeong, A. C. (2003). The sequential analysis of group interactionand critical thinking in online threaded discussions. Am. J.Distance Educ., 17(1), 25–43.*

Johnson, D. W., Johnson, R. T., and Stanne, M. B. (2000).Cooperative Learning Methods: A Meta-Analysis, http://www.co-operation.org/pages/cl-methods.html.*

Jones, E. K., Crooks, S., and Klein, J. (1995). Development of aCooperative Learning Observational Instrument. Paper pre-sented at the Association for Educational Communications andTechnology Annual Meeting, February 8–12, Anaheim, CA.

Jorgensen, D. L. (1989). Participant Observation: A Methodol-ogy for Human Studies. London: SAGE.*

Katzir, T. and Paré-Blagoev, J. (2006). Applying cognitive neu-roscience research to education: the case of literacy. Educ.Psychol., 41, 53–74.

Kirschner, P., Carr, C., van Merrienboer, J., and Sloep, P. (2002). Howexpert designers design. Perform. Improv. Q., 15(4), 86–104.

Klein, J. D. and Pridemore, D. R. (1994). Effects of orientingactivities and practice on achievement, continuing motiva-tion, and student behaviors in a cooperative learning envi-ronment. Educ. Technol. Res. Dev., 41(4), 41–54.*

Klimoski, R. and Mohammed, S. (1994). Team mental model:construct or metaphor. J. Manage., 20, 403–437.

Ko, S. and Rossen, S. (2001). Teaching Online: A PracticalGuide. Boston, MA: Houghton Mifflin.

Koschmann, T. (1996). Paradigm shifts and instructional tech-nology. In Computer Supportive Collaborative Learning:Theory and Practice of an Emerging Paradigm, edited byT. Koschmann, pp. 1–23. Mahwah, NJ: Lawrence ErlbaumAssociates.*

Kuusela, H. and Paul, P. (2000). A comparison of concurrentand retrospective verbal protocol analysis. Am. J. Psychol.,113, 387–404.

Langan-Fox, J. (2000). Team mental models: techniques, meth-ods, and analytic approaches. Hum. Factors, 42, 242–271.*

Langan-Fox, J. and Tan, P. (1997). Images of a culture in tran-sition: personal constructs of organizational stability andchange. J. Occup. Org. Psychol., 70, 273–293.

Langan-Fox, J., Code, S., and Langfield-Smith, K. (2000). Teammental models: techniques, methods, and analyticapproaches. Hum. Factors, 42, 242–271.

Langan-Fox, J., Anglim, J., and Wilson, J. R. (2004). Mentalmodels, team mental models, and performance: process,development, and future directions. Hum. Factors Ergon.Manuf., 14, 331–352.

Lawless, C. J. (1994). Investigating the cognitive structure ofstudents studying quantum theory in an Open Universityhistory of science course: a pilot study. Br. J. Educ. Technol.,25, 198–216.

Lesh, R. and Dorr, H. (2003). A Models and Modeling Perspec-tive on Mathematics Problem Solving, Learning, and Teach-ing. Mahwah, NJ: Lawrence Erlbaum Associates.*

Levine, J. M. and Moreland, R. L. (1990). Progress in smallgroup research. Annu. Rev. Psychol., 41, 585–634.

Lincoln, Y. S. and Guba, E. G. (1985). Naturalistic Inquiry.Beverly Hills, CA: SAGE.

Lingard, L. (2002). Team communications in the operatingroom: talk patterns, sites of tension, and implications fornovices. Acad. Med., 77, 232–237.

Losada, M. (1990). Collaborative Technology and Group Pro-cess Feedback: Their Impact on Interactive Sequences inMeetings. Paper presented at the Association for ComputingMachinery Conference on Computer Supported CooperativeWork, October 7–10, Los Angeles, CA.

Lowyck, J. and Elen, J. (2004). Linking ICT, knowledgedomains, and learning support for the design of learningenvironments. In Instructional Design: Addressing the Chal-lenges of Learning Through Technology and Curriculum,edited by N. Seel, and S. Dijkstra, pp. 239–256. Mahwah,NJ: Lawrence Erlbaum Associates.*

Data Collection and Analysis

803

Magliano, J. P., Trabasso, T., and Graesser, A. C. (1999). Stra-tegic processing during comprehension. J. Educ. Psychol.,91, 615–629.

Mathieu, J. E., Heffner, T. S., Goodwin, G. F., Salas, E., andCannon-Bowers, J. A. (2000). The influence of shared men-tal models on team process and performance. J. Appl. Psy-chol., 85, 273–283.

Mehrens, W.A., Popham, J. W., and Ryan, J. M. (1998). Howto prepare students for performance assessments. Educ.Measure. Issues Pract., 17(1), 18–22.

Meloy, J. M. (1994). Writing the Qualitative Dissertation:Understanding by Doing. Hillsdale, NJ: Lawrence ErlbaumAssociates.

Merrill, M. D. (2002). First principles of instruction. Educ.Technol. Res. Dev., 50(3), 43–55.*

Michaelsen, L. K., Knight, A. B., and Fink, L. D. (2004). Team-Based Learning: A Transformative Use of Small Groups inCollege Teaching. Sterling, VA: Stylus Publishing.

Miles, M. B. and Huberman, A. M. (1994). Qualitative DataAnalysis: An Expanded Sourcebook, 2nd ed. Thousand Oaks,CA: SAGE.

Miles, M. B. and Weitzman, E. A. (1994). Appendix: choosingcomputer programs for qualitative data analysis. In Quali-tative Data Analysis: An Expanded Sourcebook, 2nd ed.,edited by M. B. Miles and A. M. Huberman, pp. 311–317.Thousand Oaks, CA: SAGE.

Moallem, M. (1994). An Experienced Teacher’s Model ofThinking and Teaching: An Ethnographic Study on TeacherCognition. Paper presented at the Association for Educa-tional Communications and Technology Annual Meeting,February 16–20, Nashville, TN.

Morgan, D. L. (1996). Focus Groups as Qualitative ResearchMethods, 2nd ed. Thousand Oaks, CA: SAGE.

Morris, L. L., Fitz-Gibbon, C. T., and Lindheim, E. (1987). Howto Measure Performance and Use Tests. Newbury Park, CA:SAGE.

Myllyaho, M., Salo, O., Kääriäinen, J., Hyysalo, J., and Kosk-ela, J. (2004). A Review of Small and Large Post-MortemAnalysis Methods. Paper presented at the 17th Interna-tional Conference on Software and Systems Engineeringand their Applications, November 30–December 2, Paris,France.

Newell, A. and Rosenbloom, P. (1981). Mechanisms of skillacquisition and the law of practice. In Cognitive Skills andTheir Acquisition, edited by J. R. Anderson, pp. 1–56. Hills-dale, NJ: Lawrence Erlbaum Associates.

Nitko, A. (2001). Educational Assessment of Students, 3rd ed.Upper Saddle River, NJ: Prentice Hall.

Noldus, L. P. J. J., Trienes, R. J. H., Hendriksen, A. H. M.,Jansen, H., and Jansen, R. G. (2000). The Observer Video-Pro: new software for the collection, management, and pre-sentation of time-structured data from videotapes and digitalmedia files. Behav. Res. Methods Instrum. Comput., 32,197–206.

O’Connor, D. L. and Johnson, T. E. (2004). Measuring teamcognition: concept mapping elicitation as a means of con-structing team shared mental models in an applied setting.In Concept Maps: Theory, Methodology, Technology, Pro-ceedings of the First International Conference on ConceptMapping Vol. 1, edited by A. J. Cañas, J. D. Novak, and F.M. Gonzalez, pp. 487–493. Pamplona, Spain: Public Uni-versity of Navarra.*

Olansen, J. B. and Rosow, E. (2002). Virtual Bio-Instrumenta-tion. Upper Saddle River, NJ: Prentice Hall.

Olkinuora, E., Mikkila-Erdmann, M., and Nurmi, S. (2004).Evaluating the pedagogical value of multimedia learningmaterial: an experimental study in primary school. InInstructional Design: Addressing the Challenges of Learn-ing Through Technology and Curriculum, edited by N. Seeland S. Dijkstra, pp. 331–352. Mahwah, NJ: LawrenceErlbaum Associates.

O’Neal, M. R. and Chissom, B. S. (1993). A Comparison ofThree Methods for Assessing Attitudes. Paper presented atthe Annual Meeting of the Mid-South Educational ResearchAssociation, November 10–12, New Orleans, LA.

O’Neil, H. F., Wang, S., Chung, G., and Herl, H. E. (2000).Assessment of teamwork skills using computer-based team-work simulations. In Aircrew Training and Assessment,edited by H. F. O’Neil and D. H. Andrews, pp. 244–276.Mahwah, NJ: Lawrence Erlbaum Associates.*

Paas, F. (1992). Training strategies for attaining transfer ofproblem-solving skill in statistics: a cognitive load approach.J. Educ. Psychol., 84, 429–434.

Paas, F. and van Merriënboer, J. J. G. (1993). The efficiency ofinstructional conditions: an approach to combine mental-effortand performance measures. Hum. Factors, 35, 737–743.*

Paas, F. and van Merriënboer, J. J. G. (1994a). Instructionalcontrol of cognitive load in the training of complex cognitivetasks. Educ. Psychol. Rev., 6, 51–71.

Paas, F. and van Merriënboer, J. J. G. (1994b). Variability ofworked examples and transfer of geometrical problem-solv-ing skills: a cognitive load approach. J. Educ. Psychol., 86,122–133.

Paas, F., Tuovinen, J. E., Tabbers, H., and Van Gerven, P. W.M. (2003). Cognitive load measurement as a means toadvance cognitive load theory. Educ. Psychol., 38, 63–71.

Paterson, B., Bottorff, J., and Hewatt, R. (2003). Blendingobservational methods: possibilities, strategies, and chal-lenges. Int. J. Qual. Methods, 2(1), article 3.

Patton, M. Q. (2001). Qualitative Research and EvaluationMethods, 3rd ed. Thousand Oaks, CA: SAGE.

Paulsen, M. F. (2003). An overview of CMC and the onlineclassroom in distance education. In Computer-MediatedCommunication and the Online Classroom, edited by Z. L.Berge and M. P. Collins, pp. 31–57. Cresskill, NJ: HamptonPress.

Pavitt, C. (1998). Small Group Discussion: A TheoreticalApproach, 3rd ed. Newark: University of Delaware (http://www.udel.edu/communication/COMM356/pavitt/).

Pelto, P. J. and Pelto, G. H. (1978). Anthropological Research:The Structure of Inquiry, 2nd ed. Cambridge, U.K.: Cam-bridge University Press.

Perez-Prado, A. and Thirunarayanan, M. (2002). A qualitativecomparison of online and classroom-based sections of acourse: exploring student perspectives. Educ. Media Int.,39(2), 195–202.

Pirnay-Dummer, P. (2006). Expertise und modellbildung: Mito-car [Expertise and Model Building: Mitocar]. Ph.D. disser-tation. Freiburg, Germany: Freiburg University.

Popham, J. W. (1991). Appropriateness of instructor’s test-prep-aration practices. Educ. Meas. Issues Pract., 10(4), 12–16.

Prichard, J. S. (2006). Team-skills training enhances collabora-tive learning. Learn. Instruct., 16, 256–265.

Qureshi, S. (1995). Supporting Electronic Group Processes: ASocial Perspective. Paper presented at the Association forComputing Machinery (ACM) Special Interest Group onComputer Personnel Research Annual Conference, April6–8, Nashville, TN.

Tamara van Gog, Fred Paas et al.

804

Rayner, K. (1998). Eye movements in reading and informationprocessing: 20 years of research. Psychol. Bull., 124,372–422.

Reigeluth, C. M. (1989). Educational technology at the cross-roads: new mindsets and new directions. Educ. Technol. Res.Dev., 37 (1), 67–80.*

Reilly, B. (1994). Composing with images: a study of high schoolvideo producers. In Proceedings of ED-MEDIA 94: Educa-tional Multimedia and Hypermedia. Charlottesville, VA:Association for the Advancement of Computing in Education.

Reiser, R. A. and Mory, E. H. (1991). An examination of thesystematic planning techniques of two experienced teachers.Educ. Technol. Res. Dev., 39(3), 71–82.

Rentsch, J. R. and Hall, R. J., Eds. (1994). Members of GreatTeams Think Alike: A Model of Team Effectiveness andSchema Similarity among Team Members, Vol. 1, pp. 22–34.Stamford, CT: JAI Press.

Rentsch, J. R., Small, E. E., and Hanges, P. J. (in press). Cog-nitions in organizations and teams: What is the meaning ofcognitive similarity? In The People Make the Place, editedby B. S. B. Schneider. Mahwah, NJ: Lawrence ErlbaumAssociates.

Robinson, R. S. (1994). Investigating Channel One: a case studyreport. In Watching Channel One, edited by De Vaney, pp.21–41. Albany, NY: SUNY Press.

Robinson, R. S. (1995). Qualitative research: a case for casestudies. In Instructional Technology: Past, Present andFuture, 2nd ed., edited by G. J. Anglin, pp. 330–339. Engle-wood, CO: Libraries Unlimited.

Ross, S. M. and Morrison, G. R. (2004). Experimental researchmethods. In Handbook of Research on Educational Commu-nications and Technology, 2nd ed., edited by D. Jonassen,pp. 1021–1043. Mahwah, NJ: Lawrence Erlbaum Associ-ates.

Rourke, L., Anderson, T., Garrison, D. R., and Archer, W.(2001). Methodological issues in the content analysis ofcomputer conference transcripts. Int. J. Artif. Intell. Educ.,12, 8–22.

Rowe, A. L. and Cooke, N. J. (1995). Measuring mental models:choosing the right tools for the job. Hum. Resource Dev. Q.,6, 243–255.

Russo, J. E., Johnson, E. J., and Stephens, D. L. (1989). Thevalidity of verbal protocols. Mem. Cogn., 17, 759–769.

Salas, E. and Cannon-Bowers, J. A. (2000). The anatomy ofteam training. In Training and Retraining: A Handbook forBusiness, Industry, Government, and the Military, edited byS. T. J. D. Fletcher, pp. 312–335. New York: Macmillan.

Salas, E. and Cannon-Bowers, J. A. (2001). Special issue pref-ace. J. Org. Behav., 22, 87–88.

Salas, E. and Fiore, S. M. (2004). Why team cognition? Anoverview. In Team Cognition: Understanding the FactorsThat Drive Process and Performance, edited by E. Salas andS. M. Fiore. Washington, D.C.: American PsychologicalAssociation.

Salomon, G. and Perkins, D. N. (1998). Individual and socialaspects of learning. In Review of Research in Education,Vol. 23, edited by P. Pearson and A. Iran-Nejad, pp. 1–24.Washington, D.C.: American Educational Research Asso-ciation.*

Salvucci, D. D. (1999). Mapping eye movements to cognitiveprocesses [doctoral dissertation, Carnegie Mellon Univer-sity]. Dissert. Abstr. Int., 60, 5619.

Sapsford, R. and Jupp, V. (1996). Data Collection and Analysis.London: SAGE.

Savenye, W. C. (1989). Field Test Year Evaluation of the TLTGInteractive Videodisc Science Curriculum: Effects on Stu-dent and Teacher Attitude and Classroom Implementation.Austin, TX: Texas Learning Technology Group of the TexasAssociation of School Boards.

Savenye, W. C. (2004a). Evaluating Web-based learning sys-tems and software. In Curriculum, Plans, and Processes inInstructional Design: International Perspectives, edited byN. Seel and Z. Dijkstra, pp. 309–330. Mahwah, NJ:Lawrence Erlbaum Associates.

Savenye, W. C. (2004b). Alternatives for assessing learning inWeb-based distance learning courses. Distance Learn., 1(1),29–35.*

Savenye, W. C. (2006). Improving online courses: what is inter-action and why use it? Distance Learn., 2(6), 22–28.

Savenye, W. C. (2007). Interaction: the power and promise ofactive learning. In Finding Your Online Voice: Stories Toldby Experienced Online Educators, edited by M. Spector.Mahwah, NJ: Lawrence Erlbaum Associates.

Savenye, W. C. and Robinson, R. S. (2004). Qualitativeresearch issues and methods: an introduction for instruc-tional technologists. In Handbook of Research on Educa-tional Communications and Technology, 2nd ed., edited byD. Jonassen, pp. 1045–1071. Mahwah, NJ: LawrenceErlbaum Associates.

Savenye, W. C. and Robinson, R. S. (2005). Using qualitativeresearch methods in higher education. J. Comput. HigherEduc., 16(2), 65–95.

Savenye, W. C. and Strand, E. (1989). Teaching science usinginteractive videodisc: results of the pilot year evaluation ofthe Texas Learning Technology Group Project. In EleventhAnnual Proceedings of Selected Research Paper Presenta-tions at the 1989 Annual Convention of the Association forEducational Communications and Technology in Dallas,Texas, edited by M. R. Simonson and D. Frey. Ames, IA:Iowa State University.

Savenye, W. C., Leader, L. F., Schnackenberg, H. L., Jones, E.E. K., Dwyer, H., and Jiang, B. (1996). Learner navigationpatterns and incentive on achievement and attitudes in hyper-media-based CAI. Proc. Assoc. Educ. Commun. Technol.,18, 655–665.

Sax, G. (1980). Principles of Educational and PsychologicalMeasurement and Evaluation, 2nd ed. Belmont, CA: Wad-sworth.

Schneider, W. and Shiffrin, R. M. (1977). Controlled and auto-matic human information processing. I. Detection, search,and attention. Psychol. Rev., 84, 1–66.

Schweiger, D. M. (1986). Group approaches for improving stra-tegic decision making: a comparative analysis of dialecticalinquiry, devil’s advocacy, and consensus. Acad. Manage. J.,29(1), 51–71.

Seel, N. M. (1999). Educational diagnosis of mental models:assessment problems and technology-based solutions. J.Struct. Learn. Intell. Syst., 14, 153–185.

Seel, N. M. (2004). Model-centered learning environments: the-ory, instructional design, and effects. In InstructionalDesign: Addressing the Challenges of Learning ThroughTechnology and Curriculum, edited by N. Seel and S. Dijk-stra, pp. 49–73. Mahwah, NJ: Lawrence Erlbaum Associates.

Seel, N. M., Al-Diban, S., and Blumschein, P. (2000). Mentalmodels and instructional planning. In Integrated and Holis-tic Perspectives on Learning, Instruction, and Technology:Understanding Complexity, edited by J. M. Spector and T.M. Anderson, pp. 129–158. Dordrecht: Kluwer.*

Data Collection and Analysis

805

Segers, M., Dochy, F., and Cascallar, E., Eds. (2003). Optimis-ing New Modes of assessment: In Search of Qualities andStandards. Dordrecht: Kluwer.

Shepard, L. (2000). The role of assessment in a learning culture.Educ. Res., 29(7), 4–14.

Shiffrin, R. M. and Schneider, W. (1977). Controlled and auto-matic human information processing. II. Perceptual learning,automatic attending, and a general theory. Psychol. Rev., 84,127–190.*

Shin, E. J., Schallert, D., and Savenye, W. C. (1994). Effects oflearner control, advisement, and prior knowledge on youngstudents’ learning in a hypertext environment. Educ. Tech-nol. Res. Dev., 42(1), 33–46.

Smith, P. L. and Wedman, J. F. (1988). Read-think-aloud pro-tocols: a new data source for formative evaluation. Perform.Improv. Q., 1(2), 13–22.

Spector, J. M. and Koszalka, T. A. (2004). The DEEP Method-ology for Assessing Learning in Complex Domains. Arling-ton, VA: National Science Foundation.*

Spradley, J. P. (1979). The Ethnographic Interview. New York:Holt, Rinehart and Winston.*

Spradley, J. P. (1980). Participant Observation. New York: Holt,Rinehart and Winston.*

Stahl, G. (2006). Group Cognition: Computer Support forBuilding Collaborative Knowledge. Cambridge, MA: MITPress.*

Staller, L. (2005). Understanding analog to digital converterspecifications. [electronic version]. Embedded Syst. Design,February, 24, http://www.embedded.com/.

Stelmach, L. B., Campsall, J. M., and Herdman, C. M. (1997).Attentional and ocular movements. J. Exp. Psychol. Hum.Percept. Perform., 23, 823–844.

Stevenson, W. G. and Soejima, K. (2005). Recording techniquesfor electrophysiology. J. Cardiovasc. Electrophysiol., 16,1017–1022.

Strauss, A. L. and Corbin, J. M. (1994) Grounded theory meth-odology: an overview. In Handbook of Qualitative Research,edited by N. K. Denzin and Y. Lincoln, pp. 273–285. Thou-sand Oaks, CA: SAGE.*

Sweller, J. (1988). Cognitive load during problem solving:effects on learning. Cogn. Sci., 12, 257–285.*

Sweller, J., van Merriënboer, J. J. G., and Paas, F. (1998).Cognitive architecture and instructional design. Educ. Psy-chol. Rev., 10, 251–295.

Sy, T. (2005). The contagious leader: Impact of the leader’smood on the mood of group members, group affective tone,and group processes. J. Appl. Psychol., 90(2), 295–305.

Taylor, K. L. and Dionne, J. P. (2000). Accessing problem-solving strategy knowledge: the complementary use of con-current verbal protocols and retrospective debriefing. J.Educ. Psychol., 92, 413–425.

Thompson, S. (2001). The authentic standards movement andits evil twin. Phi Delta Kappan, 82(5), 358–362.

Thorndike, R. M. (1997). Measurement and Evaluation in Psy-chology and Education, 6th ed. Upper Saddle River, NJ:Prentice Hall.*

Tiffin, J. and Rajasingham, L. (1995). In Search of the VirtualClass: Education in an Information Society. London: Rou-tledge.

Titchener, E. B. (1900). The equipment of a psychological lab-oratory. Am. J. Psychol., 11, 251–265.*

Tuovinen, J. E. and Paas, F. (2004). Exploring multidimensionalapproaches to the efficiency of instructional conditions.Instruct. Sci., 32, 133–152.

Underwood, G., Chapman, P., Brocklehurst, N., Underwood, J.,and Crundall, D. (2003). Visual attention while driving:sequences of eye fixations made by experienced and novicedrivers. Ergonomics, 46, 629–646.

Underwood, G., Jebbett, L., and Roberts, K. (2004). Inspectingpictures for information to verify a sentence: eye movementsin general encoding and in focused search. Q. J. Exp. Psy-chol., 57, 165–182.

Urch Druskat, V. and Kayes, D. C. (2000). Learning versusperformance in short-term project teams. Small Group Res.,31, 328–353.

Van der Vleuten, C. P. M. and Schuwirth, L. W. T. (2005).Assessing professional competence: from methods to pro-grammes. Med. Educ., 39, 309–317.

Van Gerven, P. W. M., Paas, F., van Merriënboer, J. J. G., andSchmidt, H. (2004). Memory load and the cognitive pupil-lary response in aging. Psychophysiology, 41, 167–174.

van Gog, T. (2006). Uncovering the Problem-Solving Processto Design Effective Worked Examples. Ph.D. dissertation.Heerlen: Open University of the Netherlands.

van Gog, T., Paas, F., and van Merriënboer, J. J. G. (2005a).Uncovering expertise-related differences in troubleshootingperformance: combining eye movement and concurrent ver-bal protocol data. Appl. Cogn. Psychol., 19, 205–221.*

van Gog, T., Paas, F., van Merriënboer, J. J. G., and Witte, P.(2005b). Uncovering the problem-solving process: cued ret-rospective reporting versus concurrent and retrospectivereporting. J. Exp. Psychol. Appl., 11, 237–244.

Van Maanen, J. (1988). Tales of the Field: On Writing Ethnog-raphy. Chicago, IL: The University of Chicago Press.

van Merriënboer, J. J. G. (1997). Training Complex CognitiveSkills: A Four-Component Instructional Design Model forTechnical Training. Englewood Cliffs, NJ: EducationalTechnology Publications.*

van Merriënboer, J. J. G., Jelsma, O., and Paas, F. (1992).Training for reflective expertise: a four-component instruc-tional design model for complex cognitive skills. Educ. Tech-nol. Res. Dev., 40(2), 1042–1629.

Van Someren, M. W., Barnard, Y. F., and Sandberg, J. A. C.(1994). The Think Aloud Method: A Practical Guide to Mod-eling Cognitive Processes. London: Academic Press.

VanLehn, K. (1996). Cognitive skill acquisition. Annu. Rev.Psychol., 47, 513–539.*

Wainer, H. (1989). The future of item analysis. J. Educ. Meas.,26(2), 191–208.

Webb, E. J., Campbell, D. T., Schwartz, R. D., and Sechrest, L.(1966). Unobtrusive Measures: Nonreactive Research in theSocial Sciences. Chicago, IL: Rand McNally.

Webb, N. M. (1982). Student interaction and learning in smallgroups. Rev. Educ. Res., 52(3), 421–445.

Weitzman, E. A. and Miles, M. B. (1995). A Software Source-book: Computer Programs for Qualitative Data Analysis.Thousand Oaks, CA: SAGE.

Willis, S. C., Bundy, C., Burdett, K., Whitehouse, C. R., andO’Neill, P. A. (2002). Small-group work and assessment ina problem-based learning curriculum: a qualitative and quan-titative evaluation of student perceptions of the process ofworking in small groups and its assessment. Med. Teacher,24, 495–501.

Wolcott, H. F. (1990). Writing Up Qualitative Research. New-bury Park, CA: SAGE.*

Woods, D. R., Felder, R. M., Rugarcia, A., and Stice, J. E.(2000). The future of engineering education. Part 3. Devel-opment of critical skills. Chem. Eng. Educ., 34, 108–117.

Tamara van Gog, Fred Paas et al.

806

Woolf, H. (2004). Assessment criteria: reflections on currentpractices. Assess. Eval. Higher Educ., 29, 479–493.*

Worchel, S., Wood, W., and Simpson, J. A., Eds. (1992).Group Process and Productivity. Newbury Park, CA:SAGE.

Yeo, G. B. and Neal, A. (2004). A multilevel analysis of effort,practice and performance: effects of ability, conscientious-ness, and goal orientation. J. Appl. Psychol., 89, 231–247.*

* Indicates a core reference.