Assessing Outcomes and Processes of Student Collaboration

Assessing Outcomes and Processesof Student Collaboration

Peter F. Halpin

April 19, 2016

Joint work with: Alina von Davier, Yoav Bergner,Jiangang Hao, Lei Liu (ETS); Jacqueline Gutman (NYU)

1 / 89

Outline

Part 1: Wherefore assessments involving collaboration?

I Set up the current perspective: performance assessments

I Selective review of research on small group productivity

Part 2: Outcomes of collaboration

I Combining psychometric models with research on small groupproductivity

I Testing models against observed team performance

Part 3: Processes of collaborationI Focus on chat data (for now!)

I Modeling engagement among collaborators using temporalpoint processesHalpin, von Davier, Hao, & Lui (under review). Journal of Educational

Measurement.

2 / 89

Outline








I Modeling engagement among collaborators using temporalpoint processesHalpin, von Davier, Hao, & Lui (under review). Journal of Educational

Measurement.

3 / 89

Outline








I Modeling engagement among collaborators using temporalpoint processes1

1Halpin, von Davier, Hao, & Lui (under review). Journal of Educational Measurement.

4 / 89

Part 1: Why?

I 21st-century skills, non-cognitive skills, soft skills,hard-to-measure skills, social skills, ...

I Theme: traditional educational tests target a relatively narrowset of constructs

I Analyses of US labour markets indicate that such skills arevalued by employers (Burrus et al.,2013; Deming, 2015)

I There is a salient demand for assessments of a broader rangeof student competencies

5 / 89

Part 1: Why?





6 / 89

Part 1: Why?





7 / 89

With apologies to Dr. Duckworth...

upenn.app.box.com/8itemgrit

8 / 89

upenn.app.box.com/8itemgrit

Self-reports

I Self-report measures often do not require the respondent toexhibit the skills about which we wish to make inferences

→ Unsuitable for supporting consequential decisions ineducational settings2

2cf. Duckworth, & Yeager. (2015). Measurement matters: Assessing personal qualities other than cognitive

ability for educational purposes. Educational Researcher, 44(4), 237-251.

9 / 89

Educational assessments

, Reliability and generalizability in traditional content domains

/ Current psychometric models don’t seem entirely appropriateto “next generation assessments”

I e.g., IRT models don’t use process data

/ Collateral damage: teaching to the test, test anxiety,bubble-filling, ...

I NY opt-out movement: 20% of students (parents) boycottedstate test last year

10 / 89






I NY opt-out movement: 20% of students (parents) boycottedstate test last year

11 / 89






I NY opt-out movement: 20% of students (parents) boycottedstate test last year3

3www.wnyc.org/story/

new-york-city-students-make-modest-gains-state-tests-opt-out-numbers-triple/

12 / 89

www.wnyc.org/story/new-york-city-students-make-modest-gains-state-tests-opt-out-numbers-triple/

www.wnyc.org/story/new-york-city-students-make-modest-gains-state-tests-opt-out-numbers-triple/

Performance assessments4

4Davey, Ferrara, Holland, Shavelson, Webb, & Wise (2015). Psychometric Considerations for the Next

Generation of Performance Assessment. Princeton, NJ. p. 10

13 / 89

Collaboration as a modality of performance assessment

I Small group interactions are a highly-valued educationalpractice

I The Jigsaw Classroom (Aronson et al., 1978; jigsaw.org)

I Group-worthy tasks (Cohen et al., 1999)

I The use of information technology to support studentcollaboration is well established

I CSCL (e.g., Hmelo-Silver et al., 2013)

I The use of group work in assessment contexts has a relativelylong-standing history

I e.g., Webb, 1995; 2015

14 / 89

jigsaw.org

Collaboration as a modality of performance assessment

I Small group interactions are a highly-valued educationalpractice

I The Jigsaw Classroom (Aronson et al., 1978; jigsaw.org)

I Group-worthy tasks (Cohen et al., 1999)

I The use of information technology to support studentcollaboration is well established

I CSCL (e.g., Hmelo-Silver et al., 2013)

I The use of group work in assessment contexts has a relativelylong-standing history

I e.g., Webb, 1995; 2015

15 / 89

jigsaw.org

Intellective tasks

I Defined as having a demonstrably“correct” answer with respect toan agreed upon system ofknowledge

I Differentiated from decision /judgement tasks on a continuumof demonstrability (Laughlin2011)

I Differentiated from mixed-motivetasks in that the goals andoutcomes are the same for allmembers McGrath’s (1984) group task circumplex

16 / 89

Lorge & Solomon 19555

5Two models of group behavior in the solution of Eureka-type problems. Psychometrika, 1955, 20 (2), p. 141

17 / 89

Lorge & Solomon 19556

6Two models of group behavior in the solution of Eureka-type problems. Psychometrika, 1955, 20 (2), p. 141

18 / 89

Smoke and Zajonc 19627

If p is the probability that a given individual member is correct, thegroup has a probability h(p) of being correct, where h(p) is afunction of p depending upon the type of decision scheme acceptedby the group. We shall call h(p) a decision function. Intuitively, itwould seem that a decision scheme is desirable to the extent thatit surpasses p.

7On the reliability of group judgements and decisions. In Mathematical methods for small group processes

(Eds. Criswell, Solomon, Suppes), p. 322

19 / 89

Schiflett 19798

8Towards a general model of group productivity. Psychological Bulletin, 86 (1), pp. 67-68

20 / 89

Summary

I Building on research on small groups:

I Intellective tasks (vs decision tasks)

I Cooperative group interactions (vs competitive ormixed-motive)

I Describing group outcomes via decision / functions thatdepend on characteristics of individuals

I But with a focus on:

I Letting probability of success vary over individuals (e.g., viaability)

I Describing relevant task characteristics (e.g., via difficulty)

I The performance of individual groups rather than groups inaggregate

21 / 89

Outcomes of collaboration: A basic scenario

I Two students each write a conventional math assessment

I Their math ability is estimated to be θj and θk

I The two students then work together on a secondconventional math assessment

I What do we expect about their performance on the secondtest, based on the first?

22 / 89

Collaboration as a psychometric question

I Traditional psychometric models assume conditionalindependence of the items

p(xj | θj) =

N∏i

p(xij | θj) (1)

I Traditional psychometric models also assume that theresponses of two (or more) persons are independent

p(xj xk | θj θk) = p(xj | θj) p(xk | θk) (2)

I When people work together does equation (2) hold?

23 / 89

“Working together” in terms of scoring rules9

I For binary items and pairs of responses, consider:

I The conjunctive rule

xijk =

{1 if xij = 1 and xik = 10 otherwise

I The disjunctive rule

xijk =


I More possibilities, especially for items with > 2 responses or groupswith > 2 collaborators

9cf. Steiner’s 1966 classification of task types

24 / 89

Scoring rules vs decision functions

I Scoring rules describe what “counts” as a correct groupresponse

I Under control of the test designer10

I Decision functions describe the strategies adopted by a team

I Under control of the team

I Basic research strategyI Assume a certain scoring rule

I Consider plausible models for team strategies

I Test the models against data

10Maris & van der Maas (2012). Speed-accuracy response models: scoring rules based on response time and

accuracy. Psychometrika, 77 (4), 615-633

25 / 89

Scoring rules vs decision functions

I Scoring rules describe what “counts” as a correct groupresponse

I Under control of the test designer10

I Decision functions describe the strategies adopted by a team

I Under control of the team

I Basic research strategyI Assume a certain scoring rule

I Consider plausible models for team strategies

I Test the models against data

10Maris & van der Maas (2012). Speed-accuracy response models: scoring rules based on response time and

accuracy. Psychometrika, 77 (4), 615-633

26 / 89

“Working together” in terms of scoring rules

I For binary items and pairs of responses, consider:

I The conjunctive rule

xijk =


I The disjunctive rule

xijk =


I More possibilities, especially for items with > 2 responses or groupswith > 2 collaborators

27 / 89

Defining successful pairwise collaboration

I The independence model

Eind[xijk | θj θk] = E[xij | θj ] E[xik | θk]

I Successful collaboration

E[xijk | θj θk] > Eind[xijk | θj θk]

I Unsuccessful collaboration

E[xijk | θj θk] < Eind[xijk | θj θk]

I Note: these definitions are item- and dyad- specific

28 / 89

Defining successful pairwise collaboration

I The independence model

Eind[xijk | θj θk] = E[xij | θj ] E[xik | θk]

I Successful collaboration

E[xijk | θj θk] > Eind[xijk | θj θk]

I Unsuccessful collaboration

E[xijk | θj θk] < Eind[xijk | θj θk]

I Note: these definitions are item- and dyad- specific

29 / 89

Some models for successful collaboration

I Minimum individual performance (disruptive team member)

Emin[xijk | θj θk] = min{E[xij | θj ], E[xik | θk]}

I Maximum individual performance (cheating / tutor)

Emax[xijk | θj θk] = max{E[xij | θj ], E[xik | θk]}

I “True collaboration”

E[xijk | θj θk] ≥ max{E[xij | θj ], E[xik | θk]}

30 / 89








31 / 89








32 / 89

A model for “true collaboration”

I An additive model

Eadd[xijk | θj θk] = E[xij | θj ] + E[xik | θk]− E[xijk | θj θk]

I Recalling E[xijk | θj θk] > E[xij | θj ]E[xik | θk], define anadditive independence (AI) model

EAI [xijk | θj θk] = E[xij | θj ] + E[xik | θk]− E[xij | θj ]E[xik | θk]

≥ Eadd[xijk | θj θk]

I AI is an upper bound on any “more interesting” additive model forsuccessful collaboration

33 / 89

A model for “true collaboration”

I An additive model

Eadd[xijk | θj θk] = E[xij | θj ] + E[xik | θk]− E[xijk | θj θk]

I Recalling E[xijk | θj θk] > E[xij | θj ]E[xik | θk], define anadditive independence (AI) model

EAI [xijk | θj θk] = E[xij | θj ] + E[xik | θk]− E[xij | θj ]E[xik | θk]

≥ Eadd[xijk | θj θk]

I AI is an upper bound on any “more interesting” additive model forsuccessful collaboration

34 / 89

More on AI model

I Can also be written as:

EAI [xijk | θj θk] = E[xij | θj ] (1− E[xik | θk])+ E[xik | θk] (1− E[xij | θj ])+ E[xij | θj ]E[xik | θk]

I Which has an interpretation in terms of three cases

35 / 89

More on AI model

I And is also equivalent to Lorge & Solomon’s Model A

EAI [xijk | θj θk] = 1− (1− E[xij | θj ])(1− E[xik | θk])

I Except the “probability an individual can solve the problem”now depends on both the individual and the problem

36 / 89

More on AI model11

I We probably want some constraints on what counts as a goodcollaborative IRF

I Easy to show that AI satisfies latent monotonicity, if theindividual IRFs do (trivial for other models also)

11Holland & Rosenbaum (1986). Conditional Association and Unidimensionality in Monotone Latent Variable

Models. The Annals of Statistics, 14 (4), 1523 – 1543

37 / 89

More on AI model11

I We probably want some constraints on what counts as a goodcollaborative IRF

I Easy to show that AI satisfies latent monotonicity, if theindividual IRFs do (trivial for other models also)

11Holland & Rosenbaum (1986). Conditional Association and Unidimensionality in Monotone Latent Variable

Models. The Annals of Statistics, 14 (4), 1523 – 1543

38 / 89

AI: latent monotonicity

Assumptions:

f(x) ≥ f(x′) for x > x′ and 0 ≤ g(y) ≤ 1

Show:

f(x) + g(y)− f(x) g(y) ≥ f(x′) + g(y)− f(x′) g(y)

Contradiction:

f(x) + g(y)− f(x) g(y) <f(x′) + g(y)− f(x′) g(y)

→ f(x)− f(x′) <g(y) (f(x)− f(x′))

39 / 89

AI: example IRF12

theta1

prob

12Using 2PL model for individual IRFs with α = 1 and β = 0

40 / 89

Models abound!

I Basic idea: write down IRFs for collaboration based onassumed-to-be-known individual abilities (and itemparameters)

I But how do we characterize empirical team performance?

41 / 89

Empirical team performance

I We have

I Observed collaborative responses xjk = (x1jk, x1jk, . . . , xmjk)

I A model for individual performance on the m (conventional)math items

I So we can get “team theta,” e.g.,

θjk = argmaxθ

{L0(xjk | θ)} (3)

I Where L0 is the likelihood of the model calibrated onindividual performance (reference model)

42 / 89

Empirical team performance

I We have

I Observed collaborative responses xjk = (x1jk, x1jk, . . . , xmjk)

I A model for individual performance on the m (conventional)math items

I So we can get “team theta,” e.g.,

θjk = argmaxθ

{L0(xjk | θ)} (3)

I Where L0 is the likelihood of the model calibrated onindividual performance (reference model)

43 / 89

Proposed method for testing models

I Testing of different models against reference model

Dmodel = −2 lnLmodel(xjk | θj θk)L0(xjk | θjk)

(4)

I Also a “direct test” of effect of collaboration for eachindividual

D0 = −2 lnL0(xjk | θj)L0(xjk | θjk)

(5)

with effect size δjk =θjk−θjσθ

44 / 89

Proposed method: reference distribution

I Ind and AI models are not nested with reference model → NoWilk’s theorem

I Can use Vuong’s 198913 results for LR with non-nestedmodels, but asymptotic in m

I Good news: we can bootstrap a null distribution for (4) and(5) pretty easily

13Likelihood ratio tests for model selection and non-nested hypotheses. Econometrika, 57(2), 307 – 333.

45 / 89

Bootstrapping the reference distribution

Assuming known item parameters and θj , θk. For r = 1, . . . , R

Step 1 Generate collaborative response patterns x(r)jk from

Emodel[xijk | θj θk]

Step 2 Compute Lmodel(x(r)jk | θj θk)

Step 2 Estimate θ(r)jk for each x

(r)jk ; save L0(x

(r)jk | θjk)

Step 4 Compute D(r)model or D

(r)0

46 / 89

Example 1

I Design

I Pool of pre-calibrated math items (grade 12 NAEP, modifiedto be numeric response)

I Individual “pre-test” → estimate individual abilities

I Collaborative “post-test” → evaluate models, estimate δjk

I Modality of collaboration: online chat

I Limitations:

I Small calibration sample; crowd workers

I Individual and collaborative forms were not counterbalanced(neither in order nor content)

47 / 89

NAEP grade 12 math items, deployed via OpenEdx

48 / 89

AMT crowdworkers (calibration sample)

Variable Levels n %∑

%

Gender Female 155 46.5 46.5Male 178 53.5 100.0

Age 18-30 117 35.2 35.230-40 129 38.9 74.140-55 71 21.4 95.555+ 15 4.5 100.0

Education Some Grade School 3 0.9 0.9High School Diploma 49 14.7 15.6Some College 118 35.4 51.0Bachelor’s Degree 132 39.6 90.7Master’s Degree 22 6.6 97.3Ph.D or Advanced Degree 9 2.7 100.0

Country United States 313 94.0 94.0India 16 4.8 98.8Canada 3 0.9 99.7United Kingdom 1 0.3 100.0

English First Lang Yes 321 96.4 96.4No 12 3.6 100.0

49 / 89

Deltas

-2

0

2

-2 0 2Individual Theta

Col

labo

rativ

e Th

eta

Collaborative vs Individual Performance

50 / 89

Model tests: Sanity check using individual pre-test

Figure reports P(Dmodel > |obs|) for individual pre-tests scored using conjunctivescoring rule

51 / 89

Model tests: Collaborative data

Figure reports P(Dmodel > |obs|) for collaborative tests scored using conjunctivescoring rule

52 / 89

Ind Model

-2

0

2


Col

labo

rativ

e Th

eta

pairs3

4

12

15

16

35


53 / 89

Min Model

-2

0

2


Col

labo

rativ

e Th

eta

pairs7

8

9

13

29

36


54 / 89

Max Model

-2

0

2


Col

labo

rativ

e Th

eta

pairs2

5

10

14

17

19

22

24

25

26

27

28

30

38

44

45


55 / 89

AI Model

-2

0

2


Col

labo

rativ

e Th

eta

pairs6

18

20

21

23

32

33

37

39

40

41

42

43


56 / 89

Not one of our four models

-2

0

2


Col

labo

rativ

e Th

eta

pairs1

11

31

34


57 / 89

Summary of collaborative outcomes

I Can define, estimate, and test models of collaboration onacademic performance using IRT-based methods

I But how distinct are these models, really?

I Models do not cover all cases

58 / 89

Possible next step – one model to rule them all!

Let w1, w2 ∈ [0, 1] and define the weighted additive independencemodel

EWAI[Xijk | θj θk] = wjPi(θj)Qi(θk) + wkPi(θk)Qi(θj) + Pi(θj)Pi(θk)

I Includes original four and everything in between

I Includes (Pi(θj) + Pi(θk))/2 when w1 = w2 = .5

I Weights describe how well each individual obtains his/her “optimalcollaboration level”

59 / 89

Part 3: What are process data?14

I Any task-related actions of a respondent performed during thecompletion of a task

I In ed tech context, typically associated with time-stamped userlogs (“trace data”)

I All the stuff IRT ignores:

p(x | θ) =∏i

p(xi | θ)

14Halpin & von Davier 2013, Hao, & Lui (under review). Journal of Educational Measurement.

60 / 89





p(x | θ) =∏i

p(xi | θ)


61 / 89





p(x | θ) =∏i

p(xi | θ)


62 / 89

Part 3: What are collaborative process data?

I Ideally a richly detailed recording of the sequence of actionstaken by each team member during the completion of a task

I ATC21S collaborative problem solving prototype items15

I CPS frame16

I Focus today: chat messages sent between online collaborators

15http://www.atc21s.org/uploads/3/7/0/0/37007163/pd_module_3_nonadmin.pdf

16In alpha at Computational Psychometrics lab at ETS

63 / 89

http://www.atc21s.org/uploads/3/7/0/0/37007163/pd_module_3_nonadmin.pdf

Part 3: What are collaborative process data?

I Ideally a richly detailed recording of the sequence of actionstaken by each team member during the completion of a task

I ATC21S collaborative problem solving prototype items15

I CPS frame16

I Focus today: chat messages sent between online collaborators

15http://www.atc21s.org/uploads/3/7/0/0/37007163/pd_module_3_nonadmin.pdf

16In alpha at Computational Psychometrics lab at ETS

64 / 89

http://www.atc21s.org/uploads/3/7/0/0/37007163/pd_module_3_nonadmin.pdf

Two perspectives on the analysis of chat / email / etc.

I Text-based analysis of strategy and sentiment

I e.g., Howley, Mayfield, & Rose, 2013; Liu, Hao, von Davier,Kyllonen, & Zapata-Rivera, 2015

I Time series analysis of sending times

I e.g., Barabasi, 2005; Ebel, Mielsch, & Bornholdt, 2002; Halpin& De Boeck, 2013

65 / 89

Temporal point process: basic idea (more tomorrow)17

I Data: events that have negligible duration relative to a periodof observation

I Contrast events with states, regimes

I Basic idea: model the Bernoulli probability of an eventhappening in a small window of time [t, t+ ∆), conditional onthe events that have happened before t ∈ R+.

I “Instantaneous probability” of an event, denoted p(t)

17Daley, D. J., & Vera-Jones. (2003). An introduction to the theory of point processes: Elementary theory and

methods (2nd ed., Vol. 1). New York: Springer.

66 / 89

Temporal point process in interpersonal context

I Modeling p(t) to describe

I How the probability of each person’s actions changes incontinuous time

I How this depends on their previous actions

I Emergent or group-level phenomena like coordination,reciprocity, ...

67 / 89

Chat engagement via Hawkes processes

I Hawkes process provides a means of modeling instantaneousprobabilities in a multivariate context

I Halpin et al. (under review) suggest the response intensityparameter as a measure of engagement of student j with k

αjk > njk/nk (6)

I njk is the expected total number of responses made by studentj to student k is (inferred from model)

I nk is the number of actions of student k (observed)

I Lower bound is tight in practice; not necessary forcomputations

68 / 89

Chat engagement via Hawkes processes

I Aggregating to team (dyad) level

α ≡ α12n2 + α21n1n1 + n2

(7)

I Interpretation: the proportion of all group members’ actions,n1 + n2, that were responded to by any other member duringa collaboration

I See paper for more details, including initial results on SEs ofαjk

69 / 89

Example: Tetralogue

I A simulation-based science game with an embeddedassessment recently developed at ETS (Hao, Liu, von Davier,& Kyllonen, 2015)

1 Dyads work together to learn and make predictions aboutvolcano activity

2 At various points in the simulation, the students are asked toindividually submit their responses to an assessment itemwithout discussing the item

3 Following submission of responses from both students, they areinvited to discuss the question and their answers

4 Lastly, they are given an opportunity to revise their responsesto the item, with the final answers counting towards theteam’s score

70 / 89

Example: Full sample

I 286 dyads solicited via AMT and randomly paired (based onarrival in queue)

I Median reported age was 31.5 years

I 52.5% reported that they were female

I 79.2% reported that they were White.

I Additionally, all participants were required to

I Have an IP address located in the United States

I Self-identify as speaking English as their primary language

I Self-identify as having at least one year of college education

71 / 89

Example: Estimating chat engagement

0

5

10

15

0.0 0.2 0.4 0.6 0.8Alpha

count

Engagement Index

0.00

0.05

0.10

0.15

50 75 100 125Number of Chats of Partner

Sta

ndar

d E

rror

MethodHessian

Lower Bound

Standard Error Against Number of Chats

0.0

0.2

0.4

0.6

0.2 0.4 0.6Alpha

Alpha

Relation with Partner's Index

0

1

2

3

4

0.0 0.2 0.4 0.6Alpha

Diff

eren

ce in

Num

ber o

f Cha

ts

Relation with Number of Chats

Note: Alpha denotes the estimated response intensities from Equation 6. Hessian denotes standard errors obtainedvia the Hessian of the log-likelihood. See appendix of Halpin et al. for Lower Bound. Difference in Number ofChats was scaled using the log of the absolute value of the difference.

72 / 89

Example: Relation with revision on embedded assessment

0.25

0.30

0.35

0.40

Alpha Partner Alpha Team Alpha

Mea

n E

ngag

emen

t

No Revisions

Revisions

Measures of Chat Engagement vs Item Revisions

Note: Comparison of mean levels of engagement indices for individuals who either did or did not revise at least oneresponse after discussion with their partners. Alpha denotes the estimated response intensities from Equation 6;Partner’s Alpha denotes the partner’s response intensity; Team Alpha denotes the team-level index in Equation 7.For the latter, the data are reported for dyads, not individuals, and no revisions means that both individuals on theteam made no revisions. Error bars are 95% confidence intervals on the means.

73 / 89

Example: Relation with revision on embedded assessment

Table 1: Summary of group differences.

Index Group Mean SD N Hedges’ g r

Alpha No Revisions 0.31 0.13 82 –Alpha Revisions 0.36 0.10 66 0.40 .20Partner’s Alpha No Revisions 0.31 0.14 82 –Partner’s Alpha Revisions 0.37 0.14 66 0.44 .21Team Alpha No Revisions 0.27 0.11 26 –Team Alpha Revisions 0.37 0.13 48 0.84 .38

Note: Alpha denotes the estimated response intensities from Equation alpha2; Partner’s Alpha denotes theengagement index of the individual’s partner; Team Alpha denotes the team-level index in Equation 7. Hedges’ gused the correction factor described by Hedges (1981) and r denotes the point-biserial correlation.

74 / 89

Summary of collaborative processes

I Hawkes processes are a feasible model for process data obtained oncollaborative tasks

I Resulting measures of chat engagement are meaningfully related totask performance

I Future modeling work

I Random effects models for simultaneous estimation over multiple groups

I Inclusion of model parameters describing task characteristics

I Analytic expressions for standard errors of model parameters

I Methods for improving optimization with relatively small numbers ofevents

I Integration with text-based analyses (e.g., using marks / time-varyingcovariates)

75 / 89

What’s next

I Integration of task design, outcomes, processes, ... and theory!!

76 / 89

Contact: [email protected]

Support: This research was funded by a postdoctoral fellowship from theSpencer Foundation and an Education Technology grant from NYU Steinhardt.

77 / 89

References not already included in footnotes

Aronson, E., Blaney, N., Stephan, C., Sikes, J., & Snapp, M. (1978). The jigsaw classroom. Beverly Hills, CA:Sage.

Burrus, J., Carlson, J., Bridgeman, B., Golub-smith, M., & Greenwood, R. (2013). Identifying the Most Important21st Century Workforce Competencies : An Analysis of the Occupational Information Network ( O * NET ) (ETSRR-13-21). Princeton, NJ.

Cohen, E. G., Lotan, R. A., Scarloss, B. A., & Arellano, A. R. (1999). Complex instruction: Equity in cooperativelearning classrooms. Theory Into Practice, 38, 80-86.

Davey, T., Ferrara, S., Holland, P. W., Shavelson, R. J., Webb, N. M., & Wise, L. L. (2015). PsychometricConsiderations for the Next Generation of Performance Assessment. Princeton, NJ.

Deming, D. J. (2015). The Growing Importance of Social Skills in the Labor Market. National Bureau of EconomicResearch Working Paper Series, (21473).

Griffin, P., & Care, E. (2015). Assessment and teaching of 21st century skills: Methods and approach. New York:Springer.

Hmelo-Silver, C. E., Chinn, C. A., Chan, C. K., & O?Donnel, A. M. (2013). International handbook ofcollaborative learning. New York: Taylor and Francis.

McGrath, J. E. (1984). Groups: Interaction and performance. (Prentice-Hall, Ed.). Englewood Cliffs, NJ.

Organisation for Economic Co-operation and Development. (2013). PISA 2015 Draft Collaborative ProblemSolving Framework. Retrieved fromhttp://www.oecd.org/pisa/pisaproducts/DraftPISA2015CollaborativeProblemSolvingFramework.pdf

Webb, N. M. (1995). Group Collaboration in Assessment: Multiple Objectives, Processes, and Outcomes.Educational Evaluation and Policy Analysis, 17(2), 239-261.

78 / 89

http://www.oecd.org/pisa/pisaproducts/Draft PISA 2015 Collaborative Problem Solving Framework.pdf

Bootstrapping the reference distribution for Dmodel

Assuming known item parameters and θj , θk. For r = 1, . . . , R

Step 1 Generate collaborative response patterns x(r)jk from

Emodel[xijk | θj θk]

Step 2 Compute Lmodel(x(r)jk | θj θk)

Step 2 Estimate θ(r)jk for each x

(r)jk ; save L0(x

(r)jk | θjk)

Step 4 Compute D(r)model or D

(r)0

79 / 89

Instructions

80 / 89

Jigsaw / information sharing items

81 / 89


82 / 89


83 / 89


84 / 89

Hints / information requesting items

85 / 89


86 / 89


87 / 89

Multiple answer / negotiation items

88 / 89

Documents

Assessing Outcomes and Processes of Student Collaboration