Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Assessing Outcomes and Processesof Student Collaboration
Peter F. Halpin
April 19, 2016
Joint work with: Alina von Davier, Yoav Bergner,Jiangang Hao, Lei Liu (ETS); Jacqueline Gutman (NYU)
1 / 89
Outline
Part 1: Wherefore assessments involving collaboration?
I Set up the current perspective: performance assessments
I Selective review of research on small group productivity
Part 2: Outcomes of collaboration
I Combining psychometric models with research on small groupproductivity
I Testing models against observed team performance
Part 3: Processes of collaborationI Focus on chat data (for now!)
I Modeling engagement among collaborators using temporalpoint processesHalpin, von Davier, Hao, & Lui (under review). Journal of Educational
Measurement.
2 / 89
Outline
Part 1: Wherefore assessments involving collaboration?
I Set up the current perspective: performance assessments
I Selective review of research on small group productivity
Part 2: Outcomes of collaboration
I Combining psychometric models with research on small groupproductivity
I Testing models against observed team performance
Part 3: Processes of collaborationI Focus on chat data (for now!)
I Modeling engagement among collaborators using temporalpoint processesHalpin, von Davier, Hao, & Lui (under review). Journal of Educational
Measurement.
3 / 89
Outline
Part 1: Wherefore assessments involving collaboration?
I Set up the current perspective: performance assessments
I Selective review of research on small group productivity
Part 2: Outcomes of collaboration
I Combining psychometric models with research on small groupproductivity
I Testing models against observed team performance
Part 3: Processes of collaborationI Focus on chat data (for now!)
I Modeling engagement among collaborators using temporalpoint processes1
1Halpin, von Davier, Hao, & Lui (under review). Journal of Educational Measurement.
4 / 89
Part 1: Why?
I 21st-century skills, non-cognitive skills, soft skills,hard-to-measure skills, social skills, ...
I Theme: traditional educational tests target a relatively narrowset of constructs
I Analyses of US labour markets indicate that such skills arevalued by employers (Burrus et al.,2013; Deming, 2015)
I There is a salient demand for assessments of a broader rangeof student competencies
5 / 89
Part 1: Why?
I 21st-century skills, non-cognitive skills, soft skills,hard-to-measure skills, social skills, ...
I Theme: traditional educational tests target a relatively narrowset of constructs
I Analyses of US labour markets indicate that such skills arevalued by employers (Burrus et al.,2013; Deming, 2015)
I There is a salient demand for assessments of a broader rangeof student competencies
6 / 89
Part 1: Why?
I 21st-century skills, non-cognitive skills, soft skills,hard-to-measure skills, social skills, ...
I Theme: traditional educational tests target a relatively narrowset of constructs
I Analyses of US labour markets indicate that such skills arevalued by employers (Burrus et al.,2013; Deming, 2015)
I There is a salient demand for assessments of a broader rangeof student competencies
7 / 89
Self-reports
I Self-report measures often do not require the respondent toexhibit the skills about which we wish to make inferences
→ Unsuitable for supporting consequential decisions ineducational settings2
2cf. Duckworth, & Yeager. (2015). Measurement matters: Assessing personal qualities other than cognitive
ability for educational purposes. Educational Researcher, 44(4), 237-251.
9 / 89
Educational assessments
, Reliability and generalizability in traditional content domains
/ Current psychometric models don’t seem entirely appropriateto “next generation assessments”
I e.g., IRT models don’t use process data
/ Collateral damage: teaching to the test, test anxiety,bubble-filling, ...
I NY opt-out movement: 20% of students (parents) boycottedstate test last year
10 / 89
Educational assessments
, Reliability and generalizability in traditional content domains
/ Current psychometric models don’t seem entirely appropriateto “next generation assessments”
I e.g., IRT models don’t use process data
/ Collateral damage: teaching to the test, test anxiety,bubble-filling, ...
I NY opt-out movement: 20% of students (parents) boycottedstate test last year
11 / 89
Educational assessments
, Reliability and generalizability in traditional content domains
/ Current psychometric models don’t seem entirely appropriateto “next generation assessments”
I e.g., IRT models don’t use process data
/ Collateral damage: teaching to the test, test anxiety,bubble-filling, ...
I NY opt-out movement: 20% of students (parents) boycottedstate test last year3
3www.wnyc.org/story/
new-york-city-students-make-modest-gains-state-tests-opt-out-numbers-triple/
12 / 89
Performance assessments4
4Davey, Ferrara, Holland, Shavelson, Webb, & Wise (2015). Psychometric Considerations for the Next
Generation of Performance Assessment. Princeton, NJ. p. 10
13 / 89
Collaboration as a modality of performance assessment
I Small group interactions are a highly-valued educationalpractice
I The Jigsaw Classroom (Aronson et al., 1978; jigsaw.org)
I Group-worthy tasks (Cohen et al., 1999)
I The use of information technology to support studentcollaboration is well established
I CSCL (e.g., Hmelo-Silver et al., 2013)
I The use of group work in assessment contexts has a relativelylong-standing history
I e.g., Webb, 1995; 2015
14 / 89
Collaboration as a modality of performance assessment
I Small group interactions are a highly-valued educationalpractice
I The Jigsaw Classroom (Aronson et al., 1978; jigsaw.org)
I Group-worthy tasks (Cohen et al., 1999)
I The use of information technology to support studentcollaboration is well established
I CSCL (e.g., Hmelo-Silver et al., 2013)
I The use of group work in assessment contexts has a relativelylong-standing history
I e.g., Webb, 1995; 2015
15 / 89
Intellective tasks
I Defined as having a demonstrably“correct” answer with respect toan agreed upon system ofknowledge
I Differentiated from decision /judgement tasks on a continuumof demonstrability (Laughlin2011)
I Differentiated from mixed-motivetasks in that the goals andoutcomes are the same for allmembers McGrath’s (1984) group task circumplex
16 / 89
Lorge & Solomon 19555
5Two models of group behavior in the solution of Eureka-type problems. Psychometrika, 1955, 20 (2), p. 141
17 / 89
Lorge & Solomon 19556
6Two models of group behavior in the solution of Eureka-type problems. Psychometrika, 1955, 20 (2), p. 141
18 / 89
Smoke and Zajonc 19627
If p is the probability that a given individual member is correct, thegroup has a probability h(p) of being correct, where h(p) is afunction of p depending upon the type of decision scheme acceptedby the group. We shall call h(p) a decision function. Intuitively, itwould seem that a decision scheme is desirable to the extent thatit surpasses p.
7On the reliability of group judgements and decisions. In Mathematical methods for small group processes
(Eds. Criswell, Solomon, Suppes), p. 322
19 / 89
Schiflett 19798
8Towards a general model of group productivity. Psychological Bulletin, 86 (1), pp. 67-68
20 / 89
Summary
I Building on research on small groups:
I Intellective tasks (vs decision tasks)
I Cooperative group interactions (vs competitive ormixed-motive)
I Describing group outcomes via decision / functions thatdepend on characteristics of individuals
I But with a focus on:
I Letting probability of success vary over individuals (e.g., viaability)
I Describing relevant task characteristics (e.g., via difficulty)
I The performance of individual groups rather than groups inaggregate
21 / 89
Outcomes of collaboration: A basic scenario
I Two students each write a conventional math assessment
I Their math ability is estimated to be θj and θk
I The two students then work together on a secondconventional math assessment
I What do we expect about their performance on the secondtest, based on the first?
22 / 89
Collaboration as a psychometric question
I Traditional psychometric models assume conditionalindependence of the items
p(xj | θj) =
N∏i
p(xij | θj) (1)
I Traditional psychometric models also assume that theresponses of two (or more) persons are independent
p(xj xk | θj θk) = p(xj | θj) p(xk | θk) (2)
I When people work together does equation (2) hold?
23 / 89
“Working together” in terms of scoring rules9
I For binary items and pairs of responses, consider:
I The conjunctive rule
xijk =
{1 if xij = 1 and xik = 10 otherwise
I The disjunctive rule
xijk =
{0 if xij = 0 and xik = 01 otherwise
I More possibilities, especially for items with > 2 responses or groupswith > 2 collaborators
9cf. Steiner’s 1966 classification of task types
24 / 89
Scoring rules vs decision functions
I Scoring rules describe what “counts” as a correct groupresponse
I Under control of the test designer10
I Decision functions describe the strategies adopted by a team
I Under control of the team
I Basic research strategyI Assume a certain scoring rule
I Consider plausible models for team strategies
I Test the models against data
10Maris & van der Maas (2012). Speed-accuracy response models: scoring rules based on response time and
accuracy. Psychometrika, 77 (4), 615-633
25 / 89
Scoring rules vs decision functions
I Scoring rules describe what “counts” as a correct groupresponse
I Under control of the test designer10
I Decision functions describe the strategies adopted by a team
I Under control of the team
I Basic research strategyI Assume a certain scoring rule
I Consider plausible models for team strategies
I Test the models against data
10Maris & van der Maas (2012). Speed-accuracy response models: scoring rules based on response time and
accuracy. Psychometrika, 77 (4), 615-633
26 / 89
“Working together” in terms of scoring rules
I For binary items and pairs of responses, consider:
I The conjunctive rule
xijk =
{1 if xij = 1 and xik = 10 otherwise
I The disjunctive rule
xijk =
{0 if xij = 0 and xik = 01 otherwise
I More possibilities, especially for items with > 2 responses or groupswith > 2 collaborators
27 / 89
Defining successful pairwise collaboration
I The independence model
Eind[xijk | θj θk] = E[xij | θj ] E[xik | θk]
I Successful collaboration
E[xijk | θj θk] > Eind[xijk | θj θk]
I Unsuccessful collaboration
E[xijk | θj θk] < Eind[xijk | θj θk]
I Note: these definitions are item- and dyad- specific
28 / 89
Defining successful pairwise collaboration
I The independence model
Eind[xijk | θj θk] = E[xij | θj ] E[xik | θk]
I Successful collaboration
E[xijk | θj θk] > Eind[xijk | θj θk]
I Unsuccessful collaboration
E[xijk | θj θk] < Eind[xijk | θj θk]
I Note: these definitions are item- and dyad- specific
29 / 89
Some models for successful collaboration
I Minimum individual performance (disruptive team member)
Emin[xijk | θj θk] = min{E[xij | θj ], E[xik | θk]}
I Maximum individual performance (cheating / tutor)
Emax[xijk | θj θk] = max{E[xij | θj ], E[xik | θk]}
I “True collaboration”
E[xijk | θj θk] ≥ max{E[xij | θj ], E[xik | θk]}
30 / 89
Some models for successful collaboration
I Minimum individual performance (disruptive team member)
Emin[xijk | θj θk] = min{E[xij | θj ], E[xik | θk]}
I Maximum individual performance (cheating / tutor)
Emax[xijk | θj θk] = max{E[xij | θj ], E[xik | θk]}
I “True collaboration”
E[xijk | θj θk] ≥ max{E[xij | θj ], E[xik | θk]}
31 / 89
Some models for successful collaboration
I Minimum individual performance (disruptive team member)
Emin[xijk | θj θk] = min{E[xij | θj ], E[xik | θk]}
I Maximum individual performance (cheating / tutor)
Emax[xijk | θj θk] = max{E[xij | θj ], E[xik | θk]}
I “True collaboration”
E[xijk | θj θk] ≥ max{E[xij | θj ], E[xik | θk]}
32 / 89
A model for “true collaboration”
I An additive model
Eadd[xijk | θj θk] = E[xij | θj ] + E[xik | θk]− E[xijk | θj θk]
I Recalling E[xijk | θj θk] > E[xij | θj ]E[xik | θk], define anadditive independence (AI) model
EAI [xijk | θj θk] = E[xij | θj ] + E[xik | θk]− E[xij | θj ]E[xik | θk]
≥ Eadd[xijk | θj θk]
I AI is an upper bound on any “more interesting” additive model forsuccessful collaboration
33 / 89
A model for “true collaboration”
I An additive model
Eadd[xijk | θj θk] = E[xij | θj ] + E[xik | θk]− E[xijk | θj θk]
I Recalling E[xijk | θj θk] > E[xij | θj ]E[xik | θk], define anadditive independence (AI) model
EAI [xijk | θj θk] = E[xij | θj ] + E[xik | θk]− E[xij | θj ]E[xik | θk]
≥ Eadd[xijk | θj θk]
I AI is an upper bound on any “more interesting” additive model forsuccessful collaboration
34 / 89
More on AI model
I Can also be written as:
EAI [xijk | θj θk] = E[xij | θj ] (1− E[xik | θk])+ E[xik | θk] (1− E[xij | θj ])+ E[xij | θj ]E[xik | θk]
I Which has an interpretation in terms of three cases
35 / 89
More on AI model
I And is also equivalent to Lorge & Solomon’s Model A
EAI [xijk | θj θk] = 1− (1− E[xij | θj ])(1− E[xik | θk])
I Except the “probability an individual can solve the problem”now depends on both the individual and the problem
36 / 89
More on AI model11
I We probably want some constraints on what counts as a goodcollaborative IRF
I Easy to show that AI satisfies latent monotonicity, if theindividual IRFs do (trivial for other models also)
11Holland & Rosenbaum (1986). Conditional Association and Unidimensionality in Monotone Latent Variable
Models. The Annals of Statistics, 14 (4), 1523 – 1543
37 / 89
More on AI model11
I We probably want some constraints on what counts as a goodcollaborative IRF
I Easy to show that AI satisfies latent monotonicity, if theindividual IRFs do (trivial for other models also)
11Holland & Rosenbaum (1986). Conditional Association and Unidimensionality in Monotone Latent Variable
Models. The Annals of Statistics, 14 (4), 1523 – 1543
38 / 89
AI: latent monotonicity
Assumptions:
f(x) ≥ f(x′) for x > x′ and 0 ≤ g(y) ≤ 1
Show:
f(x) + g(y)− f(x) g(y) ≥ f(x′) + g(y)− f(x′) g(y)
Contradiction:
f(x) + g(y)− f(x) g(y) <f(x′) + g(y)− f(x′) g(y)
→ f(x)− f(x′) <g(y) (f(x)− f(x′))
39 / 89
AI: example IRF12
theta1
prob
12Using 2PL model for individual IRFs with α = 1 and β = 0
40 / 89
Models abound!
I Basic idea: write down IRFs for collaboration based onassumed-to-be-known individual abilities (and itemparameters)
I But how do we characterize empirical team performance?
41 / 89
Empirical team performance
I We have
I Observed collaborative responses xjk = (x1jk, x1jk, . . . , xmjk)
I A model for individual performance on the m (conventional)math items
I So we can get “team theta,” e.g.,
θjk = argmaxθ
{L0(xjk | θ)} (3)
I Where L0 is the likelihood of the model calibrated onindividual performance (reference model)
42 / 89
Empirical team performance
I We have
I Observed collaborative responses xjk = (x1jk, x1jk, . . . , xmjk)
I A model for individual performance on the m (conventional)math items
I So we can get “team theta,” e.g.,
θjk = argmaxθ
{L0(xjk | θ)} (3)
I Where L0 is the likelihood of the model calibrated onindividual performance (reference model)
43 / 89
Proposed method for testing models
I Testing of different models against reference model
Dmodel = −2 lnLmodel(xjk | θj θk)L0(xjk | θjk)
(4)
I Also a “direct test” of effect of collaboration for eachindividual
D0 = −2 lnL0(xjk | θj)L0(xjk | θjk)
(5)
with effect size δjk =θjk−θjσθ
44 / 89
Proposed method: reference distribution
I Ind and AI models are not nested with reference model → NoWilk’s theorem
I Can use Vuong’s 198913 results for LR with non-nestedmodels, but asymptotic in m
I Good news: we can bootstrap a null distribution for (4) and(5) pretty easily
13Likelihood ratio tests for model selection and non-nested hypotheses. Econometrika, 57(2), 307 – 333.
45 / 89
Bootstrapping the reference distribution
Assuming known item parameters and θj , θk. For r = 1, . . . , R
Step 1 Generate collaborative response patterns x(r)jk from
Emodel[xijk | θj θk]
Step 2 Compute Lmodel(x(r)jk | θj θk)
Step 2 Estimate θ(r)jk for each x
(r)jk ; save L0(x
(r)jk | θjk)
Step 4 Compute D(r)model or D
(r)0
46 / 89
Example 1
I Design
I Pool of pre-calibrated math items (grade 12 NAEP, modifiedto be numeric response)
I Individual “pre-test” → estimate individual abilities
I Collaborative “post-test” → evaluate models, estimate δjk
I Modality of collaboration: online chat
I Limitations:
I Small calibration sample; crowd workers
I Individual and collaborative forms were not counterbalanced(neither in order nor content)
47 / 89
NAEP grade 12 math items, deployed via OpenEdx
48 / 89
AMT crowdworkers (calibration sample)
Variable Levels n %∑
%
Gender Female 155 46.5 46.5Male 178 53.5 100.0
Age 18-30 117 35.2 35.230-40 129 38.9 74.140-55 71 21.4 95.555+ 15 4.5 100.0
Education Some Grade School 3 0.9 0.9High School Diploma 49 14.7 15.6Some College 118 35.4 51.0Bachelor’s Degree 132 39.6 90.7Master’s Degree 22 6.6 97.3Ph.D or Advanced Degree 9 2.7 100.0
Country United States 313 94.0 94.0India 16 4.8 98.8Canada 3 0.9 99.7United Kingdom 1 0.3 100.0
English First Lang Yes 321 96.4 96.4No 12 3.6 100.0
49 / 89
Deltas
-2
0
2
-2 0 2Individual Theta
Col
labo
rativ
e Th
eta
Collaborative vs Individual Performance
50 / 89
Model tests: Sanity check using individual pre-test
Figure reports P(Dmodel > |obs|) for individual pre-tests scored using conjunctivescoring rule
51 / 89
Model tests: Collaborative data
Figure reports P(Dmodel > |obs|) for collaborative tests scored using conjunctivescoring rule
52 / 89
Ind Model
-2
0
2
-2 0 2Individual Theta
Col
labo
rativ
e Th
eta
pairs3
4
12
15
16
35
Collaborative vs Individual Performance
53 / 89
Min Model
-2
0
2
-2 0 2Individual Theta
Col
labo
rativ
e Th
eta
pairs7
8
9
13
29
36
Collaborative vs Individual Performance
54 / 89
Max Model
-2
0
2
-2 0 2Individual Theta
Col
labo
rativ
e Th
eta
pairs2
5
10
14
17
19
22
24
25
26
27
28
30
38
44
45
Collaborative vs Individual Performance
55 / 89
AI Model
-2
0
2
-2 0 2Individual Theta
Col
labo
rativ
e Th
eta
pairs6
18
20
21
23
32
33
37
39
40
41
42
43
Collaborative vs Individual Performance
56 / 89
Not one of our four models
-2
0
2
-2 0 2Individual Theta
Col
labo
rativ
e Th
eta
pairs1
11
31
34
Collaborative vs Individual Performance
57 / 89
Summary of collaborative outcomes
I Can define, estimate, and test models of collaboration onacademic performance using IRT-based methods
I But how distinct are these models, really?
I Models do not cover all cases
58 / 89
Possible next step – one model to rule them all!
Let w1, w2 ∈ [0, 1] and define the weighted additive independencemodel
EWAI[Xijk | θj θk] = wjPi(θj)Qi(θk) + wkPi(θk)Qi(θj) + Pi(θj)Pi(θk)
I Includes original four and everything in between
I Includes (Pi(θj) + Pi(θk))/2 when w1 = w2 = .5
I Weights describe how well each individual obtains his/her “optimalcollaboration level”
59 / 89
Part 3: What are process data?14
I Any task-related actions of a respondent performed during thecompletion of a task
I In ed tech context, typically associated with time-stamped userlogs (“trace data”)
I All the stuff IRT ignores:
p(x | θ) =∏i
p(xi | θ)
14Halpin & von Davier 2013, Hao, & Lui (under review). Journal of Educational Measurement.
60 / 89
Part 3: What are process data?14
I Any task-related actions of a respondent performed during thecompletion of a task
I In ed tech context, typically associated with time-stamped userlogs (“trace data”)
I All the stuff IRT ignores:
p(x | θ) =∏i
p(xi | θ)
14Halpin & von Davier 2013, Hao, & Lui (under review). Journal of Educational Measurement.
61 / 89
Part 3: What are process data?14
I Any task-related actions of a respondent performed during thecompletion of a task
I In ed tech context, typically associated with time-stamped userlogs (“trace data”)
I All the stuff IRT ignores:
p(x | θ) =∏i
p(xi | θ)
14Halpin & von Davier 2013, Hao, & Lui (under review). Journal of Educational Measurement.
62 / 89
Part 3: What are collaborative process data?
I Ideally a richly detailed recording of the sequence of actionstaken by each team member during the completion of a task
I ATC21S collaborative problem solving prototype items15
I CPS frame16
I Focus today: chat messages sent between online collaborators
15http://www.atc21s.org/uploads/3/7/0/0/37007163/pd_module_3_nonadmin.pdf
16In alpha at Computational Psychometrics lab at ETS
63 / 89
Part 3: What are collaborative process data?
I Ideally a richly detailed recording of the sequence of actionstaken by each team member during the completion of a task
I ATC21S collaborative problem solving prototype items15
I CPS frame16
I Focus today: chat messages sent between online collaborators
15http://www.atc21s.org/uploads/3/7/0/0/37007163/pd_module_3_nonadmin.pdf
16In alpha at Computational Psychometrics lab at ETS
64 / 89
Two perspectives on the analysis of chat / email / etc.
I Text-based analysis of strategy and sentiment
I e.g., Howley, Mayfield, & Rose, 2013; Liu, Hao, von Davier,Kyllonen, & Zapata-Rivera, 2015
I Time series analysis of sending times
I e.g., Barabasi, 2005; Ebel, Mielsch, & Bornholdt, 2002; Halpin& De Boeck, 2013
65 / 89
Temporal point process: basic idea (more tomorrow)17
I Data: events that have negligible duration relative to a periodof observation
I Contrast events with states, regimes
I Basic idea: model the Bernoulli probability of an eventhappening in a small window of time [t, t+ ∆), conditional onthe events that have happened before t ∈ R+.
I “Instantaneous probability” of an event, denoted p(t)
17Daley, D. J., & Vera-Jones. (2003). An introduction to the theory of point processes: Elementary theory and
methods (2nd ed., Vol. 1). New York: Springer.
66 / 89
Temporal point process in interpersonal context
I Modeling p(t) to describe
I How the probability of each person’s actions changes incontinuous time
I How this depends on their previous actions
I Emergent or group-level phenomena like coordination,reciprocity, ...
67 / 89
Chat engagement via Hawkes processes
I Hawkes process provides a means of modeling instantaneousprobabilities in a multivariate context
I Halpin et al. (under review) suggest the response intensityparameter as a measure of engagement of student j with k
αjk > njk/nk (6)
I njk is the expected total number of responses made by studentj to student k is (inferred from model)
I nk is the number of actions of student k (observed)
I Lower bound is tight in practice; not necessary forcomputations
68 / 89
Chat engagement via Hawkes processes
I Aggregating to team (dyad) level
α ≡ α12n2 + α21n1n1 + n2
(7)
I Interpretation: the proportion of all group members’ actions,n1 + n2, that were responded to by any other member duringa collaboration
I See paper for more details, including initial results on SEs ofαjk
69 / 89
Example: Tetralogue
I A simulation-based science game with an embeddedassessment recently developed at ETS (Hao, Liu, von Davier,& Kyllonen, 2015)
1 Dyads work together to learn and make predictions aboutvolcano activity
2 At various points in the simulation, the students are asked toindividually submit their responses to an assessment itemwithout discussing the item
3 Following submission of responses from both students, they areinvited to discuss the question and their answers
4 Lastly, they are given an opportunity to revise their responsesto the item, with the final answers counting towards theteam’s score
70 / 89
Example: Full sample
I 286 dyads solicited via AMT and randomly paired (based onarrival in queue)
I Median reported age was 31.5 years
I 52.5% reported that they were female
I 79.2% reported that they were White.
I Additionally, all participants were required to
I Have an IP address located in the United States
I Self-identify as speaking English as their primary language
I Self-identify as having at least one year of college education
71 / 89
Example: Estimating chat engagement
0
5
10
15
0.0 0.2 0.4 0.6 0.8Alpha
count
Engagement Index
0.00
0.05
0.10
0.15
50 75 100 125Number of Chats of Partner
Sta
ndar
d E
rror
MethodHessian
Lower Bound
Standard Error Against Number of Chats
0.0
0.2
0.4
0.6
0.2 0.4 0.6Alpha
Alpha
Relation with Partner's Index
0
1
2
3
4
0.0 0.2 0.4 0.6Alpha
Diff
eren
ce in
Num
ber o
f Cha
ts
Relation with Number of Chats
Note: Alpha denotes the estimated response intensities from Equation 6. Hessian denotes standard errors obtainedvia the Hessian of the log-likelihood. See appendix of Halpin et al. for Lower Bound. Difference in Number ofChats was scaled using the log of the absolute value of the difference.
72 / 89
Example: Relation with revision on embedded assessment
0.25
0.30
0.35
0.40
Alpha Partner Alpha Team Alpha
Mea
n E
ngag
emen
t
No Revisions
Revisions
Measures of Chat Engagement vs Item Revisions
Note: Comparison of mean levels of engagement indices for individuals who either did or did not revise at least oneresponse after discussion with their partners. Alpha denotes the estimated response intensities from Equation 6;Partner’s Alpha denotes the partner’s response intensity; Team Alpha denotes the team-level index in Equation 7.For the latter, the data are reported for dyads, not individuals, and no revisions means that both individuals on theteam made no revisions. Error bars are 95% confidence intervals on the means.
73 / 89
Example: Relation with revision on embedded assessment
Table 1: Summary of group differences.
Index Group Mean SD N Hedges’ g r
Alpha No Revisions 0.31 0.13 82 –Alpha Revisions 0.36 0.10 66 0.40 .20Partner’s Alpha No Revisions 0.31 0.14 82 –Partner’s Alpha Revisions 0.37 0.14 66 0.44 .21Team Alpha No Revisions 0.27 0.11 26 –Team Alpha Revisions 0.37 0.13 48 0.84 .38
Note: Alpha denotes the estimated response intensities from Equation alpha2; Partner’s Alpha denotes theengagement index of the individual’s partner; Team Alpha denotes the team-level index in Equation 7. Hedges’ gused the correction factor described by Hedges (1981) and r denotes the point-biserial correlation.
74 / 89
Summary of collaborative processes
I Hawkes processes are a feasible model for process data obtained oncollaborative tasks
I Resulting measures of chat engagement are meaningfully related totask performance
I Future modeling work
I Random effects models for simultaneous estimation over multiple groups
I Inclusion of model parameters describing task characteristics
I Analytic expressions for standard errors of model parameters
I Methods for improving optimization with relatively small numbers ofevents
I Integration with text-based analyses (e.g., using marks / time-varyingcovariates)
75 / 89
What’s next
I Integration of task design, outcomes, processes, ... and theory!!
76 / 89
Contact: [email protected]
Support: This research was funded by a postdoctoral fellowship from theSpencer Foundation and an Education Technology grant from NYU Steinhardt.
77 / 89
References not already included in footnotes
Aronson, E., Blaney, N., Stephan, C., Sikes, J., & Snapp, M. (1978). The jigsaw classroom. Beverly Hills, CA:Sage.
Burrus, J., Carlson, J., Bridgeman, B., Golub-smith, M., & Greenwood, R. (2013). Identifying the Most Important21st Century Workforce Competencies : An Analysis of the Occupational Information Network ( O * NET ) (ETSRR-13-21). Princeton, NJ.
Cohen, E. G., Lotan, R. A., Scarloss, B. A., & Arellano, A. R. (1999). Complex instruction: Equity in cooperativelearning classrooms. Theory Into Practice, 38, 80-86.
Davey, T., Ferrara, S., Holland, P. W., Shavelson, R. J., Webb, N. M., & Wise, L. L. (2015). PsychometricConsiderations for the Next Generation of Performance Assessment. Princeton, NJ.
Deming, D. J. (2015). The Growing Importance of Social Skills in the Labor Market. National Bureau of EconomicResearch Working Paper Series, (21473).
Griffin, P., & Care, E. (2015). Assessment and teaching of 21st century skills: Methods and approach. New York:Springer.
Hmelo-Silver, C. E., Chinn, C. A., Chan, C. K., & O?Donnel, A. M. (2013). International handbook ofcollaborative learning. New York: Taylor and Francis.
McGrath, J. E. (1984). Groups: Interaction and performance. (Prentice-Hall, Ed.). Englewood Cliffs, NJ.
Organisation for Economic Co-operation and Development. (2013). PISA 2015 Draft Collaborative ProblemSolving Framework. Retrieved fromhttp://www.oecd.org/pisa/pisaproducts/DraftPISA2015CollaborativeProblemSolvingFramework.pdf
Webb, N. M. (1995). Group Collaboration in Assessment: Multiple Objectives, Processes, and Outcomes.Educational Evaluation and Policy Analysis, 17(2), 239-261.
78 / 89
Bootstrapping the reference distribution for Dmodel
Assuming known item parameters and θj , θk. For r = 1, . . . , R
Step 1 Generate collaborative response patterns x(r)jk from
Emodel[xijk | θj θk]
Step 2 Compute Lmodel(x(r)jk | θj θk)
Step 2 Estimate θ(r)jk for each x
(r)jk ; save L0(x
(r)jk | θjk)
Step 4 Compute D(r)model or D
(r)0
79 / 89
Instructions
80 / 89
Jigsaw / information sharing items
81 / 89
Jigsaw / information sharing items
82 / 89
Jigsaw / information sharing items
83 / 89
Jigsaw / information sharing items
84 / 89
Hints / information requesting items
85 / 89
Hints / information requesting items
86 / 89
Hints / information requesting items
87 / 89
Multiple answer / negotiation items
88 / 89