Upload
yun-huang
View
24
Download
0
Embed Size (px)
Citation preview
The Problem Solving Genome: Analyzing Sequen1al
Pa3erns of Student Work with Parameterized Exercises
Julio Guerra Shaghayegh Sahebi Peter Brusilovsky
Yu-‐Ru Lin
Outline
• Mo1va1on: parameterized exercises repe11ons
• Dataset • Labeling and mining pa3erns • The Problem Solving Genome • Exploring the Genome: stability, effect of complexity, across groups of students
• Conclusions
Outline
• Mo1va1on: parameterized exercises repe11ons
• Dataset • Labeling and mining pa3erns • The Problem Solving Genome • Exploring the Genome: stability, effect of complexity, across groups of students
• Conclusions
Mo#va#on: parameterized exercises repe11ons
Some numbers change each 1me the exercise is loaded
Hard to cheat
Mo#va#on: parameterized exercises repe11ons
FAIL -‐> FAIL -‐> CORRECT -‐> CORRECT 0 0 1 1
BUT There some strange ones: 10000000 101100 00101111 1101101110
Most of the sequences are of the types 1 01 11 011
Students tend to repeat exercises
We call this a sequence (ordered a3empts of the same student on the same exercise in a session)
Mo#va#on: parameterized exercises repe11ons
– Are pa3erns of repe11on due to internal (personal) or external factors?
– Which pa3erns are helpful or harmful for the Learning Experience?
Is the student learning, playing the system or having trouble?
0011 1 111
What does the sequence tell us about the Learning Experience?
Outline
• Mo1va1on: parameterized exercises repe11ons
• Dataset • Labeling and mining pa3erns • The Problem Solving Genome • Exploring the Genome: stability, effect of complexity, across groups of students
• Conclusions
Dataset Exercises • 101 parameterized exercises • 19 topics • Exercises labeled as easy (41), medium (41) or hard (19)
complexity Students • 3 terms, a total of 101 students • 21,215 a3empts, 14,726 correct and 6,489 incorrect • We formed sequences of repe11ons of the student in the
same exercise in the same session within the system • We collect 1me in each a3empt • Pretest, pos3est (not all the students)
Dataset • Time in first a:empt is always longer (the student has to understand the exercise)
First a3empts
Next a3empts
Outline
• Mo1va1on: parameterized exercises repe11ons
• Dataset • Labeling and mining pa3erns • The Problem Solving Genome • Exploring the Genome: stability, effect of complexity, across groups of students
• Conclusions
Labeling a3empts Correctness: Success (S) or Failure (F) Time: Short (lowercase) or Long (uppercase) – Using median of the distribu1on of 1me per exercise – Using different distribu1ons for first aGempt
label correctness 1me s success short S success long f failure short F failure long
Labeled sequences • First and last a3empt are labeled differently. Here we used underscore ‘_’
• Example sequences:
_fS_ _fFs_ _ss_
This labeled representa1on is for making sequences and pa3erns more readable. The actual labeling used for running the pa3ern mining algorithm uses only uppercase le3ers and different sets of le3ers for first and last a3empts within sequences. details
Pa3ern mining
• Using PexSPAM algorithm with gap = 0 • Each possible pa3ern of length 2 or higher is explored
• Support of a pa3ern: propor1on of sequences containing the pa3ern (at least once) – Does not count mul1ple occurrences of the pa3ern within a sequence
• Select all pa3erns with minimum support of 1%
Outline
• Mo1va1on: parameterized exercises repe11ons
• Dataset • Labeling and mining pa3erns • The Problem Solving Genome • Exploring the Genome: stability, effect of complexity, across groups of students
• Conclusions
The Problem Solving Genome
• Frequencies on the 102 pa3erns (vector of size 102) by student – Each common pa3ern is a gene
• The vector represents how frequent a student does each of the pa3erns
• Normalize to compare students (pa3erns might occur mul1ple 1mes in a sequence)
Problem Solving Genome
_fSss_ _fSS_ _FFss_ _FSss_ _fSs_ Frequencies of each of the 102
common pa3erns
3/5
ss_ ss Ss SS_ _FS_ 0/5 2/5 1/5 0/5 …
The Problem Solving Genome
• Flexible: – Par1al views: by complexity, by periods of 1me, by exercise…
– Consider only some genes: first 30 common pa3erns
• Similarity between students can be computed using similarity, distance or divergence measures
• Considera#on: enough sequences!
Outline
• Mo1va1on: parameterized exercises repe11ons
• Dataset • Labeling and mining pa3erns • The Problem Solving Genome • Exploring the Genome: stability, effect of complexity, across groups of students
• Conclusions
Exploring the Genome
• Stability – Are the pa3erns stable on a student?
• Effect of complexity – Are the pa3erns different across complexity levels?
• Pa3erns of success – Are successful students following different pa3erns?
Exploring the Genome
• Dataset: for further analyses, we select data from students who: – Have pretest and pos3est (learning gain) – Have at least 20 sequences and 2 sessions (limit frequency biases due to low usage)
• Total of 67 students
Genome Stability • Is the student more similar to him/herself than to others? – Select students with at least 60 sequences (32 students) – For each student:
• Split sequences per student in two random sets (set 1, set 2) • Form Genome of each set
– Compute Jensen-‐Shannon (JS) divergence between: • The the genome of the 2 sets of each student (self-‐distance) • Student’s set 1 genome and set 1 of other students (average) (other-‐distance)
• Are students changing paGerns over #me? – Repeat the procedure splimng sets in early (first half) and late (second half) sequences per student
Results (1)
Self-‐distances Other-‐distances Sig. Cohen’s d
M SE M SE
Randomly split Genome (a) .2370 .0169 .4815 .0141 <.001 2.693
Early/Late Genome (b) .3211 .0214 .4997 .0164 <.001 1.205
Paired-‐sample t-‐test
• Even when changing from early to late sequences, student self distance is significantly smaller than the distance to others
Genome is stable on individuals
Effect of complexity • Complexity may influence paGerns – Repea1ng distance procedure using genome per exercise
– Considering only easy and hard exercises to extreme the differences
Distances within and between easy and hard exercises
Complexity influences paGerns
Effect of complexity
• Repeat distance procedure on students and control for complexity: – Randomly split sets – Only within easy exercises – 39 students with at least 20 sequences in easy exercises
Results (2)
Self-‐distances Other-‐distances Sig. Cohen’s d
M SE M SE
Randomly split genome in easy exercises (c)
.3736 .0214 .6065 .0128 <.001 1.657
A student changes pa3erns from easy to hard exercises, but s1ll is consistent to herself
Performance Groups
• Groups students by Pretest, Pos1est and Learning Gain (low, medium, high)
• Contrast genome distances within and between low and high groups
Number of students in each predefined performance group
Results (3) Are performance groups behaving differently?
• Pretest: – Low students behave more similar (within)
– High students behave more heterogeneously (within)
– Low behave differently than high (between)
• No other differences!
Performance Groups and Genome
• Overall, high students don’t behave differently than low students (no differences grouping by pos3est or LG)
• But, different students may use different strategies (genome) to achieve learning – Group students by genome differences first
Clustering by Genome
• Cluster students by their genomes and analyze different pa3erns – Between clusters – Between low and high students within each cluster
• Spectral Clustering with k = 2 – Larger eigen-‐gap with k = 2
Results (4)
• Cluster 1: confirmers (repeat short successes) • Cluster 2: non-‐confirmers
Ordering pa3erns by difference magnitude (cluster 2 – cluster 1)
(same ordering than before)
confirmers
non-‐confirmers
Short failures (f) in low students
Struggle more Move on without prac1cing more
Results (5) • Successful pa3erns in each cluster are closer to the other cluster – Successful confirmers tend to stop aper long success
– Successful non-‐confirmers (c 2) tend to con1nue aper hard success
• Extreme different pa3erns between clusters are “harmful”
_FS_
Outline
• Mo1va1on: parameterized exercises repe11ons
• Dataset • Labeling and mining pa3erns • The Problem Solving Genome • Exploring the Genome: stability, effect of complexity, across groups of students
• Conclusions
Conclusions
• Problem Solving Genome is stable at the individual level
• Overall, different behavior pa3erns do not differen1ate high and low students
• Successful/harmful pa3erns emerge aper clustering students by their genome
• Successful pa3erns make clusters closer • Generaliza#on is not clear: data is about repe11ons in parameterized exercises
Labeling for PexSPAM
start middle end short success G C G long success A S A short failure W V W long failure E F E
<-‐ Back to labeling sequences