Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
An Exploration of Data-Driven Hint Generation
in an Open-Ended Programming Problem
Thomas Price Tiffany Barnes
North Carolina State University
Workshop on Graph-based EDM 2015
Thomas Price, Tiffany Barnes (NCSU) Hints in Open-Ended Problems G-EDM 2015 1 / 19
Introduction
Introduction
Data-driven hint generation
The process of extracting contextualized hints from previousstudents’ solutions to a problem
Avoids the need for an expensive, hand-authored expert model
Primary example: the Hint Factory, has been applied in a varietyof domains
Logical proofs (Stamper et al. 2013)
Linked list problems (Fossati, Eugenio, and Ohlsson 2009)
A programming game (Hicks, Peddycord III, and Barnes 2014)
How can we generate hints in complex, open-endedprogramming problems?
Thomas Price, Tiffany Barnes (NCSU) Hints in Open-Ended Problems G-EDM 2015 2 / 19
Introduction
Hint Generation
Our input is log data from previous students solving a problem, whichcan be represented as an interaction network (Eagle and Barnes 2015)
Thomas Price, Tiffany Barnes (NCSU) Hints in Open-Ended Problems G-EDM 2015 3 / 19
Introduction
Hints for Open-Ended Programming
Some approaches have successfully generated hints forprogramming problems (e.g. Jin, Barnes, and Stamper 2012; Rivers and
Koedinger 2014)
These are most successful on smaller, well-constrainedprogramming problems, with a clear solution
We are interested in the opposite type of problem:
Large state spaceMultiple, loosely ordered subgoalsUnstructured outputA creative, design task
Many novice programming activities have these traits, such asmaking games and apps
Thomas Price, Tiffany Barnes (NCSU) Hints in Open-Ended Problems G-EDM 2015 4 / 19
Current Approaches
Challenges
How do we represent a student’s state in a programming problem,and when should two states be connected?
Naive state representation: states correspond to snapshots ofstudents’ code
Problem: It is unlikely any two students will have the exactsame state
Simple connection rule: connect states that past students havetraversed
Problem: May lead to a very sparse network
Thomas Price, Tiffany Barnes (NCSU) Hints in Open-Ended Problems G-EDM 2015 5 / 19
Current Approaches
Current Approaches
This challenge has been addressed previously in three ways:
Canonicalization: Remove semantically unimportantinformation from student code to increase state overlap(e.g. Lazar and Bratko 2014; Rivers and Koedinger 2012)
Connecting States: Connect similar, existing states in thenetwork with synthetic actions (Rivers and Koedinger 2013)
Or add whole paths between states, including synthetic states(Rivers and Koedinger 2014)
Alternate State Definitions: Choose a non-code staterepresentation, such as the output of the student’s code(Hicks, Peddycord III, and Barnes 2014)
Thomas Price, Tiffany Barnes (NCSU) Hints in Open-Ended Problems G-EDM 2015 6 / 19
Data Collection
An Open-Ended Problem
We wanted to investigate the applicability of current approaches toan open-ended programming problem
Thomas Price, Tiffany Barnes (NCSU) Hints in Open-Ended Problems G-EDM 2015 7 / 19
Data Collection
An Open-Ended Problem
We wanted to investigate the applicability of current approaches toan open-ended programming problem
Thomas Price, Tiffany Barnes (NCSU) Hints in Open-Ended Problems G-EDM 2015 7 / 19
Data Collection
Data Collection
Collected data at a middle school STEM outreach programcalled SPARCS (Catete, Wassell, and Barnes 2014)
17 6th grade students (12 male; 5 female)
Students had 45 minutes to work on the activity
Instructor help was provided only upon request
Thomas Price, Tiffany Barnes (NCSU) Hints in Open-Ended Problems G-EDM 2015 8 / 19
Analysis
Canonicalization
Students’ programs were represented as Abstract Syntax Trees(ASTs). We analyzed three levels of state canonicalization:
Raw: No canonicalization; states represent exact code
Basic: Removed variable names and literal values
Rivers and Koedinger suggest a number of other measures, butthese were much less applicable to our data
Ordered: Also recursively sort all child nodes in the AST
Effective serves as an upper bound for removing unimportantordering information in code
Thomas Price, Tiffany Barnes (NCSU) Hints in Open-Ended Problems G-EDM 2015 9 / 19
Analysis
Canonicalization - Results
Raw Basic OrderedTotal States 2380 1781 1656Percent Unique 97.5% 94.8% 92.8%Mean Non-Unique Freq. 3.44 3.95 2.82Median Non-Unique Freq. 2 2 2Mean % Path Unique 89.9% 83.0% 78.9%Standard Deviation (6.67) (10.5) (13.3)
For comparison: In (Rivers and Koedinger 2012), 300 out of 500 finalsolutions to a basic problem were identical after canonicalization.
Thomas Price, Tiffany Barnes (NCSU) Hints in Open-Ended Problems G-EDM 2015 10 / 19
Analysis
Visualizing Distances
Constructed and plotteddistance matrices
Used Tree Edit Distanceas a distance metric
Lighter shades representsmaller distances
Min-distance “path”through the matrix shownin green/yellow
Red crosses indicate wheresubgoals were completed
Thomas Price, Tiffany Barnes (NCSU) Hints in Open-Ended Problems G-EDM 2015 11 / 19
Analysis
Quantifying Distances
Mean Max Farthest1 0.25 (0.27) 0.76 (0.56) 2.23 (0.75)2 4.88 (3.93) 9.18 (5.74) 12.73 (6.10)4 4.92 (2.77) 10.11 (3.69) 14.67 (4.77)5 7.79 (1.32) 13.17 (1.72) 18.17 (1.72)6 7.49 (1.11) 13.17 (0.98) 18.67 (1.75)
How close are the closest students?
Defined the distance between two students as the mean or maxdistance they get from one another when solving an objective
For each student, for each objective solved, find the pairedstudent with minimum distance
Thomas Price, Tiffany Barnes (NCSU) Hints in Open-Ended Problems G-EDM 2015 12 / 19
Discussion
Conclusions
Canonicalization
The strongest canonicalization reduced the number of states by30.4%
Over 90% of states in the network were only reached by onestudent
Important but insufficient by itself
Connecting States
Students maintain proximity when pursuing the same objective,on average within 8 tree edits, but slowly diverge
This may not be close enough to connect states
How do we take advantage of similar but distinct states?
Thomas Price, Tiffany Barnes (NCSU) Hints in Open-Ended Problems G-EDM 2015 13 / 19
Discussion
Limitations
This was intentionally exploratory, using only 17 students
Many data-driven techniques use hundreds, though the HintFactory has been historically successful with much less data
The open-ended programming assignment was very complexcompared to those used in previous work
It is difficulty to say where these results should be generalized
Thomas Price, Tiffany Barnes (NCSU) Hints in Open-Ended Problems G-EDM 2015 14 / 19
Discussion
Future Work
Can we break problems down into sub-problems, where moreoverlap is likely?
Are there more appropriate distance metrics we should be using
How can we use output-based state representations to apps orgames with non-deterministic results
Thomas Price, Tiffany Barnes (NCSU) Hints in Open-Ended Problems G-EDM 2015 15 / 19
Discussion
Thank You!
Questions? Comments?
Thomas Price, Tiffany Barnes (NCSU) Hints in Open-Ended Problems G-EDM 2015 16 / 19
References
References I
Catete, V, K Wassell, and T Barnes (2014). “Use and developmentof entertainment technologies in after school STEM program”. In:Proceedings of the 45th ACM technical symposium on Computerscience education, pp. 163–168.
Eagle, Michael and Tiffany Barnes (2015). “Exploring Networks ofProblem-Solving Interactions”. In: Proceedings of the FifthInternational Conference on Learning Analytics And Knowledge(LAK), pp. 21–30.
Fossati, D, B Di Eugenio, and S Ohlsson (2009). “I learn from you,you learn from me: How to make iList learn from students.” In:Artificial Intelligence in Education (AIED).
Thomas Price, Tiffany Barnes (NCSU) Hints in Open-Ended Problems G-EDM 2015 17 / 19
References
References II
Hicks, A, B Peddycord III, and T Barnes (2014). “Building Games toLearn from Their Players: Generating Hints in a Serious Game”.In: Intelligent Tutoring Systems (ITS), pp. 312–317.
Jin, W, T Barnes, and J Stamper (2012). “Program representationfor automatic hint generation for a data-driven noviceprogramming tutor”. In: Intelligent Tutoring Systems (ITS).
Lazar, T and I Bratko (2014). “Data-Driven Program Synthesis forHint Generation in Programming Tutors”. In: Intelligent TutoringSystems (ITS). Springer.
Rivers, K and KR Koedinger (2012). “A canonicalizing model forbuilding programming tutors”. In: Intelligent Tutoring Systems(ITS).
Thomas Price, Tiffany Barnes (NCSU) Hints in Open-Ended Problems G-EDM 2015 18 / 19
References
References III
Rivers, K and KR Koedinger (2013). “Automatic generation ofprogramming feedback: A data-driven approach”. In: The FirstWorkshop on AI-supported Education for Computer Science(AIEDCS 2013).
– (2014). “Automating Hint Generation with Solution Space PathConstruction”. In: Intelligent Tutoring Systems (ITS),pp. 329–339.
Stamper, JC et al. (2013). “Experimental evaluation of automatichint generation for a logic tutor”. In: Artificial Intelligence inEducation (AIED) 22.1, pp. 3–17.
Thomas Price, Tiffany Barnes (NCSU) Hints in Open-Ended Problems G-EDM 2015 19 / 19