20
An Exploration of Data-Driven Hint Generation in an Open-Ended Programming Problem Thomas Price Tiffany Barnes North Carolina State University Workshop on Graph-based EDM 2015 Thomas Price, Tiffany Barnes (NCSU) Hints in Open-Ended Problems G-EDM 2015 1 / 19

An Exploration of Data-Driven Hint Generation in an Open-Ended ... 2015a Slide… · An Exploration of Data-Driven Hint Generation in an Open-Ended Programming Problem Thomas Price

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: An Exploration of Data-Driven Hint Generation in an Open-Ended ... 2015a Slide… · An Exploration of Data-Driven Hint Generation in an Open-Ended Programming Problem Thomas Price

An Exploration of Data-Driven Hint Generation

in an Open-Ended Programming Problem

Thomas Price Tiffany Barnes

North Carolina State University

Workshop on Graph-based EDM 2015

Thomas Price, Tiffany Barnes (NCSU) Hints in Open-Ended Problems G-EDM 2015 1 / 19

Page 2: An Exploration of Data-Driven Hint Generation in an Open-Ended ... 2015a Slide… · An Exploration of Data-Driven Hint Generation in an Open-Ended Programming Problem Thomas Price

Introduction

Introduction

Data-driven hint generation

The process of extracting contextualized hints from previousstudents’ solutions to a problem

Avoids the need for an expensive, hand-authored expert model

Primary example: the Hint Factory, has been applied in a varietyof domains

Logical proofs (Stamper et al. 2013)

Linked list problems (Fossati, Eugenio, and Ohlsson 2009)

A programming game (Hicks, Peddycord III, and Barnes 2014)

How can we generate hints in complex, open-endedprogramming problems?

Thomas Price, Tiffany Barnes (NCSU) Hints in Open-Ended Problems G-EDM 2015 2 / 19

Page 3: An Exploration of Data-Driven Hint Generation in an Open-Ended ... 2015a Slide… · An Exploration of Data-Driven Hint Generation in an Open-Ended Programming Problem Thomas Price

Introduction

Hint Generation

Our input is log data from previous students solving a problem, whichcan be represented as an interaction network (Eagle and Barnes 2015)

Thomas Price, Tiffany Barnes (NCSU) Hints in Open-Ended Problems G-EDM 2015 3 / 19

Page 4: An Exploration of Data-Driven Hint Generation in an Open-Ended ... 2015a Slide… · An Exploration of Data-Driven Hint Generation in an Open-Ended Programming Problem Thomas Price

Introduction

Hints for Open-Ended Programming

Some approaches have successfully generated hints forprogramming problems (e.g. Jin, Barnes, and Stamper 2012; Rivers and

Koedinger 2014)

These are most successful on smaller, well-constrainedprogramming problems, with a clear solution

We are interested in the opposite type of problem:

Large state spaceMultiple, loosely ordered subgoalsUnstructured outputA creative, design task

Many novice programming activities have these traits, such asmaking games and apps

Thomas Price, Tiffany Barnes (NCSU) Hints in Open-Ended Problems G-EDM 2015 4 / 19

Page 5: An Exploration of Data-Driven Hint Generation in an Open-Ended ... 2015a Slide… · An Exploration of Data-Driven Hint Generation in an Open-Ended Programming Problem Thomas Price

Current Approaches

Challenges

How do we represent a student’s state in a programming problem,and when should two states be connected?

Naive state representation: states correspond to snapshots ofstudents’ code

Problem: It is unlikely any two students will have the exactsame state

Simple connection rule: connect states that past students havetraversed

Problem: May lead to a very sparse network

Thomas Price, Tiffany Barnes (NCSU) Hints in Open-Ended Problems G-EDM 2015 5 / 19

Page 6: An Exploration of Data-Driven Hint Generation in an Open-Ended ... 2015a Slide… · An Exploration of Data-Driven Hint Generation in an Open-Ended Programming Problem Thomas Price

Current Approaches

Current Approaches

This challenge has been addressed previously in three ways:

Canonicalization: Remove semantically unimportantinformation from student code to increase state overlap(e.g. Lazar and Bratko 2014; Rivers and Koedinger 2012)

Connecting States: Connect similar, existing states in thenetwork with synthetic actions (Rivers and Koedinger 2013)

Or add whole paths between states, including synthetic states(Rivers and Koedinger 2014)

Alternate State Definitions: Choose a non-code staterepresentation, such as the output of the student’s code(Hicks, Peddycord III, and Barnes 2014)

Thomas Price, Tiffany Barnes (NCSU) Hints in Open-Ended Problems G-EDM 2015 6 / 19

Page 7: An Exploration of Data-Driven Hint Generation in an Open-Ended ... 2015a Slide… · An Exploration of Data-Driven Hint Generation in an Open-Ended Programming Problem Thomas Price

Data Collection

An Open-Ended Problem

We wanted to investigate the applicability of current approaches toan open-ended programming problem

Thomas Price, Tiffany Barnes (NCSU) Hints in Open-Ended Problems G-EDM 2015 7 / 19

Page 8: An Exploration of Data-Driven Hint Generation in an Open-Ended ... 2015a Slide… · An Exploration of Data-Driven Hint Generation in an Open-Ended Programming Problem Thomas Price

Data Collection

An Open-Ended Problem

We wanted to investigate the applicability of current approaches toan open-ended programming problem

Thomas Price, Tiffany Barnes (NCSU) Hints in Open-Ended Problems G-EDM 2015 7 / 19

Page 9: An Exploration of Data-Driven Hint Generation in an Open-Ended ... 2015a Slide… · An Exploration of Data-Driven Hint Generation in an Open-Ended Programming Problem Thomas Price

Data Collection

Data Collection

Collected data at a middle school STEM outreach programcalled SPARCS (Catete, Wassell, and Barnes 2014)

17 6th grade students (12 male; 5 female)

Students had 45 minutes to work on the activity

Instructor help was provided only upon request

Thomas Price, Tiffany Barnes (NCSU) Hints in Open-Ended Problems G-EDM 2015 8 / 19

Page 10: An Exploration of Data-Driven Hint Generation in an Open-Ended ... 2015a Slide… · An Exploration of Data-Driven Hint Generation in an Open-Ended Programming Problem Thomas Price

Analysis

Canonicalization

Students’ programs were represented as Abstract Syntax Trees(ASTs). We analyzed three levels of state canonicalization:

Raw: No canonicalization; states represent exact code

Basic: Removed variable names and literal values

Rivers and Koedinger suggest a number of other measures, butthese were much less applicable to our data

Ordered: Also recursively sort all child nodes in the AST

Effective serves as an upper bound for removing unimportantordering information in code

Thomas Price, Tiffany Barnes (NCSU) Hints in Open-Ended Problems G-EDM 2015 9 / 19

Page 11: An Exploration of Data-Driven Hint Generation in an Open-Ended ... 2015a Slide… · An Exploration of Data-Driven Hint Generation in an Open-Ended Programming Problem Thomas Price

Analysis

Canonicalization - Results

Raw Basic OrderedTotal States 2380 1781 1656Percent Unique 97.5% 94.8% 92.8%Mean Non-Unique Freq. 3.44 3.95 2.82Median Non-Unique Freq. 2 2 2Mean % Path Unique 89.9% 83.0% 78.9%Standard Deviation (6.67) (10.5) (13.3)

For comparison: In (Rivers and Koedinger 2012), 300 out of 500 finalsolutions to a basic problem were identical after canonicalization.

Thomas Price, Tiffany Barnes (NCSU) Hints in Open-Ended Problems G-EDM 2015 10 / 19

Page 12: An Exploration of Data-Driven Hint Generation in an Open-Ended ... 2015a Slide… · An Exploration of Data-Driven Hint Generation in an Open-Ended Programming Problem Thomas Price

Analysis

Visualizing Distances

Constructed and plotteddistance matrices

Used Tree Edit Distanceas a distance metric

Lighter shades representsmaller distances

Min-distance “path”through the matrix shownin green/yellow

Red crosses indicate wheresubgoals were completed

Thomas Price, Tiffany Barnes (NCSU) Hints in Open-Ended Problems G-EDM 2015 11 / 19

Page 13: An Exploration of Data-Driven Hint Generation in an Open-Ended ... 2015a Slide… · An Exploration of Data-Driven Hint Generation in an Open-Ended Programming Problem Thomas Price

Analysis

Quantifying Distances

Mean Max Farthest1 0.25 (0.27) 0.76 (0.56) 2.23 (0.75)2 4.88 (3.93) 9.18 (5.74) 12.73 (6.10)4 4.92 (2.77) 10.11 (3.69) 14.67 (4.77)5 7.79 (1.32) 13.17 (1.72) 18.17 (1.72)6 7.49 (1.11) 13.17 (0.98) 18.67 (1.75)

How close are the closest students?

Defined the distance between two students as the mean or maxdistance they get from one another when solving an objective

For each student, for each objective solved, find the pairedstudent with minimum distance

Thomas Price, Tiffany Barnes (NCSU) Hints in Open-Ended Problems G-EDM 2015 12 / 19

Page 14: An Exploration of Data-Driven Hint Generation in an Open-Ended ... 2015a Slide… · An Exploration of Data-Driven Hint Generation in an Open-Ended Programming Problem Thomas Price

Discussion

Conclusions

Canonicalization

The strongest canonicalization reduced the number of states by30.4%

Over 90% of states in the network were only reached by onestudent

Important but insufficient by itself

Connecting States

Students maintain proximity when pursuing the same objective,on average within 8 tree edits, but slowly diverge

This may not be close enough to connect states

How do we take advantage of similar but distinct states?

Thomas Price, Tiffany Barnes (NCSU) Hints in Open-Ended Problems G-EDM 2015 13 / 19

Page 15: An Exploration of Data-Driven Hint Generation in an Open-Ended ... 2015a Slide… · An Exploration of Data-Driven Hint Generation in an Open-Ended Programming Problem Thomas Price

Discussion

Limitations

This was intentionally exploratory, using only 17 students

Many data-driven techniques use hundreds, though the HintFactory has been historically successful with much less data

The open-ended programming assignment was very complexcompared to those used in previous work

It is difficulty to say where these results should be generalized

Thomas Price, Tiffany Barnes (NCSU) Hints in Open-Ended Problems G-EDM 2015 14 / 19

Page 16: An Exploration of Data-Driven Hint Generation in an Open-Ended ... 2015a Slide… · An Exploration of Data-Driven Hint Generation in an Open-Ended Programming Problem Thomas Price

Discussion

Future Work

Can we break problems down into sub-problems, where moreoverlap is likely?

Are there more appropriate distance metrics we should be using

How can we use output-based state representations to apps orgames with non-deterministic results

Thomas Price, Tiffany Barnes (NCSU) Hints in Open-Ended Problems G-EDM 2015 15 / 19

Page 17: An Exploration of Data-Driven Hint Generation in an Open-Ended ... 2015a Slide… · An Exploration of Data-Driven Hint Generation in an Open-Ended Programming Problem Thomas Price

Discussion

Thank You!

Questions? Comments?

Thomas Price, Tiffany Barnes (NCSU) Hints in Open-Ended Problems G-EDM 2015 16 / 19

Page 18: An Exploration of Data-Driven Hint Generation in an Open-Ended ... 2015a Slide… · An Exploration of Data-Driven Hint Generation in an Open-Ended Programming Problem Thomas Price

References

References I

Catete, V, K Wassell, and T Barnes (2014). “Use and developmentof entertainment technologies in after school STEM program”. In:Proceedings of the 45th ACM technical symposium on Computerscience education, pp. 163–168.

Eagle, Michael and Tiffany Barnes (2015). “Exploring Networks ofProblem-Solving Interactions”. In: Proceedings of the FifthInternational Conference on Learning Analytics And Knowledge(LAK), pp. 21–30.

Fossati, D, B Di Eugenio, and S Ohlsson (2009). “I learn from you,you learn from me: How to make iList learn from students.” In:Artificial Intelligence in Education (AIED).

Thomas Price, Tiffany Barnes (NCSU) Hints in Open-Ended Problems G-EDM 2015 17 / 19

Page 19: An Exploration of Data-Driven Hint Generation in an Open-Ended ... 2015a Slide… · An Exploration of Data-Driven Hint Generation in an Open-Ended Programming Problem Thomas Price

References

References II

Hicks, A, B Peddycord III, and T Barnes (2014). “Building Games toLearn from Their Players: Generating Hints in a Serious Game”.In: Intelligent Tutoring Systems (ITS), pp. 312–317.

Jin, W, T Barnes, and J Stamper (2012). “Program representationfor automatic hint generation for a data-driven noviceprogramming tutor”. In: Intelligent Tutoring Systems (ITS).

Lazar, T and I Bratko (2014). “Data-Driven Program Synthesis forHint Generation in Programming Tutors”. In: Intelligent TutoringSystems (ITS). Springer.

Rivers, K and KR Koedinger (2012). “A canonicalizing model forbuilding programming tutors”. In: Intelligent Tutoring Systems(ITS).

Thomas Price, Tiffany Barnes (NCSU) Hints in Open-Ended Problems G-EDM 2015 18 / 19

Page 20: An Exploration of Data-Driven Hint Generation in an Open-Ended ... 2015a Slide… · An Exploration of Data-Driven Hint Generation in an Open-Ended Programming Problem Thomas Price

References

References III

Rivers, K and KR Koedinger (2013). “Automatic generation ofprogramming feedback: A data-driven approach”. In: The FirstWorkshop on AI-supported Education for Computer Science(AIEDCS 2013).

– (2014). “Automating Hint Generation with Solution Space PathConstruction”. In: Intelligent Tutoring Systems (ITS),pp. 329–339.

Stamper, JC et al. (2013). “Experimental evaluation of automatichint generation for a logic tutor”. In: Artificial Intelligence inEducation (AIED) 22.1, pp. 3–17.

Thomas Price, Tiffany Barnes (NCSU) Hints in Open-Ended Problems G-EDM 2015 19 / 19