Upload
beverly-park-woolf
View
10
Download
2
Embed Size (px)
Citation preview
Challenges in Data-Driven Education
Beverly Park Woolf, Ivon Arroyo, Neil Heffernan, Ryan BakerUniversity of Massachusetts - Amherst
Worcester Polytechnic InstituteUniversity of Pennsylvania - Philadelphia
Supported byNational Science Foundation #1636847
The School of Athens, fresco by Raphael (1509 -1511) in the Vatican. Plato (left) and Aristotle (right) hold bound copies of
their books.
One Goal: Provide millions of schoolchildren with access to the personal services of a tutor as well informed as
Plato or Aristotle.
Model the Student
Model the Domain
Personalize Tutoring
Assess Learning
Online tutors can:• Change curricula in real-time, based on student needs;• Provide added material for low achieving students.
Results: students learn better, learn more and learn faster with these systems.
Challenges in Data-Driven Education
• Predict future student events from existing large-scale longitudinal educational data sets involving thousands of students;
• Help teachers make sense of dense online data to influence their teaching;
• Provide personalized instruction based on using big data that represents student skills and behavior;
• Infer students’ cognitive, motivational, and metacognitive features in learning.
Example Big Data From PSLC
The percentage of errors made by students on their first attempt. Learning curves for individual topics. Knowledge components indicate student learning, categorized by little learning (e.g., square area; rectangle area); no learning (e.g. triangle area);
and still too many errors (e.g., circle-circumference).
Large Data Sets
EventLog Table of a Math Tutoring System. 571,776 rows, just in a year’s time.
Repository for Educational Big DataKen Koedinger, CMU, PI.
• NSF funded DataShop, LearnSphere hosts tens of millions of data points from hundreds of thousands of students using a variety of online learning systems.
• Includes log data of student interactions, test data, field observation data stored in fully de-identified form, with all identifiers secured.
NSF Funded DataShop/DataSphere• Central Repository
– Secure place to store & access research data
– Supports various kinds of research
• Analysis & Reporting Tools
– Focus on student-tutor interaction data
– Learning curves & error reports provide summary and low-level views of student performance
– Performance Profiler aggregates across various levels of granularity (problem, dataset levels, knowledge components, etc.)
– Data Export
– New tools created to meet highest demands
Repository
• Support full data management;• Controlled access for collaboration;• File attachments;• Paper attachments;• Great for secondary analyses.
DataShop Tools
• Learning Curve• Error Report• Performance Profiler• Export• Import
LEARNING CURVEThe Tools
How can I visualize student performance over time?
pslcdatashop.orgLearnLab DataShop [email protected]
Web application
• Learning curve point decomposition
• Knowledge component model analysis with learning curves
Learning Curves
Visualizes changes in student performance over time
Time is represented on the x-axis as ‘opportunity’, or the # of times a student (or students) had an opportunity to demonstrate a KC
Hover the y-axis to change the type of Learning Curve.
Types include:• Error Rate• Assistance Score • Number of Incorrects• Number of Hints• Step Duration• Correct Step Duration• Error Step Duration
Learning curves: Drill down
Click on a data point to view point information
Click on the number link to view details of a particular drill down information.
Details include:• Name• Value• Number of Observations
Four types of information for a data point: • KCs• Problems• Steps• Students
Students likely received too much practice for these KCs. Consider reducing thre required number of tasks.
No apparent learning for these KCs. Consider splitting KC.
Students continued to have difficulty with these KCs. Consider increasing opportunities for practice
Students didn't practice these KCs enough for the data to be interpretable.
PERFORMANCE PROFILERThe Tools
What was the hardest problem for students? How many students worked in a particular unit?
pslcdatashop.orgLearnLab DataShop [email protected]
Performance Profiler
Aggregate by• Step• Problem• Student• KC• Dataset Level
View measures of• Error Rate• Assistance Score• Avg # Hints• Avg # Incorrect• Residual Error Rate
Multipurpose tool to help identify areas that are too hard or easy
View multiple samples side by side
Mouse over a row to reveal uniqueness
ERROR REPORTThe Tools
How can I explore the errors students made and drill down to see actual responses and feedback?
pslcdatashop.orgLearnLab DataShop [email protected]
Web application
◄ Performance Profiler tool for exploring the data
Change how the selected measure is aggregated by hovering the title for the x-axis.
See more details by hovering a bar in the graph.
Change the selected measure by hovering the title for the y-axis.
pslcdatashop.orgPSLC DataShop [email protected]
The number of observations by type (correct, hint, or incorrect)
Details what the student actual typed into the tutor
NSF Big Data Spoke AwardTrain researchers and educators in techniques
and tools that personalize education and make predictions over large data sets.
Use competitions, hackathon and workshops as part of this process.
Topics to be taught include:Data Mining
Artificial IntelligenceMachine LearningLearning Sciences
Research Questions
• What kinds of questions are worth asking/answering?
– What do teachers and students want to know?
– What do researchers in Learning Sciences want to know?
• What are techniques to answer big questions for big data in education?
Workshops: Topics to Teach• How and when to use key methods.• Methods being developed as well as standard
data mining’ strengths and weaknesses for different applications.
• How to answer education research questions and drive intervention and improvement in education.
• Validity and generalizability; how trustworthy and applicable are the results.
Workshops
Philadelphia, Pa., Computer Supported Collaborative Learning; Full day, June 18-19, 2017
Wuhan, China, Artificial Intelligence in EducationHalf day, June 28-29, 2017
Wuhan, China, Educational Data MiningWuhan, China, June 25-28, 2017
Worcester, MA. Fall 2017
New York City, Fall 2017
Boston, MA., Spring 2017
Topics to Teach
• Google Refine http://code.google.com/p/google-refine/• Fathom (http://www.keycurriculum.com/products/fathom)• Rapid Miner (rapidminer.com)• IBM SPSS Statistics, Version 20• Tinker Plots (http://www.keycurriculum.com/products/tinkerplots)• Many Eyes and IBM Visualization Tools (www-958.ibm.com/)• TETRAD Causal Modeling Software
Visualization of single variablesDecision trees, Bayesian Networks, Regression Pre-processing techniques, Visualization of single variables, Decision trees, Bayesian networks, Regression
CompetitionsUse existing Big data base to:
predict who goes to college and what students will study.Longitudinal Data
NSF supported longitudinal research at WPIMiddle-School students have been tracked
for 10 years. Results of mathematics actions and
college attendance.Invite people to a Kaggle competition to predict student progress.
Datathons
• Weekend hackathons in which participants are encouraged to enhance existing educational software, including MathSpring and ASSISTments.
• Participants will– Design improved animated learning companions; – Develop visualizations of hints and messages; – Develop adaptively sequences problems adjusted to
students’ recent levels of ability and effort exerted.