30
Challenges in Data- Driven Education Beverly Park Woolf, Ivon Arroyo, Neil Heffernan, Ryan Baker University of Massachusetts - Amherst Worcester Polytechnic Institute University of Pennsylvania - Philadelphia Supported by National Science Foundation #1636847

Big Data National PI Meeting 2017

Embed Size (px)

Citation preview

Page 1: Big Data National PI Meeting 2017

Challenges in Data-Driven Education

Beverly Park Woolf, Ivon Arroyo, Neil Heffernan, Ryan BakerUniversity of Massachusetts - Amherst

Worcester Polytechnic InstituteUniversity of Pennsylvania - Philadelphia

Supported byNational Science Foundation #1636847

Page 2: Big Data National PI Meeting 2017

The School of Athens, fresco by Raphael (1509 -1511) in the Vatican. Plato (left) and Aristotle (right) hold bound copies of

their books.

Page 3: Big Data National PI Meeting 2017

One Goal: Provide millions of schoolchildren with access to the personal services of a tutor as well informed as

Plato or Aristotle.

Page 4: Big Data National PI Meeting 2017

Model the Student

Model the Domain

Personalize Tutoring

Assess Learning

Online tutors can:• Change curricula in real-time, based on student needs;• Provide added material for low achieving students.

Results: students learn better, learn more and learn faster with these systems.

Page 5: Big Data National PI Meeting 2017

Challenges in Data-Driven Education

• Predict future student events from existing large-scale longitudinal educational data sets involving thousands of students;

• Help teachers make sense of dense online data to influence their teaching;

• Provide personalized instruction based on using big data that represents student skills and behavior;

• Infer students’ cognitive, motivational, and metacognitive features in learning.

Page 6: Big Data National PI Meeting 2017

Example Big Data From PSLC

The percentage of errors made by students on their first attempt. Learning curves for individual topics. Knowledge components indicate student learning, categorized by little learning (e.g., square area; rectangle area); no learning (e.g. triangle area);

and still too many errors (e.g., circle-circumference).

Page 7: Big Data National PI Meeting 2017

Large Data Sets

EventLog Table of a Math Tutoring System. 571,776 rows, just in a year’s time.

Page 8: Big Data National PI Meeting 2017

Repository for Educational Big DataKen Koedinger, CMU, PI.

• NSF funded DataShop, LearnSphere hosts tens of millions of data points from hundreds of thousands of students using a variety of online learning systems.

• Includes log data of student interactions, test data, field observation data stored in fully de-identified form, with all identifiers secured.

Page 9: Big Data National PI Meeting 2017

NSF Funded DataShop/DataSphere• Central Repository

– Secure place to store & access research data

– Supports various kinds of research

• Analysis & Reporting Tools

– Focus on student-tutor interaction data

– Learning curves & error reports provide summary and low-level views of student performance

– Performance Profiler aggregates across various levels of granularity (problem, dataset levels, knowledge components, etc.)

– Data Export

– New tools created to meet highest demands

Page 10: Big Data National PI Meeting 2017

Repository

• Support full data management;• Controlled access for collaboration;• File attachments;• Paper attachments;• Great for secondary analyses.

Page 11: Big Data National PI Meeting 2017

DataShop Tools

• Learning Curve• Error Report• Performance Profiler• Export• Import

Page 12: Big Data National PI Meeting 2017

LEARNING CURVEThe Tools

How can I visualize student performance over time?

pslcdatashop.orgLearnLab DataShop [email protected]

Page 13: Big Data National PI Meeting 2017

Web application

• Learning curve point decomposition

• Knowledge component model analysis with learning curves

Page 14: Big Data National PI Meeting 2017

Learning Curves

Visualizes changes in student performance over time

Time is represented on the x-axis as ‘opportunity’, or the # of times a student (or students) had an opportunity to demonstrate a KC

Hover the y-axis to change the type of Learning Curve.

Types include:• Error Rate• Assistance Score • Number of Incorrects• Number of Hints• Step Duration• Correct Step Duration• Error Step Duration

Page 15: Big Data National PI Meeting 2017

Learning curves: Drill down

Click on a data point to view point information

Click on the number link to view details of a particular drill down information.

Details include:• Name• Value• Number of Observations

Four types of information for a data point: • KCs• Problems• Steps• Students

Page 16: Big Data National PI Meeting 2017

Students likely received too much practice for these KCs. Consider reducing thre required number of tasks.

No apparent learning for these KCs. Consider splitting KC.

Students continued to have difficulty with these KCs. Consider increasing opportunities for practice

Students didn't practice these KCs enough for the data to be interpretable.

Page 17: Big Data National PI Meeting 2017

PERFORMANCE PROFILERThe Tools

What was the hardest problem for students? How many students worked in a particular unit?

pslcdatashop.orgLearnLab DataShop [email protected]

Page 18: Big Data National PI Meeting 2017

Performance Profiler

Aggregate by• Step• Problem• Student• KC• Dataset Level

View measures of• Error Rate• Assistance Score• Avg # Hints• Avg # Incorrect• Residual Error Rate

Multipurpose tool to help identify areas that are too hard or easy

View multiple samples side by side

Mouse over a row to reveal uniqueness

Page 19: Big Data National PI Meeting 2017

ERROR REPORTThe Tools

How can I explore the errors students made and drill down to see actual responses and feedback?

pslcdatashop.orgLearnLab DataShop [email protected]

Page 20: Big Data National PI Meeting 2017

Web application

◄ Performance Profiler tool for exploring the data

Page 21: Big Data National PI Meeting 2017

Change how the selected measure is aggregated by hovering the title for the x-axis.

See more details by hovering a bar in the graph.

Change the selected measure by hovering the title for the y-axis.

Page 22: Big Data National PI Meeting 2017

pslcdatashop.orgPSLC DataShop [email protected]

The number of observations by type (correct, hint, or incorrect)

Details what the student actual typed into the tutor

Page 23: Big Data National PI Meeting 2017

NSF Big Data Spoke AwardTrain researchers and educators in techniques

and tools that personalize education and make predictions over large data sets.

Use competitions, hackathon and workshops as part of this process.

Topics to be taught include:Data Mining

Artificial IntelligenceMachine LearningLearning Sciences

Page 24: Big Data National PI Meeting 2017

Research Questions

• What kinds of questions are worth asking/answering?

– What do teachers and students want to know?

– What do researchers in Learning Sciences want to know?

• What are techniques to answer big questions for big data in education?

Page 25: Big Data National PI Meeting 2017

Workshops: Topics to Teach• How and when to use key methods.• Methods being developed as well as standard

data mining’ strengths and weaknesses for different applications.

• How to answer education research questions and drive intervention and improvement in education.

• Validity and generalizability; how trustworthy and applicable are the results.

Page 26: Big Data National PI Meeting 2017

Workshops

Philadelphia, Pa., Computer Supported Collaborative Learning; Full day, June 18-19, 2017

Wuhan, China, Artificial Intelligence in EducationHalf day, June 28-29, 2017

Wuhan, China, Educational Data MiningWuhan, China, June 25-28, 2017

Worcester, MA. Fall 2017

New York City, Fall 2017

Boston, MA., Spring 2017

Page 27: Big Data National PI Meeting 2017

Topics to Teach

• Google Refine http://code.google.com/p/google-refine/• Fathom (http://www.keycurriculum.com/products/fathom)• Rapid Miner (rapidminer.com)• IBM SPSS Statistics, Version 20• Tinker Plots (http://www.keycurriculum.com/products/tinkerplots)• Many Eyes and IBM Visualization Tools (www-958.ibm.com/)• TETRAD Causal Modeling Software

Visualization of single variablesDecision trees, Bayesian Networks, Regression Pre-processing techniques, Visualization of single variables, Decision trees, Bayesian networks, Regression

Page 28: Big Data National PI Meeting 2017

CompetitionsUse existing Big data base to:

predict who goes to college and what students will study.Longitudinal Data

NSF supported longitudinal research at WPIMiddle-School students have been tracked

for 10 years. Results of mathematics actions and

college attendance.Invite people to a Kaggle competition to predict student progress.

Page 29: Big Data National PI Meeting 2017

Datathons

• Weekend hackathons in which participants are encouraged to enhance existing educational software, including MathSpring and ASSISTments.

• Participants will– Design improved animated learning companions; – Develop visualizations of hints and messages; – Develop adaptively sequences problems adjusted to

students’ recent levels of ability and effort exerted.

Page 30: Big Data National PI Meeting 2017