Big Data National PI Meeting 2017

  • View
    10

  • Download
    2

  • Category

    Science

Preview:

Citation preview

Challenges in Data-Driven Education

Beverly Park Woolf, Ivon Arroyo, Neil Heffernan, Ryan BakerUniversity of Massachusetts - Amherst

Worcester Polytechnic InstituteUniversity of Pennsylvania - Philadelphia

Supported byNational Science Foundation #1636847

The School of Athens, fresco by Raphael (1509 -1511) in the Vatican. Plato (left) and Aristotle (right) hold bound copies of

their books.

One Goal: Provide millions of schoolchildren with access to the personal services of a tutor as well informed as

Plato or Aristotle.

Model the Student

Model the Domain

Personalize Tutoring

Assess Learning

Online tutors can:• Change curricula in real-time, based on student needs;• Provide added material for low achieving students.

Results: students learn better, learn more and learn faster with these systems.

Challenges in Data-Driven Education

• Predict future student events from existing large-scale longitudinal educational data sets involving thousands of students;

• Help teachers make sense of dense online data to influence their teaching;

• Provide personalized instruction based on using big data that represents student skills and behavior;

• Infer students’ cognitive, motivational, and metacognitive features in learning.

Example Big Data From PSLC

The percentage of errors made by students on their first attempt. Learning curves for individual topics. Knowledge components indicate student learning, categorized by little learning (e.g., square area; rectangle area); no learning (e.g. triangle area);

and still too many errors (e.g., circle-circumference).

Large Data Sets

EventLog Table of a Math Tutoring System. 571,776 rows, just in a year’s time.

Repository for Educational Big DataKen Koedinger, CMU, PI.

• NSF funded DataShop, LearnSphere hosts tens of millions of data points from hundreds of thousands of students using a variety of online learning systems.

• Includes log data of student interactions, test data, field observation data stored in fully de-identified form, with all identifiers secured.

NSF Funded DataShop/DataSphere• Central Repository

– Secure place to store & access research data

– Supports various kinds of research

• Analysis & Reporting Tools

– Focus on student-tutor interaction data

– Learning curves & error reports provide summary and low-level views of student performance

– Performance Profiler aggregates across various levels of granularity (problem, dataset levels, knowledge components, etc.)

– Data Export

– New tools created to meet highest demands

Repository

• Support full data management;• Controlled access for collaboration;• File attachments;• Paper attachments;• Great for secondary analyses.

DataShop Tools

• Learning Curve• Error Report• Performance Profiler• Export• Import

LEARNING CURVEThe Tools

How can I visualize student performance over time?

pslcdatashop.orgLearnLab DataShop datashop-help@lists.andrew.cmu.edu

Web application

• Learning curve point decomposition

• Knowledge component model analysis with learning curves

Learning Curves

Visualizes changes in student performance over time

Time is represented on the x-axis as ‘opportunity’, or the # of times a student (or students) had an opportunity to demonstrate a KC

Hover the y-axis to change the type of Learning Curve.

Types include:• Error Rate• Assistance Score • Number of Incorrects• Number of Hints• Step Duration• Correct Step Duration• Error Step Duration

Learning curves: Drill down

Click on a data point to view point information

Click on the number link to view details of a particular drill down information.

Details include:• Name• Value• Number of Observations

Four types of information for a data point: • KCs• Problems• Steps• Students

Students likely received too much practice for these KCs. Consider reducing thre required number of tasks.

No apparent learning for these KCs. Consider splitting KC.

Students continued to have difficulty with these KCs. Consider increasing opportunities for practice

Students didn't practice these KCs enough for the data to be interpretable.

PERFORMANCE PROFILERThe Tools

What was the hardest problem for students? How many students worked in a particular unit?

pslcdatashop.orgLearnLab DataShop datashop-help@lists.andrew.cmu.edu

Performance Profiler

Aggregate by• Step• Problem• Student• KC• Dataset Level

View measures of• Error Rate• Assistance Score• Avg # Hints• Avg # Incorrect• Residual Error Rate

Multipurpose tool to help identify areas that are too hard or easy

View multiple samples side by side

Mouse over a row to reveal uniqueness

ERROR REPORTThe Tools

How can I explore the errors students made and drill down to see actual responses and feedback?

pslcdatashop.orgLearnLab DataShop datashop-help@lists.andrew.cmu.edu

Web application

◄ Performance Profiler tool for exploring the data

Change how the selected measure is aggregated by hovering the title for the x-axis.

See more details by hovering a bar in the graph.

Change the selected measure by hovering the title for the y-axis.

pslcdatashop.orgPSLC DataShop datashop-help@lists.andrew.cmu.edu

The number of observations by type (correct, hint, or incorrect)

Details what the student actual typed into the tutor

NSF Big Data Spoke AwardTrain researchers and educators in techniques

and tools that personalize education and make predictions over large data sets.

Use competitions, hackathon and workshops as part of this process.

Topics to be taught include:Data Mining

Artificial IntelligenceMachine LearningLearning Sciences

Research Questions

• What kinds of questions are worth asking/answering?

– What do teachers and students want to know?

– What do researchers in Learning Sciences want to know?

• What are techniques to answer big questions for big data in education?

Workshops: Topics to Teach• How and when to use key methods.• Methods being developed as well as standard

data mining’ strengths and weaknesses for different applications.

• How to answer education research questions and drive intervention and improvement in education.

• Validity and generalizability; how trustworthy and applicable are the results.

Workshops

Philadelphia, Pa., Computer Supported Collaborative Learning; Full day, June 18-19, 2017

Wuhan, China, Artificial Intelligence in EducationHalf day, June 28-29, 2017

Wuhan, China, Educational Data MiningWuhan, China, June 25-28, 2017

Worcester, MA. Fall 2017

New York City, Fall 2017

Boston, MA., Spring 2017

Topics to Teach

• Google Refine http://code.google.com/p/google-refine/• Fathom (http://www.keycurriculum.com/products/fathom)• Rapid Miner (rapidminer.com)• IBM SPSS Statistics, Version 20• Tinker Plots (http://www.keycurriculum.com/products/tinkerplots)• Many Eyes and IBM Visualization Tools (www-958.ibm.com/)• TETRAD Causal Modeling Software

Visualization of single variablesDecision trees, Bayesian Networks, Regression Pre-processing techniques, Visualization of single variables, Decision trees, Bayesian networks, Regression

CompetitionsUse existing Big data base to:

predict who goes to college and what students will study.Longitudinal Data

NSF supported longitudinal research at WPIMiddle-School students have been tracked

for 10 years. Results of mathematics actions and

college attendance.Invite people to a Kaggle competition to predict student progress.

Datathons

• Weekend hackathons in which participants are encouraged to enhance existing educational software, including MathSpring and ASSISTments.

• Participants will– Design improved animated learning companions; – Develop visualizations of hints and messages; – Develop adaptively sequences problems adjusted to

students’ recent levels of ability and effort exerted.

Recommended