36
Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University

Data mining with DataShop

Embed Size (px)

DESCRIPTION

Data mining with DataShop. Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University. “Knowledge components are the germ of transfer”. Goal of the week: What does Ken mean by this?. Overview. Motivation for data mining - PowerPoint PPT Presentation

Citation preview

Page 1: Data mining with DataShop

Data mining with DataShop

Ken Koedinger CMU Director of PSLC

Professor of Human-Computer Interaction & Psychology

Carnegie Mellon University

Page 2: Data mining with DataShop

““Knowledge components Knowledge components are the germ of transfer”are the germ of transfer”

Goal of the week:

What does Ken mean by this?

Page 3: Data mining with DataShop

Overview

Motivation for data mining Better understanding of students =>

better instructional design Exploratory Data Analysis

Data Shop demo, Excel Learning curves & Learning Factors Analysis Example project from last summer

Page 4: Data mining with DataShop

Data Mining Questions & Methods What is going on with student learning &

performance? Exploratory data analysis

Summary & visualization tools in DataShop Tools in Excel: Auto filter, Pivot Tables, Solver

How to reliably model student achievement? Item Response Theory (IRT)

Basis for standardized tests, SAT, GRE, TIMSS… Version of “logistic regression”

Page 5: Data mining with DataShop

Data Mining Questions & Methods 2 What’s the nature of knowledge students are learning?

How can we discover cognitive models of student learning that fit their learning curves? Learning Factors Analysis (LFA)

Extends IRT to account for learning Search algorithm: Discover cognitive model(s) that capture

how student learning transfers over tasks over time What features of a tutor lead to the most learning?

Learning Decomposition Extends LFA to explore different rates of learning due to different

forms of instruction How to extract reliable inferences about causal mechanisms

from correlations in data? Causal modeling using Tetrad

Page 6: Data mining with DataShop

Overview

Motivation for data mining Better understanding of students =>

better instructional design Exploratory Data Analysis

Demo: DataShop, Excel Learning curves & Learning Factors Analysis Example project from last summer

Next

Page 7: Data mining with DataShop

Data Shop Demo …

Page 8: Data mining with DataShop

Before going to DataShop, let’s look at a tutor (1997 version!) that generated the example data set we’ll look at

Page 9: Data mining with DataShop

TWO_CIRCLES_IN_SQUARE problem: Initial screen

Page 10: Data mining with DataShop

TWO_CIRCLES_IN_SQUARE problem: An error a few steps later

Page 11: Data mining with DataShop

TWO_CIRCLES_IN_SQUARE problem: Student follows hint & completes prob

Page 12: Data mining with DataShop

How to get to the DataShop: Go to http://learnlab.org & click …

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

1

2

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.3

Page 13: Data mining with DataShop

PSLC’s DataShop

Researchers get data access, visualizations, statistical tools

Learning curves track student learning over time

Discover what concepts & skills students need help with

Page 14: Data mining with DataShop

PSLC’s DataShop

Learning curves reveal over- and under-practiced knowledge components

Rectangle-area has an initial low error rate, but is practiced often

Page 15: Data mining with DataShop

Other DataShop Features

Error Reports Identify misconceptions by looking for common student errors When do students ask for hints? Are there alternative correct strategies?

Performance Profiler Export Data

Get all or part of the data in tab-delimited file Use your favorite analysis tools …

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 16: Data mining with DataShop

Exported File Loaded into Excel

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 17: Data mining with DataShop

Overview

Motivation for data mining Better understanding of students =>

better instructional design Exploratory Data Analysis

Data Shop demo, Excel Learning curves & Learning Factors Analysis Example project from last summer

Next

Page 18: Data mining with DataShop

3(2x - 5) = 9

6x - 15 = 9 2x - 5 = 3 6x - 5 = 9

Cognitive Model drives behavior of intelligent tutor systems …

Cognitive Model: expert component of intelligent tutors that models how students solve problems

If goal is solve a(bx+c) = dThen rewrite as abx + ac = d

If goal is solve a(bx+c) = dThen rewrite as abx + c = d

If goal is solve a(bx+c) = dThen rewrite as bx+c = d/a

Model Tracing: Follows student through their individual approach to a problem -> context-sensitive instruction

Page 19: Data mining with DataShop

3(2x - 5) = 9

6x - 15 = 9 2x - 5 = 3 6x - 5 = 9

Cognitive Model drives behavior of intelligent tutor systems …

Cognitive Model: expert component of intelligent tutors that models how students solve problems

If goal is solve a(bx+c) = dThen rewrite as abx + ac = d

If goal is solve a(bx+c) = dThen rewrite as abx + c = d

Model Tracing: Follows student through their individual approach to a problem -> context-sensitive instruction

Hint message: “Distribute a across the parentheses.”

Bug message: “You need tomultiply c by a also.”

Knowledge Tracing: Assesses student's knowledge growth -> individualized activity selection and pacing

Known? = 85% chance Known? = 45%

Page 20: Data mining with DataShop

Cognitive Modeling Challenge

Problem: Intelligent Tutoring Systems depend on Cognitive Model, which is hard to get right Hard to program, but more importantly … A high quality cognitive model requires a deep

understanding of student thinking Cognitive models created by intuition are often

wrong (e.g., Koedinger & Nathan, 2004)

Page 21: Data mining with DataShop

Significance of improving a cognitive model

A better cognitive model means: better feedback & hints (model tracing) better problem selection & pacing (knowledge

tracing) Making cognitive models better advances

basic cognitive science

Page 22: Data mining with DataShop

How can we use student data to build better cognitive models?

Cognitive Task Analysis methods Think alouds, Difficulty Factors Assessment

General lecture Tuesday Peer collaboration dialog analysis

TagHelper track Newer:

Data mining of student interactions with on-line tutors

Page 23: Data mining with DataShop

Back to DataShop to illustrate

Page 24: Data mining with DataShop

Use log data to test alternative knowledge representations

Which “knowledge component” analysis is correct is an empirical question!

Log data from tutors provides data to compare different KC analyses Find which “germ” accounts for student learning

behaviors

Page 25: Data mining with DataShop

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Not a smooth learning curve -> this knowledge component model is wrong. Does not capture genuine student difficulties.

Page 26: Data mining with DataShop

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

This more specific knowledge component (KC) model (2 KCs) is also wrong -- still no smooth drop in error rate.

Page 27: Data mining with DataShop

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Ah! Now we are getting a smooth learning curve. This even more specific decomposition (12 KCs) better tracks the nature of student difficulties & transfer for one problem situation to another.

Page 28: Data mining with DataShop

Overview

Motivation for data mining Better understanding of students =>

better instructional design Exploratory Data Analysis

Demo: DataShop, Excel Learning curves & Learning Factors Analysis Example project from last summer

Next

Page 29: Data mining with DataShop

Example project from 2006

Rafferty (Stanford) & Yudelson (U Pitt) Analyzed a data set from Geometry Applied Learning Factors Analysis (LFA) Driving questions:

Are students learning at the same rate as assumed in prior LFA models?

Do we need different cognitive models (KC models) to account for low-achieving vs. high-achieving students?

Page 30: Data mining with DataShop

Rafferty & Yudelson Results 1

Different student learning rates?

Yes

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 31: Data mining with DataShop

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 32: Data mining with DataShop

Rafferty & Yudelson Results 2 Is it “faster” learning or “different” learning?

Fit with a more compact model is better for low pre for high learn

Students with an apparent faster learning rate are learning a more “compact”, general and transferable domain model

(Became basis of Anna Rafferty’s masters thesis)

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 33: Data mining with DataShop

Data Mining-Data Shop Offerings TomorrowLectures in 3501 Newell-Simon Hall, activities here (Wean 5202)

1. Educational data mining overview & introduction to using the DataShop Follow-up activities:

Exercise in using DataShop for exploratory data analysis Use tutor/course that generated target data set. Begin data export,

data scrubbing, exploratory data analysis

2. Learning from learning curves: Item Response Theory, Learning Factors Analysis

3. Other data mining techniques: Learning decomposition, causal models with Tetrad

Define metrics to address driving question, begin analysis

Page 34: Data mining with DataShop

Questions?

Page 35: Data mining with DataShop

What’s next?

Tomorrow: Do you know which offerings you will go to

tomorrow? Any conflicts -- two you want to go to that are at

the same time?

Page 36: Data mining with DataShop

END