Introduction Background Previous education research points to various “persistence factors” such as abilities, motivation, time constraints, self-regulated

Introduction

Background Previous education research points to various

“persistence factors” such as abilities, motivation, time constraints, self-regulated learning skills … etc.

Learner activity features correlated with these persistence factors can potentially allow us to predict and diagnose dropout.

Predicting and Diagnosing Attrition in MOOCs

Sherif Halawa, Daniel Greene, and Pr. John Mitchell

Attrition rates in MOOCs usually exceed 80%, presenting interesting opportunities to study persistence and experiment with educational interventions. How can we predict which learners are at risk of dropout and predict their reasons for dropout? These questions are a key part of intervention design in MOOCs.

Data from multiple MOOCs was used to build training and test sets. Dropout labels were assigned using learner activity data, and labels for reasons of dropout were derived from a diagnostic survey on persistence factors.

Features were extracted from learner data, and models were constructed for dropout and each of its modeled reasons: lack of ability, lack of motivation, and lack of time.

Method

Dropout Prediction Results Dropout Diagnosis Results

Given the outputs of the dropout prediction and diagnosis models, how can we design interventions for amenable learners? To what extent can this help increase persistence in MOOCs?

Implications and Future Work

References S. Halawa, D. Greene, and J. Mitchell, “Dropout

prediction in MOOCs using learner activity features”, Proceedings of EMOOCs 2014, Feb 10-12 2014, Switzerland.

Predicting Dropout Diagnosing Dropout

Our study involved 20 MOOCs from different fields (computer science, political science, agriculture … etc)

Extracted features from learners' activity on videos, assessments, and forums.

We want to obtain a model that generalizes to many courses (sacrificing some prediction accuracy for generalizability).

Thus, we used the forward feature selection algorithm to choose features with best median prediction accuracy (recall and false positive rate (fpr)) over all courses in our dataset.

Resulting model:

Active mode features: yield a predictor that workswhile the student is still active

Absent mode feature: Adds allowance after thestudent stops engaging with the course

Avg score (2 or more assns) < 50%?

Lagging by > 2 weeks during 1st month?

Total absence > 14 days?

Skipped any videos?

Skipped any assessments?

Active modepredictor

Integrated predictor(Active mode + absent mode)

Using active mode only Using active mode + absent mode

Surveyed ~ 9,000 students on their level of motivation, time allowance, and difficulties experienced (~ 800 responses). Used the survey responses to attach labels to learners: Lack of motivation? Yes / No Lack of time? Yes / No Difficulty? Yes / No

Extracted features from learner's engagement with videos, assessments, and forums.

Built logistic regression models for predicting the labels assigned to each student.

?

Wee

k 1

Wee

k 2

Wee

k 3

?Forum

Study group

Examples of features Videos viewed / skipped

Assessments started Assessments completed Assessment grades

Forming / joining study groups

Sticking to courseschedule

Posting questions/answers tothe forum

Prediction accuracyfor lack of motivation

Prediction accuracyFor lack of time

Documents

Introduction Background Previous education research points to various “persistence factors” such as abilities, motivation, time constraints, self-regulated