24
Individual Level Predictive Analytics Improving Student Enrolment Outcomes Stephen Childs, Institutional Analyst @sechilds CIRPA/PNAIRP 2016, Kelowna, BC November 7, 2016 Office of Institutional Analysis

CIRPA 2016: Individual Level Predictive Analytics for Improving Student Enrolment

Embed Size (px)

Citation preview

Page 1: CIRPA 2016: Individual Level Predictive Analytics for Improving Student Enrolment

Individual Level Predictive Analytics

Improving Student Enrolment Outcomes

Stephen Childs, Institutional Analyst @sechildsCIRPA/PNAIRP 2016, Kelowna, BCNovember 7, 2016

Office of Institutional Analysis

Page 2: CIRPA 2016: Individual Level Predictive Analytics for Improving Student Enrolment

Why Predictive Analytics and IR

Higher Education Institutions collect more data IR offices have experts in institutional data IR offices are seeking ways to add more value Machine learning, predictive models are in the news

2

Page 3: CIRPA 2016: Individual Level Predictive Analytics for Improving Student Enrolment

Opportunity… or Crisis?

Predictive Analytics are a different skill set A different set of software tools required You may be the only analyst working on this in your office Requesters expect you to be the expert Resistance to implementing insights from predictive analytics

Page 4: CIRPA 2016: Individual Level Predictive Analytics for Improving Student Enrolment

The way forward

Add these skills to your IR toolkit Find tools that work with your existing ones Develop your understanding and expertise Community of Practice

Page 5: CIRPA 2016: Individual Level Predictive Analytics for Improving Student Enrolment

Learning Outcomes

Have a high-level understanding of what predictive analytics does and how it works.

Have a concrete series of steps to follow. Know the vocabulary of machine learning and statistical

modeling. Know what tools can be used for this - and how they work

with existing tools Know about how we select, test, train models for prediction Learn some of the challenges in predictive modeling

Page 6: CIRPA 2016: Individual Level Predictive Analytics for Improving Student Enrolment

Outline

Introduction (already done??) Introduction to Machine Learning Model Building Steps Tool Overview Customer Education Challenges Building Community

Page 7: CIRPA 2016: Individual Level Predictive Analytics for Improving Student Enrolment

About Me

Page 8: CIRPA 2016: Individual Level Predictive Analytics for Improving Student Enrolment

Machine Learning

Contrast with statistics Supervised and Unsupervised Learning Classification and Regression Different Algorithms

Page 9: CIRPA 2016: Individual Level Predictive Analytics for Improving Student Enrolment

Predictive Data Analysis Steps

Goal Data Access

Analysis File Model Delivery

Page 10: CIRPA 2016: Individual Level Predictive Analytics for Improving Student Enrolment

STEP 1: Define Your Goal

Sets the scope of your analysis Provides input into model selection Identifies stakeholders Discover what data is available Revise as the project progresses

Page 11: CIRPA 2016: Individual Level Predictive Analytics for Improving Student Enrolment

STEP 2: Get Access to your Data

Three different types of data:— Operational SIS— Data Warehouse – snapshots— Predictive Analytics Data

Talk to your DBA to find out tables Think of other data to add:

— Residence, CRM— Socio-economic data

Page 12: CIRPA 2016: Individual Level Predictive Analytics for Improving Student Enrolment

STEP 3: Build an Analysis File

Extract – Transform – Load— Use as much existing ETL as you can— Join tables together— Work with a programmer – but analyst drives

Hard to capture the timeline of the application— When did they apply?— When were they accepted?— When did they register?

Page 13: CIRPA 2016: Individual Level Predictive Analytics for Improving Student Enrolment

STEP 3: Build and Analysis File - Tools

Page 14: CIRPA 2016: Individual Level Predictive Analytics for Improving Student Enrolment

STEP 3: Build a Data Analysis File – Best Practices

Test your ETL process (automated is better) Save your data in a database (existing one, SQLite) Append rows to table and timestamp & use test indicator Keep track of program version Keep a changelog Capture more data, then filter that for analysis

Page 15: CIRPA 2016: Individual Level Predictive Analytics for Improving Student Enrolment

STEP 4: Develop a model

Student Characteristics Outcomes

Independent Variables

Features

Dependent Variable

function

algorithm

formula

Page 16: CIRPA 2016: Individual Level Predictive Analytics for Improving Student Enrolment

STEP 4: Develop a Model – Things to Watch Out For

Missing data Multiple models Model testing

Page 17: CIRPA 2016: Individual Level Predictive Analytics for Improving Student Enrolment

STEP 4: Develop a Model - Accuracy

Refer back to your goal – no universal measure of accuracy Model used for decision making/resource allocation Assign loss based on incorrect predictions – minimize it Receiver Operating Characteristic (ROC) and Area Under the

Curve (AUC) Bias-Variance Trade Off and Overfitting

Page 18: CIRPA 2016: Individual Level Predictive Analytics for Improving Student Enrolment

STEP 5: Deliver Your Results

Set up delivery early Meet with your audience – set expectations How will the data be used – refer back to goal Dashboards Data files

Page 19: CIRPA 2016: Individual Level Predictive Analytics for Improving Student Enrolment

STEP 5: Delivery to Students

Have to carefully present information to students— Present a positive outlook— Don’t personalize it – talk about a group of similar

students. The factors in the model may be less deterministic than

unobserved factors. Difference between causality and correlation. Beware the self-fulfilling prophecy

Page 20: CIRPA 2016: Individual Level Predictive Analytics for Improving Student Enrolment

Cathy O’Neil

@mathbabe, mathbabe.org Mathematician, former hedge-fund

quant

Page 21: CIRPA 2016: Individual Level Predictive Analytics for Improving Student Enrolment

Weapons of Math Destruction

Three factors make a model a WMD:— Is the participant aware of the model? Is the model

opaque or invisible?— Does the model work against the participant’s interest? Is

it unfair? Does it create feedback loops?— Can the model scale?

Page 22: CIRPA 2016: Individual Level Predictive Analytics for Improving Student Enrolment

Experience So Far

Longer than anticipated to get the data Working with the data was a great learning experience Automated process for harvesting data Starting to work on the delivery end

Page 23: CIRPA 2016: Individual Level Predictive Analytics for Improving Student Enrolment

Challenges

Data quality Not enough RHS variables More categorical variables in usual ML problems

Page 24: CIRPA 2016: Individual Level Predictive Analytics for Improving Student Enrolment

Community of Practice

Predictive Analytics Roundtable Mailing List – more discussion in future http://mailman.ucalgary.ca/mailman/listinfo/predictive-l [email protected] @sechilds #CIRPA2016 PyData, other user groups