Upload
stephen-childs
View
98
Download
0
Embed Size (px)
Citation preview
Individual Level Predictive Analytics
Improving Student Enrolment Outcomes
Stephen Childs, Institutional Analyst @sechildsCIRPA/PNAIRP 2016, Kelowna, BCNovember 7, 2016
Office of Institutional Analysis
Why Predictive Analytics and IR
Higher Education Institutions collect more data IR offices have experts in institutional data IR offices are seeking ways to add more value Machine learning, predictive models are in the news
2
Opportunity… or Crisis?
Predictive Analytics are a different skill set A different set of software tools required You may be the only analyst working on this in your office Requesters expect you to be the expert Resistance to implementing insights from predictive analytics
The way forward
Add these skills to your IR toolkit Find tools that work with your existing ones Develop your understanding and expertise Community of Practice
Learning Outcomes
Have a high-level understanding of what predictive analytics does and how it works.
Have a concrete series of steps to follow. Know the vocabulary of machine learning and statistical
modeling. Know what tools can be used for this - and how they work
with existing tools Know about how we select, test, train models for prediction Learn some of the challenges in predictive modeling
Outline
Introduction (already done??) Introduction to Machine Learning Model Building Steps Tool Overview Customer Education Challenges Building Community
About Me
Machine Learning
Contrast with statistics Supervised and Unsupervised Learning Classification and Regression Different Algorithms
Predictive Data Analysis Steps
Goal Data Access
Analysis File Model Delivery
STEP 1: Define Your Goal
Sets the scope of your analysis Provides input into model selection Identifies stakeholders Discover what data is available Revise as the project progresses
STEP 2: Get Access to your Data
Three different types of data:— Operational SIS— Data Warehouse – snapshots— Predictive Analytics Data
Talk to your DBA to find out tables Think of other data to add:
— Residence, CRM— Socio-economic data
STEP 3: Build an Analysis File
Extract – Transform – Load— Use as much existing ETL as you can— Join tables together— Work with a programmer – but analyst drives
Hard to capture the timeline of the application— When did they apply?— When were they accepted?— When did they register?
STEP 3: Build and Analysis File - Tools
STEP 3: Build a Data Analysis File – Best Practices
Test your ETL process (automated is better) Save your data in a database (existing one, SQLite) Append rows to table and timestamp & use test indicator Keep track of program version Keep a changelog Capture more data, then filter that for analysis
STEP 4: Develop a model
Student Characteristics Outcomes
Independent Variables
Features
Dependent Variable
function
algorithm
formula
STEP 4: Develop a Model – Things to Watch Out For
Missing data Multiple models Model testing
STEP 4: Develop a Model - Accuracy
Refer back to your goal – no universal measure of accuracy Model used for decision making/resource allocation Assign loss based on incorrect predictions – minimize it Receiver Operating Characteristic (ROC) and Area Under the
Curve (AUC) Bias-Variance Trade Off and Overfitting
STEP 5: Deliver Your Results
Set up delivery early Meet with your audience – set expectations How will the data be used – refer back to goal Dashboards Data files
STEP 5: Delivery to Students
Have to carefully present information to students— Present a positive outlook— Don’t personalize it – talk about a group of similar
students. The factors in the model may be less deterministic than
unobserved factors. Difference between causality and correlation. Beware the self-fulfilling prophecy
Weapons of Math Destruction
Three factors make a model a WMD:— Is the participant aware of the model? Is the model
opaque or invisible?— Does the model work against the participant’s interest? Is
it unfair? Does it create feedback loops?— Can the model scale?
Experience So Far
Longer than anticipated to get the data Working with the data was a great learning experience Automated process for harvesting data Starting to work on the delivery end
Challenges
Data quality Not enough RHS variables More categorical variables in usual ML problems
Community of Practice
Predictive Analytics Roundtable Mailing List – more discussion in future http://mailman.ucalgary.ca/mailman/listinfo/predictive-l [email protected] @sechilds #CIRPA2016 PyData, other user groups