28
An Introduction to Data Mining in Institutional Research Dr. Thulasi Kumar Director of Institutional Research University of Northern Iowa

An Introduction to Data Mining in Institutional Research

  • Upload
    tommy96

  • View
    1.651

  • Download
    3

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: An Introduction to Data Mining in Institutional Research

An Introduction to Data Mining in Institutional Research

Dr. Thulasi KumarDirector of Institutional ResearchUniversity of Northern Iowa

Page 2: An Introduction to Data Mining in Institutional Research

AIR/SPSS Professional AIR/SPSS Professional Development SeriesDevelopment Series

Background

Covering variety of topics

Up to date information on www.airweb.org

Page 3: An Introduction to Data Mining in Institutional Research

Copyright 2003-4, SPSS Inc.Copyright 2003-4, SPSS Inc. 3

Common QuestionsCommon Questions

1. Will I be able to get copies of the slides after the event?

2. Is this web seminar being taped so I or others can view it after the fact?

3. Can I ask questions during this event?

Page 4: An Introduction to Data Mining in Institutional Research

Copyright 2003-4, SPSS Inc.Copyright 2003-4, SPSS Inc. 4

Common QuestionsCommon Questions

1. Will I be able to get copies of the slides after the event?

2. Is this web seminar being taped so I or others can view it after the fact?

3. Can I ask questions during this event?

Yes

Yes

Yes

Page 5: An Introduction to Data Mining in Institutional Research

TodayToday’’s Agendas Agenda

Data Mining OverviewHistory How it compares to other analytic techniques

Phases in the Data Mining Process

Applications of Data Mining in Institutional Research

Data Mining solutions

Question and Answer

Page 6: An Introduction to Data Mining in Institutional Research

The Evolution of Data AnalysisThe Evolution of Data AnalysisEvolutionary Step

Business Question

Enabling Technologies

Product Providers

Characteristics

Data Collection (1960s)

"What was my total revenue in the last five years?"

Computers, tapes, disks

IBM, CDC

Retrospective, static data delivery

Data Access (1980s)

"What were unit sales in New England last March?"

Relational databases (RDBMS), Structured Query Language (SQL), ODBC

Oracle, Sybase, Informix, IBM, Microsoft

Retrospective, dynamic data delivery at record level

Data Warehousing & Decision Support (1990s)

"What were unit sales in New England last March? Drill down to Boston."

On-line analytic processing (OLAP), multidimensional databases, data warehouses

SPSS, Comshare, Arbor, Cognos, Microstrategy,NCR

Retrospective, dynamic data delivery at multiple levels

Data Mining (Emerging Today)

"What’s likely to happen to Boston unit sales next month? Why?"

Advanced algorithms, multiprocessor computers, massive databases

SPSS/Clementine, Lockheed, IBM, SGI, SAS, NCR, Oracle, numerous startups

Prospective, proactive information delivery

Source: SPSS BI

Page 7: An Introduction to Data Mining in Institutional Research

What is Data Mining?What is Data Mining?The process of discovering meaningful new correlations, patterns, and trends by sifting through large amounts of data stored in repositories and by using pattern recognition technologies as well as statistical and mathematical techniques (The Gartner Group).

The exploration and analysis of large quantities of data in order to discover meaningful patterns and rules (Berry and Linoff).

The nontrivial extraction of implicit, previously unknown, and potentially useful information from data (Frawley, Paitestsky-Shapiro and Mathews).

Page 8: An Introduction to Data Mining in Institutional Research

Differences between Statistics andDifferences between Statistics andData MiningData Mining

STATISTICS DATA MINING

Confirmative Explorative

Small data sets/File-based Large data sets/Databases

Small number of variables Large number of variables

Deductive Inductive

Numeric data Numeric and non-numeric

Clean data Data cleaning

Page 9: An Introduction to Data Mining in Institutional Research

Paradigm ShiftParadigm Shift

Traditional IR Work:

Data file => Descriptive/Regression Analysis => Tabulations/Reports

Data Mining Driven IR Work:

Database => Data Mining (Visualization, Association, Clustering, Predicative Modeling) => Immediate Actions

Historical Predictive

Historical Predictive

Source: Jing Luan, Cabrillo College, CA

Page 10: An Introduction to Data Mining in Institutional Research

Data Mining is notData Mining is not……

OLAP

Data Warehousing

Data Visualization

SQL

Ad Hoc Queries

Reporting

Page 11: An Introduction to Data Mining in Institutional Research

Data Mining Roots and AlgorithmsData Mining Roots and Algorithms

StatisticsDistributions, mathematics, etc.

Machine LearningComputer science, heuristics and induction algorithms

Artificial IntelligenceEmulating human intelligence

Neural NetworksBiological models, psychology and engineering

Page 12: An Introduction to Data Mining in Institutional Research

Data Mining isData Mining is……

Predictive ModelingLiner/Logistic RegressionNeural NetworksDecision Trees

ClusteringKohonen Neural Networks ClusteringK-Means ClusteringNearest Neighbor Clustering

Page 13: An Introduction to Data Mining in Institutional Research

Data Mining isData Mining is……(cont(cont’’d)d)

SegmentationDecision TreesNeural NetworksPredictive Modeling

Affinity AnalysisAssociation RuleSequence Generators

Cat. % nBad 52.01 168

Good 47.99 155Total (100.00) 323

Credit ranking (1=default)

Cat. % nBad 86.67 143

Good 13.33 22Total (51.08) 165

Paid Weekly/MonthlyP-value=0.0000, Chi-square=179.6665, df=1

Weekly pay

Cat. % nBad 15.82 25Good 84.18 133Total (48.92) 158

Monthly salary

Cat. % nBad 90.51 143

Good 9.49 15Total (48.92) 158

Age CategoricalP-value=0.0000, Chi-square=30.1113, df=1

Young (< 25);Middle (25-35)

Cat. % nBad 0.00 0Good 100.00 7Total (2.17) 7

Old ( > 35)

Cat. % nBad 48.98 24Good 51.02 25Total (15.17) 49

Age CategoricalP-value=0.0000, Chi-square=58.7255, df=1

Young (< 25)

Cat. % nBad 0.92 1Good 99.08 108Total (33.75) 109

Middle (25-35);Old ( > 35)

Cat. % nBad 0.00 0Good 100.00 8Total (2.48) 8

Social ClassP-value=0.0016, Chi-square=12.0388, df=1

Management;Clerical

Cat. % nBad 58.54 24

Good 41.46 17Total (12.69) 41

Professional

Page 14: An Introduction to Data Mining in Institutional Research

Phases in the DM Process: CRISPPhases in the DM Process: CRISP--DMDM

www.crisp-dm.org

•Business Understanding•Data Understanding•Data Preparation•Modeling•Evaluation•Deployment

Page 15: An Introduction to Data Mining in Institutional Research

CRISPCRISP--DMDMBusiness Understanding

Understanding project objectives and data mining problem identification

Data UnderstandingCapturing, understand, explore your data for quality issues

Data PreparationData cleaning, merge data, derive attributes etc.

ModelingSelect the data mining techniques, build the model

EvaluationEvaluate the results and approved models

DeploymentPut models into practice, monitoring and maintenance plan

Page 16: An Introduction to Data Mining in Institutional Research

Data at the heart of theData at the heart of thePredictive EnterprisePredictive Enterprise

Behavioral data- Orders- Transactions- Payment history- Usage history

Descriptive data- Attributes- Characteristics- Self-declared info- (Geo)demographics

Attitudinal data- Opinions- Preferences- Needs- Desires

Interaction data- Offers- Results- Context- Click streams- Notes

Source: SPSS BI

Page 17: An Introduction to Data Mining in Institutional Research

Data Mining ApplicationsData Mining Applications

Institutional Effectiveness

Which students make greatest use of institutional services?

What courses provide high full-time equivalent students (FTES) and allow better use of space?

What are the patterns in course taking?

What courses tend to be taken as a group?

Page 18: An Introduction to Data Mining in Institutional Research

Data Mining Applications (cont’d)

Enrollment ManagementEnrollment ManagementWho are our best students?

Where do our students come from?

Who is most likely to return for another semester?

Who is most likely to fail or drop out?

Page 19: An Introduction to Data Mining in Institutional Research

Data Mining Applications (cont’d)

MarketingMarketing

Who is most likely to respond to our new campaign?

Which type of marketing/recruiting works best?

Where should we focus our advertising and recruiting?

Page 20: An Introduction to Data Mining in Institutional Research

Data Mining Applications (cont’d)

AlumniAlumni

What are the different types/groups of alumni?

Who is likely to pledge, for how much, and when?

Where and on whom should we focus our fundraising drives?

Page 21: An Introduction to Data Mining in Institutional Research

Data Mining Applications inData Mining Applications inInstitutional ResearchInstitutional Research

Categorize your studentsClassification

Predict students retention/Alumni donationsNeural Nets/Regression

Group similar studentsSegmentation

Identify courses that are taken togetherAssociation

Find patterns and trends over timeSequence

•Cafeteria meal planning•Student housing planning

•Identify high risk students•Estimate/predict alumni contribution•Predict new student application rate

•Course planning•Academic scheduling•Identify student preferences for clubs and social organizations

•Faculty teaching load estimation•Course planning•Academic scheduling

•Predict alumni donation•Predict potential demand for library resources

Page 22: An Introduction to Data Mining in Institutional Research

Data Mining with ClementineData Mining with ClementineIndustry-leading workbench for data mining

Comprehensive range of tools for all stages of the data mining process

Pioneered visual approach for maximum productivity

Multiple modeling techniques to predict future events

Page 23: An Introduction to Data Mining in Institutional Research

SummarySummary

Successful data mining strategy involves:Well defined goals, project objectives, and questionsSufficient and relevant dataCareful consideration and selection of software and analysts (tech and domain expert)Support from senior administrators (VPs and the President)

DM provides a set of tools, techniques and a standardized process.

Need domain expertise in institutional research to build, test, validate, and deploy models.

DM does not build models automatically. Analysts do.

Page 24: An Introduction to Data Mining in Institutional Research

Next Steps: Data Mining ResourcesNext Steps: Data Mining Resources

http://www.kdnuggets.com/

http://www.dmhe.org/

http://www.uni.edu/instrsch/dm/index.html

http://www.spss.com/data_mining/

Page 25: An Introduction to Data Mining in Institutional Research

Questions?

Page 26: An Introduction to Data Mining in Institutional Research

Next Steps: Next Steps: WebcastsWebcasts and White and White PapersPapers

December 12th, 2pm Moving Beyond the Basics: Data Mining for Institutional Research

Information at www.spss.com/airseries3

Visit www.spss.com/airseries2 to download a copy of the SPSS Data Mining Tips Guide

Page 27: An Introduction to Data Mining in Institutional Research

For more informationFor more information

www.spss.com

www.airweb.org

Complete the evaluation form and tell us what you thought of today’s webcast

Page 28: An Introduction to Data Mining in Institutional Research

THANK YOU!

Survey also at: http://www.airweb.org/page.asp?page=217&meetingid=0010