View
1.651
Download
3
Category
Tags:
Preview:
DESCRIPTION
Citation preview
An Introduction to Data Mining in Institutional Research
Dr. Thulasi KumarDirector of Institutional ResearchUniversity of Northern Iowa
AIR/SPSS Professional AIR/SPSS Professional Development SeriesDevelopment Series
Background
Covering variety of topics
Up to date information on www.airweb.org
Copyright 2003-4, SPSS Inc.Copyright 2003-4, SPSS Inc. 3
Common QuestionsCommon Questions
1. Will I be able to get copies of the slides after the event?
2. Is this web seminar being taped so I or others can view it after the fact?
3. Can I ask questions during this event?
Copyright 2003-4, SPSS Inc.Copyright 2003-4, SPSS Inc. 4
Common QuestionsCommon Questions
1. Will I be able to get copies of the slides after the event?
2. Is this web seminar being taped so I or others can view it after the fact?
3. Can I ask questions during this event?
Yes
Yes
Yes
TodayToday’’s Agendas Agenda
Data Mining OverviewHistory How it compares to other analytic techniques
Phases in the Data Mining Process
Applications of Data Mining in Institutional Research
Data Mining solutions
Question and Answer
The Evolution of Data AnalysisThe Evolution of Data AnalysisEvolutionary Step
Business Question
Enabling Technologies
Product Providers
Characteristics
Data Collection (1960s)
"What was my total revenue in the last five years?"
Computers, tapes, disks
IBM, CDC
Retrospective, static data delivery
Data Access (1980s)
"What were unit sales in New England last March?"
Relational databases (RDBMS), Structured Query Language (SQL), ODBC
Oracle, Sybase, Informix, IBM, Microsoft
Retrospective, dynamic data delivery at record level
Data Warehousing & Decision Support (1990s)
"What were unit sales in New England last March? Drill down to Boston."
On-line analytic processing (OLAP), multidimensional databases, data warehouses
SPSS, Comshare, Arbor, Cognos, Microstrategy,NCR
Retrospective, dynamic data delivery at multiple levels
Data Mining (Emerging Today)
"What’s likely to happen to Boston unit sales next month? Why?"
Advanced algorithms, multiprocessor computers, massive databases
SPSS/Clementine, Lockheed, IBM, SGI, SAS, NCR, Oracle, numerous startups
Prospective, proactive information delivery
Source: SPSS BI
What is Data Mining?What is Data Mining?The process of discovering meaningful new correlations, patterns, and trends by sifting through large amounts of data stored in repositories and by using pattern recognition technologies as well as statistical and mathematical techniques (The Gartner Group).
The exploration and analysis of large quantities of data in order to discover meaningful patterns and rules (Berry and Linoff).
The nontrivial extraction of implicit, previously unknown, and potentially useful information from data (Frawley, Paitestsky-Shapiro and Mathews).
Differences between Statistics andDifferences between Statistics andData MiningData Mining
STATISTICS DATA MINING
Confirmative Explorative
Small data sets/File-based Large data sets/Databases
Small number of variables Large number of variables
Deductive Inductive
Numeric data Numeric and non-numeric
Clean data Data cleaning
Paradigm ShiftParadigm Shift
Traditional IR Work:
Data file => Descriptive/Regression Analysis => Tabulations/Reports
Data Mining Driven IR Work:
Database => Data Mining (Visualization, Association, Clustering, Predicative Modeling) => Immediate Actions
Historical Predictive
Historical Predictive
Source: Jing Luan, Cabrillo College, CA
Data Mining is notData Mining is not……
OLAP
Data Warehousing
Data Visualization
SQL
Ad Hoc Queries
Reporting
Data Mining Roots and AlgorithmsData Mining Roots and Algorithms
StatisticsDistributions, mathematics, etc.
Machine LearningComputer science, heuristics and induction algorithms
Artificial IntelligenceEmulating human intelligence
Neural NetworksBiological models, psychology and engineering
Data Mining isData Mining is……
Predictive ModelingLiner/Logistic RegressionNeural NetworksDecision Trees
ClusteringKohonen Neural Networks ClusteringK-Means ClusteringNearest Neighbor Clustering
Data Mining isData Mining is……(cont(cont’’d)d)
SegmentationDecision TreesNeural NetworksPredictive Modeling
Affinity AnalysisAssociation RuleSequence Generators
Cat. % nBad 52.01 168
Good 47.99 155Total (100.00) 323
Credit ranking (1=default)
Cat. % nBad 86.67 143
Good 13.33 22Total (51.08) 165
Paid Weekly/MonthlyP-value=0.0000, Chi-square=179.6665, df=1
Weekly pay
Cat. % nBad 15.82 25Good 84.18 133Total (48.92) 158
Monthly salary
Cat. % nBad 90.51 143
Good 9.49 15Total (48.92) 158
Age CategoricalP-value=0.0000, Chi-square=30.1113, df=1
Young (< 25);Middle (25-35)
Cat. % nBad 0.00 0Good 100.00 7Total (2.17) 7
Old ( > 35)
Cat. % nBad 48.98 24Good 51.02 25Total (15.17) 49
Age CategoricalP-value=0.0000, Chi-square=58.7255, df=1
Young (< 25)
Cat. % nBad 0.92 1Good 99.08 108Total (33.75) 109
Middle (25-35);Old ( > 35)
Cat. % nBad 0.00 0Good 100.00 8Total (2.48) 8
Social ClassP-value=0.0016, Chi-square=12.0388, df=1
Management;Clerical
Cat. % nBad 58.54 24
Good 41.46 17Total (12.69) 41
Professional
Phases in the DM Process: CRISPPhases in the DM Process: CRISP--DMDM
www.crisp-dm.org
•Business Understanding•Data Understanding•Data Preparation•Modeling•Evaluation•Deployment
CRISPCRISP--DMDMBusiness Understanding
Understanding project objectives and data mining problem identification
Data UnderstandingCapturing, understand, explore your data for quality issues
Data PreparationData cleaning, merge data, derive attributes etc.
ModelingSelect the data mining techniques, build the model
EvaluationEvaluate the results and approved models
DeploymentPut models into practice, monitoring and maintenance plan
Data at the heart of theData at the heart of thePredictive EnterprisePredictive Enterprise
Behavioral data- Orders- Transactions- Payment history- Usage history
Descriptive data- Attributes- Characteristics- Self-declared info- (Geo)demographics
Attitudinal data- Opinions- Preferences- Needs- Desires
Interaction data- Offers- Results- Context- Click streams- Notes
Source: SPSS BI
Data Mining ApplicationsData Mining Applications
Institutional Effectiveness
Which students make greatest use of institutional services?
What courses provide high full-time equivalent students (FTES) and allow better use of space?
What are the patterns in course taking?
What courses tend to be taken as a group?
Data Mining Applications (cont’d)
Enrollment ManagementEnrollment ManagementWho are our best students?
Where do our students come from?
Who is most likely to return for another semester?
Who is most likely to fail or drop out?
Data Mining Applications (cont’d)
MarketingMarketing
Who is most likely to respond to our new campaign?
Which type of marketing/recruiting works best?
Where should we focus our advertising and recruiting?
Data Mining Applications (cont’d)
AlumniAlumni
What are the different types/groups of alumni?
Who is likely to pledge, for how much, and when?
Where and on whom should we focus our fundraising drives?
Data Mining Applications inData Mining Applications inInstitutional ResearchInstitutional Research
Categorize your studentsClassification
Predict students retention/Alumni donationsNeural Nets/Regression
Group similar studentsSegmentation
Identify courses that are taken togetherAssociation
Find patterns and trends over timeSequence
•Cafeteria meal planning•Student housing planning
•Identify high risk students•Estimate/predict alumni contribution•Predict new student application rate
•Course planning•Academic scheduling•Identify student preferences for clubs and social organizations
•Faculty teaching load estimation•Course planning•Academic scheduling
•Predict alumni donation•Predict potential demand for library resources
Data Mining with ClementineData Mining with ClementineIndustry-leading workbench for data mining
Comprehensive range of tools for all stages of the data mining process
Pioneered visual approach for maximum productivity
Multiple modeling techniques to predict future events
SummarySummary
Successful data mining strategy involves:Well defined goals, project objectives, and questionsSufficient and relevant dataCareful consideration and selection of software and analysts (tech and domain expert)Support from senior administrators (VPs and the President)
DM provides a set of tools, techniques and a standardized process.
Need domain expertise in institutional research to build, test, validate, and deploy models.
DM does not build models automatically. Analysts do.
Next Steps: Data Mining ResourcesNext Steps: Data Mining Resources
http://www.kdnuggets.com/
http://www.dmhe.org/
http://www.uni.edu/instrsch/dm/index.html
http://www.spss.com/data_mining/
Questions?
Next Steps: Next Steps: WebcastsWebcasts and White and White PapersPapers
December 12th, 2pm Moving Beyond the Basics: Data Mining for Institutional Research
Information at www.spss.com/airseries3
Visit www.spss.com/airseries2 to download a copy of the SPSS Data Mining Tips Guide
For more informationFor more information
www.spss.com
www.airweb.org
Complete the evaluation form and tell us what you thought of today’s webcast
THANK YOU!
Survey also at: http://www.airweb.org/page.asp?page=217&meetingid=0010
Recommended