Upload
brett-bell
View
216
Download
0
Tags:
Embed Size (px)
Citation preview
Data MiningData Mining
What’s important Association/Binning
Clustering
Classification
Segmentation
What to expect What-if
Estimation Curve Fitting Fill in Sparse Matrix
Prediction Probability Quantitative
MethodologyMethodology
Collected Sample
Statistical Analyst – Business Modeling
Warehouse
Marts
business interpretation
•Optimize data marts
Data
StoreDBA
Predictive Metrics & Segments
Methodology - EDMDAPAMethodology - EDMDAPA
Extract Integrate disparate data systems Build holistic business view Group and organize large sets of categorize
Discretize/Classify Grouping and Segmentation
Simplify large flat dimensions
Model Create predictive estimation functions
Deploy Build/score data marts, cubes with predictive probability and quantitative metrics and simplified
dimensional categories
Analyze, Visualize, Scorecard Identify KPI's, Identify business problems
Plan Predict(Forecast)/Test(What-If) Apply performance rules on KPI’s
Act Campaigns, personalization, optimization
ExtractExtract
DecisionStream unites information from disparate data sources for sampling the enterprise
80% of the work involved in analytics is collecting, cleansing, and preparing data
Classification with ScenarioClassification with Scenario
Segment and Classify combinations of stores, regions, divisions, customers or products
Benchmark against last month!
Path of successPath of success
Model with 4ThoughtModel with 4Thought
Avoids over-fitting
Works well with Noisy
Co-linear
Not much or sparse data
Factor Analysis
What-if
Filling in the sparse matrix – e.g. #1Filling in the sparse matrix – e.g. #1
Revenue estimation: Dimensional intersect:
Red shoes, southwest, women, springtime: $50,000
Black shoes, northeast, men, summer: $38,000
Black shoes, southwest, women, summer: $43,000
Black shoes, northeast, men, springtime: ????
Once a model is build against historical data, the resultant function can productively fill in the question marks
Filling in the sparse matrix – e.g. #2Filling in the sparse matrix – e.g. #2
Insurance cost estimation: Dimensional intersect:
Age 38, southwest, female, non-smoker, married: $1,800
Age 24, northeast, male, smoker, single: $2,300
Age 32, southwest, female, smoker, single: $3,000
Age 28, southwest, men, non-smoker, married: ????
Once a model is build against historical data, the resultant function can productively fill in the question marks
Deploy with DecisionStreamDeploy with DecisionStream
DecisionStream uses predictive function from 4Thought as UDF for derivation
Deploy data marts, cubes, and metadata
PlanPlan
Determine Business Goals and apply
NoticeCast Agents
KPI Business Pack
Exception highlighting with reports
Forecast with 4Thought Access forecasted results with
ETL
Keys to MiningKeys to Mining
Usefulness Can the information discovered be
considered knowledge?
Certainty How viable is the discovered
knowledge
Expressiveness Can the discovered knowledge be
represented in a meaningful way
Problems for MiningProblems for Mining
Missing data Inconsistent categories
Too much data Difficult to focus
Not enough data Nothing meaningful
Too many patterns Hard to discern knowledge from garbage
Complexity of discoveries Knowledge is too complex to be used
Unavailable data