Upload
optimalbi-limited
View
359
Download
0
Tags:
Embed Size (px)
Citation preview
Copyright © 2006, SAS Institute Inc. All rights reserved.
When Good Intentions FailTips on avoiding common advanced analytics traps
Evan StubbsSolution Manager, ANZ – SAS16th February, 2010
Copyright © 2006, SAS Institute Inc. All rights reserved.
Today’s Agenda
1. Four (hopefully) thought provoking statements
2. Some answers
Copyright © 2006, SAS Institute Inc. All rights reserved.
My provocative statements for the day …
1. Seeing only part of the picture is worse than seeing nothing at all.
2. Rule-based detection systems will seduce, distract, and eventually trap you.
3. Focusing on tools is the fastest road to failure.
4. Insight generated in isolation is less than useless and will actually hurt you.
Copyright © 2006, SAS Institute Inc. All rights reserved.
Seeing only part of the picture is worse than seeing nothing at all.
Copyright © 2006, SAS Institute Inc. All rights reserved.
Anyone know what this is?
Formula courtesy of Wired: http://www.wired.com/techbiz/it/magazine/17-03/wp_quant?currentPage=all
Copyright © 2006, SAS Institute Inc. All rights reserved.
Consider this …
A process identifies non-state individuals conspiring against the government based on:
• The contents of their communications• Their communication methods of choice• The frequency of their interactions
If the individuals are conspiring, 99% of the time the test will be positive
If the individuals are not conspiring, 99% of the time the test will be negative
Copyright © 2006, SAS Institute Inc. All rights reserved.
So we execute!
The test is put into production
A collection of individuals are identified as conspiring against the government
The test is known to be 99% accurate, so enforcement is mobilised and set into action
Pretty conclusive, right?
It may be wrong as high as 99.99% of the time, despite being 99% accurate (Huh?!?)
Copyright © 2006, SAS Institute Inc. All rights reserved.
Here’s why …
Few people actually conspire against the government:• Assume 1 / 500,000 people actually conspire• Assume Australia’s population is 22 million
General formula:• Population * (Incidence rate / Sample Population) * Test Efficiency
A positive result will be wrong in 99.99% of cases, despite the test being 99% accurate
Conspiring Not ConspiringPredicted Conspiring 44 220,000 Not Predicted Conspiring 0 21,779,956
Copyright © 2006, SAS Institute Inc. All rights reserved.
The Lessons
If you look through a keyhole, you’ll only ever see a tiny part of the room.
If you rely too heavily on a single detection method, you will be wrong, catastrophically so at times.
It’s only a matter of time.
Copyright © 2006, SAS Institute Inc. All rights reserved.
Anyone know what this is?
David X. Li’s Gaussian Copula function, the formula that almost brought down the financial world
Copyright © 2006, SAS Institute Inc. All rights reserved.
Rule-based detection systems will seduce, distract, and eventually trap you.
Copyright © 2006, SAS Institute Inc. All rights reserved.
Another one …
Identification of the communication point of a seditious cell could involve
• Their relationships• The directionality and frequency of ‘interesting’ communication
Analysis of the information shows that two individuals are equally possible information dissemination points
There is one standout who, over three months, leads the number of ‘interesting’ messages sent
Pretty conclusive, right?
Copyright © 2006, SAS Institute Inc. All rights reserved.
Nope, yet again …
'Interesting' messages
Total Messages Proportion
'Interesting' messages
Total Messages Proportion
Period 1: 290 411 70.6% 36 48 75.0%Period 2: 85 140 60.7% 390 582 67.0%Period 3: 98 495 19.8% 140 654 21.4%
Average: 473 1046 45.2% 566 1284 44.1%
Suspect 1 Suspect 2
Bad rules lead to bad results.
Even worse, you may not know until well after the fact!
Copyright © 2006, SAS Institute Inc. All rights reserved.
The Lessons
Rules don’t work well with ‘context’, but they do provide a false sense of security.
Maintaining a rules list can be a fun job in its own right!
Rule-based detection works great when your subjects maintain their behaviour and are happy to be observed. How often does that happen?
Copyright © 2006, SAS Institute Inc. All rights reserved.
Focusing on tools is the fastest road to failure.
Copyright © 2006, SAS Institute Inc. All rights reserved.
There are many methodologies …
Methodology Tree for Forecastingforecastingpriciples.com
JSA-KCGSeptember 2005
Causalmodels
Datamining
Statistical
Univariate
Theory-based
Data-based
Extrapolationmodels
Multivariate
Rule-basedforecasting
Unaidedjudgment
Judgmental
SelfOthers
Role playing(Simulatedinteraction)
Role No role
Conjointanalysis
Knowledgesource
Quantitativeanalogies
Unstructured Structured
Feedback No feedback
Predictionmarkets
DelphiDecom-position
Structuredanalogies
Neuralnets
Expertsystems
Intentions/expectations
Judgmentalbootstrapping Segmentation
Linear Classification
Game theory
Copyright © 2006, SAS Institute Inc. All rights reserved.
And picking an approach can be complicated …
No Yes
Sufficientobjective data
YesNo
YesNo
Large changes expected
Policy analysis
YesNo
Conflict among a fewdecision makers
Type ofknowledge
Policyanalysis
NoYes
Domain Self
YesNo
Time seriesCross-section
Type ofdata
Goodknowledge ofrelationships
Policyanalysis
No Yes
Gooddomain
knowledge
Yes No
YesNo
Large changes likely
Similarcases exist
YesNo
Judgmental methods Quantitative methods
YesNo
Delphi/Predictionmarkets
Judgmentalbootstrapping/Decomposition
Conjointanalysis
Intentions/expectations
Role playing(Simulatedinteraction/
Game theory)
Structuredanalogies
Expertsystems
Rule-basedforecasting
Extrapolation/Neural nets/Data mining
Causalmodels/
Segmentation
Quantitativeanalogies
Accuracyfeedback
Unaidedjudgment
NoYes
YesNo Use adjusted forecast
Several methods provide useful forecasts
Singlemethod
Omitted information?
Combine forecasts
Use unadjusted forecast
Selection Tree for Forecasting Methodsforecastingprinciples.com
JSA-KCGJanuary 2006
Copyright © 2006, SAS Institute Inc. All rights reserved.
Six months later …
Copyright © 2006, SAS Institute Inc. All rights reserved.
Here’s a simpler approach …
Which one gives me the answers?
Which one lets me automate the manual stuff?
Which one plays with everything else I have?
Copyright © 2006, SAS Institute Inc. All rights reserved.
The Lessons
The tools aren’t as important as answering the question quickly, accurately, and in a way that can be executed.
Focus on solving the intelligence problem, not on the colour of widget X.
Copyright © 2006, SAS Institute Inc. All rights reserved.
Insight generated in isolation is less than useless and will actually hurt you.
Copyright © 2006, SAS Institute Inc. All rights reserved.
Evan’s Generalised Formula for Analysis Paralysis
Every isolated information source, s, will create p new ‘possibilities’
Comparing and validating each of these possibilities will take t time
The total time to compare and validate these possibilities :
• (((s*p)((s*p)-1))/2) * t
Copyright © 2006, SAS Institute Inc. All rights reserved.
Evan’s Generalised Formula for Analysis Paralysis
Let’s say you have:• Five people• Each coming up with their own set of ten calculations• On their standalone desktops with their own extract of
data• And it takes two hours to validate and compare who
has the ‘best’ answer• Total time elapsed: 306 work days, or two months
of wasted team effort
And this is just for one small case!
Copyright © 2006, SAS Institute Inc. All rights reserved.
The Lessons
Every time you create a new standalone datasource, you geometrically increase your pointless workload.
Every time you use another non-integrated tool, you waste time and money.
Make sure your tools operationalise on a common platform, even if you find you must use multiple tools.
Copyright © 2006, SAS Institute Inc. All rights reserved.
The Answers …
Copyright © 2006, SAS Institute Inc. All rights reserved.
The Core Answers
Focus on solving the problem
Build a process that uses a wide range of validating / confirming techniques
Integrate, re-use, automate, and operationalise everything
Measure success by business outcomes, not models developed
Keep things as simple as possible, but no simpler
Copyright © 2006, SAS Institute Inc. All rights reserved.
Integrated Business Analytics
Alert Generation Process
SocialNetwork
Analysis
NetworkRules
NetworkAnalytics
AlertAdministration
BusinessRules
Analytics
Text Analytics
PredictiveModeling
AnalyticsData
Staging
IntelligentData Repository
Exploratory Data Analysis & Transformation
Operational Data Sources
Individuals
Transactions
Accounts
Interaction Management
Alert Management &
ReportingLearn and Improve
Cycle
Copyright © 2006, SAS Institute Inc. All rights reserved.
Thanks for the time!
Copyright © 2006, SAS Institute Inc. All rights reserved.Copyright © 2006, SAS Institute Inc. All rights reserved.