29
Copyright © 2006, SAS Institute Inc. All rights reserved. When Good Intentions Fail Tips on avoiding common advanced analytics traps Evan Stubbs Solution Manager, ANZ – SAS 16 th February, 2010

Sunz 2010 Evan Stubbs When Good Intentions Fail

Embed Size (px)

Citation preview

Page 1: Sunz 2010   Evan Stubbs   When Good Intentions Fail

Copyright © 2006, SAS Institute Inc. All rights reserved.

When Good Intentions FailTips on avoiding common advanced analytics traps

Evan StubbsSolution Manager, ANZ – SAS16th February, 2010

Page 2: Sunz 2010   Evan Stubbs   When Good Intentions Fail

Copyright © 2006, SAS Institute Inc. All rights reserved.

Today’s Agenda

1. Four (hopefully) thought provoking statements

2. Some answers

Page 3: Sunz 2010   Evan Stubbs   When Good Intentions Fail

Copyright © 2006, SAS Institute Inc. All rights reserved.

My provocative statements for the day …

1. Seeing only part of the picture is worse than seeing nothing at all.

2. Rule-based detection systems will seduce, distract, and eventually trap you.

3. Focusing on tools is the fastest road to failure.

4. Insight generated in isolation is less than useless and will actually hurt you.

Page 4: Sunz 2010   Evan Stubbs   When Good Intentions Fail

Copyright © 2006, SAS Institute Inc. All rights reserved.

Seeing only part of the picture is worse than seeing nothing at all.

Page 5: Sunz 2010   Evan Stubbs   When Good Intentions Fail

Copyright © 2006, SAS Institute Inc. All rights reserved.

Anyone know what this is?

Formula courtesy of Wired: http://www.wired.com/techbiz/it/magazine/17-03/wp_quant?currentPage=all

Page 6: Sunz 2010   Evan Stubbs   When Good Intentions Fail

Copyright © 2006, SAS Institute Inc. All rights reserved.

Consider this …

A process identifies non-state individuals conspiring against the government based on:

• The contents of their communications• Their communication methods of choice• The frequency of their interactions

If the individuals are conspiring, 99% of the time the test will be positive

If the individuals are not conspiring, 99% of the time the test will be negative

Page 7: Sunz 2010   Evan Stubbs   When Good Intentions Fail

Copyright © 2006, SAS Institute Inc. All rights reserved.

So we execute!

The test is put into production

A collection of individuals are identified as conspiring against the government

The test is known to be 99% accurate, so enforcement is mobilised and set into action

Pretty conclusive, right?

It may be wrong as high as 99.99% of the time, despite being 99% accurate (Huh?!?)

Page 8: Sunz 2010   Evan Stubbs   When Good Intentions Fail

Copyright © 2006, SAS Institute Inc. All rights reserved.

Here’s why …

Few people actually conspire against the government:• Assume 1 / 500,000 people actually conspire• Assume Australia’s population is 22 million

General formula:• Population * (Incidence rate / Sample Population) * Test Efficiency

A positive result will be wrong in 99.99% of cases, despite the test being 99% accurate

Conspiring Not ConspiringPredicted Conspiring 44 220,000 Not Predicted Conspiring 0 21,779,956

Page 9: Sunz 2010   Evan Stubbs   When Good Intentions Fail

Copyright © 2006, SAS Institute Inc. All rights reserved.

The Lessons

If you look through a keyhole, you’ll only ever see a tiny part of the room.

If you rely too heavily on a single detection method, you will be wrong, catastrophically so at times.

It’s only a matter of time.

Page 10: Sunz 2010   Evan Stubbs   When Good Intentions Fail

Copyright © 2006, SAS Institute Inc. All rights reserved.

Anyone know what this is?

David X. Li’s Gaussian Copula function, the formula that almost brought down the financial world

Page 11: Sunz 2010   Evan Stubbs   When Good Intentions Fail

Copyright © 2006, SAS Institute Inc. All rights reserved.

Rule-based detection systems will seduce, distract, and eventually trap you.

Page 12: Sunz 2010   Evan Stubbs   When Good Intentions Fail

Copyright © 2006, SAS Institute Inc. All rights reserved.

Another one …

Identification of the communication point of a seditious cell could involve

• Their relationships• The directionality and frequency of ‘interesting’ communication

Analysis of the information shows that two individuals are equally possible information dissemination points

There is one standout who, over three months, leads the number of ‘interesting’ messages sent

Pretty conclusive, right?

Page 13: Sunz 2010   Evan Stubbs   When Good Intentions Fail

Copyright © 2006, SAS Institute Inc. All rights reserved.

Nope, yet again …

'Interesting' messages

Total Messages Proportion

'Interesting' messages

Total Messages Proportion

Period 1: 290 411 70.6% 36 48 75.0%Period 2: 85 140 60.7% 390 582 67.0%Period 3: 98 495 19.8% 140 654 21.4%

Average: 473 1046 45.2% 566 1284 44.1%

Suspect 1 Suspect 2

Bad rules lead to bad results.

Even worse, you may not know until well after the fact!

Page 14: Sunz 2010   Evan Stubbs   When Good Intentions Fail

Copyright © 2006, SAS Institute Inc. All rights reserved.

The Lessons

Rules don’t work well with ‘context’, but they do provide a false sense of security.

Maintaining a rules list can be a fun job in its own right!

Rule-based detection works great when your subjects maintain their behaviour and are happy to be observed. How often does that happen?

Page 15: Sunz 2010   Evan Stubbs   When Good Intentions Fail

Copyright © 2006, SAS Institute Inc. All rights reserved.

Focusing on tools is the fastest road to failure.

Page 16: Sunz 2010   Evan Stubbs   When Good Intentions Fail

Copyright © 2006, SAS Institute Inc. All rights reserved.

There are many methodologies …

Methodology Tree for Forecastingforecastingpriciples.com

JSA-KCGSeptember 2005

Causalmodels

Datamining

Statistical

Univariate

Theory-based

Data-based

Extrapolationmodels

Multivariate

Rule-basedforecasting

Unaidedjudgment

Judgmental

SelfOthers

Role playing(Simulatedinteraction)

Role No role

Conjointanalysis

Knowledgesource

Quantitativeanalogies

Unstructured Structured

Feedback No feedback

Predictionmarkets

DelphiDecom-position

Structuredanalogies

Neuralnets

Expertsystems

Intentions/expectations

Judgmentalbootstrapping Segmentation

Linear Classification

Game theory

Page 17: Sunz 2010   Evan Stubbs   When Good Intentions Fail

Copyright © 2006, SAS Institute Inc. All rights reserved.

And picking an approach can be complicated …

No Yes

Sufficientobjective data

YesNo

YesNo

Large changes expected

Policy analysis

YesNo

Conflict among a fewdecision makers

Type ofknowledge

Policyanalysis

NoYes

Domain Self

YesNo

Time seriesCross-section

Type ofdata

Goodknowledge ofrelationships

Policyanalysis

No Yes

Gooddomain

knowledge

Yes No

YesNo

Large changes likely

Similarcases exist

YesNo

Judgmental methods Quantitative methods

YesNo

Delphi/Predictionmarkets

Judgmentalbootstrapping/Decomposition

Conjointanalysis

Intentions/expectations

Role playing(Simulatedinteraction/

Game theory)

Structuredanalogies

Expertsystems

Rule-basedforecasting

Extrapolation/Neural nets/Data mining

Causalmodels/

Segmentation

Quantitativeanalogies

Accuracyfeedback

Unaidedjudgment

NoYes

YesNo Use adjusted forecast

Several methods provide useful forecasts

Singlemethod

Omitted information?

Combine forecasts

Use unadjusted forecast

Selection Tree for Forecasting Methodsforecastingprinciples.com

JSA-KCGJanuary 2006

Page 18: Sunz 2010   Evan Stubbs   When Good Intentions Fail

Copyright © 2006, SAS Institute Inc. All rights reserved.

Six months later …

Page 19: Sunz 2010   Evan Stubbs   When Good Intentions Fail

Copyright © 2006, SAS Institute Inc. All rights reserved.

Here’s a simpler approach …

Which one gives me the answers?

Which one lets me automate the manual stuff?

Which one plays with everything else I have?

Page 20: Sunz 2010   Evan Stubbs   When Good Intentions Fail

Copyright © 2006, SAS Institute Inc. All rights reserved.

The Lessons

The tools aren’t as important as answering the question quickly, accurately, and in a way that can be executed.

Focus on solving the intelligence problem, not on the colour of widget X.

Page 21: Sunz 2010   Evan Stubbs   When Good Intentions Fail

Copyright © 2006, SAS Institute Inc. All rights reserved.

Insight generated in isolation is less than useless and will actually hurt you.

Page 22: Sunz 2010   Evan Stubbs   When Good Intentions Fail

Copyright © 2006, SAS Institute Inc. All rights reserved.

Evan’s Generalised Formula for Analysis Paralysis

Every isolated information source, s, will create p new ‘possibilities’

Comparing and validating each of these possibilities will take t time

The total time to compare and validate these possibilities :

• (((s*p)((s*p)-1))/2) * t

Page 23: Sunz 2010   Evan Stubbs   When Good Intentions Fail

Copyright © 2006, SAS Institute Inc. All rights reserved.

Evan’s Generalised Formula for Analysis Paralysis

Let’s say you have:• Five people• Each coming up with their own set of ten calculations• On their standalone desktops with their own extract of

data• And it takes two hours to validate and compare who

has the ‘best’ answer• Total time elapsed: 306 work days, or two months

of wasted team effort

And this is just for one small case!

Page 24: Sunz 2010   Evan Stubbs   When Good Intentions Fail

Copyright © 2006, SAS Institute Inc. All rights reserved.

The Lessons

Every time you create a new standalone datasource, you geometrically increase your pointless workload.

Every time you use another non-integrated tool, you waste time and money.

Make sure your tools operationalise on a common platform, even if you find you must use multiple tools.

Page 25: Sunz 2010   Evan Stubbs   When Good Intentions Fail

Copyright © 2006, SAS Institute Inc. All rights reserved.

The Answers …

Page 26: Sunz 2010   Evan Stubbs   When Good Intentions Fail

Copyright © 2006, SAS Institute Inc. All rights reserved.

The Core Answers

Focus on solving the problem

Build a process that uses a wide range of validating / confirming techniques

Integrate, re-use, automate, and operationalise everything

Measure success by business outcomes, not models developed

Keep things as simple as possible, but no simpler

Page 27: Sunz 2010   Evan Stubbs   When Good Intentions Fail

Copyright © 2006, SAS Institute Inc. All rights reserved.

Integrated Business Analytics

Alert Generation Process

SocialNetwork

Analysis

NetworkRules

NetworkAnalytics

AlertAdministration

BusinessRules

Analytics

Text Analytics

PredictiveModeling

AnalyticsData

Staging

IntelligentData Repository

Exploratory Data Analysis & Transformation

Operational Data Sources

Individuals

Transactions

Accounts

Interaction Management

Alert Management &

ReportingLearn and Improve

Cycle

Page 28: Sunz 2010   Evan Stubbs   When Good Intentions Fail

Copyright © 2006, SAS Institute Inc. All rights reserved.

Thanks for the time!

Page 29: Sunz 2010   Evan Stubbs   When Good Intentions Fail

Copyright © 2006, SAS Institute Inc. All rights reserved.Copyright © 2006, SAS Institute Inc. All rights reserved.