Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Numbersense: Clearing the Fog
of Big Data
Kaiser FungINFORMS NYC Luncheon
9/18/2013
Monday, September 23, 2013
Big Data studies
! Observational data
! Co-opted
! Seemingly exhaustive N
! Fused data
! No controls
Monday, September 23, 2013
Flight Delay: the data
! U.S. domestic commercial flights
! 1987 to 2008
! 123 million records
! 29 variables
Wicklin (2009)Monday, September 23, 2013
Alaska’s On-time Performance
10.9%Alaska is the industry leader in on-time flights
Monday, September 23, 2013
Flight Delay: the data
! U.S. domestic commercial flights
! 1987 to 2008
! 123 million records
! 29 variables
Wicklin (2009)
In which variable(s) does Simpson’s paradox lurk?
A priori?
Monday, September 23, 2013
!"#$%&'(#%)#')*#%+(#%)#(,'(&-$.%&'(#%+$+*/0#0%&'(#%12-34*/5%6"#(#%+(#%&'(#%6"#'(-#05%&'(#%)'-$60%',%7-#85%&'(#%3'&)*#9-6/5%&'(#%3'$:*-360%+$;%&'(#%3'$,20-'$<%="#(#%-0%*#00%3*+(-6/5%*#00%3'$0#$020%+$;%*#00%3'$:-;#$3#<
Monday, September 23, 2013
Big Data: Producers
user interactions
web logs:distributedservers
dataware house
datamarts
displays
Excel
Dashboards
Data Cubes
Models
Forecasts
Monday, September 23, 2013
user interactions
displays
Excel
Dashboards
Data Cubes
Models
Forecasts
web logs... datamarts
Strategies
Tactics
Plans
Big Data: Consumers
Monday, September 23, 2013
Statistics !=
Math
David S. Moore, The Basic Practice of Statistics, ~2007Monday, September 23, 2013
Taking eyes off the ball
>?*6"'2."%@A?%-0%+%;-(#36%&#+02(#&#$6%',%,+6%+$;%+%B#66#(%&#+02(#%',%+;-)'0-6/%6"+$%CDE5%-6%-0%$'6%+%;-0#+0#%3'((#*+6#<F
Monday, September 23, 2013
Taking eyes off the ball
>?*6"'2."%@A?%-0%+%;-(#36%&#+02(#&#$6%',%,+6%+$;%+%B#66#(%&#+02(#%',%+;-)'0-6/%6"+$%CDE5%-6%-0%$'6%+%;-0#+0#%3'((#*+6#<F
Monday, September 23, 2013
The more things change
35 % 39 %
1 % 25 %
74%
26%BMI
DXA
Obese
Obese
Not Obese
Not Obese
DXA Totals
BMI Totals
All Pa!ents
100%
26 % 48 %
0 % 26 %
36 % 64 % 26 % 74 %
74%
26%
DXA
Obese
Obese
Not Obese
Not Obese
DXA Totals
BMI Totals
Female Pa!ents
BMI
100%
Monday, September 23, 2013
n-U-isance
0
50
100
150
200
250
300
350
15 20 25 30 35 40
Mortality Ra!o
Body Mass Index
over-weight
obese extremely obese
Monday, September 23, 2013
A team of psychologists performed personality tests on 100 professionals, of which 30 were engineers and 70 were lawyers. Here is a brief description of one of the subjects:
Jack is a 45-year-old man. He is married and has four children. He is generally conservative, careful, and ambitious. He shows no interest in political or social issues and spends most of his free time on his many hobbies, which include
home carpentry, sailing, and mathematics.
Kahneman and Tversky (1974)
Embarrassment of Riches
Monday, September 23, 2013
What is the probability that Jack is one of the 30 engineers?
A. 10 - 40 %
B. 41 - 60 %
C. 61 - 80 %
D. 81 - 100 %
Kahneman and Tversky (1974); Vanity FairMonday, September 23, 2013
Customer Acquisition
“Right around the birth of a child... parents are exhausted and overwhelmed and their shopping patterns and brand loyalties are up for grabs.”
Monday, September 23, 2013
Customer Acquisition
“We knew that if we could identify them in their second trimester, there’s a good chance we could capture them for years.”
Monday, September 23, 2013
Brochure Design
“We started mixing in all these ads for things we knew pregnant women would never buy, so the baby ads looked random.”
Monday, September 23, 2013
Brochure Design
“We’d put an ad for wineglasses next to infant clothes. That way, it looked like all the products were chosen by chance.”
Monday, September 23, 2013
Brochure Design
“As long as a pregnant woman thinks she hasn’t been spied on, she’ll use the coupons.”
Monday, September 23, 2013
The Model
Buy baby products
soon3
4
21
Ward made 4 of 25 related purchases
Coco Bu!er Lo"on XXL Purse
Zinc/Magnesium Supplement
Bright Blue Rug
Pregnancy Score = 87%
Monday, September 23, 2013
Mad Dad
Model Says
14%
10 90
4%
6%
76%
20
80
Pregnant Not
Pregnant
Not
Reality
100
False posi!ve rate:
False nega!ve rate:
Posi!ve predic!ve value:
Incidence:
1490
410
= 16%
= 40%
620
= 30%
10100
= 10%
3x
Monday, September 23, 2013
Sending Mixed Messages
Model Says
14%
10 90
4%
6%
76%
20
80
Pregnant Not
Pregnant
Not
Reality
100
False posi!ve rate:
False nega!ve rate:
Posi!ve predic!ve value:
Incidence:
1490
410
= 16%
= 40%
620
= 30%
10100
= 10%
3x
Monday, September 23, 2013
Intuition+Data!!Data+TheoryTrust, ! Truth!!Humans+DataLaw of small numbers
Monday, September 23, 2013