Are you better than a coin toss? - Richard Warbuton & John Oliver (jClarity)

Preview:

DESCRIPTION

Presented at JAX London 2013 So you’re a big data and distributed systems “expert”, you’ve collected 500 billion data points, thrown it into sci-lib-of-the-week, you’re using Hadoop, backing onto those cool AWS GPU instances, let it grind away for days and it's spit out the answer to life the universe and everything. But is it really better than a coin toss? How do you validate whether your data analysis algorithm works? Are you learning a solution to your problems or just the data you already have? What problems can you encounter when analysing your data?

Citation preview

ARE YOU BETTERTHAN A COIN TOSS?

BY JOHN OLIVER AND RICHARD WARBURTON

WHO ARE WE?

Why you should care

The Fundamentals

Practical Problems

Applying the Theory

'EXPERTS" AREN'T VERY GOOD

BIG DATA SOLVESALL KNOWNPROBLEMS

BIG DATA SOLVESALL KNOWNPROBLEMS

... HELPS

VALIDATION =TESTS FOR DATA

FUNDAMENTALS

NULL HYPOTHESISUntil proven otherwise there is no relationship

between phenomena

WHEN YOU HEAR "WOLF!" THERE IS A WOLFNEARBY

Cry "Wolf!" Stay QuietWolf Nearby Ok False

NegativeIts really achicken!

FalsePositive

Ok

WHY IS THIS IMPORTANT?

It is better that ten guilty personsescape than that one innocent suffer

- William Blackstone

STATIC ANALYSIS

COST BENEFIT ANALYSISCosts a lot to jail an innocent manCosts very little to show someone aninappropriate houseCredibility, Liberty, Morality are also costs

CHOOSE THE RIGHT MEASUREMENTThere's more than one concept of accuracy

RECALL

Recall =number of true positives

number of actually true values

Recall =tp

+tp fn

Also called True Positive Rate or Sensitivity

PRECISION

Precision =number of true positives

predicted true value

Precision =tp

+tp fp

Also called Positive Predicted Value

=Fβ

(1 + ) ⋅β2 tp

(1 + ) ⋅ + ⋅ +β2 tp β2 fn fp

F MEASURE

Don't worry about the formula!

CASE STUDY: MEMORY LEAKSAbout ~10% of our dataset had memory leaks

Predict "never leaks memory" ~= 0.9 accuracy,but F1 = 0

Our algorithm ~= 0.9 accuracy and F1 ~= 0.9

PROBLEM: RELIABILITY OF MEASUREMENT

RULE OF THUMBIf the graph looks like random noise, it probably

is random noise.

SOLUTION: CHECK YOUR DATA

Low Standard Deviation

σ = ( −1N

∑i=1

N

xi x̄)2

− −−−−−−−−−−−−⎷

Coefficient of V ariation =σ

Mean

CAVEAT: NON-NORMAL DISTRIBUTONS

GO MAD (MEDIAN ABSOLUTE DEVIATION)MAD = media (| − media ( )|)ni Xi nj Xj

PROBLEM: EXPERIMENTAL FLUKES

IS YOUR A/B TEST A HEISEN TEST?

SOLUTION: P-VALUE

Many tests: eg Chi-Squared or Student's T

How many times do you need to roll heads beforeyou know your coin isn't biased?

SCIENCE WORKS - B****ES!

PRACTICALPROBLEMS

PROBLEM: FALSE PROPHETS

I'M AN EXPERT, LISTEN TO ME!

SOLUTION: ESTABLISH GOALS AND HYPOTHESISTHEN TEST SOLUTIONS

PROBLEM: CODE QUALITYThe math works :-) the code does not

:-(@headinthebox

GROWTH IN A TIME OF DEBT

SOLUTION: SOFTWARE ENGINEERING PRACTICES

Everyone Lies- House

SOLUTION: UNDERSTAND BIASESAND DESIGN AROUND THEM

Gay couples should have an equalright to get married, not just to have

civil partnershipsPopulus: 65% vs 27%

Marriage should continue to bedefined as a life-long exclusive

commitment between a man and awoman

Comres: 22% vs 70%

ACQUIESCENCE BIASAnswer yes

REMOVAL OF PARTICULAR ADVERTISING AND SPONSORSHIP BANS

FOR: 1045 AGAINST: 731 ABSTAIN: 121 Motion Carried

MAINTAINING AN ETHICAL UNION BY REAFFIRMING ADVERTISING ANDSPONSORSHIP BANS

FOR: 858AGAINST: 755ABSTAIN: 166Motion Carried

SOLUTION: PHRASE QUESTIONS NEUTRALLYAnd only have one question

SOCIAL DESIRABILITYPoor people overestimate their income, rich

people under estimate it.

SOLUTIONSAnonymisationConfidentialityRandomized ResponseBogus Pipeline

BIAS TOWARDS THE FIRSTANSWER OF A QUESTION

Make sure to randomise the order of answers

PROBLEM: CORRELATION DOESN’T IMPLYCAUSALITY

DATABASE AND NETWORKACTIVITY CORRELATING

Performance Diagnosis: was actually a GarbageCollection Problem.

SOLUTION: DOMAIN KNOWLEDGE

SOLUTIONSUse domain knowledge - ask PilotsStratified sample setsMeasure outcomes - are planes survivingmore?

BE RIGOROUS

APPLYING THETHEORY

CORRELATIONA MEASURE OF THE STRENGTH OF DEPENDENCE BETWEEN TWO VARIABLES

PEARSON CORRELATION= =ρX,Y

cov(X, Y )σXσY

E[(X − )(Y − )]μX μY

σXσY

Err...Just look it up

(Assumes linear relationship)

Range Strength<0.4 Weak/No Correlation<0.7 Some Correlation>0.7 Strong Correlation

CASE STUDY: PERFORMANCE PROBLEM WITH HIGHSYSTEM TIME

Hypothesis: caused by Disk I/O

Correlation Strength: 0.78453

MACHINE LEARNINGApplication of statistics to learn a relationship

HOW MANY CLUSTERS?

HOW MANY CLUSTERS?

HOW MANY CLUSTERS?

SOLUTION: ELBOW ESTIMATORS

FITTING

FITTING

SOLUTION:CROSS VALIDATION

CHOOSE CROSS VALIDATION DATA WISELY

SELF VALIDATINGEnsemble methods - Train lots of weak classifiers

and merge

RANDOM FOREST AND BAGGINGDivide the data into bootstrap sets

Use the rest for calculating error

LEARNING CURVES

UNDER-FITTING (BIAS)

OVER-FITTING (VARIANCE)

HOW MUCH IS TOO MUCH?

ACCURACY FOR DIFFERENT TREE SIZES

F1 FOR DIFFERENT TREE SIZES

MONITOR PRODUCTION DATA...IT CHANGESDoes it look like the same data that you learnt

with?

A/B TEST NEW SYSTEMSSatisfaction/Profit/Traffic...

COMMON THREADSTraining set errors are misleadingCross Validation, Production MonitoredValues are the ones that really matterVisualise and compare these errors

CONCLUSIONAnalytics are increasingly importantWide variety of statistical and practical tips toget them rightHave fun and Best of luck!

@johno_oliver @RichardWarburto

QUESTIONS?http://insightfullogic.com