PayPal's Fraud Detection with Deep Learning in H2O World 2014

Fraud Prevention Using Deep Learning Venkatesh Ramanathan

H2O World 2014 November 19, 2014

Outline(About(PayPal(

Fraud(Preven3on(@(PayPal(

Fraud(Preven3on(Dilemma(&(Solu3on((Deep(Learning)(

Experimental(Setup(

Results(

Conclusions(

About PayPal Unmatched Competitive Advantage

+150M Active Digital Wallets

Deep Relationships Core Competency In Risk

Global Platform with Huge Momentum

4�3�2�1�

143M

2013

2012

123M

PAYMENT CODE� WEARABLE TECH�

QR scanning that generates a payment code for easy check out

Fully able to integrate with existing POS systems; no rip & replace

Available in select markets today

Payments on any type of mobile device

Available in select markets today

About PayPal Innovative leader in payment…

Fraud(Preven3on(@(PayPal(StateDofDthe(art(feature(engineering,(machine(learning(and(sta3s3cal(models(

Highly(scalable(and(mul3Dlayered(infrastructure(soIware((

Superior(team(of(data(scien3sts,(researchers,(financial(and(intelligence(analysts(

Fraud(Preven3on(@(PayPal(• Employs(stateDofDthe(art(machine(learning(and(sta3s3cal(models(to(flag(fraudulent(behavior(upDfront(

• More(sophis3cated(algorithms(aIer(transac3on(is(complete(

Transac3on(Level(

• Monitor(account(level(ac3vity(to(iden3fy(abusive(behavior(

• Abusive(paPern(include(frequent(payments,(suspicious(profile(changes(

Account(Level(

• Monitor(accountDtoDaccount(interac3on(• Frequent(transfer(of(money(from(several(accounts(to(one(central(account((

Network(Level(

Fraud(Preven3on(Dilemma(Fraudsters(are(becoming(increasingly(smarter(and(adap3ve(

Need(costDeffec3ve(solu3ons(that(can(model(complex(aPack(paPerns(not(previously(observed(((

Need(scalable(and(computa3onally(efficient(predic3on(models(

Fraud(Preven3on(Dilemma(Solu3on:(Deep(Learning(• Helps(to(unearth(lowDlevel(complex(abstrac3ons(• Helps(to(learn(complex(highly(varying(func3ons(not(present(in(the(training(examples(

• Widely(employed(for(image,(video(processing(and(object(recogni3on(

Why(Deep(Learning?(

• Highly(scalable(• Superior(performance(• Flexible(deployment(• Work(seamlessly(with(other(big(data(frameworks(• Simple(interface(

Why(H2O?(

Experiment(•  Dataset(

–  160(million(records(–  1500(features((150(categorical)(–  0.6TB(compressed(in(HDFS(

•  Infrastructure(–  800(node(Hadoop((CDH3)(cluster(

•  Decision(–  fraud/notDfraud(

Experiment(

R(

H2O(Mapper(

HDFS( HDFS(

•  Setup(–  800(node(Hadoop(

(CDH3)(cluster(–  R(as(a(client(

H2O(Mapper(

•  H2O(cloud(forma3on(failed(–  H2O(mapper(needs(

memory(upfront(–  Cluster(capacity(

limita3ons(

Experiment(

R(

H2O(Cloud(

HDFS( HDFS(


(CDH3)(cluster(–  5(node(H2O(cloud((24(

CPUs;(144GB(RAM)(–  R(as(a(client(

H2O(Cloud(

•  Import(failed(–  Data(snappy(

compressed(

Experiment(

R(

H2O(Cloud(

HDFS( HDFS(


(CDH3)(cluster(–  5(node(H2O(cloud((24(

CPUs;(144GB(RAM)(–  R(as(a(client(–  GZIP’ed(data(

H2O(Cloud(

•  Import(too(slow(–  1GB/hour(–  Not(parallelized(

Experiment(

R(

H2O(Cloud(

HDFS( HDFS(

•  Setup(–  800(node(Hadoop((CDH3)(

cluster(–  5(node(H2O(cloud((24(

CPUs;(144GB(RAM)(–  R(as(a(client(–  GZIP’ed(data(–  Cliff’s(fix((1(GB(from(1(

hour(to(10(minutes)(H2O(Cloud(

•  Deep(Learning(failed(–  Skipping(rows(if(it(had(

missing(values(–  99%(of(rows(had(missing(

values(

Experiment(

R(

H2O(Cloud(

HDFS( HDFS(




hour(to(10(minutes)(–  Arno’s(fixes(

H2O(Cloud(

•  Deep(Learning(slow(

Experiment(

R(

H2O(Cloud(

HDFS( HDFS(




hour(to(10(minutes)(–  Arno’s(fixes(&(sugges3ons(–  Reduced(data(

•  10(million(rows((60%(training;(20%(valida3on;(20%(test)(

H2O(Cloud(

Experimental(Design(

Parameter' Range'

#(of(hidden(layers( (2,(4,(6,(8(

#(of(neurons( 200,(300,(400,(500,(600,(700(

ac3va3on(func3on( Rec3fier;(Tanh;(Maxout;(Rec3fierWithDropout(

feature(subset( All,(subset1(–(subset7(

test(data(set( All,(week4(–(week8(

L1/L2(regulariza3on( 0(D(1(

epoch( 500(

10(million(rows/1500(features((60%(training;(20%(valida3on;(20%(test)(((

Results(

#'of'hidden'layers'(Rec6fier,'2'layer,'200'neurons,'500'epoch,''L1/L2'='0)'

Area'Under'ROC'Curve'(AUC)''

2( 0.762(

4( 0.821(

6( 0.839(

8' 0.839'

How(much(depth(is(required?(

Best(performance(with(6(layers(

Results(How(much(depth(is(required?(

Best(performance(with(600(neurons(

#(of(hidden(layersD6(

Results(

Ac6va6on'func6on'(6'layers;'600'neurons)'

AUC'

Tanh( 0.801(

Rec3fier( 0.856(

Maxout( 0.826(

Rec6fierWithDropout' 0.865'

Which(ac3va3on(func3on(produces(best(result?(

Best(performance(with(

Rec3fierWithDropout(

Results(

Feature'subset' AUC'

subset1( 0.836(

subset2( 0.847(

subset3' 0.849'

subset4( 0.844(

subset5( 0.834(

subset6( 0.786(

subset7' 0.751'

Which(subset(of(features(produces(best(result?(

Best(performance(with(subset3;(

Worst(for(subset7((2/3rd(less(feature)(

Results(

Epoch:'500'Hidden:'2'layers'Neurons:'200'each'layer'Subset7''AUC'

Epoch:'500'Hidden:'6'layers'Neurons:'600'each'layer'Subset7''AUC'

0.751( 0.86(

Can(deep(network(improve(subset7?(

11%(improvement(in(performance((with(1/3rd(of(the(feature(

set(

Results(

Test'Set' AUC'

Week(4( 0.856(

Week(8( 0.861(

Week(12( 0.852(

Week(16( 0.858(

Week(20( 0.853(

Is(deep(learning(temporally(robust?(

Performance(within(1%(difference(upto(20(

weeks(

Conclusions(•  Deep(Learning(using(H2O(is(beneficial(for(payment(fraud(

preven3on(–  Network(architecture(D(6(layers(with(600(neurons(each(performed(the(

best(–  Ac3va3on(func3on((D(Rec3fierWithDropout(performed(the(best(–  Improved(performance(with(limited(feature(set(&(a(deep(network(

(11%(improvement(with(a(third(of(the(original(feature(set,(6(hidden(layers,(600(neurons(each)(

–  Robust(to(temporal(varia3ons(

Conclusions(•  Lessons(learned(in(using(H2O(

–  Slow(import(process((–  Issues(with(compressed(data,(missing(values,(sparse(data(–  Require(knowledge(of(performance(knobs(–  Fantas3c(support(from(H2O(team(

•  Next(Steps(– Mul3Dclass(classifica3on(–  Produc3onalize(

Thank(You!(

Data & Analytics

PayPal's Fraud Detection with Deep Learning in H2O World 2014