25
Fraud Prevention Using Deep Learning Venkatesh Ramanathan H2O World 2014 November 19, 2014

PayPal's Fraud Detection with Deep Learning in H2O World 2014

Embed Size (px)

Citation preview

Fraud Prevention Using Deep Learning Venkatesh Ramanathan

H2O World 2014 November 19, 2014

Outline(About(PayPal(

Fraud(Preven3on(@(PayPal(

Fraud(Preven3on(Dilemma(&(Solu3on((Deep(Learning)(

Experimental(Setup(

Results(

Conclusions(

About PayPal Unmatched Competitive Advantage

+150M Active Digital Wallets

Deep Relationships Core Competency In Risk

Global Platform with Huge Momentum

4�3�2�1�

143M

2013

2012

123M

PAYMENT CODE� WEARABLE TECH�

QR scanning that generates a payment code for easy check out

Fully able to integrate with existing POS systems; no rip & replace

Available in select markets today

Payments on any type of mobile device

Available in select markets today

About PayPal Innovative leader in payment…

Fraud(Preven3on(@(PayPal(StateDofDthe(art(feature(engineering,(machine(learning(and(sta3s3cal(models(

Highly(scalable(and(mul3Dlayered(infrastructure(soIware((

Superior(team(of(data(scien3sts,(researchers,(financial(and(intelligence(analysts(

Fraud(Preven3on(@(PayPal(• Employs(stateDofDthe(art(machine(learning(and(sta3s3cal(models(to(flag(fraudulent(behavior(upDfront(

• More(sophis3cated(algorithms(aIer(transac3on(is(complete(

Transac3on(Level(

• Monitor(account(level(ac3vity(to(iden3fy(abusive(behavior(

• Abusive(paPern(include(frequent(payments,(suspicious(profile(changes(

Account(Level(

• Monitor(accountDtoDaccount(interac3on(• Frequent(transfer(of(money(from(several(accounts(to(one(central(account((

Network(Level(

Fraud(Preven3on(Dilemma(Fraudsters(are(becoming(increasingly(smarter(and(adap3ve(

Need(costDeffec3ve(solu3ons(that(can(model(complex(aPack(paPerns(not(previously(observed(((

Need(scalable(and(computa3onally(efficient(predic3on(models(

Fraud(Preven3on(Dilemma(Solu3on:(Deep(Learning(• Helps(to(unearth(lowDlevel(complex(abstrac3ons(• Helps(to(learn(complex(highly(varying(func3ons(not(present(in(the(training(examples(

• Widely(employed(for(image,(video(processing(and(object(recogni3on(

Why(Deep(Learning?(

• Highly(scalable(• Superior(performance(• Flexible(deployment(• Work(seamlessly(with(other(big(data(frameworks(• Simple(interface(

Why(H2O?(

Experiment(•  Dataset(

–  160(million(records(–  1500(features((150(categorical)(–  0.6TB(compressed(in(HDFS(

•  Infrastructure(–  800(node(Hadoop((CDH3)(cluster(

•  Decision(–  fraud/notDfraud(

Experiment(

R(

H2O(Mapper(

HDFS( HDFS(

•  Setup(–  800(node(Hadoop(

(CDH3)(cluster(–  R(as(a(client(

H2O(Mapper(

•  H2O(cloud(forma3on(failed(–  H2O(mapper(needs(

memory(upfront(–  Cluster(capacity(

limita3ons(

Experiment(

R(

H2O(Cloud(

HDFS( HDFS(

•  Setup(–  800(node(Hadoop(

(CDH3)(cluster(–  5(node(H2O(cloud((24(

CPUs;(144GB(RAM)(–  R(as(a(client(

H2O(Cloud(

•  Import(failed(–  Data(snappy(

compressed(

Experiment(

R(

H2O(Cloud(

HDFS( HDFS(

•  Setup(–  800(node(Hadoop(

(CDH3)(cluster(–  5(node(H2O(cloud((24(

CPUs;(144GB(RAM)(–  R(as(a(client(–  GZIP’ed(data(

H2O(Cloud(

•  Import(too(slow(–  1GB/hour(–  Not(parallelized(

Experiment(

R(

H2O(Cloud(

HDFS( HDFS(

•  Setup(–  800(node(Hadoop((CDH3)(

cluster(–  5(node(H2O(cloud((24(

CPUs;(144GB(RAM)(–  R(as(a(client(–  GZIP’ed(data(–  Cliff’s(fix((1(GB(from(1(

hour(to(10(minutes)(H2O(Cloud(

•  Deep(Learning(failed(–  Skipping(rows(if(it(had(

missing(values(–  99%(of(rows(had(missing(

values(

Experiment(

R(

H2O(Cloud(

HDFS( HDFS(

•  Setup(–  800(node(Hadoop((CDH3)(

cluster(–  5(node(H2O(cloud((24(

CPUs;(144GB(RAM)(–  R(as(a(client(–  GZIP’ed(data(–  Cliff’s(fix((1(GB(from(1(

hour(to(10(minutes)(–  Arno’s(fixes(

H2O(Cloud(

•  Deep(Learning(slow(

Experiment(

R(

H2O(Cloud(

HDFS( HDFS(

•  Setup(–  800(node(Hadoop((CDH3)(

cluster(–  5(node(H2O(cloud((24(

CPUs;(144GB(RAM)(–  R(as(a(client(–  GZIP’ed(data(–  Cliff’s(fix((1(GB(from(1(

hour(to(10(minutes)(–  Arno’s(fixes(&(sugges3ons(–  Reduced(data(

•  10(million(rows((60%(training;(20%(valida3on;(20%(test)(

H2O(Cloud(

Experimental(Design(

Parameter' Range'

#(of(hidden(layers( (2,(4,(6,(8(

#(of(neurons( 200,(300,(400,(500,(600,(700(

ac3va3on(func3on( Rec3fier;(Tanh;(Maxout;(Rec3fierWithDropout(

feature(subset( All,(subset1(–(subset7(

test(data(set( All,(week4(–(week8(

L1/L2(regulariza3on( 0(D(1(

epoch( 500(

10(million(rows/1500(features((60%(training;(20%(valida3on;(20%(test)(((

Results(

#'of'hidden'layers'(Rec6fier,'2'layer,'200'neurons,'500'epoch,''L1/L2'='0)'

Area'Under'ROC'Curve'(AUC)''

2( 0.762(

4( 0.821(

6( 0.839(

8' 0.839'

How(much(depth(is(required?(

Best(performance(with(6(layers(

Results(How(much(depth(is(required?(

Best(performance(with(600(neurons(

#(of(hidden(layersD6(

Results(

Ac6va6on'func6on'(6'layers;'600'neurons)'

AUC'

Tanh( 0.801(

Rec3fier( 0.856(

Maxout( 0.826(

Rec6fierWithDropout' 0.865'

Which(ac3va3on(func3on(produces(best(result?(

Best(performance(with(

Rec3fierWithDropout(

Results(

Feature'subset' AUC'

subset1( 0.836(

subset2( 0.847(

subset3' 0.849'

subset4( 0.844(

subset5( 0.834(

subset6( 0.786(

subset7' 0.751'

Which(subset(of(features(produces(best(result?(

Best(performance(with(subset3;(

Worst(for(subset7((2/3rd(less(feature)(

Results(

Epoch:'500'Hidden:'2'layers'Neurons:'200'each'layer'Subset7''AUC'

Epoch:'500'Hidden:'6'layers'Neurons:'600'each'layer'Subset7''AUC'

0.751( 0.86(

Can(deep(network(improve(subset7?(

11%(improvement(in(performance((with(1/3rd(of(the(feature(

set(

Results(

Test'Set' AUC'

Week(4( 0.856(

Week(8( 0.861(

Week(12( 0.852(

Week(16( 0.858(

Week(20( 0.853(

Is(deep(learning(temporally(robust?(

Performance(within(1%(difference(upto(20(

weeks(

Conclusions(•  Deep(Learning(using(H2O(is(beneficial(for(payment(fraud(

preven3on(–  Network(architecture(D(6(layers(with(600(neurons(each(performed(the(

best(–  Ac3va3on(func3on((D(Rec3fierWithDropout(performed(the(best(–  Improved(performance(with(limited(feature(set(&(a(deep(network(

(11%(improvement(with(a(third(of(the(original(feature(set,(6(hidden(layers,(600(neurons(each)(

–  Robust(to(temporal(varia3ons(

Conclusions(•  Lessons(learned(in(using(H2O(

–  Slow(import(process((–  Issues(with(compressed(data,(missing(values,(sparse(data(–  Require(knowledge(of(performance(knobs(–  Fantas3c(support(from(H2O(team(

•  Next(Steps(– Mul3Dclass(classifica3on(–  Produc3onalize(

Thank(You!(