47

Mapping Love with Hadoop

Embed Size (px)

DESCRIPTION

Did you know that eHarmony is responsible for 5% of all new US marriages, and that more than 600,000 people already got married to partners they met on eHarmony? eHarmony was founded to give people a better chance at finding happy, passionate, and fulfilling relationships. During this talk I describe how we go about creating Compatible matches, and how we leverage Big Data technologies to accomplish that goal. Specifically, I discuss how we take Billion+ potential matches that we find through MongoDB, store them in a Voldemort NoSQL datastore, and then run multiple Hadoop jobs to come up with a filtered list based on Machine Learned models. Our Hadoop clusters are in-house, high density, low power Seamicro installations, and we use Spring Batch and Spring Data Hadoop to orchestrate the Hadoop jobs.

Citation preview

Page 1: Mapping Love with Hadoop
Page 2: Mapping Love with Hadoop

D AV I D G E V O R K YA N

@ d a v i d g e v

d a v i d g e v o r k y a n

Page 3: Mapping Love with Hadoop

W H O A R E W E ?

Page 4: Mapping Love with Hadoop

E H A R M O N Y C R E AT E S T H E H A P P I E S T, M O S T PA S S I O N AT E A N D M O S T F U L F I L L I N G R E L AT I O N S H I P S *

* A C C O R D I N G T O A R E C E N T S T U D Y

Page 5: Mapping Love with Hadoop

4 3 8 M A R R I A G E S P E R D AY

Page 6: Mapping Love with Hadoop

T H E D I F F E R E N C E ?

Page 7: Mapping Love with Hadoop

T H E D I F F E R E N C E ?

Compatibility Matching System®

C O M PAT I B I L I T Y M AT C H I N G

A F F I N I T Y M AT C H I N G

M AT C H D I S T R I B U T I O N

Page 8: Mapping Love with Hadoop

T H E D I F F E R E N C E ?

Compatibility Matching System®

C O M PAT I B I L I T Y M AT C H I N G

A F F I N I T Y M AT C H I N G

M AT C H D I S T R I B U T I O N

Page 9: Mapping Love with Hadoop

U N I D I R E C T I O N A L U S E R D E F I N E D C R I T E R I A

Nicolette

Page 10: Mapping Love with Hadoop

U N I D I R E C T I O N A L U S E R D E F I N E D C R I T E R I AB I D I R E C T I O N A L

Leo

Ian

Steve

Nicolette

Page 11: Mapping Love with Hadoop

U N I D I R E C T I O N A L U S E R D E F I N E D C R I T E R I A

Leo

Ian

Steve

Nicolette

B I D I R E C T I O N A L

Page 12: Mapping Love with Hadoop
Page 13: Mapping Love with Hadoop
Page 14: Mapping Love with Hadoop
Page 15: Mapping Love with Hadoop

150    ques)ons

Personality  Values  A5ributes  Beliefs

Page 16: Mapping Love with Hadoop

Intellect Energy

Sociability Ambition

Kindness Curiosity

Humor Spirituality

Page 17: Mapping Love with Hadoop

C O M PAT I B I L I T Y M AT C H I N G

U S E R D E F I N E D C R I T E R I A

C O M PAT I B I L I T Y M O D E L S

M O N G O D B

V O L D E M O RT

Page 18: Mapping Love with Hadoop

M O N G O D BDATA STORE NEEDS

P O W E R F U L I N D E X I N G M O D E L S

FA S T M U LT I -AT T R I B U T E S E A R C H E S

E A S Y T O M A I N TA I N

6 0 M + Q U E R I E S

per day

Page 19: Mapping Love with Hadoop

M O N G O D BWINS

A U T O S C A L I N G

B U I LT- I N S H A R D I N G

A U T O B A L A N C I N G

M M S

Page 20: Mapping Love with Hadoop

V O L D E M O RT ?

T H AT N A M E S O U N D S FA M I L I A R

Page 21: Mapping Love with Hadoop

V O L D E M O RTDATA STORE NEEDS

C R U D O P E R AT I O N S

VA R I E D T R A N S A C T I O N

S I Z E S

B I L L I O N + P O T E N T I A L M AT C H E S

per day

Page 22: Mapping Love with Hadoop

V O L D E M O RTWINS

A U T O R E P L I C AT I O N

A U T O PA RT I T I O N I N G

P L U G G A B L E S E R I A L I Z AT I O N

Page 23: Mapping Love with Hadoop

A F F I N I T Y M AT C H I N G

Compatibility Matching System®

C O M PAT I B I L I T Y M AT C H I N G

A F F I N I T Y M AT C H I N G

M AT C H D I S T R I B U T I O N

Page 24: Mapping Love with Hadoop

65 30

3000 miles

Page 25: Mapping Love with Hadoop

Com

m p

roba

bilit

y

Distance in Miles

0 1 3 7 15 63 255 1023 4095

P R O B

Page 26: Mapping Love with Hadoop
Page 27: Mapping Love with Hadoop

Com

m p

roba

bilit

y

Height difference in cm-29 -25 -21 -17 -13 -9 -6 -3 0 3 6 9 12 16 20 24 28 32 36 40 44 48 52 56

4  -­‐  8  in

P R O B

Page 28: Mapping Love with Hadoop

W O R D S T O U S E

Page 29: Mapping Love with Hadoop

W O R D S T O U S E

Page 30: Mapping Love with Hadoop

S O M E I N S I G H T

Page 31: Mapping Love with Hadoop

D ATA N E E D S F O R A F F I N I T Y

5 0 M + R E G I S T E R E D U S E R S

1 0 3 AT T R I B U T E S

1 0 7 D A I LY M AT C H E S

2 5 0 M + P H O T O S

4 B + Q U E S T I O N N A I R E S A N S W E R E D

Page 32: Mapping Love with Hadoop

C O M M U N I C AT I O N A G G R E G AT E S

E V E N T L I S T E N E R S E R V I C E

U S E R A C T I V I T Y S E R V I C E

~ 5 M S R E S P O N S E

T I M E S

1 0 K E V E N T S P E R S E C O N D

U S E R S E R V I C E

H O U R LY, D A I LY T O TA L

Page 33: Mapping Love with Hadoop

O F F L I N E B AT C H J O B S

U S E R S E R V I C E

M A P - S I D E J O I N S ( T B ) S C O R I N G

1+GB  Compressed  Protocol  Buffers  

PA I R I N G S S E R V I C E

!750M  Compressed  Protocol  Buffers  

B I L L I O N + P O T E N T I A L M AT C H E S

Page 34: Mapping Love with Hadoop

A M A Z O N E M R

AW S D I R E C T C O N N E C T

2 5 6 N O D E S 5 0 T B S T O R A G E

I N - H O U S E S E A M I C R O

D ATA R E T R I E VA L L AT E N C Y

L O W O P E R AT I O N A L C O S T

L O W P O W E R C O N S U M P T I O N

P R E D I C TA B L E C O M P L E T I O N T I M E S

Page 35: Mapping Love with Hadoop

M O D E L R E T R A I N I N G

distcp

Protocol  Buffers  from  Offline  Jobs  

Page 36: Mapping Love with Hadoop

M AT C H D I S T R I B U T I O N

Compatibility Matching System®

C O M PAT I B I L I T Y M AT C H I N G

A F F I N I T Y M AT C H I N G

M AT C H D I S T R I B U T I O N

Page 37: Mapping Love with Hadoop

Delivering the right matches at the right time to as many people as possible across

the entire network

Page 38: Mapping Love with Hadoop
Page 39: Mapping Love with Hadoop
Page 40: Mapping Love with Hadoop

P R O B

2 2

Page 41: Mapping Love with Hadoop

P R O B

2 2

Page 42: Mapping Love with Hadoop

P R O B

Page 43: Mapping Love with Hadoop

M O N I T O R I N G

Page 44: Mapping Love with Hadoop

metrics.codahale.com

Page 45: Mapping Love with Hadoop

We Are Hiring! jobs.eharmony.com

Page 46: Mapping Love with Hadoop

T H A N K Y O U Q U E S T I O N S ?

@ d a v i d g e v

Page 47: Mapping Love with Hadoop

C R E D I T S :

The Noun Project

http://thenounproject.com

Visual Elements From