Upload
david-gevorkyan
View
3.305
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Did you know that eHarmony is responsible for 5% of all new US marriages, and that more than 600,000 people already got married to partners they met on eHarmony? eHarmony was founded to give people a better chance at finding happy, passionate, and fulfilling relationships. During this talk I describe how we go about creating Compatible matches, and how we leverage Big Data technologies to accomplish that goal. Specifically, I discuss how we take Billion+ potential matches that we find through MongoDB, store them in a Voldemort NoSQL datastore, and then run multiple Hadoop jobs to come up with a filtered list based on Machine Learned models. Our Hadoop clusters are in-house, high density, low power Seamicro installations, and we use Spring Batch and Spring Data Hadoop to orchestrate the Hadoop jobs.
Citation preview
D AV I D G E V O R K YA N
@ d a v i d g e v
d a v i d g e v o r k y a n
W H O A R E W E ?
E H A R M O N Y C R E AT E S T H E H A P P I E S T, M O S T PA S S I O N AT E A N D M O S T F U L F I L L I N G R E L AT I O N S H I P S *
* A C C O R D I N G T O A R E C E N T S T U D Y
4 3 8 M A R R I A G E S P E R D AY
T H E D I F F E R E N C E ?
T H E D I F F E R E N C E ?
Compatibility Matching System®
C O M PAT I B I L I T Y M AT C H I N G
A F F I N I T Y M AT C H I N G
M AT C H D I S T R I B U T I O N
T H E D I F F E R E N C E ?
Compatibility Matching System®
C O M PAT I B I L I T Y M AT C H I N G
A F F I N I T Y M AT C H I N G
M AT C H D I S T R I B U T I O N
U N I D I R E C T I O N A L U S E R D E F I N E D C R I T E R I A
Nicolette
U N I D I R E C T I O N A L U S E R D E F I N E D C R I T E R I AB I D I R E C T I O N A L
Leo
Ian
Steve
Nicolette
U N I D I R E C T I O N A L U S E R D E F I N E D C R I T E R I A
Leo
Ian
Steve
Nicolette
B I D I R E C T I O N A L
150 ques)ons
Personality Values A5ributes Beliefs
Intellect Energy
Sociability Ambition
Kindness Curiosity
Humor Spirituality
C O M PAT I B I L I T Y M AT C H I N G
U S E R D E F I N E D C R I T E R I A
C O M PAT I B I L I T Y M O D E L S
M O N G O D B
V O L D E M O RT
M O N G O D BDATA STORE NEEDS
P O W E R F U L I N D E X I N G M O D E L S
FA S T M U LT I -AT T R I B U T E S E A R C H E S
E A S Y T O M A I N TA I N
6 0 M + Q U E R I E S
per day
M O N G O D BWINS
A U T O S C A L I N G
B U I LT- I N S H A R D I N G
A U T O B A L A N C I N G
M M S
V O L D E M O RT ?
T H AT N A M E S O U N D S FA M I L I A R
V O L D E M O RTDATA STORE NEEDS
C R U D O P E R AT I O N S
VA R I E D T R A N S A C T I O N
S I Z E S
B I L L I O N + P O T E N T I A L M AT C H E S
per day
V O L D E M O RTWINS
A U T O R E P L I C AT I O N
A U T O PA RT I T I O N I N G
P L U G G A B L E S E R I A L I Z AT I O N
A F F I N I T Y M AT C H I N G
Compatibility Matching System®
C O M PAT I B I L I T Y M AT C H I N G
A F F I N I T Y M AT C H I N G
M AT C H D I S T R I B U T I O N
65 30
3000 miles
Com
m p
roba
bilit
y
Distance in Miles
0 1 3 7 15 63 255 1023 4095
P R O B
Com
m p
roba
bilit
y
Height difference in cm-29 -25 -21 -17 -13 -9 -6 -3 0 3 6 9 12 16 20 24 28 32 36 40 44 48 52 56
4 -‐ 8 in
P R O B
W O R D S T O U S E
W O R D S T O U S E
S O M E I N S I G H T
D ATA N E E D S F O R A F F I N I T Y
5 0 M + R E G I S T E R E D U S E R S
1 0 3 AT T R I B U T E S
1 0 7 D A I LY M AT C H E S
2 5 0 M + P H O T O S
4 B + Q U E S T I O N N A I R E S A N S W E R E D
C O M M U N I C AT I O N A G G R E G AT E S
E V E N T L I S T E N E R S E R V I C E
U S E R A C T I V I T Y S E R V I C E
~ 5 M S R E S P O N S E
T I M E S
1 0 K E V E N T S P E R S E C O N D
U S E R S E R V I C E
H O U R LY, D A I LY T O TA L
O F F L I N E B AT C H J O B S
U S E R S E R V I C E
M A P - S I D E J O I N S ( T B ) S C O R I N G
1+GB Compressed Protocol Buffers
PA I R I N G S S E R V I C E
!750M Compressed Protocol Buffers
B I L L I O N + P O T E N T I A L M AT C H E S
A M A Z O N E M R
AW S D I R E C T C O N N E C T
2 5 6 N O D E S 5 0 T B S T O R A G E
I N - H O U S E S E A M I C R O
D ATA R E T R I E VA L L AT E N C Y
L O W O P E R AT I O N A L C O S T
L O W P O W E R C O N S U M P T I O N
P R E D I C TA B L E C O M P L E T I O N T I M E S
M O D E L R E T R A I N I N G
distcp
Protocol Buffers from Offline Jobs
M AT C H D I S T R I B U T I O N
Compatibility Matching System®
C O M PAT I B I L I T Y M AT C H I N G
A F F I N I T Y M AT C H I N G
M AT C H D I S T R I B U T I O N
Delivering the right matches at the right time to as many people as possible across
the entire network
P R O B
2 2
P R O B
2 2
P R O B
M O N I T O R I N G
metrics.codahale.com
We Are Hiring! jobs.eharmony.com
T H A N K Y O U Q U E S T I O N S ?
@ d a v i d g e v
C R E D I T S :
The Noun Project
http://thenounproject.com
Visual Elements From