Upload
dataconomy
View
118
Download
2
Tags:
Embed Size (px)
Citation preview
Copyright © 2015 Criteo
Large-scale real-time recommendation
Simon Dollé
Data Enthusiasts London, July 13rd, 2015
Copyright © 2015 Criteo
Offline
• Similarities computed on browsing data
• Based on coevents (collaborative filtering)
• Computed on Hadoop cluster
• Map reduce jobs, pig
• Takes around 12 hours
• Pushed to memcache severs
Copyright © 2015 Criteo
Online
• Merge candidate products
• Rank candidates thanks to ML model learned on ad display data.
• Features
Product-specific User-specific User-product interactions Display-specific
Copyright © 2015 Criteo
Online optimizations
• Algorithmic• Use simpler ML model
• Quickly discard candidates
• Technical• Fight against garbage collector
• Memcache + local cache
• Async I/O
Copyright © 2015 Criteo
Upcoming challenges
•Long(er)-term user profiles
•More and better product information (images, NLP)
• Instant-update of similarities
Copyright © 2015 Criteo
Fancy a try ?
On your own:
With us !
http://labs.criteo.com/jobs/
• Our 1st public dataset is online: http://bit.ly/1vgw2XC• 4GB display and click data, Kaggle challenge in 2014
• NEW : 1TB dataset released a few weeks ago: http://bit.ly/1PyH4Vq• Hosted on Microsoft Azure, just waiting for you