Robson Motta | [email protected]
7 Machine Learning techniquesin practice in a Startup
312.000.000.000(this means billions)
recommendations in 2014
Get to know our solutions
How to present the
bestrecommendation for each client/context?
recommendations
data
recommendations
data
preprocessing
processing
postprocessing
● products● pageviews● clicks● buyorders
etc.
Machine Learning
“All models are wrong,but some are useful”
(George E. P. Box)
… first ecommerce
We havea client!
Content-based Filtering1
Content-based Filtering
1
frequency of term n in document d
IDF factor ofterm n
weight of term nwithin document d
Content-based Filtering
1
reference
reference
reference
reference
Content-based Filtering
1
Content-based Filtering
1
Content-based Filtering
1
Content-based Filtering
1
Clustering2
Clustering
2
iteration 1
Clustering
2
iteration 2
Clustering
2
iteration 3
Clustering
2
iteration 4
Clustering
2
iteration 10
Clustering
2
iteration 12
… main issues
the numberof clusters
Clustering
2
Clustering
2
iteration 12
Clustering
2
iteration 13
… main issues
false positives(pair of products wrongly
assigned to the same cluster)
false negatives(pair of products wrongly
assigned to different clusters)
Clustering
2
Clustering
2
iteration 12
Clustering
2
iteration 12
… what did we learn?
clustering algorithms+
evaluation metrics
Clustering
2
… second ecommerce
We haveanother client!
Clustering
2
… second ecommerce
We haveanother client!
And this one has categories!
Clustering
2
Clustering
2
Classification3
… what did we learn?
committee(modified SVM + kNN)
Clustering
2
… main issues
unbalanced classes
unlabeled areas
Classification
3
Classification
3
ActiveLearning3
Active Learning
4
Active Learning
4
Active Learning
4
Active Learning
4
Active Learning
4
Active Learning
4
Active Learning
4
… what did we learn?
distance:new areas
confidence:fix incorrectly classified
Active Learning
4
Active Learning
4
5Community extraction
(networks)
Community extraction (networks)
5
Community extraction (networks)
5
Community extraction (networks)
5
… what did we learn?
committee(clustering + community extraction)
Community extraction (networks)
5
Community extraction (networks)
5
Collaborative Filtering6
Collaborative Filtering
6
Customers Who Bought This Item Also Bought, PaulsHealthBlog.com, 11.04.2014
Collaborative Filtering
6
Collaborative Filtering
6
Collaborative Filtering
6
Collaborative Filtering
6
Collaborative Filtering
6
Collaborative Filtering
6
Collaborative Filtering
6
Collaborative Filtering
6
Collaborative Filtering
6
user-based
Collaborative Filtering
6
10 5 7 0 2 3 4 1...
Collaborative Filtering
6
10 5 7 0 2 3 4 1...item-based
Collaborative Filtering
6
Multi-armed Bandit7
Multi-armed Bandit
7
Exploration-Exploitation trade-off
Multi-armed Bandit
7
… case 1
algorithm 2
algorithm 1
…algorithm N
Multi-armed Bandit
7
… case 2
order 2
order 1
…
Multi-armed Bandit
7 chance to be picked
Multi-armed Bandit
7 chance to be picked
Multi-armed Bandit
7 chance to be picked
Multi-armed Bandit
7 chance to be picked
Multi-armed Bandit
7 chance to be picked
user feedback: click
Multi-armed Bandit
7 chance to be picked
Multi-armed Bandit
7 chance to be picked
user feedback: click
recommendations
data
preprocessing
mining
postprocessing
● products● pageviews● clicks● buyorders
etc.
Challenges
+...
popular items
outliers
incompatible
principal-accessory
++
How do weguarantee qualityto our clients?
● subjective evaluation: visualization● objective evaluation: quality measures● online evaluation: A/B test and Bandit
Multidimensional Projection(tSNE technique)
Stability, purity and coverage measures
Circular connected chart