88
Robson Motta | robson@chaordic.com.br 7 Machine Learning techniques in practice in a Startup

7 Machine Learning techniques in pratice in a Startup (Robson Motta - WIAMS UFMS 2015)

Embed Size (px)

Citation preview

Robson Motta | [email protected]

7 Machine Learning techniquesin practice in a Startup

312.000.000.000(this means billions)

recommendations in 2014

Get to know our solutions

How to present the

bestrecommendation for each client/context?

recommendations

data

recommendations

data

preprocessing

processing

postprocessing

● products● pageviews● clicks● buyorders

etc.

Machine Learning

“All models are wrong,but some are useful”

(George E. P. Box)

… first ecommerce

We havea client!

Content-based Filtering1

Content-based Filtering

1

frequency of term n in document d

IDF factor ofterm n

weight of term nwithin document d

Content-based Filtering

1

reference

reference

reference

reference

Content-based Filtering

1

Content-based Filtering

1

Content-based Filtering

1

Content-based Filtering

1

Clustering2

Clustering

2

Clustering

2

Clustering

2

iteration 1

Clustering

2

iteration 2

Clustering

2

iteration 3

Clustering

2

iteration 4

Clustering

2

iteration 10

Clustering

2

iteration 12

Clustering

2

… main issues

the numberof clusters

Clustering

2

Clustering

2

iteration 12

Clustering

2

iteration 13

… main issues

false positives(pair of products wrongly

assigned to the same cluster)

false negatives(pair of products wrongly

assigned to different clusters)

Clustering

2

Clustering

2

iteration 12

Clustering

2

iteration 12

… what did we learn?

clustering algorithms+

evaluation metrics

Clustering

2

… second ecommerce

We haveanother client!

Clustering

2

… second ecommerce

We haveanother client!

And this one has categories!

Clustering

2

Clustering

2

Classification3

Classification

3

Classification

3

Classification

3

Classification

3

… what did we learn?

committee(modified SVM + kNN)

Clustering

2

… main issues

unbalanced classes

unlabeled areas

Classification

3

Classification

3

ActiveLearning3

Active Learning

4

Active Learning

4

Active Learning

4

Active Learning

4

Active Learning

4

Active Learning

4

Active Learning

4

… what did we learn?

distance:new areas

confidence:fix incorrectly classified

Active Learning

4

Active Learning

4

5Community extraction

(networks)

Community extraction (networks)

5

Community extraction (networks)

5

Community extraction (networks)

5

… what did we learn?

committee(clustering + community extraction)

Community extraction (networks)

5

Community extraction (networks)

5

Collaborative Filtering6

Collaborative Filtering

6

Customers Who Bought This Item Also Bought, PaulsHealthBlog.com, 11.04.2014

Collaborative Filtering

6

Collaborative Filtering

6

Collaborative Filtering

6

Collaborative Filtering

6

Collaborative Filtering

6

Collaborative Filtering

6

Collaborative Filtering

6

Collaborative Filtering

6

Collaborative Filtering

6

user-based

Collaborative Filtering

6

10 5 7 0 2 3 4 1...

Collaborative Filtering

6

10 5 7 0 2 3 4 1...item-based

Collaborative Filtering

6

Multi-armed Bandit7

Multi-armed Bandit

7

Exploration-Exploitation trade-off

Multi-armed Bandit

7

… case 1

algorithm 2

algorithm 1

…algorithm N

Multi-armed Bandit

7

… case 2

order 2

order 1

Multi-armed Bandit

7 chance to be picked

Multi-armed Bandit

7 chance to be picked

Multi-armed Bandit

7 chance to be picked

Multi-armed Bandit

7 chance to be picked

Multi-armed Bandit

7 chance to be picked

user feedback: click

Multi-armed Bandit

7 chance to be picked

Multi-armed Bandit

7 chance to be picked

user feedback: click

recommendations

data

preprocessing

mining

postprocessing

● products● pageviews● clicks● buyorders

etc.

Challenges

+...

popular items

outliers

incompatible

principal-accessory

++

How do weguarantee qualityto our clients?

● subjective evaluation: visualization● objective evaluation: quality measures● online evaluation: A/B test and Bandit

Multidimensional Projection(tSNE technique)

Stability, purity and coverage measures

Circular connected chart

A/B tests

Robson [email protected]