Upload
ted-dunning
View
1.183
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Recommendation is really valuable and much easier to implement than most people think. Here's how.
Citation preview
© MapR Technologies, confidential © MapR Technologies, confidential
Introduction to Mahout
© MapR Technologies, confidential © MapR Technologies, confidential
Topic For This Section
• What is recommendation?• What makes it different?• What is multi-model recommendation?• How can I build it using common household
items?
© MapR Technologies, confidential © MapR Technologies, confidential
Oh … Also This
• Detailed break-down of a recommendation system running with Mahout on MapR
• With code examples
© MapR Technologies, confidential
I may have to summarize
© MapR Technologies, confidential
I may have to summarize
just a bit
© MapR Technologies, confidential
Part 1:5 minutes of background
© MapR Technologies, confidential
Part 2:5 minutes: I want a pony
© MapR Technologies, confidential
© MapR Technologies, confidential
Part 1:5 minutes of background
© MapR Technologies, confidential © MapR Technologies, confidential
What Does Machine Learning Look Like?
© MapR Technologies, confidential © MapR Technologies, confidential
What Does Machine Learning Look Like?
O(κ k d + k3 d) = O(k2 d log n + k3 d) for small k, high qualityO(κ d log k) or O(d log κ log k) for larger k, looser quality
But tonight we’re going to show you how to keep it simple yet powerful…
© MapR Technologies, confidential © MapR Technologies, confidential
Recommendations as Machine Learning• Recommendation:
– Involves observation of interactions between people taking action (users) and items for input data to the recommender model
– Goal is to suggest additional appropriate or desirable interactions– Applications include: movie, music or map-based restaurant choices;
suggesting sale items for e-stores or via cash-register receipts
© MapR Technologies, confidential
© MapR Technologies, confidential
© MapR Technologies, confidential
Part 2:How recommenders work
(I still want a pony)
© MapR Technologies, confidential
Recommendations
Recap:Behavior of a crowd helps us understand what individuals will do
© MapR Technologies, confidential
Recommendations
Alice got an apple and a puppy
Charles got a bicycle
Alice
Charles
© MapR Technologies, confidential
Recommendations
Alice got an apple and a puppy
Charles got a bicycle
Bob got an apple
Alice
Bob
Charles
© MapR Technologies, confidential
Recommendations
What else would Bob like?
?
Alice
Bob
Charles
© MapR Technologies, confidential
Recommendations
A puppy, of course!
Alice
Bob
Charles
© MapR Technologies, confidential
You get the idea of how recommenders work… (By the way, like me, Bob also wants a pony)
© MapR Technologies, confidential
Recommendations
What if everybody gets a pony?
?
Alice
Bob
Charles
Amelia What else would you recommend for Amelia?
© MapR Technologies, confidential
Recommendations
?
Alice
Bob
Charles
AmeliaIf everybody gets a pony, it’s not a very good indicator of what to else predict...
© MapR Technologies, confidential © MapR Technologies, confidential
Problems with Raw Co-occurrence
• Very popular items co-occur with everything (it’s doesn’t help that everybody wants a pony…)– Examples: Welcome document; Elevator music
• Widespread occurrence is not interesting– Unless you want to offer an item that is constantly
desired, such as razor blades (or ponies)• What we want is anomalous co-occurrence
– This is the source of interesting indicators of preference on which to base recommendation
© MapR Technologies, confidential © MapR Technologies, confidential
Get Useful Indicators from Behaviors
• Use log files to build history matrix of users x items– Remember: this history of interactions will be sparse
compared to all potential combinations• Transform to a co-occurrence matrix of items x items• Look for useful co-occurrence by looking for
anomalous co-occurrences to make an indicator matrix– Log Likelihood Ratio (LLR) can be helpful to judge which
co-occurrences can with confidence be used as indicators of preference
– RowSimilarityJob in Apache Mahout uses LLR
© MapR Technologies, confidential
Log Files
Alice
Bob
Charles
Alice
Bob
Charles
Alice
© MapR Technologies, confidential
Log Files
u1
u3
u2
u1
u3
u2
u1
t1
t4
t3
t2
t3
t3
t1
© MapR Technologies, confidential
History Matrix: Users by Items
Alice
Bob
Charles
✔ ✔ ✔✔ ✔
✔ ✔
© MapR Technologies, confidential
Co-occurrence Matrix: Items by Items
-
1 21 1
1
12 1
How do you tell which co-occurrences are useful?.
00
0 0
© MapR Technologies, confidential
Co-occurrence Matrix: Items by Items
-
1 21 1
1
12 1
Use LLR test to turn co-occurrence into indicators…
00
0 0
© MapR Technologies, confidential
Co-occurrence Binary Matrix
11not
not
1
© MapR Technologies, confidential © MapR Technologies, confidential
Spot the Anomaly
A not A
B 13 1000
not B 1000 100,000
A not A
B 1 0
not B 0 2
A not A
B 1 0
not B 0 10,000
A not A
B 10 0
not B 0 100,000
What conclusion do you draw from each situation?
© MapR Technologies, confidential © MapR Technologies, confidential
Spot the Anomaly
• Root LLR is roughly like standard deviations• In Apache Mahout, RowSimilarityJob uses LLR
A not A
B 13 1000
not B 1000 100,000
A not A
B 1 0
not B 0 2
A not A
B 1 0
not B 0 10,000
A not A
B 10 0
not B 0 100,000
0.90 1.95
4.52 14.3
What conclusion do you draw from each situation?
© MapR Technologies, confidential
Co-occurrence Matrix
-
1 21 1
1
12 1
Recap: Use LLR test to turn co-occurrence into indicators
00
0 0
© MapR Technologies, confidential
Indicator Matrix: Anomalous Co-Occurrence
✔✔
Result: The marked row will be added to the indicator field in the item document…
© MapR Technologies, confidential
Indicator Matrix
✔id: t4title: puppydesc: The sweetest little puppy ever.keywords: puppy, dog, pet
indicators: (t1)
That one row from indicator matrix becomes the indicator field in the Solr document used to deploy the recommendation engine.
Note: the indicator field is added directly to meta-data for a document in the Solr index. No need to create a separate index for indicators.
© MapR Technologies, confidential
Internals of the Recommender Engine
37
© MapR Technologies, confidential
Internals of the Recommender Engine
38
© MapR Technologies, confidential © MapR Technologies, confidential
Looking Inside LucidWorks
What to recommend if new user listened to 2122: Fats Domino & 303: Beatles?
Recommendation is “1710 : Chuck Berry”
39
Real-time recommendation query and results: Evaluation
© MapR Technologies, confidential
Search-based Recommendations
• Sample document– Merchant Id– Field for text
description– Phone– Address– Location
© MapR Technologies, confidential
Search-based Recommendations
• Sample document– Merchant Id– Field for text description– Phone– Address– Location
– Indicator merchant id’s– Indicator industry (SIC) id’s– Indicator offers– Indicator text– Local top40
© MapR Technologies, confidential
Search-based Recommendations
• Sample document– Merchant Id– Field for text description– Phone– Address– Location
– Indicator merchant id’s– Indicator industry (SIC) id’s– Indicator offers– Indicator text– Local top40
• Sample query– Current location– Recent merchant
descriptions– Recent merchant id’s– Recent SIC codes– Recent accepted
offers– Local top40
© MapR Technologies, confidential
Search-based Recommendations
• Sample document– Merchant Id– Field for text description– Phone– Address– Location
– Indicator merchant id’s– Indicator industry (SIC) id’s– Indicator offers– Indicator text– Local top40
• Sample query– Current location– Recent merchant
descriptions– Recent merchant id’s– Recent SIC codes– Recent accepted
offers– Local top40
Original data and meta-data
Derived from cooccurrence and cross-occurrence analysis
Recommendation query
© MapR Technologies, confidential © MapR Technologies, confidential
For example
• Users enter queries (A)– (actor = user, item=query)
• Users view videos (B)– (actor = user, item=video)
• ATA gives query recommendation– “did you mean to ask for”
• BTB gives video recommendation– “you might like these videos”
© MapR Technologies, confidential © MapR Technologies, confidential
The punch-line
• BTA recommends videos in response to a query– (isn’t that a search engine?)– (not quite, it doesn’t look at content or meta-data)
© MapR Technologies, confidential © MapR Technologies, confidential
Real-life example
• Query: “Paco de Lucia”• Conventional meta-data search results:
– “hombres de paco” times 400– not much else
• Recommendation based search:– Flamenco guitar and dancers– Spanish and classical guitar– Van Halen doing a classical/flamenco riff
© MapR Technologies, confidential © MapR Technologies, confidential
Real-life example
© MapR Technologies, confidential © MapR Technologies, confidential
Hypothetical Example
• Want a navigational ontology?• Just put labels on a web page with traffic
– This gives A = users x label clicks• Remember viewing history
– This gives B = users x items• Cross recommend
– B’A = label to item mapping• After several users click, results are whatever
users think they should be
© MapR Technologies, confidential
Nice. But we can do better?
© MapR Technologies, confidential © MapR Technologies, confidential
A Quick Simplification
• Users who do h (a vector of things a user has done)
• Also do r User-centric recommendations(transpose translates back to things)
Item-centric recommendations(change the order of operations)
A translates things into users
© MapR Technologies, confidential © MapR Technologies, confidential
Symmetry Gives Cross Recommentations
Conventional recommendations with off-line learningCross recommendations
© MapR Technologies, confidential
users
things
© MapR Technologies, confidential
users
thingtype 1
thingtype 2
© MapR Technologies, confidential
© MapR Technologies, confidential
Bonus Round:
When worse is better
© MapR Technologies, confidential © MapR Technologies, confidential
The Real Issues After First Production
• Exploration• Diversity• Speed
• Not the last fraction of a percent
© MapR Technologies, confidential © MapR Technologies, confidential
Result Dithering
• Dithering is used to re-order recommendation results – Re-ordering is done randomly
• Dithering is guaranteed to make off-line performance worse
• Dithering also has a near perfect record of making actual performance much better
© MapR Technologies, confidential © MapR Technologies, confidential
Result Dithering
• Dithering is used to re-order recommendation results – Re-ordering is done randomly
• Dithering is guaranteed to make off-line performance worse
• Dithering also has a near perfect record of making actual performance much better
“Made more difference than any other change”
© MapR Technologies, confidential
Why Dithering Works
Real-time recommender
Overnight training
Log Files
© MapR Technologies, confidential
Exploring The Second Page
© MapR Technologies, confidential © MapR Technologies, confidential
Simple Dithering Algorithm
• Synthetic score from log rank plus Gaussian
• Pick noise scale to provide desired level of mixing
• Typically
• Also… use floor(t/T) as seed
© MapR Technologies, confidential
Example … ε = 2
© MapR Technologies, confidential
Lesson:Exploration is good
© MapR Technologies, confidential
Part 3:What about that worked example?
© MapR Technologies, confidential
SolRIndexerSolR
IndexerSolrindexing
Cooccurrence(Mahout)
Item meta-data
Indexshards
Complete history
Analyze with Map-Reduce
© MapR Technologies, confidential
SolRIndexerSolR
IndexerSolrsearchWeb tier
Item meta-data
Indexshards
User history
Deploy with Conventional Search System
© MapR Technologies, confidential © MapR Technologies, confidential