Upload
pferrel
View
58
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Optimizing discovery and engagement
Citation preview
A FINDERBOTS.COM PRODUCTION
The Guide to Predictive Analytics
DISCOVERY
A FINDERBOTS.COM PRODUCTION
FINDERBOTS.COM• Independent Consulting Service
• Specialize in Big-data Predictive Analytics• Recommenders
• Personalized discovery
• Search optimization and personalization
• Committer to open source machine learning projects (Apache Mahout, Finderbots Solr-recommender)
Pat Ferrel
A FINDERBOTS.COM PRODUCTION
DISCOVERY: • Browse
• editorial categories
• user generated content—tags, hashtags, comments, likes, shares
• realtime predictive analytics driven “concepts”
• Search• keywords is not enough
• inferred keywords (from usage data)
• personalized search (from collaborative filtering data, just like Google)
• Recommendations• profile based, content based, usage based
• entire catalog can be skewed by predictive analytics
• required
• why?
A FINDERBOTS.COM PRODUCTION
DISCOVERY: • Browse
• editorial categories
• user generated content—tags, hashtags, comments, likes, shares
• realtime predictive analytics driven “concepts”
• Search• keywords is not enough
• inferred keywords (from usage data)
• personalized search (from collaborative filtering data, just like Google)
• Recommendations• profile based, content based, usage based
• entire catalog can be skewed by predictive analytics
• required
• why?
Netflix—80% of views
Amazon—60% of sales
Yahoo News—40% increase in TOS
Better Discovery = Better
Engagement
A FINDERBOTS.COM PRODUCTION
NOT JUST RECOMMENDATIONS
Pervasive Content Personalization
A FINDERBOTS.COM PRODUCTION
• Search for “leather laptop bag”
• Hmm, some are ok but not quite right
• Put some in “wishlist”
• Look at recommendations
• Add and remove as you like… …things improve!
• Never knew I wanted a “Messenger bag with a leather strap”
• Didn’t know what one was so would never have searched for it
RECOMMENDATIONS CAN DO WHAT SEARCH CANNOT
A FINDERBOTS.COM PRODUCTION
• Search for “leather laptop bag”
• Buy “leather messenger bag with leather strap”
• With the right usage data we can infer “messenger bag” = “laptop bag”
• Now –the the words I know will get me –the object I want even though –I didn’t know how to ask for it
SEARCH THAT KNOWS WHAT THE USER MEANS
A FINDERBOTS.COM PRODUCTION
THE CUTTING EDGE IN PREDICTIVE ANALYTICS• Uses any number of user actions—entire user clickstream
• Uses metadata—from user profile or item
• Uses context—on-site, time, location
• Uses content—unstructured text or semi-structured
• Personalizes recommendations even when content-based
• Mixes any number of “indicators” to increase quality or tune to specific context
• Solves the “cold-start” problem—items with too short a lifespan
• Can recommend to new users in realtime
• Improves Search
• Personalizes Search
A FINDERBOTS.COM PRODUCTION
THE GOOD NEWS
• 90% of these features come from 3 technologies• Search engine (Solr, Elasticsearch)
• Mahout
• Spark
• 90% of the flexibility comes at runtime via query—not from new analytical models.
A FINDERBOTS.COM PRODUCTION
Technical Overview
THE UNIVERSAL RECOMMENDER
A FINDERBOTS.COM PRODUCTION
ARCHITECTURE
HDFSaction logging
action logsMahout 1.0
spark-itemsimilarity
cooccurrenceindicators
Scalable Store
HDFS or DB
content ormetadata
content ormetadata =intrinsic indicators
content indicators
Spark
Mahout 1.0 spark-
rowsimilarity
Application
Catalog creation and
editing
Search Engine
indicatorsindex
query
recomm
endations
recs request
realtime background
A FINDERBOTS.COM PRODUCTION
ANATOMY OF A RECOMMENDATIONr = recommendationshp = a user’s history of some primary action (purchase for instance)P = the history of all users’ primary action rows are users, columns are items[PtP] = compares column to column using log-likelihood based cooccurrence
r = hp[PtP]
A FINDERBOTS.COM PRODUCTION
THE UNIVERSAL RECOMMENDER• Virtually all collaborative filtering type
recommenders can use only one indicator of preference—one action
• But the theory doesn’t stop there
• Virtually all user actions can be used to improve recommendations—purchase, view, category view…
r = hp[PtP]
r = hp[PtP] + hv[VtP] + hc[CtP] + …
A FINDERBOTS.COM PRODUCTION
A COOCCURRENCE INDICATOR• [PtP] is an indicator matrix for some primary
action like purchase• Rows = users, columns = items, boolean data
• Compares cooccurring interactions using the log-likelihood ratio—column-wise similarity
• LLR finds important cooccurrences and filters out the rest
• Comparing the history of the primary action to other actions finds the secondary actions that lead to the primary—the effect is to scrub secondary actions of non-meaningful ones
A FINDERBOTS.COM PRODUCTION
CROSS-COOCCURRENCE INDICATORShi = a user’s history of an actionP, V, C = the history of all users’ history of some
action (purchase, view, category view)[PtX] = the pairwise comparison of column to column—comparison may be across two actions but is always anchored by primary
r = hp[PtP] + hv[VtP] + hc[CtP] + …
A FINDERBOTS.COM PRODUCTION
CROSS-COOCCURRENCE SO WHAT?• The entire user’s clickstream can be used• Items clicked• Terms searched• Categories viewed• Items shared• People followed• Items liked or disliked• Video watched• Virtually any action the user can takes makes
it easier to predict what they will like in the future.
A FINDERBOTS.COM PRODUCTION
FROM INDICATOR TO RECOMMENDATION
• This actually means to take the user’s history hp and compare it to rows of the indicator matrix [PtP]
• TF-IDF weighting of indicators would be nice to mitigate popular items
• Query the indicator with user history
• Sort these by similarity strength and keep only the highest—you have recommendations
• Sound familiar?
• That is exactly what a search engine does—except for calculating indicators
r = hp[PtP]
A FINDERBOTS.COM PRODUCTION
INDICATOR TYPES• Cooccurrence and cross-cooccurrence
• Calculated from user actions as discussed
• Create with Mahout 1.0 spark-itemsimilarity
• Content or metadata• Tags, categories, description text, anything describing an item
• Create with Mahout 1.0 spark-rowsimilarity
• Intrinsic• Tags, genres, categories, popularity rank, geo-location,
anything describing an item
• Some may be derived from usage data like popularity rank, or hotness
• Is a known or specially calculated property of the item
A FINDERBOTS.COM PRODUCTION
CONTENT INDICATORS• Finds similar items based on their content—not which users preferred them
• Examples: text descriptions, tags, categories, genres
r = recommended items, based on tags
ht = a user’s history of an action on items with tags
[TTt] = item similarity based on similar tags—a content indicator
• This personalizes even content based recommendations
r = ht[TTt]
A FINDERBOTS.COM PRODUCTION
INTRINSIC INDICATORS• Attributes of items
• Genre, subject, category, tags
• Specially calculated based on business rules• Popularity, hotness
• Based on demographics• Preferred by people using mobile access
• Preferred by city dwellers
• Preferred by people in warmer climes
• Query by value—not user history
r = v*I
A FINDERBOTS.COM PRODUCTION
THE UNIVERSAL RECOMMENDER“Unified” means one query on all indicators at once
Unified query: query: users-history-of-purchases; field: purchase query: users-history-of-views; field: view query: users-history-of-categories-viewed; field: category query: users-history-of-purchases; field: tags query: users-location; field: geo-location-preferred …
r = hp[PtP] + hv[VtP] + hc[CtP] + ht[TTt] + l*L …
A FINDERBOTS.COM PRODUCTION
ONE OR MANY• One query—one trip to one scalable
search engine
• Many flavors—customize in the query• Customize for content context
• Customize for user context• Profile, location, time, …
• Customize for special indicators• Trending, hot, new, popular
• All personalized
A FINDERBOTS.COM PRODUCTION
POLISH THE APPLE• Auto-optimize via explore-exploit (important):
Randomize some returned recs, if they are acted upon they become part of the new training data and are more likely to be recommended in the future
• Visibility control:• Don’t show dups or Show dups at some rate
• Filter items the user has already seen
• Generate some intrinsic indicators like hotness, popularity—helps solve the “cold-start” problem
• Asymmetric train vs query management—for instance query with most recent actions, train on all ingested
• On-demand cross-validation scoring for tuning purposes
• A/B testing integration with explore-exploit