Optimizing Online Yield via Predictive Modeling of Individual Site Visitors Magnify360 Liasons:...

Preview:

Citation preview

Optimizing Online Yield via Predictive Modeling

of Individual Site Visitors

Magnify360 Liasons:

Olivier Chaine, Jim Healy, Nate Pool,

Gilles ?????

David LapayowkerMarissa Quitt

Elaine Shaver (PM)Devin Smith

HMC Advisor:

Zachary Dodds

Magnify360

Designs multiple websites for clients with each site customized to meet the needs of different types of users.

Analyzes clickstream data from site visitors in order to provide the website that will best suit each one.

The result is to convert a larger set of users than a single page.

old Facebook new Facebook

System OverviewNavigates to a site

serve pageclickstream data

User Actions

Dataflow

Our system

classify user

Musician

Tailored interactions "Conversion"

results

choose page

• user data• pages served• conversion data

Musician

Pachyphile

Bioengineer

Musician

Pasadena resident

InsomniacUser

groups

Online classifier Offline analysis

visitor@gmail.com

clustering

Problem StatementNavigates to a site

serve pageclickstream data

User Actions

Dataflow

Our system

classify user

Musician

Tailored interactions "Conversion"

results

choose page

• user data• pages served• conversion data

Musician

Pachyphile

Bioengineer

Musician

Pasadena resident

InsomniacUser

groups

Online classifier Offline analysis

visitor@gmail.com

clustering

Detailed problem statement here

Clickstream Dataexample columns…

Database

80 tables 110,000,000 rows 13 GB

ethics ~ anonymous ~ no purchased data!

User profilesA profile is a binary attribute that captures a specific combination of data values.

Currently 42 of them, hand-specified

insomniac something something

Tradeoffs:+ captures experienced intuition about what is important

+ takes advantage of Magnify360's site-design expertise

- binary attributes- may miss patterns not captured by the user profiles

from Mag360's site

Conversion dataThe site yield, or conversion, is client-specified

Amount of transaction(s)

3% conversion

Time spent on (a part of) the site

Contact information

presence and/or time of an email address

table

Goal: to determine those clusters of visitors who will be best served (convert) via a particular version of a client site

Offline analysis ~ user clustering

Visitors ~ vectors of profile

attributes

hand-tuned clusters

decision-tree clustering

fuzzy k-means clustering

support vector machines

one big cluster ~ "best page"

growing neural gas

hierarchical clustering

Offline analysis ~ user clustering

Visitors ~ vectors of profile

attributes

hand-tuned clusters

decision-tree clustering

fuzzy k-means clustering

support vector machines

one big cluster ~ "best page"

growing neural gashierarchical clustering

Offline analysis ~ user clustering

Visitors ~ vectors of profile

attributes

hand-tuned clusters

decision-tree clustering

fuzzy k-means clustering

support vector machines

one big cluster ~ "best page"

growing neural gashierarchical clustering

Offline analysis ~ user clustering

Visitors ~ vectors of profile

attributes

hand-tuned clusters

decision-tree clustering

fuzzy k-means clustering

support vector machines

one big cluster ~ "best page"

growing neural gashierarchical clustering

Support vector machine example

Can we get one of the real data pages?

This cluster of six people responds better to site B,

Page: AYield: 7 Page: A

Yield: 1

Page: AYield: 1

Page: BYield: 3

Page: BYield: 8

Page: BYield: 7

page A score ~ 3.0

page B score ~ 6.0

+7 1 1+

3 (visits)

+7 8 3+

3 (visits)

From clusters to sitesTraining data from each cluster determines the best site:

(yield)

(yield)

Magnify360 wants to adapt quickly to new preferences:

but site A has had better recent performance.

Page: AYield: 7t: 0

Page: AYield: 1t: 3

Page: AYield: 1t: 4

Page: BYield: 3t: 1 Page: B

Yield: 8t: 5

Page: BYield: 7t: 4

page A score ~ 6.05

page B score ~ 3.68

+ +2-3 • 120 • 7 2-4 • 1

20 + 2-3 + 2-4

+ +2-5 • 82-4 • 7 2-1 • 3

2-4 + 2-5 + 2-1

t ~ age of data

Time-based site choice

Time-weighted average yields:

procedure

Online classification

Possible results…

all on one graph

Results ~ Packet 8

comments

what about hand-tuned system results?

talk about SVM parameters here?

A closer look…

comments

Sensitivity to scoring parameters?

comments

David's charts

Software structure

comments

Diagram

What's done and not done…

Software structure

comments

Diagram

What's done and not done…

Perspective

Concluding comments

Questions?

Clickstream DataThe Good: We have DATA!

Too much?The Bad:

What is this data!?The Ugly:

~ 80 tables

~ 13 GB

One of our tables…

ID, anyone?

Fun Statistics

Data: To do

Understand the purpose of each table / column

Understand relationships between tables

Create a single table (or file) of relevant information in order to test and evaluate our clustering algorithms.

(table demodularization, against all design principles)

Clustering Algorithmsk-Means: Choose centroids at random, and place points in cluster such that distances inside clusters are minimized. Recalculate centroids and repeat until a steady state is reached

Fuzzy k-Means: Similar, but every datapoint is in a cluster to some degree, not just in or out.

Heirarchical Clustering: Uses a bottom-up approach to bring together points and clusters that are close together

Bottom line: These clustering algorithms are simple and effective techniques for categorizing data, but they cannot exist in a vacuum; we are investigating other techniques that may be used in parallel or instead.

FuzME's best 10-cluster results ~ synthetic data

Growing Neural Gas

A clustering algorithm masquerading as a neural network Given a data distribution, dynamically determines

nodes or “centroids” to represent the data

Growing Neural Gas

A clustering algorithm masquerading as a neural network Given a data distribution, dynamically determines

nodes or “centroids” to represent the data

User Profiles

Representative Nodes

Growing Neural Gas

A clustering algorithm masquerading as a neural network Given a data distribution, dynamically determines nodes

or “centroids” to represent the data

“Dynamic” because it adds or deletes nodes as necessary, as well as adapting nodes toward changes in the data.

User Profiles

Representative Nodes

How it works…

Find the closest node, s, and the next closest, t. Update the error of s by εw|s – x| Shift s and its neighbors toward x, and increment

the age of all those edges. If s and t are adjacent, set the age of that edge to

0. Otherwise, create that edge. Remove edges that are too old, decrease the

error of all edges by a small amount. Add a node every generations, putting it between the

node with the largest error and its largest-error neighbor. Repeat!

Given some input x:

A Few Parameters…

λ: Controls how frequently new nodes are inserted Max Edge Age: Dictates how often old edges are deleted εw: Factor to scale the value of the “winning” node εn: Factor to scale the value of the next nearest node α: Scale factor for decreasing the error of parent nodes β: Scale factor for decreasing error of all nodes

(Making sense of the GUI)

… and the difference they make.

λ= 1000λ= 100

• Larger λ, nodes inserted less often• Takes longer, but yields more accurate placement of nodes

• Smaller λ, nodes inserted more often • Leaves straggler nodes that don’t accurately match data

Support Vector Machines

Clearly planar

Planar in feature space

Support Vector Regression (Machine?)Goal: Minimize error between hyper-plane and data points.

SVM SVR

Maximize cluster separation Minimize plane-to-data distance

Getting the correct page…

What do we want from a technique?

Input: User data.Output: Page to serve.

Input: User data and possible page.Output: Predicted Success.

Both require multiple SVMs.

CLASSIFICATION:

REGRESSION:

Using Classification via SVMs

Predicted Page:

CDATA

C

B

C

Using Regression via SVRs

Page APredictor

Page BPredictor

Page CPredictor

0.42

0.24

0.78

Predicted Page:

CDATA

DataThe Good: We have DATA!

Too much?The Bad:

What is this data!?The Ugly:

~ 80 tables

~ 13 GB

One of our tables…

ID, anyone?

Fun Statistics

Data: To do

Understand the purpose of each table / column

Understand relationships between tables

Create a single table (or file) of relevant information in order to test and evaluate our clustering algorithms.

(table demodularization, against all design principles)

Goal Breakdown

Short-term Plan

Plan for Algorithm Comparison

Plan for Algorithm Comparison

Plan for Algorithm Comparison

Schedule and Conclusion

Friday November 14 Prototype algorithm comparison method

Friday November 21 Initial testing on real data Meeting with Magnify360

Friday December 5 Initial composition of classification algorithms

Friday December 12 Midyear Report

Questions?

Questions?

SVM vs SVR

SVM SVR

Maximize Distance Minimize Distance

Data

The Bad, or, The Challenges:

Lots of SQL data

Some Data Tables

80 tables total…

Data Size

Problem StatementOfficially: Develop an innovative predictive modeling system to predict shopping cart abandonment based on profiles, clusters, shopping cart contents

Most importantly: GRAB from email ! Research and implement various AI techniques to optimize the process of matching users with websites

Individualized Online Experiences

Classifying Users

Unsupervised clustering: points are clustered without knowledge of the results

Supervised clustering: clusters are built using prior knowledge of the results

Ethical concerns?

Recap: What Magnify360 Does

Individualize a website for different types of users

Collect data on users from their clickstream, and give them the site that will appeal to them best

Appeal to a larger base of users by making the site more interesting to a larger group

serving both!old Facebook