36
Using machine learning to determine drivers of bounce and conversion Velocity 2016 Santa Clara

Using machine learning to determine drivers of bounce and conversion

Embed Size (px)

Citation preview

Page 1: Using machine learning to determine drivers of bounce and conversion

Using machine learning to determine drivers

of bounce and conversionVelocity 2016 Santa Clara

Page 2: Using machine learning to determine drivers of bounce and conversion

Pat Meenan@patmeenan

Tammy Everts@tameverts

Page 3: Using machine learning to determine drivers of bounce and conversion

What we did (and why we did it)

Page 4: Using machine learning to determine drivers of bounce and conversion

Get the codehttps://github.com/WPO-

Foundation/beacon-ml

Page 5: Using machine learning to determine drivers of bounce and conversion

Deep learning

weights

Page 6: Using machine learning to determine drivers of bounce and conversion

Random forestLots of random decision trees

Page 7: Using machine learning to determine drivers of bounce and conversion

Vectorizing the data• Everything needs to be numeric• Strings converted to several inputs as

yes/no (1/0)• i.e. Device manufacturer• “Apple” would be a discrete input

• Watch out for input explosion (UA String)

Page 8: Using machine learning to determine drivers of bounce and conversion

Balancing the data• 3% conversion rate• 97% accurate by always guessing

no• Subsample the data for 50/50 mix

Page 9: Using machine learning to determine drivers of bounce and conversion

Validation data• Train on 80% of the data• Validate on 20% to prevent

overfitting

Page 10: Using machine learning to determine drivers of bounce and conversion

Smoothing the dataML works best on normally

distributed data

scaler = StandardScaler()x_train = scaler.fit_transform(x_train)x_val = scaler.transform(x_val)

Page 11: Using machine learning to determine drivers of bounce and conversion

Input/output relationships

• SSL highly correlated with conversions• Long sessions highly correlated with

not bouncing• Remove correlated features from

training

Page 12: Using machine learning to determine drivers of bounce and conversion

Training deep learning

model = Sequential()model.add(...)model.compile(optimizer='adagrad', loss='binary_crossentropy', metrics=["accuracy"])model.fit(x_train, y_train, nb_epoch=EPOCH_COUNT, batch_size=32, validation_data=(x_val, y_val), verbose=2, shuffle=True)

Page 13: Using machine learning to determine drivers of bounce and conversion

Training random forest

clf = RandomForestClassifier(n_estimators=FOREST_SIZE, criterion='gini', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features='auto', max_leaf_nodes=None, bootstrap=True, oob_score=False, n_jobs=12, random_state=None, verbose=2, warm_start=False, class_weight=None)clf.fit(x_train, y_train)

Page 14: Using machine learning to determine drivers of bounce and conversion

Feature importancesclf.feature_importances_

Page 15: Using machine learning to determine drivers of bounce and conversion

What we learned

Page 16: Using machine learning to determine drivers of bounce and conversion

What’s in our beacon?• Top-level – domain, timestamp, SSL

• Session – start time, length (in pages), total load time• User agent – browser, OS, mobile ISP• Geo – country, city, organization, ISP, network speed• Bandwidth• Timers – base, custom, user-defined• Custom metrics• HTTP headers• Etc.

Page 17: Using machine learning to determine drivers of bounce and conversion

Conversion rate

Page 18: Using machine learning to determine drivers of bounce and conversion

Conversion rate

Page 19: Using machine learning to determine drivers of bounce and conversion

Bounce rate

Page 20: Using machine learning to determine drivers of bounce and conversion

Bounce rate

Page 21: Using machine learning to determine drivers of bounce and conversion

Finding 1Number of scripts was a predictor…

but not in the way we expected

Page 22: Using machine learning to determine drivers of bounce and conversion

Number of scripts per page (median)

Page 23: Using machine learning to determine drivers of bounce and conversion

Finding 2When entire sessions were more

complex, they converted less

Page 24: Using machine learning to determine drivers of bounce and conversion

Finding 3Sessions that converted had 38% fewer images than sessions that didn’t

Page 25: Using machine learning to determine drivers of bounce and conversion

Number of images per page (median)

Page 26: Using machine learning to determine drivers of bounce and conversion

Finding 4DOM ready was the greatest

indicator of bounce rate

Page 27: Using machine learning to determine drivers of bounce and conversion

DOM ready (median)

Page 28: Using machine learning to determine drivers of bounce and conversion

Finding 5Full load time was the second

greatest indicator of bounce rate

Page 29: Using machine learning to determine drivers of bounce and conversion

timers_loaded (median)

Page 30: Using machine learning to determine drivers of bounce and conversion

Finding 6Mobile-related measurements weren’t meaningful predictors of conversions

Page 31: Using machine learning to determine drivers of bounce and conversion

Conversions

Page 32: Using machine learning to determine drivers of bounce and conversion

Finding 7Some conventional metrics

were (almost) meaningless, too

Page 33: Using machine learning to determine drivers of bounce and conversion

Feature Importance (out of 93)

DNS lookup 79Start render 69

Page 34: Using machine learning to determine drivers of bounce and conversion

Takeaways

Page 35: Using machine learning to determine drivers of bounce and conversion

1. YMMV2. Do this with your own data3. Gather your RUM data4. Run the machine learning

against it

Page 36: Using machine learning to determine drivers of bounce and conversion

Thanks!