41
Using machine learning to determine drivers of bounce and conversion (part 2) Velocity 2016 New York

Using machine learning to determine drivers of bounce and conversion (part 2)

Embed Size (px)

Citation preview

Page 1: Using machine learning to determine drivers of bounce and conversion (part 2)

Using machine learning to determine drivers

of bounce and conversion(part 2)

Velocity 2016 New York

Page 2: Using machine learning to determine drivers of bounce and conversion (part 2)

Pat Meenan@patmeenan

Tammy Everts@tameverts

Page 3: Using machine learning to determine drivers of bounce and conversion (part 2)

What we did (and why we did it)

Page 4: Using machine learning to determine drivers of bounce and conversion (part 2)
Page 5: Using machine learning to determine drivers of bounce and conversion (part 2)

Get the codehttps://github.com/WPO-

Foundation/beacon-ml

Page 6: Using machine learning to determine drivers of bounce and conversion (part 2)

Deep learning

weights

Page 7: Using machine learning to determine drivers of bounce and conversion (part 2)

Random forestLots of random decision trees

Page 8: Using machine learning to determine drivers of bounce and conversion (part 2)

Vectorizing the data• Everything needs to be numeric• Strings converted to several inputs as

yes/no (1/0)• i.e. Device manufacturer• “Apple” would be a discrete input

• Watch out for input explosion (UA String)

Page 9: Using machine learning to determine drivers of bounce and conversion (part 2)

Balancing the data• 3% conversion rate• 97% accurate by always guessing

no• Subsample the data for 50/50 mix

Page 10: Using machine learning to determine drivers of bounce and conversion (part 2)

Smoothing the dataML works best on normally

distributed data

scaler = StandardScaler()x_train = scaler.fit_transform(x_train)x_val = scaler.transform(x_val)

Page 11: Using machine learning to determine drivers of bounce and conversion (part 2)

Validation data• Train on 80% of the data• Validate on 20% to prevent

overfitting–Training accuracy from validation set

Page 12: Using machine learning to determine drivers of bounce and conversion (part 2)

Input/output relationships

• SSL highly correlated with conversions• Long sessions highly correlated with

not bouncing• Remove correlated features from

training

Page 13: Using machine learning to determine drivers of bounce and conversion (part 2)

Training random forest

clf = RandomForestClassifier(n_estimators=FOREST_SIZE, criterion='gini', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features='auto', max_leaf_nodes=None, bootstrap=True, oob_score=False, n_jobs=12, random_state=None, verbose=2, warm_start=False, class_weight=None)clf.fit(x_train, y_train)

Page 14: Using machine learning to determine drivers of bounce and conversion (part 2)

Feature importancesclf.feature_importances_

Page 15: Using machine learning to determine drivers of bounce and conversion (part 2)

Training deep learning

model = Sequential()model.add(...)model.compile(optimizer='adagrad', loss='binary_crossentropy', metrics=["accuracy"])model.fit(x_train, y_train, nb_epoch=EPOCH_COUNT, batch_size=32, validation_data=(x_val, y_val), verbose=2, shuffle=True)

Page 16: Using machine learning to determine drivers of bounce and conversion (part 2)

Understanding deep learning

Page 17: Using machine learning to determine drivers of bounce and conversion (part 2)

Brute force FTW• 93 input “features”• Train 93 models with 1 input–Measuring the prediction accuracy of each

• Train 92 models with 2 inputs– Top feature from first round–Measure combined prediction accuracy

• Lather, rinse, repeat…

Page 18: Using machine learning to determine drivers of bounce and conversion (part 2)

Visualizing the model• Take trained model (X inputs)

• Vary inputs–100ms to 20 seconds in 100ms intervals

• Apply the data smoothing from training set• model.predict_proba

Page 19: Using machine learning to determine drivers of bounce and conversion (part 2)

What we learned

Page 20: Using machine learning to determine drivers of bounce and conversion (part 2)

What’s in our beacon?

• Top-level – domain, timestamp, SSL• Session – start time, length (in pages), total load time• User agent – browser, OS, mobile ISP• Geo – country, city, organization, ISP, network speed• Bandwidth• Timers – base, custom, user-defined• Custom metrics• HTTP headers

https://docs.soasta.com/whatsinbeacon/

Page 21: Using machine learning to determine drivers of bounce and conversion (part 2)

Finding 1Maybe everything doesn’t matter

after all

Page 22: Using machine learning to determine drivers of bounce and conversion (part 2)

Bounce rate

Page 23: Using machine learning to determine drivers of bounce and conversion (part 2)

Finding 2DOM ready (aka DOM content

loaded) and average session load time were the best indicators of

bounce rate

Page 24: Using machine learning to determine drivers of bounce and conversion (part 2)

Up to 89.5% accuracy

Page 25: Using machine learning to determine drivers of bounce and conversion (part 2)
Page 26: Using machine learning to determine drivers of bounce and conversion (part 2)

Finding 3When it came to getting high

predictability, conversion data was tougher than bounce data

Page 27: Using machine learning to determine drivers of bounce and conversion (part 2)

81% prediction accuracy was as high as we got

Page 28: Using machine learning to determine drivers of bounce and conversion (part 2)

Finding 4Pages with more scripts were

more less likely to convert

Page 29: Using machine learning to determine drivers of bounce and conversion (part 2)
Page 30: Using machine learning to determine drivers of bounce and conversion (part 2)

Finding 5The number of DOM elements

matters…a lot

Page 31: Using machine learning to determine drivers of bounce and conversion (part 2)
Page 32: Using machine learning to determine drivers of bounce and conversion (part 2)

Finding 6Mobile-related measurements weren’t meaningful predictors of conversions

Page 33: Using machine learning to determine drivers of bounce and conversion (part 2)
Page 34: Using machine learning to determine drivers of bounce and conversion (part 2)

Finding 7Some conventional metrics

were not as important as we thought

Page 35: Using machine learning to determine drivers of bounce and conversion (part 2)

Feature Importance (bounce)

Start render 69 ~top 3

Page 36: Using machine learning to determine drivers of bounce and conversion (part 2)

Things to watch out for

(other than dangling prepositions)

Page 37: Using machine learning to determine drivers of bounce and conversion (part 2)

Yep, checkout pages are SLOW

Page 38: Using machine learning to determine drivers of bounce and conversion (part 2)
Page 39: Using machine learning to determine drivers of bounce and conversion (part 2)

Takeaways

Page 40: Using machine learning to determine drivers of bounce and conversion (part 2)

1. YMMV2. Do try this at home3. Gather your RUM data (lots of

it)4. Run the machine learning

against it5. If you get unexpected results,

keep digging

Page 41: Using machine learning to determine drivers of bounce and conversion (part 2)

Thanks!@patmeenan@tameverts