20
Snowplow drives everything we do What and why?

Snowplow is at the core of everything we do

Embed Size (px)

Citation preview

Page 1: Snowplow is at the core of everything we do

Snowplow drives everything we do

What and why?

Page 2: Snowplow is at the core of everything we do

Digital and print publisher

Family-owned German company

116 sites across Australia and New Zealand

Tag management across all sites

Bauer Media

Page 3: Snowplow is at the core of everything we do

Just start collecting

Snowplow data collection in 2014

We didn’t really have a use case

Page 4: Snowplow is at the core of everything we do
Page 5: Snowplow is at the core of everything we do

Stuff we record

Page views

Metadata around content

User logins

Email click-throughs

Ad impressions

Page 6: Snowplow is at the core of everything we do

Use cases started showing up

Cross-site integrated reporting

Ad hoc tricky analysis

Sanity checking industry audience reporting

Stalking individual users

Audience overlaps

Page 7: Snowplow is at the core of everything we do
Page 8: Snowplow is at the core of everything we do
Page 9: Snowplow is at the core of everything we do
Page 10: Snowplow is at the core of everything we do

User behaviour

Ad impressions

Content metadata

Trending service

Recommendations

Dashboards

Ad hoc analysis

Page 11: Snowplow is at the core of everything we do

Some things you can’t do in GA

Tag-based reporting

Accurate reporting of in-app Facebook using user-agent contains FBAN

Page 12: Snowplow is at the core of everything we do
Page 13: Snowplow is at the core of everything we do
Page 14: Snowplow is at the core of everything we do
Page 15: Snowplow is at the core of everything we do

We’re using Snowplow 0.9.2 from 2014-04-29!

It just works

We’ve been busy building other stuff

Page 16: Snowplow is at the core of everything we do

But...

Page pings is b0rken: no time spent or scroll depth

(Out-of-the-box) browser categorisation is terrible

Hourly batches are a bit higher latency than we’d like

No context shredding, but JSON queries are performant enough

Page 17: Snowplow is at the core of everything we do

runSnowPlow.shWeb page

(JavaScript in page creates

image beacon)

S3

CloudfrontSnowCannon

(Node app in Elastic

Beanstalk)

Redirects to

Writes logs to

ETL(Elastic Map

Reduce)

S3

events(Redshift)

events_temp(Redshift)

x_events(Redshift)

Page 18: Snowplow is at the core of everything we do

Tips

Redshift can get very expensive very quickly

Decent dashboarding platforms are rare

And plenty of crap ones are overpriced

Just tip everything in and worry about what you’ll do later

Page 19: Snowplow is at the core of everything we do

What’s next?

Page 20: Snowplow is at the core of everything we do

Future plans

Upgrade ETL to real-time: probably our own solution

Time spent and scroll depth

Shredding?