Upload
yalisassoon
View
2.883
Download
1
Embed Size (px)
Citation preview
Snowplow drives everything we do
What and why?
Digital and print publisher
Family-owned German company
116 sites across Australia and New Zealand
Tag management across all sites
Bauer Media
Just start collecting
Snowplow data collection in 2014
We didn’t really have a use case
Stuff we record
Page views
Metadata around content
User logins
Email click-throughs
Ad impressions
Use cases started showing up
Cross-site integrated reporting
Ad hoc tricky analysis
Sanity checking industry audience reporting
Stalking individual users
Audience overlaps
User behaviour
Ad impressions
Content metadata
Trending service
Recommendations
Dashboards
Ad hoc analysis
Some things you can’t do in GA
Tag-based reporting
Accurate reporting of in-app Facebook using user-agent contains FBAN
We’re using Snowplow 0.9.2 from 2014-04-29!
It just works
We’ve been busy building other stuff
But...
Page pings is b0rken: no time spent or scroll depth
(Out-of-the-box) browser categorisation is terrible
Hourly batches are a bit higher latency than we’d like
No context shredding, but JSON queries are performant enough
runSnowPlow.shWeb page
(JavaScript in page creates
image beacon)
S3
CloudfrontSnowCannon
(Node app in Elastic
Beanstalk)
Redirects to
Writes logs to
ETL(Elastic Map
Reduce)
S3
events(Redshift)
events_temp(Redshift)
x_events(Redshift)
Tips
Redshift can get very expensive very quickly
Decent dashboarding platforms are rare
And plenty of crap ones are overpriced
Just tip everything in and worry about what you’ll do later
What’s next?
Future plans
Upgrade ETL to real-time: probably our own solution
Time spent and scroll depth
Shredding?