How to Build a Data-Driven Company: From Infrastructure to Insights

#datastack#datastack

What you’re going to learn1 How top engineering organizations are

building their data infrastructure

The 7 core challenges of data integration

Why companies like Asana, Buffer, and SeatGeek choose Redshift for their analytics warehouse

...and much more!

#datastack

Data Infrastructure: Then and Now

Dillon

#datastack

The traditional approach: ETL Dillon

END USERBI TEAMETL TEAM EDW TEAM

TRANSACTIONAL DATA

SUMMARY

ELT - Heavy Transformation Restricted Q&AOLAP / Silos

SUMMARY

#datastack

How companies are doing it today: ELT

Dillon

Modeling LayerTransform at Query

Database

Extract Load

- name: first_purchasers type: single_value base_view: orders measures:[orders.customer.all]

AnalyticsViz & Exploration

3rd Party Data

Transform (and Explore!)

#datastack

Benefits of this approach1.Redshift is performant enough to handle most

transformations2.Users prefer performing transformations in a

language they already use (SQL) or with UI3.Transformations are much simpler, more

transparent4.Performing transformations alongside raw data

is great for auditability

Dillon

#datastack

Data infrastructure has geek cred Shaun

#datastack

Data Integration

Data Warehouse

BI/Analytics

What the stack looks like Shaun

#datastack

Data Integration

#datastack

Why consolidation matters

Common data sources for internal analytics Shaun

#datastack

Quick poll Shaun

What top five data sources are a top priority for you to integrate/keep integrated?● production databases● events● error logs● billing● email marketing● crm● advertising● erp● a/b testing● support

#datastack

“A year ago, we were facing a lot of stability problems with our data processing. When there was a major shift in a graph, people immediately questioned the data integrity. It was hard to distinguish interesting insights from bugs. Data science is already an art so you need the infrastructure to give you trustworthy answers to the questions you ask. 99% correctness is not good enough. And on the data infrastructure team, we were spending a lot of time churning on fighting urgent fires, and that prevented us from making much long-term progress. It was painful.”

- Marco Gallotta, Asana, How to Build Stable, Accessible Data Infrastructure at a Startup

#datastack

“Our story would end here if real-time processing were perfect. But it’s not: some events can come in days late, some time ranges need to be re-processed after initial ingestion due to code changes or data revisions, various components of the real-time pipeline can fail, and so on.”

- Gian Merlino, MetaMarkets, Building a Data Pipeline That Handles Billions of Events in Real-Time

#datastack

7 core challenges of data integration

Connections: Every API is aunique and special snowflake

Accuracy: Ordering data on a distributed system

Latency: Large object data stores (Amazon S3, Redshift) are optimized for batches not streams

Scale: Data will grow exponentially as your company grows

Flexibility: you’re interacting with systems you don’t control

Monitoring: Notifications for expired credentials, errors, notifications of disruptions

Maintenance: Justifying investment in ongoing maintenance/improvement

#datastack

Or...try Pipeline Shaun

Ad Platforms Customer Support

Web Data

Marketing Automation

CRM PaymentsEcommerce

#datastack

Warehousing Infrastructure

#datastack

Analytics warehouse Shaun

Redshift is the most common analytics warehouse.

Chosen by: Asana, Braintree, Looker, Seatgeek, VigLink, Buffer

Why Redshift is awesome Shaun

AirBnB experimentHive Redshift

Test 1: 3 billion rows of data 28 minutes <6 minutesTest 2: two joins with millions of rows

182 seconds 8 seconds

Cost $1.29/hour/node $0.85/hour/node

#datastack

Periscope research Shaun

#datastack

DiamondStream’s dashboard query performance Shaun

#datastack

Business Intelligence & Analytics

Dillon

A broken model Dillon

● Feedback loop is broken

● Disparate reporting● Non-unified decision

making● Versioning● Reusability is lost

Marketing

Finance

#datastack

Constraints of SQL Dillon

SQL is versatile, but shares the same flavor as assembly-only languages such as Perl

Can write but not readPromotes one-off, piecemeal analysisDisparate interpretation

#datastack

The critical multiplier: modeling Dillon

Any SQL Data Warehouse

Modeling Layer

What’s our most successful marketing campaign

How does our Q4 Pipeline looks?

Who are our healthiest / happiest customers?

Interactive, collaborative analytics Dillon

● Data access

● Uniform definitions

● A Shared View

● Collaboration

● Analytical Speed

#datastack

What You Can Do

Dillon

Integrated data + analytics tools Dillon

Week 1 Week 2-3RJMetrics Pipeline

BLOCKS

Looker blocks: sales & marketing

Looker blocks: event analytics

#datastack

Thank you!

How to Build a Data-Driven Company: From Infrastructure to Insights

Technology

Pinterest and Instagram - Data Driven Actionable Insights

CSM Analyst presentation - Corbion segment growth driven by •innovation and market insights… •……that let us create value propositions for our customers •and build our business

Strategic marketing driven by real-time consumer insights

Data Insights Driven Business Model Innovation

Single-Molecule Insights into PcrA-Driven Disruption of RecA …d-scholarship.pitt.edu/10806/1/FagerburgDissertation2011.pdf · Single-Molecule Insights into PcrA-Driven Disruption

Apigee Insights: Data & Context-Driven Actions

News Driven Business Cycles: Insights and Challenges

USING DATA DRIVEN INSIGHTS TO DRIVE WORKFORCE STRATEGY€¦ · actionable data-driven business insights direct to decision makers. Altius’s management intelligence and data analytics

Accenture: Data-driven-insights optimize-service-experience

Shaping Mobile Commerce with Data-Driven Insights

Build An Agile BI Organization To Support An · fOr APPLICATION DeVeLOPmeNT DeLIVerY PrOfeSSIONALS Build an agile Bi organization To support an insights-driven culture february 19,

Massive and Misunderstood Data-Driven Insights into ... · Massive and Misunderstood: Data-Driven Insights into National Oil Companies Executive summary National oil companies (NOCs)

Insights from a novel, user-driven science transfer

Applying Workforce Analytics: Data-Driven Insights to ... · Applying Workforce Analytics: Data-Driven Insights to Inform Talent Decisions Mick Collins & Marcus Joseph, SuccessFactors

Together, we can grow businesses, create jobs and build ... · provides big cities, small towns and economic development organizations data-driven insights, marketing resources and

Hybrid integration for data-driven insights · data-driven insights: Grant Thornton Accelerates Client Service Delivery, Establishes Digital Workspace. For accounting, audit, tax,

Continuous Change-Driven Build Verification

Data-driven insights: Assessment of airline ancillary services

Launching an Insights-Driven Transformation

The Performance-Driven Marketer’s - Full Circle Insights€¦ · The Performance-Driven Marketer’s Guide to Annual Planning Full Circle Insights FullCircleInsights.com 650.641.2766