42
BACKING UP OUR TABLEAU - STEPS TOWARDS A RELIABLE REPORTING SOLUTION Sergii Khomenko, Data Scientist Dr. Konstantin Wemhöner, Head of Business Intelligence

Secure Data Scalability at Stylight with Tableau Online and Amazon Redshift / Tableau Conference on Tour - Berlin - Jun 9, 2015

Embed Size (px)

Citation preview

BACKING UP OUR TABLEAU - STEPS TOWARDS A

RELIABLE REPORTING SOLUTION

Sergii Khomenko, Data Scient is t

Dr. Konstant in Wemhöner, Head of Business Intel l igence

WHATIS

STYLIGHT?

A S H O R T

I N T R O D U C T I O N

STYLIGHT.de Seite 2 / 42

CONTENT meets COMMERCE

N

E

W

The best place to discover & shop fashion.STYLIGHT.DE

STYLIGHT.de Seite 3 / 42

GLOBAL INSPIRATION – LOCAL COMMERCE

A V A I L A B L E I N 1 4 C O U N T R I E S

Germany, Aust r ia, Switzer land, Netherlands, France, I taly, Sweden, UK,

Spain, Aust ral ia, Brazi l, US, Norway, Belgium

STYLIGHT.de Seite 4 / 42

STYLIGHT ON THE GO

W H E N E V E R .

W H E R E V E R .

STYLIGHT.de Seite 5 / 42

PROUD TO BLEED PURPLE

• Founded: 2008 in Munich

• OFFICES: Munich, London, New York

• Investors: Holtzbrinck Ventures,

Tengelmann Ventures, Seven Ventures

• Business Par tners: 350+ par tner shops

worldwide with 6000+ brands

• Total Employees — 160+

(over 19 nat ionali t ies f rom 4 cont inents)

F A C T S A N D F I G U R E S

TOTA

L N

° O

F EM

PLO

YEES

50

100

150

201520142013201220112010

STYLIGHT.de Seite 6 / 42

GROSS MERCHANDISE VALUE

$360 MILLION

$2 70 MILLION

$175 MILLION

$50 MILL2011

2012

2013

2014

STYLIGHT.de Seite 7 / 42

BI SETUP BEFORE OUTAGE

SETUP OF OUR REPORTING WORKFLOW

Department

Business Intelligence

STYLIGHT.de Seite 9 / 42

OUR FIRST “DWH”: TABLEAU ONLINE

STYLIGHT.de Seite 10 / 42

WHY WE CHOSE TABLEAU ONLINE?

• Easy to start using

• Works for free

• All data sources in one place

• Unified routine

STYLIGHT.de Seite 11 / 42

WHY WE CHOSE TABLEAU ONLINE?

• combination of local and online/cloud sources

(Google Analytics, JDBC…)

• Sharing cross-continents - instantaneous

• easy distribution of reports with Tabcmd

STYLIGHT.de Seite 12 / 42

SAMPLE REPORTS FROM OUR ENVIRONMENT

STYLIGHT.de Seite 13 / 42

SAMPLE REPORTS FROM OUR ENVIRONMENT

STYLIGHT.de Seite 14 / 42

3 TYPES OF DATA SOURCES

static snapshot incremental

STYLIGHT.de Seite 15 / 42

HOW DATA SOURCES WERE UPDATED

STYLIGHT.de Seite 16 / 42

LOADING AND MONITORING BEFORE OUTAGE

• 25 workbooks online with 119 views from

80 data sources

• Scheduled mails

• All refreshes scheduled manually

STYLIGHT.de Seite 17 / 42

AND THEN IT CRASHED ...

STYLIGHT.de Seite 18 / 42

BACK FROM CHRISTMAS … AND EVERYTHING CRASHED

STYLIGHT.de Seite 19 / 42

SERVER OUTAGE JANUARY 2015

• Started with empty scheduled mail reports

(9th Jan)

• Monday >80% of views not working

• No clear communication from Tableau

• Server outage during our scheduled refreshes

STYLIGHT.de Seite 20 / 42

SERIOUS DOWNTIME OF REPORTING INFRASTRUCTURE

STYLIGHT.de Seite 21 / 42

3 TYPES OF DATA SOURCES AND HOW THEY WERE AFFECTED

STYLIGHT.de Seite 22 / 42

FIRST THINGS FIRST: FIREFIGHTING

Replacement of all data sources in workbooks

Open Local copy New extract Replace

STYLIGHT.de Seite 23 / 42

HOW TO REBUILD A BROKEN DATA SOURCE?

Biggest Issue: Workbooks could not be opened

due to broken data source

Understand how a Tableau data extract is build

Find a way to extract and recreate the essential parts

of a TDE

STYLIGHT.de Seite 24 / 42

STYLIGHT.de Seite 25 / 42

THE INSIDE OF A TABLEAU DATA SOURCE FILE

STYLIGHT.de Seite 26 / 42

RECREATING THE FILES CONTENT

STYLIGHT.de Seite 27 / 42

ISSUES, PLANS

• We have all DS accessible

• We know where data comes from

• Structure re-creation

• Migration without any manual input

STYLIGHT.de Seite 28 / 42

DWH WITH AMAZON REDSHIFT

IMPROVING OUR TECHNICAL SETUP

STYLIGHT.de Seite 30 / 42

GENERATION OF DATA INPUT FOR REDSHIFT

STYLIGHT.de Seite 31 / 42

SERVER-SIDE MONITORING OF DATA REFRESHES

STYLIGHT.de Seite 32 / 42

TRACKING DWH PERFORMANCE

STYLIGHT.de Seite 33 / 42

BENEFITS

• Control over backups

• Control over refreshes

• Scale DWH up to petabyte scale

• Easy to add new ETL stages (EMR)

• More open for new challenges

STYLIGHT.de Seite 34 / 42

MONITORING INSTALLED FOR ALL REFRESHING DATA SOURCES

STYLIGHT.de Seite 35 / 42

ADDITIONAL ERROR LOGGING WITH LOGGLY

STYLIGHT.de Seite 36 / 42

OUTCOME & FUTURE PLANS

POSITIVE OUTCOMES

• Number of data sources reduced by 30%

• Speed increase by using RedShift factor >100

• Scalable infrastructure for growing company

• More flexible connection of tables via RedShift

STYLIGHT.de Seite 38 / 42

IMPROVING IT TO THE NEXT LEVEL!

• Open Source our Python tools

• Internal DWH mapping server

• Flexible to integrate new things

• Google Spreadsheet integration

STYLIGHT.de Seite 39 / 42

HOW TO REACH US

T O O L S , T U G M U N I C H

Sergi i Khomenkosergi i .khomenko@styl ight .com

@lc0d3r

G E N E R A L I N F O , B I J O B S

Dr. Konstant in Wemhönerkonstant in.wemhoener@styl ight .com

@kwarks85

STYLIGHT Engineering: @CodeTai lors

STYLIGHT.de Seite 40 / 42

STYLIGHT.de Seite 41 / 42

STYLIGHTNymphenburger Straße 86

80636 Munich, Germany

Join us on Facebook: facebook.com/st yl ight

Fol low us on Twit ter : twi t ter.com/st yl ight

Fol low us on Instagram: ins tagram.com/st yl ight

STYLIGHT.de Seite 42 / 42