20
US Office: 1355 Market Street, #488 San Francisco, CA 94103 German Office: Katharinenstr. 15 04109 Leipzig, Germany Beyond the Data Lake Simplifying data integration for the modern age Matthias Korn | Head of Presales [email protected]

Beyond the Data Lake - Matthias Korn, Technical Consultant at Data Virtuality

Embed Size (px)

Citation preview

Page 1: Beyond the Data Lake - Matthias Korn, Technical Consultant at Data Virtuality

US Office:1355 Market Street, #488San Francisco, CA 94103

German Office:Katharinenstr. 1504109 Leipzig, Germany

Beyond the Data LakeSimplifying data integration for the modern age

Matthias Korn | Head of [email protected]

Page 2: Beyond the Data Lake - Matthias Korn, Technical Consultant at Data Virtuality

Variety is The Challenge

Gartner 2014: “VARIETY is the biggest challenge.”

“When asked about the dimensions of data organizations strugglewith most, 49% answered variety, while 35%answered volume and 16% velocity.”

Page 3: Beyond the Data Lake - Matthias Korn, Technical Consultant at Data Virtuality

1996 - Variety already was a major challenge…

Page 4: Beyond the Data Lake - Matthias Korn, Technical Consultant at Data Virtuality

Integration using the Data Warehouse

Data is integrated by copying it into a central repositoryApproach: ETL process (Extract/Transform/Load)Structure is applied on the way into the repositoryBI users query Data Marts

Page 5: Beyond the Data Lake - Matthias Korn, Technical Consultant at Data Virtuality

Why do so many DWH projects fail: ETL

Inflexible; costly modifications

Labour-intensive setup and maintenance

Over 50% failure rate*

Slow data-to-actionable-insights (6 to 9+ months)

Page 6: Beyond the Data Lake - Matthias Korn, Technical Consultant at Data Virtuality

2016 – Variety is Getting Dramatic

Page 7: Beyond the Data Lake - Matthias Korn, Technical Consultant at Data Virtuality

Where does the complexity come from?

Big Data• Machine data, unstructured

data, social data, streaming data, IoT, etc.

Cloud data• APIs, cloud data platforms etc.

Page 8: Beyond the Data Lake - Matthias Korn, Technical Consultant at Data Virtuality

Data Lake – getting some data in pretty easy…

Clickstream Data

Sensor Data

Server logs

Unique identifie

r provide

d

Metadata tags

provied

Original data

structure

Databases Web APIs

…still challenges with other data

Page 9: Beyond the Data Lake - Matthias Korn, Technical Consultant at Data Virtuality

Integration using the Data Lake

Data is integrated by copying it into a central repository

Approach: ELT process (Extract/Load/Transform)

Data loaded in the original structure

For Data Scientists rather than for BI users

BI users query Data Marts: wait, didn‘t they do this before already?

Page 10: Beyond the Data Lake - Matthias Korn, Technical Consultant at Data Virtuality

Data Lake and DWH

Both physical data integrationBoth require significant upfront effort to create and fill with dataBoth miss agility from BI user‘s point of view

Page 11: Beyond the Data Lake - Matthias Korn, Technical Consultant at Data Virtuality

Reasons for physical data integration

Query all data with same languageModel data with same languageHigh performance

Page 12: Beyond the Data Lake - Matthias Korn, Technical Consultant at Data Virtuality

The Logical Data WarehouseIntroduced by Gartner in 2012New data management architecture for analyticsUses repositories just like the EDWAdds distributed processes like Data LakeAdds virtualization of data sources for business agilityRemoves the obstacle of physical data integration

Page 13: Beyond the Data Lake - Matthias Korn, Technical Consultant at Data Virtuality

Logical Data Warehouse (LDW)

Page 14: Beyond the Data Lake - Matthias Korn, Technical Consultant at Data Virtuality

What does the Logical Data Warehouse do?

LDW knows where the data is stored instead of copying itCombines different technologies for different usecases

• big data processing• Classical BI• Agile business analytics

Page 15: Beyond the Data Lake - Matthias Korn, Technical Consultant at Data Virtuality

Advantages of the Logical Data Warehouse

Real time data available and ready for analysisImmediately productiveFlexible Logical Data ModelPermissions, governanceAPIs, WebservicesDecoupling business layer and tech layer

Page 16: Beyond the Data Lake - Matthias Korn, Technical Consultant at Data Virtuality

Technology Map

Page 17: Beyond the Data Lake - Matthias Korn, Technical Consultant at Data Virtuality

ConclusionLogical Data Warehouse holds enormous promiseUnified data architecture for both Big Data and classical BI usecasesFlexibility and real-time access give an advantageExplore->Use->Optimize instead of Build->Test->Use

provide quicker time to solutionWe dataconomy

Page 18: Beyond the Data Lake - Matthias Korn, Technical Consultant at Data Virtuality

US Office:1355 Market Street, #488San Francisco, CA 94103

German Office:Katharinenstr. 1504109 Leipzig, Germany

Thanks for your attention

Page 19: Beyond the Data Lake - Matthias Korn, Technical Consultant at Data Virtuality

Backup 1 : Example data flow in an LDW

Distributed queryBI frontend aware of all data sources - creates SQL statementPerformance optimization engine replicates data only if needed

Page 20: Beyond the Data Lake - Matthias Korn, Technical Consultant at Data Virtuality

Backup 2: Competitive LandscapeAcquired