34
Data Warehousing - in the real world - Dr. Thomas Zurek @tfxz November 2015 Big Data und Analytische Applikationen

Data Warehousing - in the real world

  • Upload
    ukc4

  • View
    704

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Data Warehousing - in the real world

Data Warehousing- in the real world -

Dr. Thomas Zurek@tfxz

November 2015

Big Data und Analytische Applikationen

Page 2: Data Warehousing - in the real world

Real-World Data Warehouses / Thomas Zurek

Who am I ?

• Vice President of Development @ SAP for – HANA Data Warehousing (DW)– Enterprise Performance Management (EPM)– HANA Analytics

• 18 years at SAP• PhD in Computer Science• Universities of Karlsruhe and Edinburgh

2November 2015

Page 3: Data Warehousing - in the real world

Real-World Data Warehouses / Thomas Zurek

Agenda

1. Examples 2. Business Intelligence (BI) + Data Warehouses (DW)3. Data Warehouses4. Layered Scalable Architecture (LSA)5. Big Data + Data Warehousing 6. Summary

3November 2015

Page 4: Data Warehousing - in the real world

Real-World Data Warehouses / Thomas Zurek 4

EXAMPLES

November 2015

Page 5: Data Warehousing - in the real world

Real-World Data Warehouses / Thomas Zurek 5

Examples of Business Intelligence Scenarios

• fraud detection- retail company- point-of-sales data & given discounts- huge amounts of data- a prototypical BI question

• long tail analysis- e-commerce companies like Amazon, Ebay, iTunes, Netflix, …- translate sales of popular products into (additional) sales in

the long tail- BI integrated into operational processes

November 2015

Page 6: Data Warehousing - in the real world

Real-World Data Warehouses / Thomas Zurek 6

Long Tail Analysis (1) – An Amazon Example

November 2015

Page 7: Data Warehousing - in the real world

Real-World Data Warehouses / Thomas Zurek 7

Long Tail Analysis (2)

Source: Chris Anderson, The Long Tail, Wired, October 2004, http://www.wired.com/wired/archive/12.10/tail.html

November 2015

Page 8: Data Warehousing - in the real world

Real-World Data Warehouses / Thomas Zurek 8

Long Tail Analysis (3)

Source: Chris Anderson, The Long Tail, Wired, October 2004, http://www.wired.com/wired/archive/12.10/tail.html

November 2015

Page 9: Data Warehousing - in the real world

Real-World Data Warehouses / Thomas Zurek 9

BUSINESS INTELLIGENCE +DATA WAREHOUSES

November 2015

Page 10: Data Warehousing - in the real world

Real-World Data Warehouses / Thomas Zurek 10

Business Intelligence and Data Warehouses

• Business Intelligence (BI)An environment in which business users conduct analyses that yield overall understanding of where

the business has been, where it is now, and where it will be in the near future (i.e. planning).

• Data Warehouse (DW)- An implementation of an informational database used to collect, integrate

and provide sharable data sourced from multiple operational databases for analyses.

- Provide data that is reliable, consistent, understandable.- It typically serves as the foundation for a business intelligence system.

November 2015

Page 11: Data Warehousing - in the real world

Real-World Data Warehouses / Thomas Zurek

Business Intelligence and Data Warehouses

11

Business IntelligenceOLAP, cubes, dimensions, measures, KPIs, scoreboards, dashboards,

pivot tables, data mining, predictive, slice & dice, planning, EPM, analytics, …

Data Warehouseconnectivity, cleansing, scrubbing, ETL, ELT, EHL,

transformation, harmonisation,consistency, compliance, auditing, big data, scalability, …

OperationalSystem

ERP, CRM, SCM, HR, …

Met

a Da

ta

secu

rity,

mod

els,

November 2015

Page 12: Data Warehousing - in the real world

Real-World Data Warehouses / Thomas Zurek

Business Intelligence and Data Warehouses

12

OperationalSystem

ERP, CRM, SCM, HR, …

Met

a Da

ta

secu

rity,

mod

els,

simply remember:(1) BI and DW(2) BI ≠ DW

Business IntelligenceOLAP, cubes, dimensions, measures, KPIs, scoreboards, dashboards,

pivot tables, data mining, predictive, slice & dice, planning, EPM, analytics, …

Data Warehouseconnectivity, cleansing, scrubbing, ETL, ELT, EHL,

transformation, harmonisation,consistency, compliance, auditing, big data, scalability, …

Focus today!

November 2015

Page 13: Data Warehousing - in the real world

Real-World Data Warehouses / Thomas Zurek 13

DATA WAREHOUSES

November 2015

Page 14: Data Warehousing - in the real world

Real-World Data Warehouses / Thomas Zurek

Multiple Data Sources

Why are there so many DBs at an enterprise?• business processes data captured in some DB• organisation reflected in system landscape• geography reflected in system landscape• smaller systems easier to manage than big systems• mergers and acquisitions• external data: market data, supplier data, …• …

14November 2015

Page 15: Data Warehousing - in the real world

Real-World Data Warehouses / Thomas Zurek

A Typical Example for Business Processes in an Enterprise

15

source: http://thebankwatch.com/2006/09/13/simplifying-the-business-model/

November 2015

Page 16: Data Warehousing - in the real world

Real-World Data Warehouses / Thomas Zurek 16

Business transform

End-user access / Presentation

Provide data

Data Acquisition

Harmonization

Data Propagation

Reporting / Analyses / Planning

Main Service : Spot for apps/Delta to app/App recoveryTransform : Enriched || General Business logicContent : Data source || Business domain specific History : Determined by rebuild requirements of appsStore : DSO(can be logical partitioned)

Main Service : Decouple, Fast load and distribute Transform : 1:1Content : 1 data source, All fields History : 4 weeksStore : PSA, DSO-WO.

Main Service : Integrated, harmonized Transform : Harmonize quality assure (in flow|| lookup)Content : Defined fieldsHistory : Short or not at all || Long termStore : Info source || IO/DSO/Z-table

Main Service : Make data available for reporting & planning tools Transform : Application specific/(dis-)aggregate/lookupContent : Application specific History : Application specific Store : IC,DSO, Info Set, Virtual Provider, Multi Provider.

A Typical Data Warehouse Architecture

Corp.Memory

ODSBI Layer

Data Warehouse

Source 1 Source 2 Source 3 Source 4 Source 5

Proj

ect G

over

nanc

eIT

Gov

erna

nce

November 2015

Page 17: Data Warehousing - in the real world

Real-World Data Warehouses / Thomas Zurek

Challenge 1: RELIABLE

• typical: data from 50-100 data sources• availability of data sources not given– system downtimes– network failures– example:

• availability per data source = 98%• all 100 data sources available = 0.98**100 = 13%• 1 out of 100 data sources not available = 1 – 0.13 = 87%

all data in one place asserts reliable data access

17November 2015

Page 18: Data Warehousing - in the real world

Real-World Data Warehouses / Thomas Zurek

Challenge 2: CONSISTENT

• Assume: each data source is consistent!• Is the union of all data sources consistent?

NO !

In a DW, data gets synchronised and harmonized to provide a consistent view spanning multiple data sources.

18November 2015

Page 19: Data Warehousing - in the real world

Real-World Data Warehouses / Thomas Zurek

Examples Challenge 2: Transformation, Cleansing

• Jun 1, 2011 = 1.6.2011 = 06/01/11 = …

• VW Touareg = VW TOUAREG = [product] 87654 = …

• currency and unit conversions:– box kg

– €, $, £, ¥, … €

• resolve ID clashes:product 123 [in subsiduary A] ≠ product 123 [in subsiduary B]

• enrich data:add attributes from source A to data from source B

19November 2015

Page 20: Data Warehousing - in the real world

Real-World Data Warehouses / Thomas Zurek

Examples Challenge 2: History / Time-Dependency

• data is time-dependent, e.g.– employee A worked in department X in 2012– employee A worked in department Y in 2013– currency exchange rates– current view vs historic view analysis

• versioning of meta data– models change– development test production– auditing

20November 2015

Page 21: Data Warehousing - in the real world

Automatisierte Überprüfung der Datenqualität in Form eines Plausibility Gates

Single Point of Truth

Quelle 1 Quelle 2 Quelle ... Quelle n

Fachliche Überprüfung der Daten verringern den Administrationsaufwand und den anschließenden „Ärger“

Harmonisierte Auswertungen

Plausibility GateUNSPSC-Code vorhanden?

RVO mit BVO-Bezug?

DUNS-Nummer vorhanden?

Größenordnung BVO/RVO?

real customer example

Page 22: Data Warehousing - in the real world

Real-World Data Warehouses / Thomas Zurek 22

Challenge 3: UNDERSTANDABLE

• texts for cryptic numbers• multi-language support• data provenance:

know where the data originated

• auditing: track changes• relevance:

show the user data from his "realm of command"

November 2015

Page 23: Data Warehousing - in the real world

Real-World Data Warehouses / Thomas Zurek 23

LAYERED SCALABALE ARCHITECTURE (LSA)

November 2015

Page 24: Data Warehousing - in the real world

Real-World Data Warehouses / Thomas Zurek 24

Business transform

End-user access / Presentation

Provide data

Data Acquisition

Harmonization

Data Propagation

Reporting / Analyses / Planning

Main Service : Spot for apps/Delta to app/App recoveryTransform : Enriched || General Business logicContent : Data source || Business domain specific History : Determined by rebuild requirements of appsStore : DSO(can be logical partitioned)

Main Service : Decouple, Fast load and distribute Transform : 1:1Content : 1 data source, All fields History : 4 weeksStore : PSA, DSO-WO.

Main Service : Integrated, harmonized Transform : Harmonize quality assure (in flow|| lookup)Content : Defined fieldsHistory : Short or not at all || Long termStore : Info source || IO/DSO/Z-table

Main Service : Make data available for reporting & planning tools Transform : Application specific/(dis-)aggregate/lookupContent : Application specific History : Application specific Store : IC,DSO, Info Set, Virtual Provider, Multi Provider.

A Typical Data Warehouse Architecture

Corp.Memory

ODSBI Layer

Data Warehouse

Source 1 Source 2 Source 3 Source 4 Source 5

Proj

ect G

over

nanc

eIT

Gov

erna

nce

November 2015

Page 25: Data Warehousing - in the real world

Real-World Data Warehouses / Thomas Zurek 25

Yet Another, Arbitrary Example …

Source: http://www.zentut.com/wp-content/uploads/2012/10/stand-alone-data-mart.jpg

November 2015

Page 26: Data Warehousing - in the real world

Real-World Data Warehouses / Thomas Zurek 26

The Layered Scalable Architecture (LSA)

• reference architecture for DW• term introduced by SAP, but not SAP-specific• layers:

– each layer has a certain task

– each layer has an associated service-level

– layers describe the step-wise refinement of data

• not every DW needs all LSA-layers• modern technology allows to remove / merge layers as less or

no performance-motivated services are required• more: http://tinyurl.com/sap-lsa

November 2015

Page 27: Data Warehousing - in the real world

Real-World Data Warehouses / Thomas Zurek 28

BIG DATA +DATA WAREHOUSING

November 2015

Page 28: Data Warehousing - in the real world

Real-World Data Warehouses / Thomas Zurek 29

Big Data – The 3 "V"s

• Velocity speed, parallelism

• Volume scale

• Variety many formats, file system

November 2015

Page 29: Data Warehousing - in the real world

Real-World Data Warehouses / Thomas Zurek 30

Big Data Example: Connected Cows

November 2015

• estrus detection by counting steps– motion sensors for 40000 cows

• benefits for artifical insemination:– less labour intensive– higher success rates:

45% 63%– sex determination

• references:– Strata Feb 2015:

http://tinyurl.com/connected-cows

http://www.fujitsu.com/global/about/resources/news/press-releases/2013/1015-01.html

Page 30: Data Warehousing - in the real world

Real-World Data Warehouses / Thomas Zurek 31

RDBMS vs Big Data

Data Warehouses / RDBMS

• INSERT + UPDATE + DELETE• prescreptive schema

– DDL

• strict notion of consistency

Big Data / HDFS

• INSERT (mostly)• descriptive schema

– XML– JSON

• sloppier due to scale of data and number of systems involved

November 2015

Page 31: Data Warehousing - in the real world

Real-World Data Warehouses / Thomas Zurek 32

Big Data + Data Warehouses

November 2015

Source: Strata Feb 2015, San Jose, CA, US – Keynote by Amr Awadallah (Cloudera)

Page 32: Data Warehousing - in the real world

Real-World Data Warehouses / Thomas Zurek 33

SUMMARY

November 2015

Page 33: Data Warehousing - in the real world

Real-World Data Warehouses / Thomas Zurek 34

What You Should Take Away

1. Difference: BI vs DW vs Big Data

2. What are the problems that a DW handles?

3. How are those problems tackled?

November 2015

Page 34: Data Warehousing - in the real world

Real-World Data Warehouses / Thomas Zurek 35November 2015