Upload
ukc4
View
704
Download
2
Embed Size (px)
Citation preview
Data Warehousing- in the real world -
Dr. Thomas Zurek@tfxz
November 2015
Big Data und Analytische Applikationen
Real-World Data Warehouses / Thomas Zurek
Who am I ?
• Vice President of Development @ SAP for – HANA Data Warehousing (DW)– Enterprise Performance Management (EPM)– HANA Analytics
• 18 years at SAP• PhD in Computer Science• Universities of Karlsruhe and Edinburgh
2November 2015
Real-World Data Warehouses / Thomas Zurek
Agenda
1. Examples 2. Business Intelligence (BI) + Data Warehouses (DW)3. Data Warehouses4. Layered Scalable Architecture (LSA)5. Big Data + Data Warehousing 6. Summary
3November 2015
Real-World Data Warehouses / Thomas Zurek 4
EXAMPLES
November 2015
Real-World Data Warehouses / Thomas Zurek 5
Examples of Business Intelligence Scenarios
• fraud detection- retail company- point-of-sales data & given discounts- huge amounts of data- a prototypical BI question
• long tail analysis- e-commerce companies like Amazon, Ebay, iTunes, Netflix, …- translate sales of popular products into (additional) sales in
the long tail- BI integrated into operational processes
November 2015
Real-World Data Warehouses / Thomas Zurek 6
Long Tail Analysis (1) – An Amazon Example
November 2015
Real-World Data Warehouses / Thomas Zurek 7
Long Tail Analysis (2)
Source: Chris Anderson, The Long Tail, Wired, October 2004, http://www.wired.com/wired/archive/12.10/tail.html
November 2015
Real-World Data Warehouses / Thomas Zurek 8
Long Tail Analysis (3)
Source: Chris Anderson, The Long Tail, Wired, October 2004, http://www.wired.com/wired/archive/12.10/tail.html
November 2015
Real-World Data Warehouses / Thomas Zurek 9
BUSINESS INTELLIGENCE +DATA WAREHOUSES
November 2015
Real-World Data Warehouses / Thomas Zurek 10
Business Intelligence and Data Warehouses
• Business Intelligence (BI)An environment in which business users conduct analyses that yield overall understanding of where
the business has been, where it is now, and where it will be in the near future (i.e. planning).
• Data Warehouse (DW)- An implementation of an informational database used to collect, integrate
and provide sharable data sourced from multiple operational databases for analyses.
- Provide data that is reliable, consistent, understandable.- It typically serves as the foundation for a business intelligence system.
November 2015
Real-World Data Warehouses / Thomas Zurek
Business Intelligence and Data Warehouses
11
Business IntelligenceOLAP, cubes, dimensions, measures, KPIs, scoreboards, dashboards,
pivot tables, data mining, predictive, slice & dice, planning, EPM, analytics, …
Data Warehouseconnectivity, cleansing, scrubbing, ETL, ELT, EHL,
transformation, harmonisation,consistency, compliance, auditing, big data, scalability, …
OperationalSystem
ERP, CRM, SCM, HR, …
Met
a Da
ta
secu
rity,
mod
els,
…
November 2015
Real-World Data Warehouses / Thomas Zurek
Business Intelligence and Data Warehouses
12
OperationalSystem
ERP, CRM, SCM, HR, …
Met
a Da
ta
secu
rity,
mod
els,
…
simply remember:(1) BI and DW(2) BI ≠ DW
Business IntelligenceOLAP, cubes, dimensions, measures, KPIs, scoreboards, dashboards,
pivot tables, data mining, predictive, slice & dice, planning, EPM, analytics, …
Data Warehouseconnectivity, cleansing, scrubbing, ETL, ELT, EHL,
transformation, harmonisation,consistency, compliance, auditing, big data, scalability, …
Focus today!
November 2015
Real-World Data Warehouses / Thomas Zurek 13
DATA WAREHOUSES
November 2015
Real-World Data Warehouses / Thomas Zurek
Multiple Data Sources
Why are there so many DBs at an enterprise?• business processes data captured in some DB• organisation reflected in system landscape• geography reflected in system landscape• smaller systems easier to manage than big systems• mergers and acquisitions• external data: market data, supplier data, …• …
14November 2015
Real-World Data Warehouses / Thomas Zurek
A Typical Example for Business Processes in an Enterprise
15
source: http://thebankwatch.com/2006/09/13/simplifying-the-business-model/
November 2015
Real-World Data Warehouses / Thomas Zurek 16
Business transform
End-user access / Presentation
Provide data
Data Acquisition
Harmonization
Data Propagation
Reporting / Analyses / Planning
Main Service : Spot for apps/Delta to app/App recoveryTransform : Enriched || General Business logicContent : Data source || Business domain specific History : Determined by rebuild requirements of appsStore : DSO(can be logical partitioned)
Main Service : Decouple, Fast load and distribute Transform : 1:1Content : 1 data source, All fields History : 4 weeksStore : PSA, DSO-WO.
Main Service : Integrated, harmonized Transform : Harmonize quality assure (in flow|| lookup)Content : Defined fieldsHistory : Short or not at all || Long termStore : Info source || IO/DSO/Z-table
Main Service : Make data available for reporting & planning tools Transform : Application specific/(dis-)aggregate/lookupContent : Application specific History : Application specific Store : IC,DSO, Info Set, Virtual Provider, Multi Provider.
A Typical Data Warehouse Architecture
Corp.Memory
ODSBI Layer
Data Warehouse
Source 1 Source 2 Source 3 Source 4 Source 5
Proj
ect G
over
nanc
eIT
Gov
erna
nce
November 2015
Real-World Data Warehouses / Thomas Zurek
Challenge 1: RELIABLE
• typical: data from 50-100 data sources• availability of data sources not given– system downtimes– network failures– example:
• availability per data source = 98%• all 100 data sources available = 0.98**100 = 13%• 1 out of 100 data sources not available = 1 – 0.13 = 87%
all data in one place asserts reliable data access
17November 2015
Real-World Data Warehouses / Thomas Zurek
Challenge 2: CONSISTENT
• Assume: each data source is consistent!• Is the union of all data sources consistent?
NO !
In a DW, data gets synchronised and harmonized to provide a consistent view spanning multiple data sources.
18November 2015
Real-World Data Warehouses / Thomas Zurek
Examples Challenge 2: Transformation, Cleansing
• Jun 1, 2011 = 1.6.2011 = 06/01/11 = …
• VW Touareg = VW TOUAREG = [product] 87654 = …
• currency and unit conversions:– box kg
– €, $, £, ¥, … €
• resolve ID clashes:product 123 [in subsiduary A] ≠ product 123 [in subsiduary B]
• enrich data:add attributes from source A to data from source B
19November 2015
Real-World Data Warehouses / Thomas Zurek
Examples Challenge 2: History / Time-Dependency
• data is time-dependent, e.g.– employee A worked in department X in 2012– employee A worked in department Y in 2013– currency exchange rates– current view vs historic view analysis
• versioning of meta data– models change– development test production– auditing
20November 2015
Automatisierte Überprüfung der Datenqualität in Form eines Plausibility Gates
Single Point of Truth
Quelle 1 Quelle 2 Quelle ... Quelle n
Fachliche Überprüfung der Daten verringern den Administrationsaufwand und den anschließenden „Ärger“
Harmonisierte Auswertungen
Plausibility GateUNSPSC-Code vorhanden?
RVO mit BVO-Bezug?
DUNS-Nummer vorhanden?
Größenordnung BVO/RVO?
real customer example
Real-World Data Warehouses / Thomas Zurek 22
Challenge 3: UNDERSTANDABLE
• texts for cryptic numbers• multi-language support• data provenance:
know where the data originated
• auditing: track changes• relevance:
show the user data from his "realm of command"
November 2015
Real-World Data Warehouses / Thomas Zurek 23
LAYERED SCALABALE ARCHITECTURE (LSA)
November 2015
Real-World Data Warehouses / Thomas Zurek 24
Business transform
End-user access / Presentation
Provide data
Data Acquisition
Harmonization
Data Propagation
Reporting / Analyses / Planning
Main Service : Spot for apps/Delta to app/App recoveryTransform : Enriched || General Business logicContent : Data source || Business domain specific History : Determined by rebuild requirements of appsStore : DSO(can be logical partitioned)
Main Service : Decouple, Fast load and distribute Transform : 1:1Content : 1 data source, All fields History : 4 weeksStore : PSA, DSO-WO.
Main Service : Integrated, harmonized Transform : Harmonize quality assure (in flow|| lookup)Content : Defined fieldsHistory : Short or not at all || Long termStore : Info source || IO/DSO/Z-table
Main Service : Make data available for reporting & planning tools Transform : Application specific/(dis-)aggregate/lookupContent : Application specific History : Application specific Store : IC,DSO, Info Set, Virtual Provider, Multi Provider.
A Typical Data Warehouse Architecture
Corp.Memory
ODSBI Layer
Data Warehouse
Source 1 Source 2 Source 3 Source 4 Source 5
Proj
ect G
over
nanc
eIT
Gov
erna
nce
November 2015
Real-World Data Warehouses / Thomas Zurek 25
Yet Another, Arbitrary Example …
Source: http://www.zentut.com/wp-content/uploads/2012/10/stand-alone-data-mart.jpg
November 2015
Real-World Data Warehouses / Thomas Zurek 26
The Layered Scalable Architecture (LSA)
• reference architecture for DW• term introduced by SAP, but not SAP-specific• layers:
– each layer has a certain task
– each layer has an associated service-level
– layers describe the step-wise refinement of data
• not every DW needs all LSA-layers• modern technology allows to remove / merge layers as less or
no performance-motivated services are required• more: http://tinyurl.com/sap-lsa
November 2015
Real-World Data Warehouses / Thomas Zurek 28
BIG DATA +DATA WAREHOUSING
November 2015
Real-World Data Warehouses / Thomas Zurek 29
Big Data – The 3 "V"s
• Velocity speed, parallelism
• Volume scale
• Variety many formats, file system
November 2015
Real-World Data Warehouses / Thomas Zurek 30
Big Data Example: Connected Cows
November 2015
• estrus detection by counting steps– motion sensors for 40000 cows
• benefits for artifical insemination:– less labour intensive– higher success rates:
45% 63%– sex determination
• references:– Strata Feb 2015:
http://tinyurl.com/connected-cows
http://www.fujitsu.com/global/about/resources/news/press-releases/2013/1015-01.html
Real-World Data Warehouses / Thomas Zurek 31
RDBMS vs Big Data
Data Warehouses / RDBMS
• INSERT + UPDATE + DELETE• prescreptive schema
– DDL
• strict notion of consistency
Big Data / HDFS
• INSERT (mostly)• descriptive schema
– XML– JSON
• sloppier due to scale of data and number of systems involved
November 2015
Real-World Data Warehouses / Thomas Zurek 32
Big Data + Data Warehouses
November 2015
Source: Strata Feb 2015, San Jose, CA, US – Keynote by Amr Awadallah (Cloudera)
Real-World Data Warehouses / Thomas Zurek 33
SUMMARY
November 2015
Real-World Data Warehouses / Thomas Zurek 34
What You Should Take Away
1. Difference: BI vs DW vs Big Data
2. What are the problems that a DW handles?
3. How are those problems tackled?
November 2015
Real-World Data Warehouses / Thomas Zurek 35November 2015