Upload
lauren-campbell-assoc-cipd
View
30
Download
0
Embed Size (px)
Citation preview
Data Warehousing on HadoopThe Future of Data
Warehousing.
Big Data in the Olden Days
p: +353 1 254 2897t: @sonra_io
e: [email protected]: www.sonra.io
_________________________________________________________________________________________________
_________________________________________________________________________________________________
Big Data Today
Machine Translation
______________________________________________________________________________________________
p: +353 1 254 2897t: @sonra_io
e: [email protected]: www.sonra.io
_________________________________________________________________________________________________
Big Data Today
Voice of Patient
p: +353 1 254 2897t: @sonra_io
e: [email protected]: www.sonra.io
_________________________________________________________________________________________________
______________________________________________________________________________________________
The Rise of Big Data
p: +353 1 254 2897t: @sonra_io
e: [email protected]: www.sonra.io
_________________________________________________________________________________________________
______________________________________________________________________________________________
The Perfect Data Storm
With digitisation we now have an abundance of data (exponential growth).
Globalisation & Machine Data
Distributed Computing.
Moore’s Law
New breakthroughs in Artificial Intelligence (neural networks)
p: +353 1 254 2897t: @sonra_io
e: [email protected]: www.sonra.io
_________________________________________________________________________________________________
______________________________________________________________________________________________
Big Data vs EDW
p: +353 1 254 2897t: @sonra_io
e: [email protected]: www.sonra.io
_________________________________________________________________________________________________
______________________________________________________________________________________________
EDW Ralph
p: +353 1 254 2897t: @sonra_io
e: [email protected]: www.sonra.io
_________________________________________________________________________________________________
______________________________________________________________________________________________
EDW Bill
p: +353 1 254 2897t: @sonra_io
e: [email protected]: www.sonra.io
_________________________________________________________________________________________________
______________________________________________________________________________________________
ETL vs ELT
p: +353 1 254 2897t: @sonra_io
e: [email protected]: www.sonra.io
_________________________________________________________________________________________________
______________________________________________________________________________________________
SMP vs MPP______________________________________________________________________________________________
p: +353 1 254 2897t: @sonra_io
e: [email protected]: www.sonra.io
_________________________________________________________________________________________________
RDBMS - Swiss Data Knife
p: +353 1 254 2897t: @sonra_io
e: [email protected]: www.sonra.io
_________________________________________________________________________________________________
______________________________________________________________________________________________
Limitations – Persistent Storage______________________________________________________________________________________________
p: +353 1 254 2897t: @sonra_io
e: [email protected]: www.sonra.io
_________________________________________________________________________________________________
Limitations – Unstructured Data
p: +353 1 254 2897t: @sonra_io
e: [email protected]: www.sonra.io
_________________________________________________________________________________________________
______________________________________________________________________________________________
Limitations – Unstructured Data
p: +353 1 254 2897t: @sonra_io
e: [email protected]: www.sonra.io
_________________________________________________________________________________________________
______________________________________________________________________________________________
Limitations – ETL______________________________________________________________________________________________
p: +353 1 254 2897t: @sonra_io
e: [email protected]: www.sonra.io
_________________________________________________________________________________________________
Limitations – ETL
p: +353 1 254 2897t: @sonra_io
e: [email protected]: www.sonra.io
_________________________________________________________________________________________________
______________________________________________________________________________________________
Limitations – BI
$$$$ Cloud
Server-less
Open Source
______________________________________________________________________________________________
p: +353 1 254 2897t: @sonra_io
e: [email protected]: www.sonra.io
_________________________________________________________________________________________________
Limitations – Graph
- Verbose- Performance (self-joins)
p: +353 1 254 2897t: @sonra_io
e: [email protected]: www.sonra.io
_________________________________________________________________________________________________
______________________________________________________________________________________________
Limitations – Graph
MATCH (kenau:Person {name:"Keanu Reeves"})-[:ACTED_IN]->(movie)<-[:ACTED_IN]-(coStar)RETURN coStar.name;Cypher Query Language
______________________________________________________________________________________________
p: +353 1 254 2897t: @sonra_io
e: [email protected]: www.sonra.io
_________________________________________________________________________________________________
Limitations – Graph
Data Orchestration
p: +353 1 254 2897t: @sonra_io
e: [email protected]: www.sonra.io
_________________________________________________________________________________________________
______________________________________________________________________________________________
Limitations – Graph
Data Catalog
Data Lineage
Master Data Management
p: +353 1 254 2897t: @sonra_io
e: [email protected]: www.sonra.io
_________________________________________________________________________________________________
______________________________________________________________________________________________
Other Limitations______________________________________________________________________________________________
p: +353 1 254 2897t: @sonra_io
e: [email protected]: www.sonra.io
_________________________________________________________________________________________________
Limitations Agility
Business Requirements
Data Analysis
Source to target map
Data Model
Development (ETL, BI, Reports, Dashboards)
Testing
Deployment
p: +353 1 254 2897t: @sonra_io
e: [email protected]: www.sonra.io
_________________________________________________________________________________________________
______________________________________________________________________________________________
Self-Service Sandboxes
Use Cases
Data Profile/Analysis
Ad-hoc Analytics
Data Science
Data Exhaust for EDW
p: +353 1 254 2897t: @sonra_io
e: [email protected]: www.sonra.io
_________________________________________________________________________________________________
______________________________________________________________________________________________
Analytics Sandboxes
p: +353 1 254 2897t: @sonra_io
e: [email protected]: www.sonra.io
__________________________________________________________________________
EDW/BI Data Discovery/Science
Data Format/Quality
Cleansed, Processed, Integrated
Raw, Unknown
Data Types Structured Any
Method Known Unknowns Unknown Unknowns EDA
Data Scope EDW All, New data sources, EDW, long tail of data
Time to Insight Long – EDW Lifecycle Shorter
Data Transformations
Formal: ETL, Code Ad-hoc: Iterative, Self-Service, GUI
Tool ETL & BI Tool Data Discovery ,Science, Preparation Platform
Self-Service Ad-hoc queries All data
Testing Unit, Integration, UAT Less Formal
Audience Business User, Power User
Data Scientist, Data Analyst,Data Developer
______________________________________________________________________________________________
Is EDW Obsolete?______________________________________________________________________________________________
p: +353 1 254 2897t: @sonra_io
e: [email protected]: www.sonra.io
_________________________________________________________________________________________________
In Summary
Use the appropriate technology for the problem at hand… and yes, there is a fancy word for this
Polyglot Persistence
The Law of the Instrument
p: +353 1 254 2897t: @sonra_io
e: [email protected]: www.sonra.io
_________________________________________________________________________________________________
______________________________________________________________________________________________
Sonra Offerings
p: +353 1 254 2897t: @sonra_io
e: [email protected]: www.sonra.io
_________________________________________________________________________________________________
______________________________________________________________________________________________
Data Warehouse in the Cloud Quick Start Packages
Training: Big Data for Data Warehouse Professionals
WWW.SONRA.IOWWW.SONRA.IO