Upload
big-data-spain
View
1.504
Download
1
Embed Size (px)
DESCRIPTION
Session presented at Big Data Spain 2012 Conference 16th Nov 2012 ETSI Telecomunicacion UPM Madrid www.bigdataspain.org More info: http://www.bigdataspain.org/es-2012/conference/building-a-heterogeneous-hadoop-olap-system-with-microsoft-bi-stack/pablo-doval-and-ibon-landa
Citation preview
BUILDING AN HETEROGENEOUS HADOOP/OLAP SYSTEM WITH
MICROSOFT'S BI STACK
Pablo Álvarez DovalSQL/BI Team Lead - Plain Concepts
WHO…
… AM I?• SQL/BI Team Lead at Plain Concepts• e-mail: [email protected]• Blog: http://geek.ms/blogs/palvarez• Twitter: @PabloDoval
… ARE YOU?• Quick Poll in the Room
WHAT…
… ARE WE GOING TO SEE?
… I’M NOT GOING TO SHOW?
SOME PICS…
SCADA Historical Analysis and Reporting Platform
Demonstrate the feasibility of a custom end to end global architecture:• SCADA: Local, Mobile and Central• Historical Data: High speed and High volume• Reporting• Analysis
SHARPOverview
Production Centers Central
MAGUS
Central
MongoDBCapped collections
For each Production Center2 months of 1s data1 year of 10m data
MAGUS
MongoDBCapped collections
2 months of 1s data1 year of 10m data
MAGUSLocal OperationMobile Operation
MAGUSRemote Operation
DAT FilesMongo Export
Production Center A
Production Center B
MAGUS
MongoDBCapped collections
2 months of 1s data1 year of 10m data
MAGUSLocal OperationMobile Operation
SHARPMAGUS
DAT
DAT
DAT
DAT
DAT
DAT
Mongo Export
Hadoop DWH
MAGUS
CentralSource
1
Loader
Source2
Loader
Source3
Loader
Source4
Loader
Source5
Loader
MAGUS
Source6
Loader
DAT
Source7
Loader
DAT
Production Centers Central
SHARPHistorical Data
DWHMicrosoft
Office
Reporting Services
• Dynamic reports• Scheduled reports• Automatic Distribution• Multiformat (PDF, XLS, etc.)OLAP
Tabular
OLAPTabular
Power View
Power Pivot
Future¿Cloud?
Str
eam
Insig
ht
Events
Production Centers Central
SHARPAnalysis and Reporting
INITIAL ASSESMENT
Proof of Concept
Microsoft Ecosystem
On Premise Infrastructu
re
• Minimum Risk• Think and Plan future iterations
• Integration with current platform• Suits the team’s Know-how
• More controlled environment
PowerPivot
Power View
TOOLS OF THE TRADE
Non-Relational
• Enterprise class security, HA & management• Seamlessly integrated with Microsoft BI tools• Provisioned in minutes on Windows Azure
Microsoft HDInsight Server (on premise)Windows Azure HDInsight Service (cloud)
HDINSIGHT BUILT ON HORTONWORKS DATA PLATFORM (HDP)
SO… WHAT DOES IT LOOK LIKE?
CURRENT SHARP IMPLEMENTATION
DWH
Hadoop
HDFS
HIVE
Map Reduce
SSIS
Load Service
Azure Storage
SSRS PowerView
LET’S TAKE A DEEPER LOOK…
FUTURE IMPROVEMENTS
New Analytical Processes
CEP Integration with Stream Insight
Improvements on the Higher Resolution data
DWHMicrosoft
Office
Reporting Services
• Dynamic reports• Scheduled reports• Automatic Distribution• Multiformat (PDF, XLS, etc.)OLAP
Tabular
OLAPTabular
Power View
Power Pivot
Future¿Cloud?
Str
eam
Insig
ht
Events
Production Centers Central
COMPLEX EVENT PROCESSINGStreamInsight
Str
eam
Insig
ht
Events
Production Centers Central
COMPLEX EVENT PROCESSINGStreamInsight
IMPROV. TO HIGHER RESOLUTION DATAThe Goal
Ability to work with data in DW and Hive seamlessly and in a performant way.
Export
HDFS
MR
Import
RDBMS
IMPROV. TO HIGHER RESOLUTION DATASqoop Refresher
HDFS
HIVE
Sqoop
MR
IMPROV. TO HIGHER RESOLUTION DATASqoop with PDW…
SqoopMap/
ReduceJob
Control Node
SQL Server
Compute Node
SQL Server
Compute Node
SQL Server
Compute Node…
SQL Server
PDW Cluster
Query
Query Results
Source: Dr. DeWitt presentation on SQL PASS Summit 2012
IMPROV. TO HIGHER RESOLUTION DATASqoop refresher…
Control Node
SQL Server
Compute Node
SQL Server
Compute Node
SQL Server
Compute Node…
SQL Server
PDW Cluster
Namenode(HDFS)
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
DataNodeHadoop Cluster
Sqoop
Source: Dr. DeWitt presentation on SQL PASS Summit 2012
IMPROV. TO HIGHER RESOLUTION DATAThe Goal – Polybase!
Ability to work with data in DW and Hive seamlessly and in a performant way.
SQL HDFS
SQL Server (PDW)
T-SQL Queries
IMPROV. TO HIGHER RESOLUTION DATAPolybase parallelism via DMS
Control Node
SQL Server
Compute Node
SQL Server
Compute Node
SQL Server
Compute Node…
SQL Server
PDW Cluster
Namenode(HDFS)
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
DataNodeHadoop Cluster
IMPROV. TO HIGHER RESOLUTION DATAParallelism
IMPROV. TO HIGHER RESOLUTION DATAThat’s just the beginning…
Uses the same T-SQL Syntax to query both worlds at the same time
The QO is able to check what data to push into what environment to process optimally.
STORIES WE COULD TELLWhat went right…
Cloud Environment
Tabular Model for OLAP
SSIS for ETL via ODBC Hive Driver
STORIES WE COULD TELLWhat was not so good…
Mappers and Reducers in C# via Hadoop Streaming
LEARN MORE1. Microsoft Big Data Solution: www.microsoft.com/bigdata2. Windows Azure:
www.windowsazure.com/en-us/home/scenarios/big-data
TRY NOW3. Preview of the Windows Azure HDInsight Service:
https://www.hadooponazure.com
4. Developer CTP of Microsoft HDInsight Server for Windows Server: http://www.microsoft.com/bigdata
CALL TO ACTION