30
BUILDING AN HETEROGENEOUS HADOOP/OLAP SYSTEM WITH MICROSOFT'S BI STACK Pablo Álvarez Doval SQL/BI Team Lead - Plain Concepts

Building a heterogeneous Hadoop Olap system with Microsoft BI stack. PABLO DOVAL & IBON LANDA at Big Data Spain 2012

Embed Size (px)

DESCRIPTION

Session presented at Big Data Spain 2012 Conference 16th Nov 2012 ETSI Telecomunicacion UPM Madrid www.bigdataspain.org More info: http://www.bigdataspain.org/es-2012/conference/building-a-heterogeneous-hadoop-olap-system-with-microsoft-bi-stack/pablo-doval-and-ibon-landa

Citation preview

Page 1: Building a heterogeneous Hadoop Olap system with Microsoft BI stack. PABLO DOVAL & IBON LANDA at Big Data Spain 2012

BUILDING AN HETEROGENEOUS HADOOP/OLAP SYSTEM WITH

MICROSOFT'S BI STACK

Pablo Álvarez DovalSQL/BI Team Lead - Plain Concepts

Page 2: Building a heterogeneous Hadoop Olap system with Microsoft BI stack. PABLO DOVAL & IBON LANDA at Big Data Spain 2012

WHO…

… AM I?• SQL/BI Team Lead at Plain Concepts• e-mail: [email protected]• Blog: http://geek.ms/blogs/palvarez• Twitter: @PabloDoval

… ARE YOU?• Quick Poll in the Room

Page 3: Building a heterogeneous Hadoop Olap system with Microsoft BI stack. PABLO DOVAL & IBON LANDA at Big Data Spain 2012

WHAT…

… ARE WE GOING TO SEE?

… I’M NOT GOING TO SHOW?

Page 4: Building a heterogeneous Hadoop Olap system with Microsoft BI stack. PABLO DOVAL & IBON LANDA at Big Data Spain 2012
Page 5: Building a heterogeneous Hadoop Olap system with Microsoft BI stack. PABLO DOVAL & IBON LANDA at Big Data Spain 2012

SOME PICS…

Page 6: Building a heterogeneous Hadoop Olap system with Microsoft BI stack. PABLO DOVAL & IBON LANDA at Big Data Spain 2012

SCADA Historical Analysis and Reporting Platform

Demonstrate the feasibility of a custom end to end global architecture:• SCADA: Local, Mobile and Central• Historical Data: High speed and High volume• Reporting• Analysis

SHARPOverview

Page 7: Building a heterogeneous Hadoop Olap system with Microsoft BI stack. PABLO DOVAL & IBON LANDA at Big Data Spain 2012

Production Centers Central

MAGUS

Central

MongoDBCapped collections

For each Production Center2 months of 1s data1 year of 10m data

MAGUS

MongoDBCapped collections

2 months of 1s data1 year of 10m data

MAGUSLocal OperationMobile Operation

MAGUSRemote Operation

DAT FilesMongo Export

Production Center A

Production Center B

MAGUS

MongoDBCapped collections

2 months of 1s data1 year of 10m data

MAGUSLocal OperationMobile Operation

SHARPMAGUS

Page 8: Building a heterogeneous Hadoop Olap system with Microsoft BI stack. PABLO DOVAL & IBON LANDA at Big Data Spain 2012

DAT

DAT

DAT

DAT

DAT

DAT

Mongo Export

Hadoop DWH

MAGUS

CentralSource

1

Loader

Source2

Loader

Source3

Loader

Source4

Loader

Source5

Loader

MAGUS

Source6

Loader

DAT

Source7

Loader

DAT

Production Centers Central

SHARPHistorical Data

Page 9: Building a heterogeneous Hadoop Olap system with Microsoft BI stack. PABLO DOVAL & IBON LANDA at Big Data Spain 2012

DWHMicrosoft

Office

Reporting Services

• Dynamic reports• Scheduled reports• Automatic Distribution• Multiformat (PDF, XLS, etc.)OLAP

Tabular

OLAPTabular

Power View

Power Pivot

Future¿Cloud?

Str

eam

Insig

ht

Events

Production Centers Central

SHARPAnalysis and Reporting

Page 10: Building a heterogeneous Hadoop Olap system with Microsoft BI stack. PABLO DOVAL & IBON LANDA at Big Data Spain 2012

INITIAL ASSESMENT

Proof of Concept

Microsoft Ecosystem

On Premise Infrastructu

re

• Minimum Risk• Think and Plan future iterations

• Integration with current platform• Suits the team’s Know-how

• More controlled environment

Page 11: Building a heterogeneous Hadoop Olap system with Microsoft BI stack. PABLO DOVAL & IBON LANDA at Big Data Spain 2012

PowerPivot

Power View

TOOLS OF THE TRADE

Page 12: Building a heterogeneous Hadoop Olap system with Microsoft BI stack. PABLO DOVAL & IBON LANDA at Big Data Spain 2012

Non-Relational

• Enterprise class security, HA & management• Seamlessly integrated with Microsoft BI tools• Provisioned in minutes on Windows Azure

Microsoft HDInsight Server (on premise)Windows Azure HDInsight Service (cloud)

HDINSIGHT BUILT ON HORTONWORKS DATA PLATFORM (HDP)

Page 13: Building a heterogeneous Hadoop Olap system with Microsoft BI stack. PABLO DOVAL & IBON LANDA at Big Data Spain 2012

SO… WHAT DOES IT LOOK LIKE?

Page 14: Building a heterogeneous Hadoop Olap system with Microsoft BI stack. PABLO DOVAL & IBON LANDA at Big Data Spain 2012

CURRENT SHARP IMPLEMENTATION

DWH

Hadoop

HDFS

HIVE

Map Reduce

SSIS

Load Service

Azure Storage

SSRS PowerView

Page 15: Building a heterogeneous Hadoop Olap system with Microsoft BI stack. PABLO DOVAL & IBON LANDA at Big Data Spain 2012

LET’S TAKE A DEEPER LOOK…

Page 16: Building a heterogeneous Hadoop Olap system with Microsoft BI stack. PABLO DOVAL & IBON LANDA at Big Data Spain 2012

FUTURE IMPROVEMENTS

New Analytical Processes

CEP Integration with Stream Insight

Improvements on the Higher Resolution data

Page 17: Building a heterogeneous Hadoop Olap system with Microsoft BI stack. PABLO DOVAL & IBON LANDA at Big Data Spain 2012

DWHMicrosoft

Office

Reporting Services

• Dynamic reports• Scheduled reports• Automatic Distribution• Multiformat (PDF, XLS, etc.)OLAP

Tabular

OLAPTabular

Power View

Power Pivot

Future¿Cloud?

Str

eam

Insig

ht

Events

Production Centers Central

COMPLEX EVENT PROCESSINGStreamInsight

Page 18: Building a heterogeneous Hadoop Olap system with Microsoft BI stack. PABLO DOVAL & IBON LANDA at Big Data Spain 2012

Str

eam

Insig

ht

Events

Production Centers Central

COMPLEX EVENT PROCESSINGStreamInsight

Page 19: Building a heterogeneous Hadoop Olap system with Microsoft BI stack. PABLO DOVAL & IBON LANDA at Big Data Spain 2012

IMPROV. TO HIGHER RESOLUTION DATAThe Goal

Ability to work with data in DW and Hive seamlessly and in a performant way.

Export

HDFS

MR

Import

RDBMS

Page 20: Building a heterogeneous Hadoop Olap system with Microsoft BI stack. PABLO DOVAL & IBON LANDA at Big Data Spain 2012

IMPROV. TO HIGHER RESOLUTION DATASqoop Refresher

HDFS

HIVE

Sqoop

MR

Page 21: Building a heterogeneous Hadoop Olap system with Microsoft BI stack. PABLO DOVAL & IBON LANDA at Big Data Spain 2012

IMPROV. TO HIGHER RESOLUTION DATASqoop with PDW…

SqoopMap/

ReduceJob

Control Node

SQL Server

Compute Node

SQL Server

Compute Node

SQL Server

Compute Node…

SQL Server

PDW Cluster

Query

Query Results

Source: Dr. DeWitt presentation on SQL PASS Summit 2012

Page 22: Building a heterogeneous Hadoop Olap system with Microsoft BI stack. PABLO DOVAL & IBON LANDA at Big Data Spain 2012

IMPROV. TO HIGHER RESOLUTION DATASqoop refresher…

Control Node

SQL Server

Compute Node

SQL Server

Compute Node

SQL Server

Compute Node…

SQL Server

PDW Cluster

Namenode(HDFS)

DataNode

DataNode

DataNode

DataNode

DataNode

DataNode

DataNode

DataNode

DataNode

DataNode

DataNode

DataNodeHadoop Cluster

Sqoop

Source: Dr. DeWitt presentation on SQL PASS Summit 2012

Page 23: Building a heterogeneous Hadoop Olap system with Microsoft BI stack. PABLO DOVAL & IBON LANDA at Big Data Spain 2012

IMPROV. TO HIGHER RESOLUTION DATAThe Goal – Polybase!

Ability to work with data in DW and Hive seamlessly and in a performant way.

SQL HDFS

SQL Server (PDW)

T-SQL Queries

Page 24: Building a heterogeneous Hadoop Olap system with Microsoft BI stack. PABLO DOVAL & IBON LANDA at Big Data Spain 2012

IMPROV. TO HIGHER RESOLUTION DATAPolybase parallelism via DMS

Control Node

SQL Server

Compute Node

SQL Server

Compute Node

SQL Server

Compute Node…

SQL Server

PDW Cluster

Namenode(HDFS)

DataNode

DataNode

DataNode

DataNode

DataNode

DataNode

DataNode

DataNode

DataNode

DataNode

DataNode

DataNodeHadoop Cluster

Page 25: Building a heterogeneous Hadoop Olap system with Microsoft BI stack. PABLO DOVAL & IBON LANDA at Big Data Spain 2012

IMPROV. TO HIGHER RESOLUTION DATAParallelism

Page 26: Building a heterogeneous Hadoop Olap system with Microsoft BI stack. PABLO DOVAL & IBON LANDA at Big Data Spain 2012

IMPROV. TO HIGHER RESOLUTION DATAThat’s just the beginning…

Uses the same T-SQL Syntax to query both worlds at the same time

The QO is able to check what data to push into what environment to process optimally.

Page 27: Building a heterogeneous Hadoop Olap system with Microsoft BI stack. PABLO DOVAL & IBON LANDA at Big Data Spain 2012

STORIES WE COULD TELLWhat went right…

Cloud Environment

Tabular Model for OLAP

SSIS for ETL via ODBC Hive Driver

Page 28: Building a heterogeneous Hadoop Olap system with Microsoft BI stack. PABLO DOVAL & IBON LANDA at Big Data Spain 2012

STORIES WE COULD TELLWhat was not so good…

Mappers and Reducers in C# via Hadoop Streaming

Page 29: Building a heterogeneous Hadoop Olap system with Microsoft BI stack. PABLO DOVAL & IBON LANDA at Big Data Spain 2012

LEARN MORE1. Microsoft Big Data Solution: www.microsoft.com/bigdata2. Windows Azure:

www.windowsazure.com/en-us/home/scenarios/big-data

TRY NOW3. Preview of the Windows Azure HDInsight Service:

https://www.hadooponazure.com

4. Developer CTP of Microsoft HDInsight Server for Windows Server: http://www.microsoft.com/bigdata

CALL TO ACTION

Page 30: Building a heterogeneous Hadoop Olap system with Microsoft BI stack. PABLO DOVAL & IBON LANDA at Big Data Spain 2012