25

Big data is hard…download.microsoft.com/download/5/9/E/59E3DFD2-ABD0-49B4-AE7E... · 13/1/2016 · BI tool of choice (Power BI, Tabelau, Qlik, SAP) Notebook experience (Jupyter/iPython,

  • Upload
    buinhi

  • View
    221

  • Download
    1

Embed Size (px)

Citation preview

Big data is hard…

Top 3 Challenges To Adopting Big Data

Traditionally, analytics have been over pre-defined structures

Data characteristics:

Questions answered with BI and visualizations:

Customer

Sales

Product

To innovate, new types of data and analytics are needed

Data characteristics:

Questions from exploratory analytics:

Data complexity: variety and velocity

Peta

byte

s

Customer

Sales

Product

Two Approaches to Analytics

Observation

Pattern

Theory

Hypothesis

What will happen?

How can we make it happen?

Predictive

Analytics

Prescriptive

Analytics

What happened?

Why did it happen?

Descriptive

Analytics

Diagnostic

Analytics

Top-Down

Confirmation

Theory

Hypothesis

Observation

ETL pipeline

Dedicated ETL tools (e.g. SSIS)

Defined schema

Queries

Results

Relational

LOB

Applications

Traditional business analytics process1. Start with end-user requirements to identify desired reports

and analysis

2. Define corresponding database schema and queries

3. Identify the required data sources

4. Create a Extract-Transform-Load (ETL) pipeline to extract

required data (curation) and transform it to target schema

(‘schema-on-write’)

5. Create reports. Analyze data

All data not immediately required is discarded or archived

7

Store indefinitely Analyze See resultsGather data

from all sources

Iterate

New big data thinking: All data has value

All data has potential value

Data hoarding

No defined schema—stored in native format

Schema is imposed and transformations are done at query time (schema-on-read).

Apps and users interpret the data as they see fit

8

The Microsoft Data Platform Capabilities

Transform+ analyze

Visualize+ decide

Capture+ manage

Data

Azure Data Platform

VPN

Gateway

Cloud

Gateway

EventHub

ExpressRoute

SQL Data Sync

Data

Management

Service

Data Factory

Logic Apps

Cloud Services

Worker Role

Stream Analytics

Azure Data

Catalogue

Azure Batch

Machine

Learning

PowerBI

Cortana

Analytics

Suite

On-Premises

VPN Device

On-Premises

File Data

IOT

Transactional

Data

Had

oo

pSQ

L

Device Data

Log Data

Ap

ps

Stream Data

iOS/

And

roid

MPLS

Enterprise

Data

MPP/A

PS

Data

Management

Gateway

DocDB

storage blob

storage table

storage queue

MySQL Database

Azure SQL Data

Warehouse

HDInsight (Hadoop)

Azure Data Lake

Azure SQL Database

Azure Data Lake

Introducing Microsoft Azure Data Lake

Microsoft Azure Data Lake

YARN

U-SQL

Analytics Service HDInsight

Store

HDFS

Product Details

Azure Data Lake store

Azure Data Lake analytics service

Azure HDInsight

Microsoft Azure Data Lake

YARN

U-SQL

Analytics Service HDInsight

Store

HDFS

YARN

U-SQL

Analytics

ServiceHDInsight

HDFS

Store

Introducing Azure Data Lake Store

No fixed limits file size (PB file sizes)

Designed for diversity of analytic workloads

Accessible to all HDFS compliant analytic applications (Hortonworks, Cloudera, MapR)

Managed, monitored, and supported by Microsoft

Enterprise grade features around security, compliance & management

Microsoft Azure Data Lake

YARN

U-SQL

Analytics Service HDInsight

Store

HDFS

Azure Data Lake Analytics Service

Distributed analytics service

Dynamically scales to meet your business needs

Productive day one with industry leading development tools (for novices & experts)

Analytics over all data (unstructured, semi-structured, structured)

U-SQL: simple and familiar, easily extensible

Hive coming soon

Built on open standards (YARN)

Microsoft Azure Data Lake

YARN

U-SQL

Analytics Service HDInsight

Store

HDFS

Azure HDInsight becomes key part of Data Lake

Microsoft’s cloud Hadoop offering

100% open source Apache Hadoop

Fully managed and supported by Microsoft

Spark, Hive, Pig, Storm, HBase

Up and running in minutes with no hardware

.NET and Java skills

Deep integration to Visual Studio

99.9% Enterprise Service Level Agreement

Use Windows or Linux

Microsoft Azure Data Lake

YARN

U-SQL

Analytics Service HDInsight

Store

HDFS

Azure HDInsight Includes Spark

Single execution model for multiple tasks (SQL queries, streaming, machine learning, and graph)

Processing up to 100x faster performance

Developer friendly (Java, Python, Scala)

BI tool of choice (Power BI, Tabelau, Qlik, SAP)

Notebook experience (Jupyter/iPython, Zeppelin)

Microsoft Azure Data Lake

YARN

U-SQL

Analytics Service HDInsight

Store

HDFS

Azure HDInsight Includes Storm

Consumes millions of real-time events from a scalable event broker (ie. Apache Kafka, Azure Event Hub)

Performs time-sensitive computation

Output to persistent stores, dashboards or devices

Customizable with Java + .NET

Deeply integrated to Visual Studio

Microsoft Azure Data Lake

YARN

U-SQL

Analytics Service HDInsight

Store

HDFS

Azure HDInsight Includes HBase

Columnar, NoSQL database

Runs on top of the Hadoop Distributed File System (HDFS)

Provides flexibility in that new columns can be added to column families at any time

ADL Store: IngressData can be ingested into Azure Data Lake Store from a variety of sources

Server logs

Azure Event Hub

Apache

Flume

Azure Storage Blobs

Custom programs

.NET SDK

JavaScript CLI

Azure Portal

Azure PowerShell

Azure Data Factory

Apache Sqoop

Azure SQL DB

Azure SQL DW

Azure tables

Table Storage

On-premises databases

SQL

20

ADL

Store

Built-in

copy service

ADL Store: EgressData can be exported from Azure Data Lake Store into numerous targets/sinks

Azure SQL DB

SQL

Azure SQL DW

Azure

Tables

Table Storage

On-premises databases

Azure Data Factory

Apache Sqoop

Azure Storage Blobs

Custom programs

.NET SDK

JavaScript CLI

Azure Portal

Azure PowerShell

21

Built-in

copy service

ADL

Store

Get StartedSign uphttp://azure.com/datalake

Learn More

http://azure.microsoft.com/en-us/documentation/services/hdinsight/

http://azure.microsoft.com/en-us/documentation/articles/hdinsight-learn-map/

http://www.microsoftvirtualacademy.com/training-courses/getting-started-with-microsoft-big-data

http://channel9.msdn.com/Shows/Data-Exposed

http://azure.microsoft.com/en-us/pricing/free-trial/