Download pptx - Microsoft's Hadoop Story

Transcript
Page 1: Microsoft's Hadoop Story

Hadoop and Microsoft.

Michael Rys | Principal Program Manager @SQLServerMike

Page 2: Microsoft's Hadoop Story

Session Objectives

• What is BigData?• How it fits into the Windows and Windows Azure environments• How do I program against it in the Microsoft Environment

Page 3: Microsoft's Hadoop Story

What is Big Data?• Traditionally: • Physics Experiments, Sensor data, Satellite data, …

• Now:• Operational Logs• Customer behavior• Social interactions online• …

• From Terabytes in the 1990 over Petabytes today to Zetabytes in the future

Page 4: Microsoft's Hadoop Story

Big Data.

Page 5: Microsoft's Hadoop Story

Big Data.

VOLUME (Size)

VARIETY (Structure)

VELOCITY (Speed)

Page 6: Microsoft's Hadoop Story

Advanced Analytics

Live Data Feed

Social Analytics

How do I optimize my services based on patterns of weather, traffic, etc.?

What’s the social sentiment of my product?

How do I better predict future outcomes?

New Questions.

Page 7: Microsoft's Hadoop Story

Hadoop is for Big Data.

Page 8: Microsoft's Hadoop Story

What is Hadoop (v1)?

• Processing Platform for Big Data Processing• Using the “Map-Reduce” Processing Paradigm

• Characteristics:• Highly-scalable (scaled out)• Commodity HW-based• Open Source

=> Very low cost for acquisition and storage costs

Page 9: Microsoft's Hadoop Story

Hadoop Data Flow

HadoopData Analytics

Page 10: Microsoft's Hadoop Story
Page 11: Microsoft's Hadoop Story

Hadoop Capabilities

Machine Learning

Graph Processing

Distributed Compute

Extract Load Transform

Predictive

Analysis

Page 12: Microsoft's Hadoop Story

Distributed Storage(HDFS)

Query(Hive)

HDInsight Ecosystem

Distributed Processing(Map Reduce)

Scripting(Pig)

NoSQ

L Data

base

(HB

ase

)

Metadata(HCatalog)

Data

Inte

gra

tion

( OD

BC

/ SQ

OO

P/

REST)

Busin

ess In

tellig

ence

(E

xcel, Po

werV

iew

…)

Machine Learning(Mahout)

Graph(Pegasus)

Stats processing(RHadoop)

Pipelin

e /

workflo

w(O

ozie

)

Log file

aggre

gatio

n(Flu

me)

PDW

World’s Data (Azure Data Marketplace) AD, System CenterWindows Azure Storage

Page 13: Microsoft's Hadoop Story

Data Knowledge Action

HDInsight

Page 14: Microsoft's Hadoop Story

Front endFront end

Stream Layer

Partition Layer

HDFS on Azure: Tale of two File Systems

NameNode

Data Node Data Node

Front end

HDFS API

DFS (1 Data Node per Worker Role)and Compute Cluster

Azure Storage Vault (ASV)

Containers on Azure Blob Storage

Page 15: Microsoft's Hadoop Story

.Net Map/Reduce Support• Install NuGet• “NuGet” Microsoft .Net MapReduce API for Hadoop• Provide an implementation of a HadoopJob• Execute the job via either

• MRLib\MRRunner.exe -dll ConsoleAppHadoopJob.exeOr

– HadoopJobExecutor.ExecuteJob<HadoopJobClass>();

• Collect your result on HDFS

Page 16: Microsoft's Hadoop Story

Javascript Map/Reduce Support• Provide a map and reduce function variable in JS file• Use Javascript console with• runJS(‘/user/myself/MRjob.js’, ‘/path/to/inputfile’, ‘/path/to/output/dir’)

• Collect your result on HDFS

Page 17: Microsoft's Hadoop Story

Invoking HiveQL Queries• Run queries in Hadoop Command Shell after invoking hive• Through the web console• Programmatically through ODBC• Coming soon: LINQ to Hive!

Page 18: Microsoft's Hadoop Story

Social Apps

Sensor & RFID

Mobile Apps

WebApps

Unstructured data Structured data

Polybase – Enhancing PDW query engine

Traditional schema-based DW applications

EnhancedPDW query engine

Data ScientistsBI Users

DB Admins

Regular T-SQL

Results

PDW V2Hadoop

Page 19: Microsoft's Hadoop Story

Microsoft Hadoop Vision

Microsoft Business Intelligence (BI) • Hive ODBC Connectivity • BI Tools for Big Data

Better on Windows and Azure • Active Directory• System Center • .Net Programmability

Microsoft Data Connectivity• SQL Server / SQL Parallel Data Warehouse• Azure Storage / Azure Data Market

Collaborate with and Contribute to OSS• Collaborate with HortonWorks• Provide improvements and Windows support back to OSS

Page 20: Microsoft's Hadoop Story

Getting started• On prem: http://www.microsoft.com/bigdata/

• Single node cluster (onebox) install• C:\hadoop• Starts local services• Can start/stop them with start-onebox.cmd/stop-onebox.cmd• Comes with:

• Hadoop command line (shell)• Hadoop Status for name node and map-reduce cluster• HDInsight Dashboard

• On Windows Azure: http://HadoopOnAzure.com/• 3 node cluster running as a service in Azure• Can be used for 5 days• Provides samples and HDInsight Dashboard

• TAP Program

Page 21: Microsoft's Hadoop Story

Related Content and Links

http://www.microsoft.com/bigdatahttp://www.hadooponazure.comNuget: http://nuget.codeplex.com/LinqPad: http://www.linqpad.net/Linq to Hive (see http://hadoopsdk.codeplex.com)

Find Me Later At…Twitter: @SQLServerMike

ACM SIGOPS Paper: Windows Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency (Calder et al)http://blogs.msdn.com/b/windowsazure/archive/2012/11/02/windows-azure-s-flat-network-storage-and-2012-scalability-targets.aspx Developing Big Data Analytics Applications with JavaScript and .NET for Windows Azure and Windows: http://channel9.msdn.com/Events/Build/2012/3-038

Page 22: Microsoft's Hadoop Story

[email protected]

@SQLServerMike

Michael Rys

Thank you