Upload
prema
View
134
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Microsoft Big Data Essentials Module 1 - Introduction to Big Data. Saptak Sen, Microsoft Bill Ramos, Advaiya. Agenda. Why Big Data? Big Data Lambda Architecture Getting started with Windows Azure HDInsight Service. The Business Imperative. 1 . . 2 . . 3. . 4. . - PowerPoint PPT Presentation
Citation preview
Microsoft Big Data EssentialsModule 1 - Introduction to Big Data
Saptak Sen, MicrosoftBill Ramos, Advaiya
• Why Big Data?
• Big Data Lambda Architecture
• Getting started with Windows Azure HDInsight Service
Agenda
The Business Imperative
1. 2. 4. 3. Human Fault Tolerance
Minimize CapEx Low Learning CurveHyper Scale on Demand
CAP Theorem
Consistency
C
Partition Tolerance
PAvailabili
ty
A
Big Data Lambda Architecture
Big Data Lambda Architecture• Batch layer• Stores master dataset• Compute arbitrary views
• Speed layer• Fast, incremental algorithms• Batch layer eventually
overrides speed layer
• Serving layer• Random access to batch
views• Updated by batch layer
Serving Layer
Speed Layer
Batch Layer
The Batch Layer
• Stores master dataset (in append mode)
• Unrestrained computation
• Horizontally scalable
• High latency
Incoming data
streamsMaster dataset
Batch views
The Speed Layer
• Stream processing of data
• Stores a limited window of data
• Dynamic computation
Real-time increments
Incoming data
streams
Process stream
Increment views
Real-time views
The Serving Layer
• Queries the batch and real-time views
• Merges the resultsReal-time views
Batch views
Querying and
mergingOutput
Microsoft Lambda Architecture Support Serving LayerSpeed LayerBatch Layer
Windows Azure HDInsightAzure Blob storageMapReduce, Hive, Pig, Oozie, SSIS
Federations in Windows Azure SQL Database Azure tablesMemcached/MongoDBSQL Server database engineSQL Server VM:• Columnstore
indexes• Analysis Services• StreamInsight
Azure Storage ExplorerMicrosoft ExcelPower QueryPowerPivot Power ViewPower MapReporting ServicesLINQ to HiveAnalysis Services
Serving LayerSpeed LayerBatch Layer
Apache Hadoop
Yahoo!
SQL Server Analysis Service (SSAS)Microsoft Excel and PowerPivotOther BI Tools and Custom Applications
Hadoop Data
Third Party Database
SQL Server Analysis Services
(SSAS Cube)
+Custom
Applications
SQL Server Connector (Hadoop Hive ODBC)
Staging Database
Microsoft Excel & PowerPivot for
Excel
Serving LayerSpeed LayerBatch Layer
Windows Azure HDInsight
Ferranti Computer Systems
Microsoft Dynamics AXSQL Server Analysis ServicesSQL Server Reporting Services
SQL Server (In-Memory OLTP)
Data Feed from Smart Meters
Reactive Extensions (Rx)SQL Server Database (In-Memory OLTP)
Reactive Extensions (Rx)
Windows Azure
HDInsight
SQL Server Analysis Services
SQL Server ReportingServices
Microsoft Dynamics
AX
Windows Azure Storage
Serving LayerSpeed LayerBatch Layer
Azure Blob storage
Windows AzureBlob storage
Demo 1: Setting up the Windows Azure storage account
Azure Storage Explorer
Azure Storage Explorer
Blob Storage Concepts• Store large amounts of
unstructured text or binary data with the fastest read performance
• Highly scalable, durable, and available file system
• Blobs can be exposed publically over HTTP
• Securely lock down permissions to blobs
BlobContainer
Account
Images
PIC01.JPG
Video
VID1.AVI
http://<account>.blob.core.windows.net/<container>/<blobname>
Pages/Blocks
Block/Page
Block/Page
PIC02.JPGContoso
Getting started with HDInsight Service
Demo 2: Setting up the Windows Azure HDInsight cluster
Windows Azure HDInsightAzure Blob storage
Windows AzureHDInsight
Windows AzureBlob storage
HDInsight Console
HDInsight Console
https://<ClusterName>.azurehdinsight.net/
Serving LayerSpeed LayerBatch Layer
Demo 3: Loading data into Windows Azure storage for use with HDInsight
Windows Azure HDInsightAzure Blob storage
Windows AzureHDInsight
Windows AzureBlob storage
HDInsight Console
HDInsight Console
https://<ClusterName>.azurehdinsight.net/
Serving LayerSpeed LayerBatch Layer
CSV files from local disk
Easy Access to Data, Big & Small
Easy Access to Data, Big & SmallSimplify access to public & corporate dataEasily preview, shape, & format your data
Combine and refine data across multiple sourcesGain insight across relational, unstructured, & semi-structured data
Common management of structured & unstructured dataQuery across relational DB & Hadoop with single T-SQL Query
Power QueryWindows Azure MarketplaceWindows Azure HDInsight ServiceParallel Data Warehouse with Polybase
Learn more• Getting Started with
HDInsighthttp://blogs.msdn.com/b/windowsazure/archive/2013/03/19/getting-started-with-hdinsight.aspx
• Azure HDInsight and Azure Storagehttp://blogs.msdn.com/b/windowsazure/archive/2013/03/21/azure-hdinsight-and-azure-storage.aspx
Questions?