Upload
biztalk360
View
1.063
Download
0
Embed Size (px)
Citation preview
Sponsored & Brought to you by
Analyzing StackExchange data with Azure Data LakeTom Kerkhove
http://www.twitter.com/TomKerkhove
https://be.linkedin.com/in/tomkerkhove
Analysing StackExchange datawith Azure Data Lake
Analysing StackExchange data with Azure Data Lake
Nice to meet youTom KERKHOVE➔ Integration Professional➔ IoT Competency Lead➔ Windows Development &
Microsoft Azure MVP
[email protected]+32 473 701 [email protected]/in/tomkerkhovegithub.com/tomkerkhove
Agenda• Why should we care about Big
Data?• Big Data in Azure• Azure Data Lake• Demo• Q & A
4
10101010110101
10101010110101
101010111
10101010110101
10101010110101
Integration of ThingsInternet of Things
6
Connect and scale with efficiency
Analyze and act on new
data
Integrate and transform
business processes
Business Systems1010100111010010110101010110101
10101010110100011010001011
10101010110101
10101010110100011010001011010101
Connect and scale with efficiency
Analyze and act on new
data
Integrate and transform
business processes
Event producers & gateways
Ingestion & transformation Report, Act, Predict
Microsoft Patterns & Practices – IoT Journey
10
11
Cluster Management
12
Languages
Platform Services
Infrastructure Services
OS/Server Compute Storage
Datacenter Infrastructure (24 Regions, 22 Online)
Web and Mobile
Web Apps
MobileApps
APIManagement
API Apps
Logic Apps
Notification Hubs
Media & CDNContent DeliveryNetwork (CDN)
Media Services
Integration
BizTalkServices
HybridConnections
Service Bus
StorageQueues
HybridOperations
Backup
StorSimple
Azure SiteRecovery
Import/Export
Networking
Data
SQL Database
DocumentDB
RedisCache Azure
SearchStorageTables
DataWarehouse Azure AD
Health Monitoring
Virtual Network
ExpressRoute
BLOB Storage AzureFiles
PremiumStorage
Virtual Machines
AD PrivilegedIdentity Management
Traffic Manager
AppGateway
OperationalAnalytics
Services ComputeCloud Services
Batch RemoteApp
ServiceFabric
Developer Services
Visual Studio
AppInsights
Azure SDK
VS Online
ContainerService
DNS VPN GatewayLoad Balancer
Domain Services
Analytics & IoT
HDInsight MachineLearning
StreamAnalytics
Data Factory
EventHubs
MobileEngagement
Data Lake
IoT Hub
Data Catalog
Security & Management
Azure ActiveDirectory
Multi-FactorAuthentication
Automation
Portal
Key Vault
Store/Marketplace
VM Image Gallery& VM Depot
Azure ADB2C
Scheduler
Overview in Azure
14
DocumentDB
Data Factory Stream Analytics Data Lake HDInsight Data Lake(Store & Analytics)
Virtual Machine
IoT Hub SQL DataWarehouse
SQL DatabaseStorageEvent HubsDocument Db
Data Ingestion Data Storage
Data Pipelines
Machine Learning
Data Analytics
Personal Digital Assistant – Cortana
Perceptual Intelligence
Preconfigured Solutions
Dashboards and Visualizations
Machine Learning and Analytics
Big Data Store
Information Management
Cortana Analytics Suite
16
Analysing Big Data in Azure
Azure Data Lake Family
HDInsight Data Lake Store Data Lake Analytics
• Unlimited storage• WebHDFS Store
• Managed cluster service• Open-source technology• Runs on Windows or
Linux
• Managed job service• U-SQL batch-processing
Azure Data Lake Store➔ WebHDFS compatible➔ Any size➔ Any format as-is➔ Write-once-read-many➔ Enterprise-grade security
➔ Thé big data store in Azure
18
Characteristics➔ Data Warehousing
➔ Structured data➔ Defined set of schemas➔ Requires Extract-
Transform-Load (ETL) before storing
➔ Known for some of us
➔ Exploratory analysis is hard because of transforming the data
19
Data Lake vs Data Warehousing➔ Data Lake
➔ Raw data(unstructured/semi-structured/structured)
➔ “Dump” all your data in the lake
➔ Data scientists will interpret data from the lake
➔ Without metadata, turns in a data swamp pretty fast
Azure Data Lake Analytics➔ Run analytics jobs on managed clusters
➔ Don’t worry about scale➔ Written in U-SQL
➔ SQL Syntax➔ Extensibility in C#
➔ Easily scaled with Analytics Units➔ Pay for processing time only
21
Writing U-SQL scripts
22
Extract from data source by using built-in or custom extractors.
Transform / Analyse the data using SQL-syntax, in-line C# or C# method calls
Output the result to a data source by using built-in or custom extractors
23
Data Lake Analytics - Data Sources
U-SQL Query
Query
Query
Query
Write
Query
Azure Storage Blobs
Azure Data Lake Store
Azure SQL Database
Azure SQL Data
Warehouse
Azure SQL in VMs
Azure Data Lake Analytics
25
Meet StackExchange➔ Over 280 subwebsites➔ 150+ GB of open-source data➔ Different kinds of data
➔ Posts➔ Users➔ Votes➔ ...
➔ A big data sample data set
What Are We Going To Do?
• Downloading the original data set
Acquiring The Data
• Upload data set to Azure• Determine what
service to use
Moving The Data
• Visualize what we’ve learned
Visualizing The Data
27
Azure Data Lake tools for Visual Studio➔ Projects / Solutions / Source control➔ Store Explorer
➔ Browse store➔ Download complete / subset of file➔ Preview
➔ Job Visualizer➔ Determine bottlenecks by using heatmaps➔ Playback jobs based on telemetry➔ Query optimization➔ Job Profiler
➔ Off-Line execution28
Integration with Azure Services➔ Integrate in your data pipelines in Azure Data
Factory➔ Move data from Azure Data Lake Store to other store➔ Move data to Azure Data Lake Store➔ Run U-SQL query within pipeline
➔ Integration with Azure Data Catalog➔ Register your Azure Data Lake Store assets
29
Pricing➔ Data Lake Store
➔ $0,08/GB stored per month➔ $0,14 per 1M transactions
• 1 transaction is block of up to 128 kB➔ Egress will be billed but not know yet
➔ Data Lake Analytics➔ $0,05 per job➔ $0,05 per minute per Analytics Unit for processing
time
30
Azure Data Lake Store vs Blob Storage
31
No LimitationsStore whatever you want in any format
SecurityBuilt-in Azure Active Directory support
PricingMore expensive than Storage RA-GRS
RedundancyIt’s there but no control over it
Built for ScaleOptimized for high-scale reads
IntegrationWith Data Factory, Data Catalog & HDInsight
32
Summary➔ Big Data is not just a hype so get ready➔ Azure Data Lake Store
➔ Analyse today & explore tomorrow➔ Data Swamps
➔ Data Lake Analytics➔ No cluster management➔ Re-use existing skills➔ Pay for what we use
➔ Big Data in Azure? Azure Data Lake family and it’s easy!
35
36
37