32

Introduction to Azure Data Lake

Embed Size (px)

Citation preview

Page 1: Introduction to Azure Data Lake
Page 2: Introduction to Azure Data Lake

Introduction to Azure Data LakeAthens May 26, 2017

Page 3: Introduction to Azure Data Lake

Pre

sen

ter

Info 1982 I started working with computers

1988 I started my professional career in computers industry

1996 I started working with SQL Server 6.0

1998 I earned my first certification at Microsoft as Microsoft Certified Solution Developer (3rd in Greece)

1999 I started my career as Microsoft Certified Trainer (MCT) with more than 30.000 hours of training until now!

2010 I became for first time Microsoft MVP on Data PlatformI created the SQL School Greece www.sqlschool.gr

2012 I became MCT Regional Lead by Microsoft Learning Program.

2013 I was certified as MCSE : Data PlatformI was certified as MCSE : Business Intelligence

2016 I was certified as MCSE: Data Management & Analytics

Antonios Chatzipavlis

SQL Server Expert and EvangelistData Platform MVP

MCT, MCSE, MCITP, MCPD, MCSD, MCDBA, MCSA, MCTS, MCAD, MCP, OCA, ITIL-F

Page 4: Introduction to Azure Data Lake

SQ

Lsc

ho

ol.

gr

Μια πηγή ενημέρωσης για τον Microsoft SQL Server προς τους Έλληνες IT Professionals, DBAs, Developers, Information Workers αλλά και απλούς χομπίστες που απλά τους αρέσει ο SQL Server.

Help line : [email protected]

• Articles about SQL Server• SQL Server News• SQL Nights• Webcasts• Downloads• Resources

What we are doing here Follow us in socials

fb/sqlschoolgrfb/groups/sqlschool

@antoniosch@sqlschool

yt/c/SqlschoolGr

SQL School Greece group

SELECT KNOWLEDGE FROM SQL SERVER

Page 5: Introduction to Azure Data Lake

▪ Sign up for a free membership today at sqlpass.org.

▪ Linked In: http://www.sqlpass.org/linkedin

▪ Facebook: http://www.sqlpass.org/facebook

▪ Twitter: @SQLPASS

▪ PASS: http://www.sqlpass.org

Page 6: Introduction to Azure Data Lake

PA

SS

Vir

tua

l Ch

ap

ters

Page 7: Introduction to Azure Data Lake

Data Lake Overview

Page 8: Introduction to Azure Data Lake

What is Azure Data Lake?

“A single store of all data… ranging from raw data (which implies exact copy of source system data) to transformed data which is used for various forms including reporting, visualization, analytics, and machine learning”

Page 9: Introduction to Azure Data Lake

Built on Open-Source

Page 10: Introduction to Azure Data Lake

Azure Ecosystem Integration

Azure Data Lake

Page 11: Introduction to Azure Data Lake

• Data Lake Analytics

• HDInsight

• Data Lake Store

• Develop, debug, and optimize big data programs with ease

• Integrates seamlessly with your existing IT investments

• Store and analyze petabyte-size files and trillions of objects

• Affordable and cost effective

• Enterprise grade security, auditing, and support

What Azure Data Lake Offers?

Page 12: Introduction to Azure Data Lake

Data Lakes vs Data Warehouses

DATA WAREHOUSE vs. DATA LAKE

StructuredProcessed

DATA

StructuredSemi-structuredUnstructuredRaw

Schema-on-Write PROCESSING Schema-on-Read

Expensive for large data volumes STORAGE Designed for low-cost storage

Less AgileFixed configuration

AGILITYHighly AgileConfigure and Reconfigure as needed

Mature SECURITY Maturing

Business Professionals USERS Data Scientists et. al.

Page 13: Introduction to Azure Data Lake

Data Lake Store

Page 14: Introduction to Azure Data Lake

• Enterprise-wide hyper-scale repository for big data analytic workloads.

- Azure Data Lake enables you to capture data of any size, type, and ingestion speed in one single

place for operational and exploratory analytics.

• Can be accessed from Hadoop (available with HDInsight cluster) using

the WebHDFS-compatible REST APIs.

• Specifically designed to enable analytics on the stored data and is tuned

for performance for data analytics scenarios.

• It includes, out of the box, all the enterprise-grade capabilities

- security, manageability, scalability, reliability, and availability

• Essential for real-world enterprise use cases.

What is Azure Data Lake Store?

Page 15: Introduction to Azure Data Lake

Azure Data Lake Store vs Azure Blob Storage

AZURE DATA LAKE STORE vs. AZURE BLOB STORAGE

Optimized storage for big data analytics workloads

PURPOSEGeneral purpose object store for a wide variety of storage scenarios

Batch, interactive, streaming analytics and machine learning data such as

log files, IoT data, click streams, large datasets

USE CASES

Any type of text or binary data, such as application back end, backup data, media storage for streaming and general purpose data

Data Lake Store account contains folders, which in turn contains data

stored as filesKEY CONCEPTS

Storage account has containers, which in turn has data in the form of blobs

Hierarchical file system STRUCTURE Object store with flat namespace

Based on Azure Active Directory Identities

SECURITYBased on shared secrets - Account Access Keys and Shared Access Signature Keys.

Page 16: Introduction to Azure Data Lake

Data Lake Analytics

Page 17: Introduction to Azure Data Lake

• Is an on-demand analytics job service to simplify big data analytics.

• Focus on writing, running, and managing jobs rather than on

operating distributed infrastructure.

• Can handle jobs of any scale instantly by setting the dial for how much

power you need.

• You only pay for your job when it is running, making it cost-effective.

• The analytics service supports Azure Active Directory letting you

manage access and roles, integrated with your on-premises identity

system.

What is Azure Data Lake Analytics?

Page 18: Introduction to Azure Data Lake

• Dynamic scaling

• Develop faster, debug, and optimize smarter using

familiar tools

• U-SQL: simple and familiar, powerful, and extensible

• Integrates seamlessly with your IT investments

• Affordable and cost effective

• Works with all your Azure Data

Azure Data Lake Analytics Key Capabilities

Page 19: Introduction to Azure Data Lake

HDInsight

Page 20: Introduction to Azure Data Lake

- A only fully-managed cloud Apache Hadoop offering

- Provides optimized open-source analytic clusters for

- Spark,

- Hive,

- MapReduce,

- HBase,

- Storm,

- Kafka,

- Microsoft R Server

- Provides a 99.9% SLA

- Deploy these big data technologies and ISV applications

as managed clusters with enterprise-level security and

monitoring.

What is Azure HDInsight?

Page 21: Introduction to Azure Data Lake

U-SQL

Page 22: Introduction to Azure Data Lake

Is the new big data query language of

the Azure Data Lake Analytics service

It evolved out of Microsoft's internal Big

Data language called

SCOPE : Easy and Efficient Parallel

Processing of Massive Data Sets by Ronnie Chaiken, Bob Jenkins, Per-Åke Larson, Bill Ramsey, Darren

Shakib, Simon Weaver, Jingren Zhou

http://www.vldb.org/pvldb/1/1454166.pdf

What is U-SQL?

Page 23: Introduction to Azure Data Lake

– a familiar SQL-like declarative

language

– with the extensibility and

programmability provided by C# types

and the C# expression language

– and big data processing concepts such

as “schema on reads”, custom

processors and reducers.

U-SQL combines

Page 24: Introduction to Azure Data Lake

– Azure Data Lake Storage,

– Azure Blob Storage,

– Azure SQL DB, Azure SQL Data

Warehouse,

– SQL Server instances running in

Azure VMs.

Provides the ability to query and combine data from a

variety of data sources

Page 25: Introduction to Azure Data Lake

– Its keywords such as SELECT have to be

in UPPERCASE.

– Its expression language inside SELECT

clauses, WHERE predicates etc is C#.

– This for example means, that the

comparison operations inside a predicate

follow C# syntax (e.g., a == "foo"),

– and that the language uses C# null

semantics which is 2-valued and not 3-

valued as in ANSI SQL.

It’s NOTANSI SQL

Page 26: Introduction to Azure Data Lake

• Azure Data Lake Analytics provides U-SQL for batch processing.

• U-SQL is written and executed in form of a batch script.

• U-SQL also supports data definition statements such as CREATE

TABLE to create metadata artifacts either in separate scripts or

sometimes even in combination with the transformation scripts.

• U-SQL Scripts can be submitted in a variety of ways.

- Directly from within the Azure Data Lake Tools for Visual Studio,

- From the Azure Portal

- Programmatically via the Azure Data Lake SDK job submission API

- Azure Powershell extension's job submission command

How does a U-SQL Script process Data?

Page 27: Introduction to Azure Data Lake

It follows the following general processing pattern:

• Retrieve data from stored locations in rowset format

- Stored locations can be files that will be schematized on read with EXTRACT expressions

- Stored locations can be U-SQL tables that are stored in a schematized format

- Or can be tables provided by other data sources such as an Azure SQL database.

• Transform the rowset(s)

- Several transformations over the rowsets can be composed in a data flow format

• Store the transformed rowset data

- Store it in a file with an OUTPUT statement, or

- Store it in a U-SQL table with an INSERT statement

How does a U-SQL Script process Data?

Page 28: Introduction to Azure Data Lake

DECLARE @in string = "/Samples/Data/SearchLog.tsv";

DECLARE @out string = "/output/result.tsv";

@searchlog = EXTRACT UserId int, Start DateTime, Region string, Query string,

Duration int?, Urls string, ClickedUrls string

FROM @in USING Extractors.Tsv();

@rs1 = SELECT Start, Region, Duration FROM @searchlog WHERE Region == "en-gb";

@rs1 = SELECT Start, Region, Duration FROM @rs1

WHERE Start >= DateTime.Parse("2012/02/16");

OUTPUT @rs1

TO @out

USING Outputters.Tsv();

U-SQL Scripts

Page 29: Introduction to Azure Data Lake

DEMO– Create Data Lake Stores

– Create Data Lake Analytics accounts and connect them to Data Lake Stores

– Import data into Azure Data Lake Stores

– Run U-SQL jobs in Azure Data Lake Analytics

Page 30: Introduction to Azure Data Lake

Ask your Questions

Page 31: Introduction to Azure Data Lake

Thank you

Page 32: Introduction to Azure Data Lake

SELECT KNOWLEDGE FROM SQL SERVER

Copyright © 2017 SQLschool.gr. All right reserved. PRESENTER MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION