Survey of the Microsoft Azure Data Landscape

Preview:

Citation preview

Survey of the Microsoft Azure

Data Landscape

Ike Ellis, Partner, Crafting Bytes

2

Please silence cell phones

Agenda• Azure Blob Storage• Azure Table Storage• Azure DocumentDB• Azure SQL Database• Azure SQL in a VM• Azure SQL Data Warehouse• Azure Data Lake• Lots of other things supported: • Postgres, MySQL, MongoDB, Redis

4

Topic Agenda• What is it?• How is it used?• What are the competitors?• DEMO!

5

6

Azure Blob Storage

7

Azure Blob Storage• Blobs are files (PDFs, JPGs, DOCs, etc)• Highly durable, massively scalable• More than 40 trillion stored objects• 3.5+ Million requests/second• Exposed via REST APIs• Use them in .NET, C++, Java, Node.JS, Android…• AzCopy, PowerShell

8

Blob Storage Fault Tolerance & Scalability

9

What kind of blobs can I have?• Share files with clients• off-load static content from web servers (invoices, contracts,

resumes)• Azure Websites – Platform as a Service – no files on a webserver• SQL BAK Files• VM Hard Drives

10

Competitors?• On premise SANS and arrays• Amazon S3 Blob Storage

Azure Blob StorageDemo

12

Azure Table Storage• Much of it similar to Azure Blob Storage• Same scalability & redundancy• Affordable price• Very, very fast• NoSQL key value pair solution• Quick data retrieval, little configuration

13

14

15

Competitors• Amazon DynamoDB Table Storage

Azure Table StorageDemo

JSON DocumentStandard for passing data between a server and a web applicationReplacement for XMLHierarchicalTerseSimple data types

Modeling in DocumentDB{ "id": "1", "firstName": "Thomas", "lastName": "Andersen", "addresses": [ { "line1": "100 Some Street", "line2": "Unit 1", "city": "Seattle", "state": "WA", "zip": 98012 } ], "contactDetails": [ {"email: "thomas@andersen.com"}, {"phone": "+1 555 555-5555", "extension": 5555} ] }

Reading is one operation

Writing is one operation

No assembly de-assembly

Query Playground

http://www.documentdb.com/sql/demo

20

• MongoDB• Amazon DynamoDB

Competitors

22

Azure SQL Database

• Platform as a Service• All data is backed up for you• Point in time restore• Can be geo-redundant• Scalable both in performance and in data size• Up to 1TB• Not feature complete with SQL Server in a VM

24

Database Replicas

Replica 1

Replica 2

Replica 3

DB

Single Logical Database

Multiple Physical Replicas

Single Primary

26

You can also make it scale up!

27

Amazon RDS

Competitors

Azure SQL DatabaseDemo

29

• You manage backups• You create fault tolerant options• You manage disk space• You manage patching• You don’t manage hardware failure• You don’t manage purchasing hardware• You don’t manage networking infrastructure

Azure SQL Server in a VM

30

• Use Premium Storage.• Use a VM size of DS3 or higher for SQL Enterprise edition

and DS2 or higher for SQL Standard edition.• Use a minimum of 2 P30 disks (1 for log files; 1 for data files

and TempDB).• Keep the storage accountand SQL Server VM in the same

region.• Disable Azure geo-redundant storage (geo-replication) on

the storage account.• Avoid using operating system or temporary disks for

database storage or logging.

Performance Considerations

31

• Back up to Azure Blob Storage• Use Always on Availability Groups and Windows

Failover Clustering Services (WFCS) for fault tolerance

• Can use mirroring or log shipping, too• Can also mix in on-premise

Backups & Fault Tolerance

32

Amazon EC2 – VMs in the cloud

Competitors

33

• Elastic Massively Parallel Processing System• Use T-SQL to query across relational and non-

relational data• Up to petabyte volumes of data• Scale compute separately from data• When paused, you only pay for storage• Deploys in seconds

Azure SQL Data Warehouse

34

• Supports 32 concurrent queries• Used for fanning out queries over multiple

machines for processing/aggregation/analytics• Performance becomes far more predictable than

with just straight SQL Server• Not used in OLTP environments

Azure SQL Data Warehouse

35

• A unit of scale that determines how much hardware will give great performance

• Done in increments of 100 (mostly)• How many DTUs?• Start Small• Monitor • Change as needed, it’s instant

What is a DTU (Data Warehouse Unit)?

ALTER DATABASE MySQLDW MODIFY (SERVICE_OBJECTIVE = 'DW1000') ;

36

Two choices:• Distribute data based on hashing values from a

single column• Good if clusters of tables will be joined and are related

• Distribute data evenly but randomly• Fail-safe method

Partitioning Data

37

38

Non-supported data types•geometry, use a varbinary type•geography, use a varbinary type•hierarchyid, CLR type not native•image, text, ntext when text based use varchar/nvarchar (smaller the better)•nvarchar(max), use varchar(4000) or smaller for better performance•numeric, use decimal•sql_variant, split column into several strongly typed columns•sysname, use nvarchar(128)•table, convert to temporary tables•timestamp, re-work code to use datetime2 and CURRENT_TIMESTAMP function. •varchar(max), use varchar(8000) or smaller for better performance•uniqueidentifier, use varbinary(8)•user defined types, convert back to their native types where possible•xml, use a varchar(8000) or smaller for better performance - split across columns if needed

39

• primary keys• foreign keys• check constraints• unique constraints• unique indexes• computed columns• sparse columns• user-defined types• indexed views• identities• sequences• triggers• synonyms

Unsupported Features

40

Amazon RedShift

Competitors

Azure SQL Data WarehouseDemo

42

HDFS for the cloudCan use tools like Spark, Storm, Flume, Sqoop, Kafka, etc.No fixed limits on account size or file size

Azure Data Lake

43

• An enterprise wide repository of every type of data collected in a single place

• Prior to any formal definition of requirements or schema. Allows every type of data to be kept without discrimination Organizations can then use Hadoop or advanced analytics to find patterns of the data.

• Serve as a repository for lower cost data preparation prior to moving curated data into a data warehouse.

What is a generic data lake?

44

• Azure Data Lake Store – Built on HDFS• Azure Data Lake Analytics – Built on Yarn.

Introduces U-SQL

Products

45

A lot of Hadoop implementations, but nothing really quite like it

Competitors

46

• MongoDB• PostGres• Redis• MySQL• Oracle

More data options….

47

Session Evaluations

ways to access

Go to passSummit.com

Download the GuideBook App and search: PASS Summit 2015

Follow the QR code link displayed on session signage throughout the conference venue and in the program guide

Submit by 5pmFriday November 6th toWIN prizes

Your feedback is important and valuable.

Ike EllisCrafting Bytes• Small San Diego Software Studio• Modern web, mobile, Azure, SQL Server• Looking for future teammates!Book: Developing Azure SolutionsPodcast Guest: Talk Python to Me – Dec 2015

.NET Rocks – Sept 2015• www.craftingbytes.com• blog.ikeellis.com• www.ikeellis.com• SDTIG – www.sdtig.com

Ike Ellis, MVP@ike_ellis619.922.9801ike@craftingbytes.com

Recommended