7
Michael Rys Principal Program Manager, Big Data @ Microsoft @MikeDoesBigData, {mrys, usql}@microsoft.com U-SQL Federated Distributed Queries

U-SQL Federated Distributed Queries (SQLBits 2016)

Embed Size (px)

Citation preview

Michael RysPrincipal Program Manager, Big Data @ Microsoft@MikeDoesBigData, {mrys, usql}@microsoft.com

U-SQL Federated Distributed Queries

DEMOWorking with data where it lives

Query data where it livesEasily query data in multiple Azure data stores without moving it to a single store

Benefits• Avoid moving large amounts of data across the

network between stores• Single view of data irrespective of physical

location • Minimize data proliferation issues caused by

maintaining multiple copies• Single query language for all data• Each data store maintains its own sovereignty• Design choices based on the need• Push SQL expressions to remote SQL sources

• Filters• Joins

U-SQL Query

Query

Query

Query

Write

Azure Storage Blobs

Azure SQL in VMs

Azure SQL DB

Azure Data Lake Analytics

Query

Azure SQL Data Warehouse

Quer

y

Writ

e

Azure Data Lake Storage

Federated queries

• Minimize data proliferation through data consolidation

• Same U-SQL over all Azure data (WASB, SQL Azure)

• Efficient and reliable execution strategies• Striving to maintain semantic equivalence• Design choices based on requirements:• Schema-less design

• fast time-to-query and exploratory analysis• Schematized design

• protect applications from data source changes• Advanced federated query capabilities:• Built-in decisions to optimize for performance

• push downs of joins, predicates, projection• Control when and what to push down

• Prevent data source overload • Provide control over semantics

Data sources and external tables

• Secure credential management

• Data sources to manage connections and remoting of queries

• Schematized design:external tables to provide early bound tables for federated queries

Create secret in PowerShellNew-AzureRMDataLakeAnalyticsCatalogSecret

Create credential CREATE CREDENTIAL Secret WITH USER_NAME = “user@server", IDENTITY = "Secret";

Create external data source on • Azure SQL DB• Azure SQL DW• SQL Server in Azure VM

CREATE DATA SOURCE SQL_PATIENTS FROM SQLSERVER WITH ( PROVIDER_STRING = "Database=DB;Trusted_Connection=False;Encrypt=False" , CREDENTIAL = Secret , REMOTABLE_TYPES = (bool, byte, short, string, DateTime) );

External tables (optional)CREATE EXTERNAL TABLE sql_patients (

[custkey] int, [name] string, [address] string

) FROM SQL_PATIENTS LOCATION "dbo.patients";

Federated queries• Queries have to be in

a different script from data source

• Pass-through queries to execute remote language

• Schema-less design:query data source location

• Schematized design:query external tables

• Semantics of federated queries close to U-SQL and C#

Pass-Through Query@alive_patients = SELECT * FROM EXTERNAL SQL_PATIENTS EXECUTE @" SELECT name , CASE WHEN is_alive = 1 THEN 'Alive' ELSE 'Deceased' END AS status , address, nationkey, phone FROM dbo.patients";

Query Data Source Location@patients = SELECT * FROM EXTERNAL master.SQL_PATIENTS LOCATION "dbo.patients";

Query External Tables@patients = SELECT * FROM EXTERNAL master.dbo.sql_patients;

Execution• U-SQL Semantics• Pushes predicates and even joins based on remotable types

http://aka.ms/AzureDataLake