Upload
michael-rys
View
536
Download
2
Embed Size (px)
Citation preview
Michael RysPrincipal Program Manager, Big Data @ Microsoft@MikeDoesBigData, {mrys, usql}@microsoft.com
U-SQL Federated Distributed Queries
Query data where it livesEasily query data in multiple Azure data stores without moving it to a single store
Benefits• Avoid moving large amounts of data across the
network between stores• Single view of data irrespective of physical
location • Minimize data proliferation issues caused by
maintaining multiple copies• Single query language for all data• Each data store maintains its own sovereignty• Design choices based on the need• Push SQL expressions to remote SQL sources
• Filters• Joins
U-SQL Query
Query
Query
Query
Write
Azure Storage Blobs
Azure SQL in VMs
Azure SQL DB
Azure Data Lake Analytics
Query
Azure SQL Data Warehouse
Quer
y
Writ
e
Azure Data Lake Storage
Federated queries
• Minimize data proliferation through data consolidation
• Same U-SQL over all Azure data (WASB, SQL Azure)
• Efficient and reliable execution strategies• Striving to maintain semantic equivalence• Design choices based on requirements:• Schema-less design
• fast time-to-query and exploratory analysis• Schematized design
• protect applications from data source changes• Advanced federated query capabilities:• Built-in decisions to optimize for performance
• push downs of joins, predicates, projection• Control when and what to push down
• Prevent data source overload • Provide control over semantics
Data sources and external tables
• Secure credential management
• Data sources to manage connections and remoting of queries
• Schematized design:external tables to provide early bound tables for federated queries
Create secret in PowerShellNew-AzureRMDataLakeAnalyticsCatalogSecret
Create credential CREATE CREDENTIAL Secret WITH USER_NAME = “user@server", IDENTITY = "Secret";
Create external data source on • Azure SQL DB• Azure SQL DW• SQL Server in Azure VM
CREATE DATA SOURCE SQL_PATIENTS FROM SQLSERVER WITH ( PROVIDER_STRING = "Database=DB;Trusted_Connection=False;Encrypt=False" , CREDENTIAL = Secret , REMOTABLE_TYPES = (bool, byte, short, string, DateTime) );
External tables (optional)CREATE EXTERNAL TABLE sql_patients (
[custkey] int, [name] string, [address] string
) FROM SQL_PATIENTS LOCATION "dbo.patients";
Federated queries• Queries have to be in
a different script from data source
• Pass-through queries to execute remote language
• Schema-less design:query data source location
• Schematized design:query external tables
• Semantics of federated queries close to U-SQL and C#
Pass-Through Query@alive_patients = SELECT * FROM EXTERNAL SQL_PATIENTS EXECUTE @" SELECT name , CASE WHEN is_alive = 1 THEN 'Alive' ELSE 'Deceased' END AS status , address, nationkey, phone FROM dbo.patients";
Query Data Source Location@patients = SELECT * FROM EXTERNAL master.SQL_PATIENTS LOCATION "dbo.patients";
Query External Tables@patients = SELECT * FROM EXTERNAL master.dbo.sql_patients;
Execution• U-SQL Semantics• Pushes predicates and even joins based on remotable types