Upload
michael-rys
View
296
Download
1
Embed Size (px)
Citation preview
Michael RysPrincipal Program Manager, Big Data @ Microsoft@MikeDoesBigData, {mrys, usql}@microsoft.com
U-SQL Meta Data Catalog
2016/04/04
Meta Data Object ModelADLA Catalog
Database
Schema
[1,n]
[1,n]
[0,n]
tables views TVFs
C# Fns C# UDAgg
ClusteredIndex
partitions
C# Assemblies
C# Extractors
Data Source
C# ReducersC# Processors
C# CombinersC# Outputters
Ext. tables Procedures
Creden-tials
C# Applier
Table Types
Statistics
C# UDTs
Abstract objects
User objects
Refers toContains
Implemented and named by
MD Name
C# Name
Legend
U-SQL Catalog• Naming• Discovery• Sharing• Securing
Naming• Default database and schema context:
master.dbo• Quote identifiers with []: [my table]• Stores data in ADL Storage /catalog folder
Discovery• Visual Studio Server Explorer• Azure Data Lake Analytics Portal• SDKs and Azure PowerShell commands
Sharing• Within an Azure Data Lake Analytics account
Securing• Secured with AAD principals at catalog level
(inherited from ADL Storage)
Create shareable data and code
Views and TVFs• Views for
simple cases• TVFs for
parameterization and most cases
ViewsCREATE VIEW V AS EXTRACT…CREATE VIEW V AS SELECT …
• Cannot contain user-defined objects (such as UDFs or UDOs)
• Will be inlined
Table-Valued Functions (TVFs)CREATE FUNCTION F (@arg string = "default") RETURNS @res [TABLE ( … )] AS BEGIN … @res = … END;
• Provides parameterization• One or more results• Can contain multiple statements• Can contain user-code (needs assembly reference)• Will always be inlined • Infers schema or checks against specified return
schema
ProceduresAllows encapsulation of non-DDL scripts
CREATE PROCEDURE P (@arg string = "default“) ASBEGIN …; OUTPUT @res TO …; INSERT INTO T …;END;
• Provides parameterization• No result but writes into file or table• Can contain multiple statements• Can contain user code (needs assembly
reference)• Will always be inlined • Cannot contain DDL (no CREATE, DROP)
Table typesEnables you to name a table schema
Provides reuse for function/procedure definitions
CREATE TYPE T AS TABLE(c1 string, c2 int );
CREATE FUNCTION F (@table_arg T) RETURNS @res T AS BEGIN … @res = … END;
Tables• CREATE TABLE• CREATE TABLE
AS SELECT
CREATE TABLE T (col1 int , col2 string , col3 SQL.MAP<string,string> , INDEX idx CLUSTERED (col1 ASC) PARTITIONED BY HASH (driver_id) );
• Structured Data• Built-in Data types only (no UDTs)• Clustered index (must be specified): row-
oriented• Fine-grained partitioning (must be specified):
• HASH, DIRECT HASH, RANGE, ROUND ROBIN
CREATE TABLE T (INDEX idx CLUSTERED …) AS SELECT …;CREATE TABLE T (INDEX idx CLUSTERED …) AS EXTRACT…;CREATE TABLE T (INDEX idx CLUSTERED …) AS myTVF(DEFAULT);
• Infer the schema from the query• Still requires index and partitioning
Additional Resources
DocumentationU-SQL DDL: https://msdn.microsoft.com/en-us/library/azure/mt621299.aspx
Sample Projectshttps://github.com/Azure/usql/tree/master/Examples/AmbulanceDemos/AmbulanceDemos/2-Ambulance-Structured%20Data https://github.com/Azure/usql/tree/master/Examples/TweetAnalysis
http://aka.ms/AzureDataLake