10
Michael Rys Principal Program Manager, Big Data @ Microsoft @MikeDoesBigData, {mrys, usql}@microsoft.com U-SQL Meta Data Catalog 2016/04/04

U-SQL Meta Data Catalog (SQLBits 2016)

Embed Size (px)

Citation preview

Page 1: U-SQL Meta Data Catalog (SQLBits 2016)

Michael RysPrincipal Program Manager, Big Data @ Microsoft@MikeDoesBigData, {mrys, usql}@microsoft.com

U-SQL Meta Data Catalog

2016/04/04

Page 2: U-SQL Meta Data Catalog (SQLBits 2016)

Meta Data Object ModelADLA Catalog

Database

Schema

[1,n]

[1,n]

[0,n]

tables views TVFs

C# Fns C# UDAgg

ClusteredIndex

partitions

C# Assemblies

C# Extractors

Data Source

C# ReducersC# Processors

C# CombinersC# Outputters

Ext. tables Procedures

Creden-tials

C# Applier

Table Types

Statistics

C# UDTs

Abstract objects

User objects

Refers toContains

Implemented and named by

MD Name

C# Name

Legend

Page 3: U-SQL Meta Data Catalog (SQLBits 2016)

U-SQL Catalog• Naming• Discovery• Sharing• Securing

Naming• Default database and schema context:

master.dbo• Quote identifiers with []: [my table]• Stores data in ADL Storage /catalog folder

Discovery• Visual Studio Server Explorer• Azure Data Lake Analytics Portal• SDKs and Azure PowerShell commands

Sharing• Within an Azure Data Lake Analytics account

Securing• Secured with AAD principals at catalog level

(inherited from ADL Storage)

Page 4: U-SQL Meta Data Catalog (SQLBits 2016)

Create shareable data and code

Page 5: U-SQL Meta Data Catalog (SQLBits 2016)

Views and TVFs• Views for

simple cases• TVFs for

parameterization and most cases

ViewsCREATE VIEW V AS EXTRACT…CREATE VIEW V AS SELECT …

• Cannot contain user-defined objects (such as UDFs or UDOs)

• Will be inlined

Table-Valued Functions (TVFs)CREATE FUNCTION F (@arg string = "default") RETURNS @res [TABLE ( … )] AS BEGIN … @res = … END;

• Provides parameterization• One or more results• Can contain multiple statements• Can contain user-code (needs assembly reference)• Will always be inlined • Infers schema or checks against specified return

schema

Page 6: U-SQL Meta Data Catalog (SQLBits 2016)

ProceduresAllows encapsulation of non-DDL scripts

CREATE PROCEDURE P (@arg string = "default“) ASBEGIN …; OUTPUT @res TO …; INSERT INTO T …;END;

• Provides parameterization• No result but writes into file or table• Can contain multiple statements• Can contain user code (needs assembly

reference)• Will always be inlined • Cannot contain DDL (no CREATE, DROP)

Page 7: U-SQL Meta Data Catalog (SQLBits 2016)

Table typesEnables you to name a table schema

Provides reuse for function/procedure definitions

CREATE TYPE T AS TABLE(c1 string, c2 int );

CREATE FUNCTION F (@table_arg T) RETURNS @res T AS BEGIN … @res = … END;

Page 8: U-SQL Meta Data Catalog (SQLBits 2016)

Tables• CREATE TABLE• CREATE TABLE

AS SELECT

CREATE TABLE T (col1 int , col2 string , col3 SQL.MAP<string,string> , INDEX idx CLUSTERED (col1 ASC) PARTITIONED BY HASH (driver_id) );

• Structured Data• Built-in Data types only (no UDTs)• Clustered index (must be specified): row-

oriented• Fine-grained partitioning (must be specified):

• HASH, DIRECT HASH, RANGE, ROUND ROBIN

CREATE TABLE T (INDEX idx CLUSTERED …) AS SELECT …;CREATE TABLE T (INDEX idx CLUSTERED …) AS EXTRACT…;CREATE TABLE T (INDEX idx CLUSTERED …) AS myTVF(DEFAULT);

• Infer the schema from the query• Still requires index and partitioning

Page 10: U-SQL Meta Data Catalog (SQLBits 2016)

http://aka.ms/AzureDataLake