37
Michael Rys Principal Program Manager, Big Data @ Microsoft @MikeDoesBigData, {mrys, usql}@microsoft.com U-SQL Query Execution and Performance Basics

U-SQL Query Execution and Performance Basics (SQLBits 2016)

Embed Size (px)

Citation preview

Page 1: U-SQL Query Execution and Performance Basics (SQLBits 2016)

Michael RysPrincipal Program Manager, Big Data @ Microsoft@MikeDoesBigData, {mrys, usql}@microsoft.com

U-SQL Query Execution and Performance Basics

Page 2: U-SQL Query Execution and Performance Basics (SQLBits 2016)

Simplified U-SQL Job Workflow

Job Front End

Job Scheduler Compiler ServiceJob Queue

Job Manager

U-SQL CatalogYARN

Job submission

Job execution

U-SQL Runtime Vertex execution

Page 3: U-SQL Query Execution and Performance Basics (SQLBits 2016)

U-SQL Compilation Process

C#

C++

AlgebraOther files

(system files, deployed resources)

managed dllUnmanaged

dll

Compilation output (in job folder)

Compiler & Optimizer

U-SQL Metadata Service

Deployed to Vertices

Page 4: U-SQL Query Execution and Performance Basics (SQLBits 2016)

Job Status in Visual Studio

Page 5: U-SQL Query Execution and Performance Basics (SQLBits 2016)

Preparing

Queued

Running

Finalizing

Ended(Succeeded, Failed, Cancelled)

NewCompiling

QueuedScheduling

Starting

Running

Ended

UX Job State

The script is being compiled by the Compiler Service

All jobs enter the queue.

Are there enough ADLAUs to start the job?

If yes, then allocate those ADLAUs for the job

The U-SQL runtime is now executing the code on 1 or more ADLAUs or finalizing the outputs

The job has concluded.

Page 6: U-SQL Query Execution and Performance Basics (SQLBits 2016)

The Job Queue

The queue is ordered by job priority.

Lower numbers -> higher priority.

1 = highest.

Running jobs

When a job is at the top of the queue, it will

start running.

Defaults: Max Running Jobs = 3Max Tokens per job = 20Max Queue Size = 200

Page 7: U-SQL Query Execution and Performance Basics (SQLBits 2016)

Priority Doesn’t Preempt Running Jobs

X has Pri=1.

XA

B

C

X will NOT preempt running jobs. X will have to wait.

These are all running and have very low priority

(pri=1000)

Page 8: U-SQL Query Execution and Performance Basics (SQLBits 2016)

Resources

Page 9: U-SQL Query Execution and Performance Basics (SQLBits 2016)

Blue items: the output of the compiler

Grey items: U-SQL runtime bits

Download all the resources

Download a specific resource

Page 10: U-SQL Query Execution and Performance Basics (SQLBits 2016)

The Job FolderInside the Default ADL Store:

/system/jobservice/jobs/Usql/YYYY/MM/DD/hh/mm/JOBID

/system/jobservice/jobs/Usql/2016/01/20/00/00/17972fc2-4737-48f7-81fb-49af9a784f64

Page 11: U-SQL Query Execution and Performance Basics (SQLBits 2016)

Query ExecutionPlans, Vertices, Stages, Parallelism,

ADLAUs

Page 12: U-SQL Query Execution and Performance Basics (SQLBits 2016)

Job Scheduler & Queue

Fron

t-End

Ser

vice

13

Optimizer

Vertex Scheduling

Compiler

Runtime

Visual Studio

Portal / API

Query Life

Page 13: U-SQL Query Execution and Performance Basics (SQLBits 2016)

Job Execution

Page 14: U-SQL Query Execution and Performance Basics (SQLBits 2016)

Parallelism100 (ADLAUs)

Work composed of12K Vertices

Page 15: U-SQL Query Execution and Performance Basics (SQLBits 2016)

U-SQL Script -> Job GraphLogical -> Physical Plan

Each square = “a vertex” represents a fraction of the total

Vertexes in each SuperVertex (aka “Stage) are doing the same operation on a different part of the same data. Visualized as a

“Job Graph”

Page 16: U-SQL Query Execution and Performance Basics (SQLBits 2016)

ADLAUs AzureData LakeAnalyticsUnit

Parallelism N = N ADLAUs

1 ADLAU ~= A VM with 2 cores and 6 GB of memory

Page 17: U-SQL Query Execution and Performance Basics (SQLBits 2016)

Execution with Requested ParallelismRequested Parallelism =

1(reserve enough to do 1

vertex at a time)

Requested Parallelism = 4

(reserve enough to do 4 vertices at a time)

Page 18: U-SQL Query Execution and Performance Basics (SQLBits 2016)

NotesThe next stage can

start before the previous one has

finished

It may not be possible to use all

the reserved parallelism during

a Stage

Page 19: U-SQL Query Execution and Performance Basics (SQLBits 2016)

NotesThe Job Resources are copied to each vertex

JOB RESOURCE

S

Page 20: U-SQL Query Execution and Performance Basics (SQLBits 2016)

Stage Details252 Pieces of

work

AVG Vertex execution time

4.3 Billion rows

Data Read & Written

Super Vertex = Stage

Page 21: U-SQL Query Execution and Performance Basics (SQLBits 2016)

Automatic Vertex retry

ORANGE: A vertex failed … but was

retried automatically

Overall Stage Completed

Successfully

Page 22: U-SQL Query Execution and Performance Basics (SQLBits 2016)

Vertex Execution View

Page 23: U-SQL Query Execution and Performance Basics (SQLBits 2016)

All the vertexes

Filter which vertexes to see

Page 24: U-SQL Query Execution and Performance Basics (SQLBits 2016)

The Critical Path

Page 25: U-SQL Query Execution and Performance Basics (SQLBits 2016)

Vertex Relationships

The vertex on the bottom depends on the output of the vertex in the top

Page 26: U-SQL Query Execution and Performance Basics (SQLBits 2016)

Critical Path

The dependency chain of vertexes that kept the job running to the

very end.

Page 27: U-SQL Query Execution and Performance Basics (SQLBits 2016)

EfficiencyCost vs Latency

Page 28: U-SQL Query Execution and Performance Basics (SQLBits 2016)

𝐽𝑜𝑏𝐶𝑜𝑠𝑡=5𝑐+ (𝑚𝑖𝑛𝑢𝑡𝑒𝑠× 𝐴𝐷𝐿𝑈𝐴𝑠×𝐴𝐷𝐿𝐴𝑈𝑐𝑜𝑠𝑡𝑝𝑒𝑟𝑚𝑖𝑛 )

Page 29: U-SQL Query Execution and Performance Basics (SQLBits 2016)

Allocation

Allocating 10 ADLAUsfor a 10 minute job.

Cost = 10 min * 10 ADLAUs = 100 ADLAU minutes

Time

Blue line: Allocated

Page 30: U-SQL Query Execution and Performance Basics (SQLBits 2016)

Over Allocation Consider using fewer ADLAUs

You are paying for the area under the blue line

You are only using the area under the red line

Time

Page 31: U-SQL Query Execution and Performance Basics (SQLBits 2016)

Profile isn’t loaded

Page 32: U-SQL Query Execution and Performance Basics (SQLBits 2016)

Profile is loaded now

Click Resource usage

Page 33: U-SQL Query Execution and Performance Basics (SQLBits 2016)

Blue: Allocation

Red: Actual running

Page 34: U-SQL Query Execution and Performance Basics (SQLBits 2016)

Dips down to 1 active vertex at these times

Page 35: U-SQL Query Execution and Performance Basics (SQLBits 2016)

Smallest estimated time when given 2425 ADLAUs

1410 seconds= 23.5 minutes

Page 36: U-SQL Query Execution and Performance Basics (SQLBits 2016)

Model with 100 ADLAUs

8709 seconds= 145.5 minutes

Page 37: U-SQL Query Execution and Performance Basics (SQLBits 2016)

http://aka.ms/AzureDataLake