An In-Depth Look at SAP SQL Anywhere Performance Features

(c) 2015 Independent SAP Technical User Group Annual Conference, 2015

SQL Anywhere Performance

Features Jason Hinsperger

Product Manager

SAP


Agenda Review

SQL Anywhere design goals

Why self-management is important

Query processing in SQL Anywhere

SQL Anywhere performance

Sequential scans vs index scans

Multiprogramming level

Cache management

Adaptive query execution

Statistics management


Design Goals of SQL Anywhere

Ease of administration

Comprehensive yet comprehensible tools

Good out-of-the-box performance

“Embeddability” features self-tuning

Many environments have no DBA’s

Cross-platform support

Interoperability

A Holistic Approach to Autonomic Database Management


Autonomic Database Management

Self-Managing/Self-Configuring

Self-Tuning/Self-Adapting

Self-Healing

Monitoring and correcting/advising on problems

Self-protecting

Ease of administration

Goal: zero (manual) administration

Design Automation

Management Tools

Index consultant, application profiler, …


Why is Self-management Important?

In a word: complexity

Application development is becoming more complex: new development paradigms

such as ORM toolkits, distributed computation with synchronization amongst

database replicas, and so on

Databases are now ubiquitous in IT because they solve a variety of difficult problems

Yet most companies continue to own and manage a variety of different DBMS

products, which increases administrative costs

Ubiquity brings scale, in several ways

To keep TCO constant, one must improve the productivity of each developer


Agenda

Review




Sequential scans vs index scans


Cache management




Query Processing in SQL Anywhere

QOG Build

Execute

Close

Pre-Optimization

Scan SQL

Parse

Semantic Transformations

Prepare Parse Tree

Cursor

Join Enumeration

DFO Build

Open

Execute

Close

Post-Optimization


SQL Anywhere Performance SQL Anywhere is designed to get good performance with very little tuning

• Many auto-tuning and self-management capabilities designed to adapt:

Self-managing buffer pool: size and contents

Dynamic tuning of multi-programming level

Automatic statistics gathering, monitoring and healing

Self-tuning query optimization

Query optimization bypass for simple statements

Intra-query parallelism


IO intelligence for certain operations

Cache warming on startup and to steady state


SQL Anywhere Performance BUT…

Auto-tuning and self-management capabilities are designed to adapt to:

Hardware – CPU, I/O, memory of the machine

Queries being requested

Application logic and concurrency attributes

SQL Anywhere will adapt to different deployment environments

BUT: some adaptations may produce unacceptable performance

Eg. Low memory execution strategies

Many things can be done at development time to improve performance of

application and database interactions

Capacity planning, Performance analysis and improvements, Scalability testing


Agenda

Review




Reading Data – Sequential scans vs index scans


Cache management




How Fast is SQL Anywhere?

Typical conversation with regards to performance:

C: “So, how fast is SQL Anywhere?”

S: “Well, it depends on a variety of factors. Your database design, number of

concurrent users, hardware, server cache contents, etc…”

C: “I understand those things are important, but can’t you just tell me how

many rows the server can fetch per second?”

S: “Well, in this test, we can fetch between 80 and 30 million rows per

second”

C: “What? That makes no sense. My application can’t get anywhere near 30

million rows per second. Something must be broken. Can you fix it?”

S: “Well, it depends …”

…


How Many Rows Can We Read per Second?

8,161

31,272

125

5,083

1,111

5,616

0.08

0 0 0 1 10 100 1,000 10,000 100,000

Seq Scan cold

Seq Scan Hot

Non Clust IDX 1% cold

Non Clust IDX 1% hot

Clustered IDX 10% cold

Clustered IDX 10% hot

One row statement

Thousands

Rows Read Per Second on Z820 Server (256GB, 32 threads, SSD)


How Many Rows Can We Read per Second? (cont)

733

704

1

3,933

108

3,163

5.10

0 0 0 1 10 100 1,000 10,000 100,000

Seq Scan cold

Seq Scan Hot

Non Clust IDX 1% cold

Non Clust IDX 1% hot

Clustered IDX 10% cold

Clustered IDX 10% hot

One row statement

Thousands

Rows Read Per Second on T520 Laptop (8GB, 4 thread, HDD)


Cold Cache Performance on Two Hosts

I/O dominates cold cache performance

On HDD, sequential is much faster

Clustering of indexes is very important

Buffer size affects how many pages are

re-read

SSD has much better performance

Excellent throughput and random seeks

CPU speed/number is important when

I/O fast

81.9 0.1

492.0

3.5 55.4

0.00

200.00

400.00

600.00

Seq Scan NonCluIX.1%

NonCluIX 1% CluIX 1% CluIx 10%

T520: Laptop, 8GB, 4-thread, HDD

7.4

0.0

4.8

0.4

5.4

0.00

2.00

4.00

6.00

8.00

Seq Scan NonCluIX .1% NonCluIX 1% CluIX 1% CluIx 10%

Z820: Server, 256GB, 32-thread, SSD


Warm Cache Performance On Two Hosts

When data is in cache, CPU is the major factor

Parallelism is available but has overheads

Clock speed is an important factor

Buffer pool contents have huge

impact

85.3 0.0 0.2 0.2

1.9

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Seq Scan NonCluIX .1% NonCluIX 1% CluIX 1% CluIx 10%

T520: Laptop, 8GB, 4-thread, HDD

1.9

0.0 0.1 0.2

1.1

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Seq Scan NonCluIX.1%

NonCluIX1%

CluIX 1% CluIx 10%

Z820: Server, 256GB, 32-thread, SSD


Access Methods

Full Table Scan

Index Scan

Index


Deciding About Access Method

Full Table Scan

Reads all pages in a table => unnecessary I/O

Processes all rows => more CPU

But it benefits from sequential I/O

Index Scan

Reads only required pages

Processes only required rows => Less CPU

Suffers from Random I/O

Needs to read index pages in addition to table pages

Might need to re-fetch the same table pages

When selectivity is large enough, it might need to read the

entire table pages

0

10

20

30

40

50

60

0 20 40 60 80 100

Ru

nti

me

Selectivity (%)

Index Scan Full Table Scan

Selectivity

Break-even

Point


Factors For Choosing Between

Index and Table Scan Selectivity

Larger selectivity Table Scan

Small selectivity Index Scan

Row size (the number of rows per page)

Larger row size Shifts the break even point toward right (index scan performs better)

Cache contents

With more of the table in cache, more reads are satisfied from the cache

Available memory

Larger available memory Shifts the break even point toward right (index scan performs

better)

What about I/O Parallelism?


Parallel Index Scan in SAP SQL Anywhere

Leaf

Node

Leaf

Node

Index


0

20

40

60

80

100

120

1400

0.0

1

0.0

2

0.0

3

0.0

4

0.0

5

0.0

6

0.0

7

0.0

8

0.0

9

0.1

0.1

1

0.1

2

0.1

3

0.1

4

0.1

5

0.1

6

0.1

7

0.1

8

0.1

9

Tim

e (

Se

co

nd

)

Selectivity %

IS

PIS32

FTS

PFTS32

HDD – Parallel Index Scan Moves

Break-Even Point

Parallel

Break-even point

Non-parallel

Break-even point


0

2

4

6

8

10

12

14

16

18

200

0.2

0.4

0.6

0.8 1

1.2

1.4

1.6

1.8 2

2.2

2.4

2.6

2.8 3

3.2

3.4

3.6

3.8 4

4.2

4.4

4.6

4.8 5

5.2

5.4

5.6

5.8

Tim

e (

Se

co

nd

)

Selectivity %

IS

PIS32

FTS

PFTS32

SSD – Break-Even Point Moves

Further to Right Non-parallel

Break-even point

Parallel

Break-even point


Shift in Break Even Point NP-HDD P-HDD NP-SSD P-SSD

RPP=1 0.55% 1.4% 8% 48%

RPP=33 0.02% 0.05% 0.4% 2.1%

RPP=500 0.0045% 0.005% 0.15% 0.5%

On SSD the magnitude of shift in selectivity break-even point is significantly higher

So the query optimizer needs to be aware of the impact of parallel I/O

Otherwise, we will end up with non optimal execution plans up to ~20 times worse

than optimal

Takeaway: consider ALTER DATABASE CALIBRATE SERVER

If you know the database will run on one configuration, consider calibrating

If you have little control over disk configuration, default calibration does best we can


Agenda

Review






Cache management




Elements of a Database System

Client

Process

Server

Process DB

Statement, parameters

Status, results

Network

Buffer

Pool


Anatomy of a single statement

Client Server Network

Form SQL Input

Prepare Prepare

Describe Describe

Execute

Execute

… Read results

…

Format Output Output

Close Close

Open


Concurrent requests C1 C2 CN Server


SQL Anywhere Scheduler


Task Scheduling Database server has a worker-per-request

A worker pool and a request queue

Each worker picks and complete one request at a time.

No guarantee that the same worker will service the same connection

A small pool of workers executes requests

Scheduler dynamically assigns work across workers - Cooperative multitasking

Unassigned requests wait for an available thread

Server supports dynamic intra-query parallelism

Degree of parallelism varies based on available resources

Pool size establishes the multiprogramming level

SA Default: 20

Priorities can be set on connections

Adjusts the number of time slices that any given request will get


Worker-per-Request Architecture

How to choose the size of the worker pool?

A large worker pool:

Increases the concurrency level of the server

Increases contention on server resources

Increases working set size of server

A smaller worker pool:

Under utilization of hardware resources

Limit concurrency level of workload

Possibility of a server hang due to no workers available to handle outstanding

requests


Dynamic Worker Pool Management

Dynamically adjust the size of the worker pool

Based on workload throughput monitoring and number of requests pending

Benefits of dynamic MPL:

One less parameter for DBAs to worry about

Improve server throughput for different workloads

Better handling of changes in workload transaction mix


Agenda

Review






Cache management




Dynamic Memory Management

A SQL Anywhere server will grow and shrink the buffer pool as necessary to accommodate both

Database server load

Physical memory requirements of other applications

Enabled by default on all supported platforms

User can set lower, upper bounds, initial size


Dynamic Memory Management –

Adjust Buffer Pool Size Basic idea: match buffer pool size to SA's working set as determined

by the operating system plus the OS free pool • Feedback control loop

Buffer Pool Governor Buffer Pool Manager

New Buffer

Pool Size

Operating

System

Buffer Miss

Rate

OS Working

Set Size

Adjusted

Memory

Target

Grow/Shrink

Amount

Amount of Free Physical Memory

Database file sizes


SQL Anywhere Memory Management

Single heterogeneous buffer pool with few predefined limits

Buffer pool comprises

• Table and index data pages

• Checkpoint log pages

• Bitmap pages

• Heap pages (data structures for query execution plans, optimization graphs,

connection structures, stored procedures, triggers)

• Free (available) pages

All page frames are the same size

Fully contained memory manager

• Self managed memory foot-print


Cache Warming

Startup cache warming

Record the pages referenced during the “startup period”

Read these pages in on future startups

Meant to quickly load data needed for the first few requests

Steady state cache warming

Record an approximation of the steady state of the cache

After startup warming is done, in the background load up pages expected to be needed

Should be included in V17


Cache contents estimation

Every table and index maintains a count of pages currently in cache • This is incremented/decremented when pages are read/evicted

The cost model estimates how many disk reads are needed • Estimates the number of distinct pages referenced by the plan

• Estimates how many are likely already in the buffer pool

• Estimates how many of those read multiple times will remain in the buffer pool

Takeaway: Consider buffer pool contents when evaluating performance • Consider flushing or warming cache before experiments to stabilize state


Agenda

Review






Cache management




SQL Anywhere Query Optimizer

SA optimizes requests each time they are executed

Takes into account server context

Optimization process includes both heuristic and cost-based rewrites

No hard limits – tested with 500 quantifiers in a single block

Advantages

Plans are responsive to server environment, buffer pool contents/size, data skew

No need to administer ‘packages’ (pre-optimized SQL)

Optimization effort adapts to expected query cost and benefit of optimization

Simple statements bypass optimizer

Cheap but complex statements use plan cache

Optimizer considers multiple join enumeration approaches depending on expected benefit


Bypassing the Query Optimizer

Single-table queries without “complications” bypass the optimizer:

If they have a specific form (select * from T where pk = value), use a single “bypass

cache” plan

If there is only one reasonable plan (WHERE clause specifies a unique row), bypass

heuristic

Otherwise, “bypass costed” compares alternative indexes and sequential scan

A subset are “bypass costed simple” where we can skip trying predicate optimizations or

semantic transforms

If the bypass optimizer finds a plan > 5 seconds, it re-optimizes with full optimizer


Plan Caching and Auto-Parameterization

Access plans for queries in stored procedures/triggers/events are cached

and reused for future executions

Plans undergo a ‘training period’ where plan variance is determined

If no variance (even without variable values), plan is cached and reused

Query is periodically re-optimized on a logarithmic scale to ensure plan does not

become sub-optimal

Improvements in V16 and V17 avoid plan caching when it degrades performance

Takeaway: Do not set max_plans_cached=0


Adaptive Query Processing

Alternative access plans can be executed if actual intermediate result sizes are poorly estimated Server switches to alternative plan automatically at run time

Low-memory strategies used when buffer pool utilization is high

Parallelize access plan when doing so is advantageous The degree of parallelism is determined based on cost during enumeration process

Work is partitioned independently of worker pool size Plans are largely self-tuning with respect to degree of parallelism

Prevents starvation of query fragments when the number of available workers is less than optimal for some period


Agenda

Review






Cache management




Automatic Statistics Management

Self-tuning column histograms

On both base and temporary tables

Statistics are updated on-the-fly automatically

Join histograms built for intermediate result analysis during an optimization process

Not persisted

Server maintains persistent index statistics in real-time

Index sampling during optimization

If there is no histogram or it reports “no confidence”

If there is an index with two or more predicates covered (better than combining single-column estimates)


Column Histograms

Updated in real-time with the results of predicate evaluation and update

DML statements

By default, statistics are computed during the execution of every DML

request

Histograms computed automatically on LOAD TABLE or CREATE INDEX

statements

Can be created/dropped explicitly if necessary

Retained by default across unload/reload


Motivation for Self Healing Statistics

Quality of self tuned statistics can degrade arbitrarily

Can get out of sync in the face of rollbacks

Statistics generation looks at data once, out of order

Goal is not to be perfect with self tuning

Can get out of sync in the face of severe data skew

Self-tuning may not be able to “keep up” on busy servers

The system needs to monitor and correct itself


Self Healing Statistics

An internal system of background server processes

Low overhead to the engine and query execution

Statistics Governor

Categorize and record estimation errors during QP

Self-monitors “quality” of statistics as they are used

Self-heals “poor” statistics

Removes “bad” statistics


SQL Anywhere Solution

Statistics Flusher

Unloads unused statistics from memory

Advises on the health of column statistics

Advises on column statistics usage

Advises on whether to create or drop statistics

Runs every 30 minutes

Statistics Cleaner

Triggered by the flusher process to fix statistics that cannot be fixed otherwise

Keeps track of the table IDs where bad statistics is found

Runs with background priority


Fixing Statistics Several methods used for automatically improving quality of statistics

Piggyback off user queries Exploit access plans that see a large portion of the table Perform in-line statistics collection during query execution

Replace or fix in-situ

Recreate from indexes Fallback mechanism for piggybacking Use a shallow index scan to recreate histogram

Perform a sampled table scan If the table column does not have an index, then we must scan the table to get the statistics

Read a random sample of small number of table pages

Detect pathological situations and prevent self-healing or, even, drop histograms


Agenda

Review






Cache management



Conclusion


Conclusions How fast is SQL Anywhere?

“It depends” is the right answer!

The optimizer is co-ordinating changing data from multiple sources in real-

time in order to provide/maintain the best performance it can at that

point in time

But it is not perfect!

Specific Takeaways

Consider ALTER DATABASE CALIBRATE SERVER

Consider buffer pool contents when evaluating performance

Do not set max_plans_cached=0


Questions?

Jason Hinsperger

[email protected]

Technology

An In-Depth Look at SAP SQL Anywhere Performance Features