Zach Hill, Jie Li, Ming Mao, Arkaitz Ruiz-Alvarez, Marty ...datasys.cs.iit.edu/events/ScienceCloud2010/s01.pdfExpect SQL Azure 2x slowdown SQL Azure scales reasonably well, especially

Zach Hill, Jie Li, Ming Mao, Arkaitz Ruiz-Alvarez, Marty HumphreyDepartment of Computer Science, University of Virginia

Science Cloud Workshop June 21st, 2010 1

The question is not can I build my application for the cloud, it’s how to do it well

How will it perform? Our focus

How do Azure services perform?

Experiments run between November, 2009 and February, 2010



Deploying and Scaling Compute Resources

Storage Services

Direct TCP Communication

Azure SQL Services

Methodology

Application deployed from Azure Blob Storage

Deployment package <5MB

Measure time to start deployment (i.e. 4 small instances.)

Measure time to double instance count

Between Dec 17, 2009 and Jan 09 2010 we ran the experiment 431 times. Failure rate: 2.6%



9.910.6

11.3

13.8

8.99.9

11.0

13.2

0

2

4

6

8

10

12

14

16

Small Medium Large X-Large

Minutes

VM Instance Size

Average Instance Adding Time

Web

Worker


0

5

10

15

20

25

Scale Web Scale Worker Start Web Start Worker

Minutes

Role

Scaling versus Startup for 4 Small Instances

4th VM

3rd VM

2nd VM

1st VM

Deploying a VM takes around 10 minutes—too long?

Adding instances takes much longer than initial deployment—even worse

Larger instance types take longer to start & web roles take longer than worker roles

Not all instances will come online at the same time


Blobs – Large, unstructured storage

Tables – Semi-structured data, queries, updates, inserts, deletes

Queues – FIFO, asynchronous messaging


Large object storage – 50GB or 1TB limit depending on type

Get/Put semantics Performance isolated between blob

containers Methodology:

Get a 1GB blob concurrently with 1 – 192 clients operating on the same blob

Put 1GB blobs concurrently into same container



0

2

4

6

8

10

12

14

1 8 16 32 64 128 192

Per-Client Bandwidth

(MB/s)

Concurrent Instances

Upload

Download

Download more than 2x upload speed Single, small client ~100Mb/s


0

50

100

150

200

250

300

350

400

450

1 8 16 32 64 128 192

MB/s

Concurrency

Cumulative Bandwidth

Download

Upload

Entity, Attribute, Value model Semi-structured, no schema Methodology: Perform 4 primary operations: insert, query, update,

delete

Each client operates on unique entities (rows) within the same shared partition

Insert & Query & Delete: 500 ops/client

Update: 100 ops/client

~220K entities in table for Query, Update, & Delete



0

1

2

3

4

5

6

1 8 16 32 64 128 192

Ave

rag

e O

pe

rati

on

Tim

e (

sec)

Table Update

1K

4K

16K

64K

0

20

40

60

80

100

120

140

160

180

200

1 8 16 32 64 128 192

Ave

rag

e T

ime

(m

s)

Concurrent Instances

Table Performance Using 4KB Entities

Query

Insert

Delete

Passing small (<=8K) messages in a FIFO style

Get, Put, Peek operations

Methodology: Single queue, concurrent clients get/put messages



0

5

10

15

20

25

30

1 2 4 8 16 32 64 128 192

Put Message

512B

1K

4K

8K

0

10

20

30

40

50

1 2 4 8 16 32 64 128 192

Peek Message

0

5

10

15

20

25

30

1 2 4 8 16 32 64 128 192

Get Message

Vertical Axis: Msg/sec

Horizontal Axis: Concurrent Instances

TCP Endpoints allow Worker-to-Worker Role communication directly

Offers a lower latency communication mechanism than message queues

No intermediary required



0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Cu

mu

lati

ve P

erc

en

tag

e

MB/sec

Bandwidth

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

0 20 40 60 80 100

Cu

mu

lati

ve P

erc

en

tag

e

RTT ms

Latency

TCP performance can change dramatically, why?


0

20

40

60

80

100

Time

MB/s

Azure to Azure TCP Bandwidth over Time

Normal SQL Server capabilities (RDBMS) Size limited to <50GB per database Tested with TPC-E benchmark for OLTP

workload

Our .NET implementation of the benchmark

Simulates a brokerage house

Testing DB is 3GB in size



00.05

0.10.15

0.20.25

0.30.35

0.40.45

Avg

. Tra

ns.

Tim

e in

Se

con

ds

Single Thread TPCE Client Local to LAN

Azure to Azure


75%

100%

125%

150%

175%

200%

225%

1 2 4 6 8 16 20 30 40 50 64

Number of Concurrent Threads

Azure Cloud LAN Server

Normalized average TPCE transaction time (only committed transactions)


0%10%20%30%40%50%60%70%80%90%

100%

1 2 4 6 8 16 20 30 40 50

Number of Concurrent Client Threads

Azure Cloud

LAN server

Normalized Percent of Transactions Committed per Client

Thread

Deployment size expected client slowdown and service throughput

Deployment scaling is slower than initial deployment, web roles slower than worker roles, large VMs slower than small VMs

VM deployment can take a long time depending on how many are requested

Distribute blob accesses across as many containers as possible to achieve performance at scale


Access tables by partition and row key. Property filters are slow

Tables scale well for query and insert, but watch out for delete and update – this is expected

Expect SQL Azure 2x slowdown SQL Azure scales reasonably well, especially under

30 or less concurrent clients SQL Azure performance over time: low variability


Documents

Zach Hill, Jie Li, Ming Mao, Arkaitz Ruiz-Alvarez, Marty ...datasys.cs.iit.edu/events/ScienceCloud2010/s01.pdfExpect SQL Azure 2x slowdown SQL Azure scales reasonably well, especially