Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Zach Hill, Jie Li, Ming Mao, Arkaitz Ruiz-Alvarez, Marty HumphreyDepartment of Computer Science, University of Virginia
Science Cloud Workshop June 21st, 2010 1
The question is not can I build my application for the cloud, it’s how to do it well
How will it perform? Our focus
How do Azure services perform?
Experiments run between November, 2009 and February, 2010
Science Cloud Workshop June 21st, 2010 2
Science Cloud Workshop June 21st, 2010 3
Deploying and Scaling Compute Resources
Storage Services
Direct TCP Communication
Azure SQL Services
Methodology
Application deployed from Azure Blob Storage
Deployment package <5MB
Measure time to start deployment (i.e. 4 small instances.)
Measure time to double instance count
Between Dec 17, 2009 and Jan 09 2010 we ran the experiment 431 times. Failure rate: 2.6%
Science Cloud Workshop June 21st, 2010 4
Science Cloud Workshop June 21st, 2010 5
9.910.6
11.3
13.8
8.99.9
11.0
13.2
0
2
4
6
8
10
12
14
16
Small Medium Large X-Large
Minutes
VM Instance Size
Average Instance Adding Time
Web
Worker
Science Cloud Workshop June 21st, 2010 6
0
5
10
15
20
25
Scale Web Scale Worker Start Web Start Worker
Minutes
Role
Scaling versus Startup for 4 Small Instances
4th VM
3rd VM
2nd VM
1st VM
Deploying a VM takes around 10 minutes—too long?
Adding instances takes much longer than initial deployment—even worse
Larger instance types take longer to start & web roles take longer than worker roles
Not all instances will come online at the same time
Science Cloud Workshop June 21st, 2010 7
Blobs – Large, unstructured storage
Tables – Semi-structured data, queries, updates, inserts, deletes
Queues – FIFO, asynchronous messaging
Science Cloud Workshop June 21st, 2010 8
Large object storage – 50GB or 1TB limit depending on type
Get/Put semantics Performance isolated between blob
containers Methodology:
Get a 1GB blob concurrently with 1 – 192 clients operating on the same blob
Put 1GB blobs concurrently into same container
Science Cloud Workshop June 21st, 2010 9
Science Cloud Workshop June 21st, 2010 10
0
2
4
6
8
10
12
14
1 8 16 32 64 128 192
Per-Client Bandwidth
(MB/s)
Concurrent Instances
Upload
Download
Download more than 2x upload speed Single, small client ~100Mb/s
Science Cloud Workshop June 21st, 2010 11
0
50
100
150
200
250
300
350
400
450
1 8 16 32 64 128 192
MB/s
Concurrency
Cumulative Bandwidth
Download
Upload
Entity, Attribute, Value model Semi-structured, no schema Methodology: Perform 4 primary operations: insert, query, update,
delete
Each client operates on unique entities (rows) within the same shared partition
Insert & Query & Delete: 500 ops/client
Update: 100 ops/client
~220K entities in table for Query, Update, & Delete
Science Cloud Workshop June 21st, 2010 12
Science Cloud Workshop June 21st, 2010 13
0
1
2
3
4
5
6
1 8 16 32 64 128 192
Ave
rag
e O
pe
rati
on
Tim
e (
sec)
Table Update
1K
4K
16K
64K
0
20
40
60
80
100
120
140
160
180
200
1 8 16 32 64 128 192
Ave
rag
e T
ime
(m
s)
Concurrent Instances
Table Performance Using 4KB Entities
Query
Insert
Delete
Passing small (<=8K) messages in a FIFO style
Get, Put, Peek operations
Methodology: Single queue, concurrent clients get/put messages
Science Cloud Workshop June 21st, 2010 14
Science Cloud Workshop June 21st, 2010 15
0
5
10
15
20
25
30
1 2 4 8 16 32 64 128 192
Put Message
512B
1K
4K
8K
0
10
20
30
40
50
1 2 4 8 16 32 64 128 192
Peek Message
0
5
10
15
20
25
30
1 2 4 8 16 32 64 128 192
Get Message
Vertical Axis: Msg/sec
Horizontal Axis: Concurrent Instances
TCP Endpoints allow Worker-to-Worker Role communication directly
Offers a lower latency communication mechanism than message queues
No intermediary required
Science Cloud Workshop June 21st, 2010 16
Science Cloud Workshop June 21st, 2010 17
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Cu
mu
lati
ve P
erc
en
tag
e
MB/sec
Bandwidth
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0 20 40 60 80 100
Cu
mu
lati
ve P
erc
en
tag
e
RTT ms
Latency
TCP performance can change dramatically, why?
Science Cloud Workshop June 21st, 2010 18
0
20
40
60
80
100
Time
MB/s
Azure to Azure TCP Bandwidth over Time
Normal SQL Server capabilities (RDBMS) Size limited to <50GB per database Tested with TPC-E benchmark for OLTP
workload
Our .NET implementation of the benchmark
Simulates a brokerage house
Testing DB is 3GB in size
Science Cloud Workshop June 21st, 2010 19
Science Cloud Workshop June 21st, 2010 20
00.05
0.10.15
0.20.25
0.30.35
0.40.45
Avg
. Tra
ns.
Tim
e in
Se
con
ds
Single Thread TPCE Client Local to LAN
Azure to Azure
Science Cloud Workshop June 21st, 2010 21
75%
100%
125%
150%
175%
200%
225%
1 2 4 6 8 16 20 30 40 50 64
Number of Concurrent Threads
Azure Cloud LAN Server
Normalized average TPCE transaction time (only committed transactions)
Science Cloud Workshop June 21st, 2010 22
0%10%20%30%40%50%60%70%80%90%
100%
1 2 4 6 8 16 20 30 40 50
Number of Concurrent Client Threads
Azure Cloud
LAN server
Normalized Percent of Transactions Committed per Client
Thread
Deployment size expected client slowdown and service throughput
Deployment scaling is slower than initial deployment, web roles slower than worker roles, large VMs slower than small VMs
VM deployment can take a long time depending on how many are requested
Distribute blob accesses across as many containers as possible to achieve performance at scale
Science Cloud Workshop June 21st, 2010 23
Access tables by partition and row key. Property filters are slow
Tables scale well for query and insert, but watch out for delete and update – this is expected
Expect SQL Azure 2x slowdown SQL Azure scales reasonably well, especially under
30 or less concurrent clients SQL Azure performance over time: low variability
Science Cloud Workshop June 21st, 2010 24