Upload
jins0618
View
61
Download
0
Embed Size (px)
Citation preview
Experimental Methods on
Performance in Clouds, …
Calton Pu
CERCS and School of Computer Science
Georgia Institute of Technology
1
2
Ancestors of Clouds (Hardware)
Data processing centers (~1960s)
Supercomputers, Grids (~1970s)
P2P, SETI@Home (~1999), botnets
Utility computing and data centers (~2000s)
Modern Clouds
Amazon data centers in early 2000s was only at 10%
capacity – introduction of Amazon Web Service (AWS)
in 2006
2007 – Google & IBM join Cloud Computing research
(NSF), Microsoft joins in 2010, NSFCloud in 2014
3
Cloud & Big Data (company)
4
Google Inc
Third market cap in the world ($390B in 2014)
Probably more data than anyone else
13 declared
data centers
around the
world; drawing
260MW in 2011
(2,259,998
MWh total).
Cloud & Big Data (government)
5
NSA (maybe more than Google)
Utah Data Center, drawing 65MW (about half of
Salt Lake City)
Cloud Service Models
Software as a Service (SaaS) [not covered] Use provider’s applications over a network
Example: Salesforce.com
Platform as a Service (PaaS) [not covered] Use system-level services (e.g., database) to
develop and deploy customer applications
Example: Google App Engine, MS Azure
Infrastructure as a Service (IaaS) Rent processing, storage, network
Example: Amazon EC2, Emulab
8
Amazon EC2 (circa 2010)
Elastic Block Store, CloudWatch,
Automated Scaling
9
Instance Type Memory Compute
(1GHz virt)
Local
Storage
Data Price
(hour)
Std Small 1.7 GB 1 X 1 160 GB 32b $0.085
Std Large 7.5 GB 2 X 2 850 GB 64-bit $0.34
Std X-Large 15.0 GB 4 X 2 1.7 TB 64-bit $0.68
High Memory X-L 17.1 GB 2 X 3.25 420 GB 64-bit $0.50
High Memory DB X-L 34.2 GB 4 X 3.25 850 GB 64-bit $1.00
High Memory QD X-L 68.4 GB 8 X 3.25 1.7 TB 64-bit $2.00
High CPU Medium 1.7 GB 2 X 2.5 350 GB 32-bit $0.17
High CPU X-L 7.0 GB 8 X 2.5 1.7 TB 64-bit $0.68
Cluster Compute X-L 23 GB 33.5 1.7 TB 64-bit $1.60
AWS Free Tier
New AWS customers can get started with Amazon EC2 for free. Each month for 1 year: 750 hours of EC2 running Linux, RHEL, or SLES
t2.micro
750 hours of EC2 running MS Windows Server t2.micro
750 hours of Elastic Load Balancing plus 15 GB data
30 GB of Amazon Elastic Block Storage in any combination of General Purpose (SSD) or Magnetic, plus 2 million I/Os and 1 GB of snapshot storage
15 GB of bandwidth out
1 GB of Regional Data Transfer
10
Resources Available for
Experiments
Our own cluster (about 50 nodes)
GT/CERCS cluster (about 800 nodes)
Emulab (Utah), PROBE (CMU)
A few hundreds nodes, a few dozen available
CloudLab replaces Emulab (May 2015)
Other partner clusters in companies and
universities
11
Challenges in Cloud Adoption
From user’s point of view
Data security/privacy (in a public cloud)
Performance concerns
From provider’s point of view
Up-front hardware costs high, rapid aging
Hardware capacity generally under-utilized
Low scalability of most enterprise applications
Negotiating SLA contracts and price structures
12
Cloud Management Challenge
High utilization brings higher ROI
Achievable by predictable/stationary workloads
Mission-critical applications need SLA
Resource Utilization Paradox
Good ROI requires high utilization (many
papers on consolidation claim >90% utilization)
Consistent reports of 18% average utilization
Cloud management is more challenging
than we hoped initially
13
Representative Cloud Workloads
Cloud workload – amount of processing that a cloud has to do at a given time
Use workloads to test a particular type of application
Types of workloads: E-commerce
OLTP
Forum/Message board
Web 2.0 application
MapReduce
14
15
Example 1: RUBiS Benchmark
E-commerce applications (eBay auctions)
N-tier (3 or more tiers): web servers, application
servers, database servers
26 web interactions, requiring sophisticated
models, e.g., Layered Queuing Network Models
16
Typical Execution Environment
Client Browsers Tomcat Servlet Engines
MySQLDB Servers
Apache Web Servers
HTTP
AJP13 JDBC
Hardware resource
Xen Hypervisor
D0 VM1 VM2 VM3
Hardware resource
Host OS
Hypervisor
Hardware resource
Host OS
Hypervisor
Virtual
Mgm.
Inferface
17
Meta-Model of RUBiS
Layered Queuing Network Model of RUBiS(3-Tier): one for each of 26 interactions; total of 78 sub-models
18
Web Server Sub-Model
3-tier: simplest implementation of RUBiS
AboutMe (1 of 26), customized for 3-tier
Challenges in Modeling
Layered Queuing Network Models become
very complex even for “simple” n-tier
applications
Experiments are needed anyway
Setting the values for various sub-models
Need detailed experiments for a variety of
configurations
Let’s try “pure” experiments
19
20
Example 2: RUBBoS Benchmark
Another e-commerce workload
Bulletin Board (Slashdot)
DB server bottleneck, C-JDBC as load
balancer
24 web interactions
Configuration notation: 1-2-1-9
1 web server, 2 app servers, 1 C-JDBC server,
9 DB servers
Emulab (a relatively modest testbed)
24
Low endDB server
Experiment Design
Web
server1
App
server1-3
C-
JDBC1
DB
server1-9
MySQL
Browse-only
Read-Write
Wait-1st
Wait-All
PostgreSQL
Browse-only
Read-Write
Wait-1st
Wait-All
NormalDB server
25
MySQL Throughput (Low-Cost)
Better scalability (different query processing strategies)
MySQL
Browse-only
Read-Write
Wait-1st
Wait-All
PostgreSQL
Browse-only
Read-Write
Wait-1st
Wait-All
DB server
What’s Different about Clouds (1)
Traditional benchmarks
Static configuration: HW,
SW, workload range
Find the “best tuning” to
achieve highest
throughput
Cloud benchmarks
Dynamic and many
configurations
Find representative
throughput and response
time for each
configuration
(reproducible results by
other users)
26
27
MySQL Throughput for R/W
Mix (read one, write all)
0
100
200
300
400
500
600
Th
rou
gh
pu
t (o
ps/s
)
Workload
1-1-1-8ML
1-2-1-4ML
1-2-1-5ML
1-2-1-6ML
1-2-1-7ML
1-2-1-9ML
1-3-1-9ML
MySQL
Browse-only
Read-Write
Wait-1st
Wait-All
PostgreSQL
Browse-only
Read-Write
Wait-1st
Wait-All
DB server
28
1-2-1-9ML Configuration Data
Clear bottleneck indicated by leveled
performance (previous slide)
All high workloads (more than 4000)
Same leveling for other configurations
Average resource consumption on DB
servers quite low (CPU and disk I/O)
MySQL
Browse-only
Read-Write
Wait-1st
Wait-All
PostgreSQL
Browse-only
Read-Write
Wait-1st
Wait-All
DB server
29
Web Server CPU Utilization
(1-2-1-9ML)
MySQL
Browse-only
Read-Write
Wait-1st
Wait-All
PostgreSQL
Browse-only
Read-Write
Wait-1st
Wait-All
DB server
30
Application Server CPU
Utilization (one of 1-2-1-9ML)
MySQL
Browse-only
Read-Write
Wait-1st
Wait-All
PostgreSQL
Browse-only
Read-Write
Wait-1st
Wait-All
DB server
31
C-JDBC Server CPU Utilization
(1-2-1-9ML)
MySQL
Browse-only
Read-Write
Wait-1st
Wait-All
PostgreSQL
Browse-only
Read-Write
Wait-1st
Wait-All
DB server
32
DB Server CPU Utilization
(one of 1-2-1-9ML)
MySQL
Browse-only
Read-Write
Wait-1st
Wait-All
PostgreSQL
Browse-only
Read-Write
Wait-1st
Wait-All
DB server
33
DB Server Disk I/O Bandwidth
Utilization (one of 1-2-1-9ML)
MySQL
Browse-only
Read-Write
Wait-1st
Wait-All
PostgreSQL
Browse-only
Read-Write
Wait-1st
Wait-All
DB server
34
Observations on 1-2-1-9ML
No CPU bottlenecks anywhere
Disk I/O bandwidth on the DB servers has
a slight peak at the high value spectrum
boundary
An infrequent disk I/O bottleneck, which cannot
explain the observed lack of overall system
performance
MySQL
Browse-only
Read-Write
Wait-1st
Wait-All
PostgreSQL
Browse-only
Read-Write
Wait-1st
Wait-All
DB server
36
Maximum of Disk I/O Bandwidth
Utilization (all of 1-2-1-9ML)
MySQL
Browse-only
Read-Write
Wait-1st
Wait-All
PostgreSQL
Browse-only
Read-Write
Wait-1st
Wait-All
DB server
What’s Different about Clouds (2)
Traditional benchmarks
Balanced configuration:
near-full utilization of all
resources for a stable
workload
Single bottleneck, high
average utilization
Cloud benchmarks
Almost always start all
low utilization
No stable bottlenecks
Often stays with all low
average utilization, but
performance remains low
Found new phenomenon:
Multi-bottlenecks
37
Why Automation?
Traditional benchmarks such as TPC and
SPEC answer the question
For a given hardware/software configuration
and workload, what is the highest achievable
throughput?
In the cloud this become very difficult due
to various dimensions:
Horizontal scalability
Vertical Scalability
Variety of software components
38
Solution: Expertus
A framework for large-scale benchmark
measurements through flexible automation
of experiments.
It creates the scripts through a multi-stage
code generation process.
Easy to plug new benchmarks and clouds
Enables cloud measurements at a scale
that is beyond manual management of
benchmarks
39
Experiment Summary
Over 500 different hardware configurations
(i.e., varying node number and type)
Over 10,000 different software
configurations (i.e., varying software and
software setting).
Over 100,000 computing nodes in various
cloud environments:
Emulab
Amazon EC2
Open Cirrus.40
Many configuration variables
Many applications, clear differences among them
Many cloud offerings, non-obvious differences
Different software/hardware configurations may
produce the same or different results
Experimental setup challenges
Dependencies among components
Systematic search through the potentially large
configuration space
Experimental Challenges
42
43
Elba: Automating Measurements
0
20
40
60
80
100
L/ L
2H/ L
Analyzed
Result
79.515.298.2/97.246.2/36.22H/H
98.221.397.3/98.236.4/46.62H/L
87.25.398.265.3H/H
98.322.398.766.8H/L
78.311.297.999.8L/L
MemoryCPUMemoryCPU
APP ServerDB Server
79.515.298.2/97.246.2/36.22H/H
98.221.397.3/98.236.4/46.62H/L
87.25.398.265.3H/H
98.322.398.766.8H/L
78.311.297.999.8L/L
MemoryCPUMemoryCPU
APP ServerDB Server
Automated, Evolutionary
Staging Cycle
(0) Config. Design
Deployment
Scripts
MuliniTBL
Analyzer
Monitors
App
Staging
Deployment
Workload
Driver
(1) Code Generation / Deployment
System Under TestWorkload Drivers
Monitor
(3) Analyzer
Monitor
Monitor
Monitor
Evaluation / Analysis
(4) Reconfiguration
(2) Execution
Automated
AdaptationBenchmark
specs
Experiment
Spec. Lang.
Adapt.
Cost
Automated Experiment Management Through Extensible, Flexible and modular code
generation
Extensibility Extending the framework to support specification changes,
new benchmarks, computing clouds, and software packages
Flexibility Modification to input configuration or output configuration
without changing the source code of the framework
Modularity Consists of a number of components that may be mixed
and matched in a variety of configurations
44
Benefits of Automation
Abstraction mapping. External forces often drive changes
Standards formulation/adoption
Industry evolution
Internal forces drive changes Goals, functionality refinement
Interoperable heterogeneity. Heterogeneous clouds and applications
Flexible customization. Experiment goals, API changes
45
Code Generation – Key
Challenges
The code generator adopts a compiler approach of multiple serial transformation stages.
One type of transformation at any given stage (e.g., cloud, operating system, application etc…)
The number of stages is determined by the experiment, application, software stack, operating system, and cloud.
At each stage Expertus uses the intermediate XML document created from the previous stage as the input to the current stage.
Expertus Approach
46
Create experiment specification with the application, software packages, cloud and experiments.
Use Expertus and generate scripts.
Platform configuration sets up the target cloud.
Application deployment is to deploy the target application on the configured cloud.
Configure application correctly.
Main script runs the test plan, which in fact consists of multiple iterations.
Upload the resource monitoring and performance data to the data warehouse.
Experiment Automation Process
52
Usability of the Tool. How quickly a user can change an existing specification
to run the same experiment with different settings
Generated Script Types and Magnitude. Depends on the application, software packages,
deployment platform, number of experiments.
Richness of the Tool. Magnitude of completed experiments
Amount of different software packages, clouds, and applications it supports
Extensibility and Flexibility. Supporting new clouds
Supporting new applications
Evaluation Metrics
55
Usability of the Tool. How quickly a user can change an existing specification
to run the same experiment with different settings
Generated Script Types and Magnitude. Depends on the application, software packages,
deployment platform, number of experiments.
Richness of the Tool. Magnitude of completed experiments
Amount of different software packages, clouds, and applications it supports
Extensibility and Flexibility. Supporting new clouds
Supporting new applications
Evaluation Metrics
57
Usability of the Tool. How quickly a user can change an existing specification
to run the same experiment with different settings
Generated Script Types and Magnitude. Depends on the application, software packages,
deployment platform, number of experiments.
Richness of the Tool. Magnitude of completed experiments
Amount of different software packages, clouds, and applications it supports
Extensibility and Flexibility. Supporting new clouds
Supporting new applications
Evaluation Metrics
60
Significant strides towards realizing flexible
and scalable application testing for today’s
complex cloud environments.
Over 500 different hardware configurations.
Over 10, 000 software configurations.
Five clouds (i.e., Emulab, EC2, Open Cirrus,
Georgia Tech cluster, and Wipro)
Three representative applications (RUBBoS,
RUBiS, and CloudStone)
65
Usability
Support new clouds, applications, and software
packages with only a few template line changes.
8.21% of template line changes, to support
Amazon EC2 once we had support for the Emulab
cloud.
Caused a 25.35% change in the generated code
for an application scenario with 18 nodes
Switching from the RUBBoS to the RUBiS
required only a 5.66% template change
66
Flexibility and Extensibility
Example: RUBBoS Benchmark
E-commerce applications (Slashdot
Bulletin Board)
N-tier (3 or more tiers): web servers,
application servers, database servers
26 web interactions, requiring sophisticated
models, e.g., Layered Queuing Network
Models
78
Cloud Evaluation - Overview
Main idea How and where to deploy your enterprise system in what
scenario?
Automated empirical measurement and evaluation of
alternative platforms, configurations, and architectures
for n-tier apps in the cloud
Hardware platforms (IaaS) Amazon EC2, Open Cirrus (HP), and Emulab
System software configurations LAMP, MySQL Cluster (off-the-shelf RDBMS)
Application software E-commerce application benchmarks (RUBBoS)
83
Fast-Forward A Few Years
Using the automated experiment
generation infrastructure, we ran many
thousands of experiments
We found several interesting phenomena
The best: Very Short Bottlenecks that cause
Very Long Response-Time Requests
90
Latency Long Tail Problem
At moderate CPU utilization levels (about
60% at 9000 users), 4% of requests take
several seconds, instead of milliseconds
91
Latency Long Tail: A Serious
Research Challenge
No system resource is near saturation Very Long Response Time (VLRT) requests
start to appear at moderate utilization levels (often at 50% or lower)
VLRT requests themselves are not bugs: They only take milliseconds when run by
themselves
Each run presents different VLRT requests
VLRT requests appear and disappear too quickly for most monitoring tools
92
Big Data & Clouds Need
Automation
Experimental approaches
Often the only choice (modeling too complex)
Abundant resource availability
Many configurations mean many experiments
and measurements
Automated experiment generation,
execution, monitoring, and analysis
Very interesting phenomena found (VSB)
93