84
Analyzing Performance in the Cloud solving an elastic problem with a scientific approach Nicholas Wakou (Dell EMC), Alex Krzos (Red Hat) Thursday, October 27, 2016 Barcelona Openstack Summit 2016

Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

Analyzing Performance in the Cloudsolving an elastic problem with a scientific approach

Nicholas Wakou (Dell EMC), Alex Krzos (Red Hat)Thursday, October 27, 2016Barcelona Openstack Summit 2016

Page 3: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

Agenda

➢ CLOUD DEFINITION & CHARACTERISTICS

➢ PERFORMANCE MEASURING TOOLS

➢ SPEC CLOUD IaaS 2016 BENCHMARK

➢ PERFORMANCE MONITORING TOOLS

➢ PERFORMANCE CHARACTERIZATION

➢ TUNING TIPS

Page 4: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

CLOUD DEFINITION & CHARACTERISTICS

Page 5: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

DEFINING A CLOUD

NIST SPECIAL PUBLICATION 800-145

Cloud computing is a model for enabling ubiquitous,convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidlyprovisioned and released with minimal management effort or service provider interaction.

http://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-145.pdf

Page 6: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

CLOUD CHARACTERISTICS

Page 7: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

PERFORMANCE MEASURING TOOLS

Page 8: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

RALLYOpenStack Benchmarking Tool

➢ as-an-App and as-a-Service➢ Verification➢ Benchmarking➢ Profiling➢ Reports➢ SLAs for Benchmarks➢ Many plugins

Source: What is Rally?, https://rally.readthedocs.io/en/latest/

Page 9: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

PERFKIT BENCHMARKER

Source: Introduction to Perfkit Benchmark and How to Extend it, https://github.com/GoogleCloudPlatform/PerfKitBenchmarker/wiki/ Tech-Talks

OpenSource Living Benchmarking Framework containing a set of Benchmarks used to compare cloud offerings/environments

➢ 10+ Cloud Providers/Environments➢ 34+ Benchmarks➢ Large Community Involvement➢ Capture Cloud Elasticity with Benchmark

Results➢ Use Cloud/Environment CLI Tooling➢ Publish Results to BigQuery for Comparison

Page 10: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

PERFKIT EXPLORER

Source: https://github.com/GoogleCloudPlatform/PerfKitExplorer

Dashboarding and Performance Analysis Tool for PerfKitBenchmarker Results

➢ Multiple Chart Options➢ Uses BigQuery as backend data-store➢ Hosted in Google App Engine

Page 11: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

CLOUDBENCH

➢ Framework that automates cloud-scale evaluation and benchmarking➢ Benchmark Harness

▪ Requests the Cloud Manager to create an instance(s)▪ Submit configuration plan and steps to the Cloud Manager on how the

test will be performed▪ At the end of the test, collect and log applicable performance data and

logs▪ Destroy Instances no longer needed for the test.

Page 12: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

HARNESS AND WORKLOAD CONTROL

Benchmark Harness Cloud SUT

Group of boxes represents an application instance

Benchmark Harness. It comprises of CloudBench (CBTOOL) and baseline/elasticity drivers, and report generators.

For white-box clouds the benchmark harness is outside the SUT. For black-box clouds, it can be in the same location or campus.

Page 13: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

BROWBEAT

Orchestration tool for existing OpenStack Workloads➢ Combines Workloads, Metrics, and Results into single tool➢ Runs Performance Workloads:

➢ Rally - Control Plane➢ Rally Plugins & Rally+pBench Plugins - Control+Data Plane➢ Shaker - Network Data Plane➢ PerfKitBenchmarker - Data Plane + Cloud Elasticity

➢ Provides Performance Infrastructure Installation and Configuration for➢ Carbon/Graphite/Grafana➢ Collectd➢ ELK➢ FluentD

➢ Provides dashboards for Visualizing and Comparing Results and System Performance Metrics

Page 14: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

BROWBEAT - RESULTS

Page 15: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

BROWBEAT - Metrics

Page 16: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

SPEC CLOUD IAAS 2016 BENCHMARK

Page 17: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

SPEC CLOUD IAAS 2016 BENCHMARK

➢ Measures performance of Infrastructure-as-a-Service (IaaS) Clouds➢ Measures both control and data plane

▪ Control: management operations, e.g., Instance provisioning time ▪ Data: virtualization, network performance, runtime performance

➢ Uses workloads that➢ resemble “real” customer applications➢ benchmarks the cloud, not the application

➢ Produces metrics (“elasticity”, “scalability”, “provisioning time”) which allow comparison

SPEC Cloud IaaS Benchmarking : Dell Leads the Way http://en.community.dell.com/techcenter/cloud/b/dell-cloud- blog/archive/2016/06/24/spec-cloud-iaas-benchmarking-dell-leads-the-way

Page 18: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

Scalability and Elasticity AnalogyClimbing a mountain

18

c c

Scal

abilit

y –

conq

uerin

g an

infin

itely

high

mou

ntai

n

{

{

{

{

{

Elasticity – time for each step

IDEALScalability• Mountain: Keep on climbing• Cloud: keep on adding load without errorsElasticity• Mountain: Each step takes identical time• Cloud: performance within limits as load increases

{

{

{

Page 19: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

WHAT IS MEASURED?

➢ Measures the number of AIs that can be loaded onto a Cluster before SLA violations occur

➢ Measures the scalability and elasticity of the Cloud under Test (CuT)

➢ Not a measure of Instance density➢ SPEC Cloud workloads can individually be used to stress the CuT:

▪ KMeans – CPU/Memory ▪ YCSB - IO

Page 20: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

SPEC CLOUD BENCHMARK PHASESBaseline Phase

▪ Determine the results for a single application instance of a workload

▪ AI = stream of 5 runs

KMeans baseline AI

YCSB baseline AI

Elasticity Phase

Determine cloud elasticity andscalability results when multipleworkloads are run

Page 21: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

BENCHMARK STOPPING CONDITIONS

➢ 20% AIs fail to provision➢ 10% AIs have errors in any run➢ Max number of AIs set by Cloud Provider ➢ 50% AIs have QoS violations

▪ KMeans completion time ≤ 3.33x Baseline phase ▪ YCSB Throughput ≥ Baselinethroughput / 3▪ YCSB Read Response Time ≤ 20 x BaselineReadResponse Time ▪ YCSB Insert Response Time ≤ 20 x BaselineInsertResponse Time

Page 22: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

HIGH LEVEL REPORT SUMMARY

Page 23: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

PUBLISHED RESULTS WEBSITE

https://www.spec.org/cloud_iaas2016/results/cloudiaas2016.html

Page 24: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

PERFORMANCE MONITORING TOOLS

Page 25: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

CEILOMETER

Source: http://docs.openstack.org/developer/ceilometer/architecture.html

Another familiar OpenStack project➢ https://wiki.openstack.org/wiki/Telemetry

➢ Goal is to efficiently collect, normalize and transform data produced by OpenStack services

➢ Interacts directly with the OpenStack services through defined interfaces

➢ Applications can leverage Ceilometer to gather OpenStack performance data

Page 26: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

COLLECTD/GRAPHITE/GRAFANA

➢ Collectd➢ Daemon to collect System Performance

Statistics ➢ Plugins for CPU, Memory, Disk,

Network, Process, …➢ Graphite/Carbon

➢ Carbon receives metrics, and flushes them to whisper database files

➢ Graphite is webapp frontend to Carbon➢ Grafana

➢ Visualize metrics from multiple backends.

Page 27: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

GANGLIA Ganglia is a scalable, distributed monitoring system for high-performance computing systems such as Server Nodes, Clusters and Grids.- Relatively easy to

setup- Tracks a lot

hardware-centric metrics

- Low operational burden

Page 28: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

PERFORMANCE CHARACTERIZATION

Page 29: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

PROVISIONING TIME: SPEC CLOUD

➢ The time needed to bring up a new instance, or add more resources (like CPU or storage) to an existing instance ➢ Instance: Time FROM request to create a new instance TO time when the instance responds to a netcat

probe on port 22.➢ Application instance: Time FROM request to create a new instance TO time when the AI reports readiness to

accept client requests.

➢ Provisioning Time Characterization using Baseline phase➢ Increase number of VMs (vary YCSB seeds KMeans and/or Hadoop slaves) and note impact on provisioning

time. ➢ vary instance configuration (flavor)

Page 30: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

PCI-E LimitsFor PCI-E Gen-3 capable slots. http://www.tested.com/tech/457440-theoretical-vs- actual-bandwidth-pci-express-and-thunderbolt/

SAS LimitAn LSI whitepaper, Switched SAS: Sharable, Scalable SAS Infrastructurehttp://www.abacus.cz/prilohy/_5025/5025548/SAS_Switch_White%20Paper_US-EN_092210.pdf

IO LIMITS

Page 31: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

NETWORK/IO CHARACTERIZATION

➢ Understand network utilization under load ➢ Management networks➢ Data networks (Neutron tenant)

➢ Monitor with Ganglia, collectd, Linux tools (vmstat, iostat etc)

➢ SPEC Cloud YCSB Baseline tests – Throughput (ops/s)➢ Vary number of Seeds➢ Increase number of YCSB records and

operations ➢ Increase number of YCSB threads

➢ CloudBench fio➢ CloudBench Netperf

Page 32: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

CPU CHARACTERIZATION

➢ Use SPEC Cloud Baseline tests for CPU Characterization➢ Vary number of hadoopslaves➢ Increase sample size, number of dimensions, number of clusters

➢ Understand CPU utilization under load

➢ Monitor with Ganglia, collectd, graphana

➢ Linux tools (top, vmstat), SPEC Cloud, Kmeans

Note:✓ CPU user time✓ CPU system time✓ CPU iowait time✓ CPU irq time

Page 33: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

SCALABILITY/ELASTICITY

➢Understand Scalability/Elasticity of the CuT ➢ SPEC Cloud Elasticity phase➢ Vary number of AIs➢ Monitor with FDR html report

Page 34: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

TUNING TIPS

Page 35: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

HARDWARE/OS TUNING

➢ Latest BIOS and Firmware revs ➢ Appropriate BIOS settings ➢ RAID/JBOD➢ Disk controller➢ NIC driver- Interrupt coalescing and affinitization ➢ NIC bonding➢ NIC jumbo frames➢ OS configuration settings

Page 36: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

CLOUD TUNING

▪ HW/OS Tuning▪ Cloud Configs/Settings ▪ Workload tuning

Page 37: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

INSTANCE CONFIGURATION

Performance is impacted by▪ Instance type (flavor) ▪ Number of Instances

Page 38: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

OVER-SUBSCRIPTION

Beware of over-subscription !!!

Page 39: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

LOCAL STORAGE

Use of local storage instead of shared storage (like Ceph) could improveperformance by over 50% ... depending on Ceph replication.

Source: OpenStack: Install and configure a storage node - OpenStackkilo.http://docs.OpenStack.org/kilo/install-guide/install/yum/content/cinder-install-storage-node.html (2015)

Page 40: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

NUMA NODES

Pinning instance CPU to physical CPUs (NUMA nodes) on local storage further improves performance.

Source: Red Hat: Cpu pinning and numa topology awareness in OpenStackcompute.http://redhatstackblog.redhat.com/2015/05/05/cpu-pinning-and-numa-topology-awareness-in-OpenStack-compute/ (2015)

Page 41: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

DISK PINNING

Disk Pinning shows a 15% performance improvement

Source: OpenStack: OpenStack cinder multibackend. https://wiki.OpenStack.org/wiki/Cinder-multi-backend (2015)

Page 42: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

WORKER COUNT CONFIGURATIONAllow Services to use available resources with higher concurrency

Examples:Keystone Process CountNeutron WorkersGlance WorkersGnocchi API Workers

Page 43: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

UNEVEN CONTROLLER USAGE

One controller had more cores available than the other two and ended up with all the jobs. This scenario was identified easily because the correct dashboarding was in place.

Page 44: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

HEAT MEMORY USAGE

About 1GB of memory used by Heat for every 10 compute nodes deployed. Size your controller memory appropriately.

Page 45: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

DEPLOYMENT TIMINGSOSPD 9 Overcloud Deployment

Page 46: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

CONCLUSION

Page 47: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

CONCLUSION

➢ Define what you are trying to measure ▪ Define a cloud▪ Define what metrics are important

➢ Use the correct tools ▪ Rally▪ PerfKitBenchmarker▪ Cloudbench▪ SPEC Cloud IaaS 2016 Benchmark ▪ Ceilometer▪ Collectd/Graphite/Grafana ▪ Ganglia▪ Browbeat

➢ Gather and analyze data▪ Apply tuning tips based on the data

PARTICIPATE!

Page 48: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

ADDITIONAL INFORMATION

➢ Guidelines and Considerations for Performance and Scaling your Red Hat Enterprise Linux OpenStack Platform 6 Cloud

▪ https://access.redhat.com/articles/1507893➢ Guidelines and Considerations for Performance and Scaling your Red

Hat Enterprise Linux OpenStack Platform 7 Cloud▪ https://access.redhat.com/articles/2165131

➢ Red Hat OpenStack Blog▪ http://redhatstackblog.redhat.com/

➢ Red Hat Developer Blog▪ http://developerblog.redhat.com/

➢ Red Hat Enterprise Linux Blog▪ http://rhelblog.redhat.com/

Page 49: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used
Page 50: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

Rally

Source: https://github.com/OpenStack/rally/blob/master/doc/source/images/Rally-Actions.png

Page 51: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

Rally

Rally is a familiar OpenStack project

▪ https://github.com/OpenStack/rally

▪ An automated benchmark tool for OpenStack

Benchmarking

▪ Multiple use cases

• Development and QA• DevOps• CI/CD

Page 52: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

PERFKIT BENCHMARKER

Source: Introduction to Perfkit Benchmark and How to Extend it, https://github.com/GoogleCloudPlatform/PerfKitBenchmarker/wiki/ Tech-Talks

Page 53: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

PERFKIT BENCHMARKER

Source: Introduction to Perfkit Benchmark and How to Extend it, https://github.com/GoogleCloudPlatform/PerfKitBenchmarker/wiki/ Tech-Talks

Page 54: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

PERFKIT BENCHMARKER

Source: Introduction to Perfkit Benchmark and How to Extend it, https://github.com/GoogleCloudPlatform/PerfKitBenchmarker/wiki/ Tech-Talks

Page 55: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

PERFKIT BENCHMARKER

Source: Introduction to Perfkit Benchmark and How to Extend it, https://github.com/GoogleCloudPlatform/PerfKitBenchmarker/wiki/ Tech-Talks

Page 56: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

BROWBEAT

➢ Scale and Performance automation ➢ Ansible playbooks for automation➢ Provides automation wrapper around existing tooling

➢ Rally - Control plane tests➢ Shaker - Data plane network tests ➢ Perfkit - Data plane tests

➢ Leverages existing upstream test frameworks rather than replacing them➢ Performance Monitoring

➢ Collectd/Graphite/Grafana➢ Results Capture/Storage/Analytics

➢ ELK stack➢ Allows for results comparison

Page 57: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

BROWBEAT

Page 58: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

BROWBEAT - RESULTS

Page 59: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

COLLECTD/GRAPHITE/GRAFANAExample Grafana dashboards

Page 60: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

UNEVEN CONTROLLER USAGE

About 1GB of memory used by Heat for every 10 compute nodes deployed. Size your controller memory appropriately.

Page 61: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

HEAT MEMORY USAGE

About 1GB of memory used by Heat for every 10 compute nodes deployed. Size your controller memory appropriately.

Page 62: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

DEPLOYMENT TIMINGS

Saw many instance reschedules with default scheduler. Deployment time dropped dramatically by setting up assignments via ironic.

Page 63: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

DEFINING A CLOUD

Cloud = Private

Cloud = OpenStackCloud = Rain

Cloud = CumulusCloud = Public

Cloud = FunnelCloud = OpenShift

Cloud = Community Cloud = Cirrus

Ten different people will probably give you ten different answers

Page 64: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

DEFINING A CLOUD

http://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-145.pdf

NIST SPECIAL PUBLICATION 800-145

Private cloud

The cloud infrastructure is provisioned for exclusive use by a single organizationcomprising multiple consumers (e.g., business units). It may be owned, managed, and operated by the organization, a third party, or some combination of them, and it may exist on or off premises.

Public cloud

The cloud infrastructure is provisioned for open use by the general public. It may beowned, managed, and operated by a business, academic, or government organization, or some combination of them. It exists on the premises of the cloud provider.

Hybrid cloud

The cloud infrastructure is a composition of two or more distinct cloud infrastructures (private, community, or public) that remain unique entities, but are bound together by standardized or proprietary technology that enables data and application portability(e.g., cloud bursting for load balancing between clouds).

Page 65: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

DEFINING A CLOUD

SPEC OSG Cloud Subcommittee Glossary

Blackbox Cloud

A cloud-provider provides a general specification of the SUT, usually in terms of how the cloud consumer may be billed. The exact hardware detailscorresponding to these compute units may not be known. This will typically be the case if the entity benchmarking the cloud is different from a cloudprovider.

Whitebox Cloud

The SUT’s exact engineering specifications including all hardware and software are known and under the control of the tester. This will typically be the casefor private clouds.

Source: https://www.spec.org/cloud_iaas2016/docs/faq/html/glossary.html

Page 66: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

DEFINING A CLOUD

➢ The focus of this presentation will be predominantly on white box private cloud environments

➢ Primary example is OpenStack

➢ Many of the tools and methodologies are usable in the other cloud environments as well

Page 67: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

CLOUD CHARACTERISTICS

Page 68: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

CLOUD CHARACTERISTICS

SPEC RESEARCH GROUP - CLOUD WORKING GROUPhttps://research.spec.org/working-groups/rg-cloud-working- group.html

READY FOR RAIN? A VIEW FROM SPEC RESEARCH ON THE FUTURE OF CLOUD METRICS

https://research.spec.org/fileadmin/user_upload/documents/ rg_cloud/endorsed_publications/SPEC-RG-2016-01_CloudMetrics.pdf

Page 69: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

THE DEGREE TO WHICH A SYSTEM IS ABLE TO ADAPT TO WORKLOAD CHANGES BY PROVISIONING AND DE-PROVISIONING RESOURCES IN AN AUTONOMIC MANNER, SUCH THAT AT EACH POINT IN TIME THE AVAILABLE RESOURCES MATCH THE CURRENT DEMAND AS CLOSELY AS POSSIBLE

Source: READY FOR RAIN? A VIEW FROM SPEC RESEARCH ON THE FUTURE OF CLOUD METRICS, SPEC RG CloudWorking Group

ELASTICITY

Page 70: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

ELASTICITY

THE DEGREE TO WHICH A SYSTEM IS ABLE TO ADAPT TO WORKLOAD CHANGES BY PROVISIONING AND DE-PROVISIONING RESOURCES IN AN AUTONOMIC MANNER, SUCH THAT AT EACH POINT IN TIME THE AVAILABLE RESOURCES MATCH THE CURRENT DEMAND AS CLOSELY AS POSSIBLE

Source: READY FOR RAIN? A VIEW FROM SPEC RESEARCH ON THE FUTURE OF CLOUD METRICS, SPEC RG CloudWorking Group

Page 71: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

ELASTICITY

Page 72: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

ELASTICITY

Source: http://www.today.com/news/remember-stretch-armstrong-how-buy-your-favorite-retro-toys-your-1D8037 7927

HOW FAR WILL HE STRETCH? WILL HE BREAK WHEN STRETCHED?

AS YOU STRETCH HIM DOES IT GET HARDER TO STRETCH HIM MORE?

WHEN I LET GO DOES HE RETURN TO HIS ORIGINAL SHAPE?

HOW LONG DOES HE TAKE TO RETURN TO HIS NORMAL SHAPE?

Page 73: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

RESULTS COMPARED# Submissions 2 Submission ID Dell_12g Dell_13g CommentCloud Type Private / White

boxPrivate / White box

Hardware Platform 12g, 7xR720 Compute nodes

13g, 9xR630 Compute nodes

Job Date 03/05/2016 06/08/2016 Metrics Scalability@AIs 10.3@10 29.5@20 Higher is betterScalability per AI 1.03 1.45 Higher is betterElasticity 63.0% 71.9% Higher is betterInst. Prov. Time (s)

163 135 Lower is better

AI Prov. Success 100% 86.96% Higher is betterAI Run Success 100% 100% Higher is betterTotal Instances 65 131 Higher is betterBaseline Phase YCSB Throughput 13,082.6 17,742.0 Higher is betterKMeans Job time (s)

115.7 109.7 Lower is better

Elasticity Phase YCSB Throughput 9,480.9 14,890.8 Higher is betterKMeans Job time (s)

211.5 186.2 Lower is better

Page 74: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

PERFKIT BENCHMARKER

Source: Introduction to Perfkit Benchmark and How to Extend it, https://github.com/GoogleCloudPlatform/PerfKitBenchmarker/wiki/ Tech-Talks

Page 75: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

BENCHMARK HARNESS

Page 76: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

SURPORTED WORKLOADS

Page 77: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

BROWBEAT

Page 78: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

BROWBEAT

REPEATABLE AUTOMATED TESTING

Page 79: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

PROVISIONING TIME: RALLY

79

Automated VM provisioningNova Success rate

Source: measuring the Cloud Using Rally & CloudBench, Douglas Shakshober, Red Hat Inc.

Page 80: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

PCI-E LimitsFor PCI-E Gen-3 capable slots. (http://www.tested.com/tech/457440-theoretical-vs- actual-bandwidth-pci-express-and-thunderbolt/)

▪ Gen-3 is defined at 8 Gbps; this gives a bandwidth of 8.0 Gb/s (Scrambling +128b/130b encoding instead of 8b/10b encoding) per lane, so for example a PCI-E Gen-3 x8 link delivers an aggregate bandwidth of 8 GB/s

SAS LimitAn LSI whitepaper, Switched SAS: Sharable, Scalable SAS Infrastructure(http://www.abacus.cz/prilohy/_5025/5025548/SAS_Switch_White%20Paper_US-EN_092210.pdf) shows how to calculate the SAS limit of an 8 lane controller port with a SAS bandwidth of 6Gbps:

▪ Vary number of Seeds 6Gb/s x 8 lanes = 48Gb/s per x8 port▪ 48Gb/s (8b/10b encoding) = 4.8GB/sec per port (per node)▪ 4.8GB/s per port x 88.33% (arbitration delays and additional framing) = 4320MB/s per port

IO LIMITS

Page 81: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

CEILOMETER: High-level Architecture

Source: http://docs.OpenStack.org/developer/ceilometer/architecture.html

Page 82: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

CEILOMETER

Another familiar OpenStack project

➢ https://wiki.openstack.org/wiki/Telemetry

➢ Goal is to efficiently collect, normalize and transform data produced by OpenStack services

➢ Interacts directly with the OpenStack services through defined interfaces

➢ Many tools utilize Ceilometer to gather OpenStack performance data

Page 83: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

COLLECTD/GRAPHITE/GRAFANA

➢ Collectd➢ Daemon to collect System Performance Statistics ➢ CPU, Memory, Disk, Network, Process, MariaDB, Load, Logged errors

and more➢ Graphite/Carbon

➢ Carbon receives metrics, and flushes them to whisper database files➢ Graphite is webapp frontend to carbon

➢ Grafana➢ Visualize metrics from multiple backends.

Page 84: Barcelona Openstack Summit 2016 Thursday, October 27, …...Applications can leverage Ceilometer to gather OpenStack performance data ... HEAT MEMORY USAGE About 1GB of memory used

SPEC CLOUD WORKLOADS

YCSB

Framework used by a common set of workloads for evaluatingperformance of different key-value and cloud serving stores.

KMeans

-Hadoop-based CPU intensive workload

-Chose Intel HiBench implementation