26
Technical Report OnCommand Performance Manager Best Practices OnCommand Performance Manager Version 2.0 Bob Allegretti, NetApp Technical Marketing Performance Management August 2015 | TR-4448 Abstract This document describes some best practices when using NetApp ® OnCommand ® Performance Manager for managing NetApp clustered Data ONTAP ® systems.

Technical Report OnCommand Performance Manager Best Practices · 2018-09-01 · Technical Report OnCommand Performance Manager Best Practices OnCommand Performance Manager Version

  • Upload
    others

  • View
    11

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Technical Report OnCommand Performance Manager Best Practices · 2018-09-01 · Technical Report OnCommand Performance Manager Best Practices OnCommand Performance Manager Version

Technical Report

OnCommand Performance Manager

Best Practices OnCommand Performance Manager Version 2.0 Bob Allegretti, NetApp Technical Marketing Performance Management

August 2015 | TR-4448

Abstract

This document describes some best practices when using NetApp® OnCommand

®

Performance Manager for managing NetApp clustered Data ONTAP® systems.

Page 2: Technical Report OnCommand Performance Manager Best Practices · 2018-09-01 · Technical Report OnCommand Performance Manager Best Practices OnCommand Performance Manager Version

2 TR-4448: OnCommand Performance Manager Best Practices © 2015 NetApp, Inc. All rights reserved

Version History

Version Date Document Version History

Version 1.0 August 2015 Initial version, Bob Allegretti

Page 3: Technical Report OnCommand Performance Manager Best Practices · 2018-09-01 · Technical Report OnCommand Performance Manager Best Practices OnCommand Performance Manager Version

3 TR-4448: OnCommand Performance Manager Best Practices © 2015 NetApp, Inc. All rights reserved

TABLE OF CONTENTS

Version History ..............................................................................................................2

1 NetApp OnCommand Performance Manager Overview ........................................5

1.1 What Is OnCommand Performance Manager? ...............................................................................................5

1.2 Benefits ...........................................................................................................................................................6

1.3 What Performance Manager Does ..................................................................................................................6

2 Installation and Setup ...............................................................................................8

3 Architectural Elements .............................................................................................9

3.1 Dashboard ......................................................................................................................................................9

3.2 Performance Visualization ............................................................................................................................ 10

3.3 Policies and Events ....................................................................................................................................... 12

4 Common Use Cases ............................................................................................... 14

4.1 Common Workflows ...................................................................................................................................... 14

4.2 Managing Clusters ........................................................................................................................................ 16

4.3 Managing Objects ......................................................................................................................................... 17

4.4 Correlating Events and Metrics ..................................................................................................................... 21

4.5 Discovering Most Active Volumes Globally ................................................................................................... 22

4.6 View the Effects of Caching .......................................................................................................................... 24

5 OnCommand Performance Manager Key Takeaways ......................................... 25

6 Additional Resources ............................................................................................. 25

7 Contact Us ............................................................................................................... 25

LIST OF FIGURES

Figure 1) NetApp OnCommand portfolio. .......................................................................................................................5

Figure 2) OnCommand Unified Manager dashboard with integrated performance. .......................................................6

Figure 3) Performance Manager data sources administration. .......................................................................................8

Figure 4) Performance Manager maintenance console. .................................................................................................9

Figure 5) Performance Manager dashboard................................................................................................................. 10

Figure 6) Actively alerting cluster on dashboard ........................................................................................................... 10

Figure 7) Performance Manager node object landing page summary. ......................................................................... 11

Figure 8) Performance Explorer comparing volume and aggregate metrics................................................................. 11

Page 4: Technical Report OnCommand Performance Manager Best Practices · 2018-09-01 · Technical Report OnCommand Performance Manager Best Practices OnCommand Performance Manager Version

4 TR-4448: OnCommand Performance Manager Best Practices © 2015 NetApp, Inc. All rights reserved

Figure 9) Threshold policy creation. ............................................................................................................................. 12

Figure 10) Event correlation with metrics chart. ........................................................................................................... 13

Figure 11) User-defined event details. ......................................................................................................................... 14

Figure 12) Cluster landing page (part 1). ...................................................................................................................... 16

Figure 13) Cluster landing page "hot objects" (part 2). ................................................................................................. 17

Figure 14) Volume landing page. ................................................................................................................................. 18

Figure 15) Metrics chart with multiple volumes. ............................................................................................................ 18

Figure 16) Node landing page. ..................................................................................................................................... 19

Figure 17) Metrics chart comparing node utilization within a cluster. ........................................................................... 19

Figure 18) Performance Explorer chart (metrics) selection. ......................................................................................... 20

Figure 19) Metrics chart selection options. ................................................................................................................... 21

Figure 20) Performance Explorer summary chart correlating event with metric. .......................................................... 22

Figure 21) "Cherry point" access to event details. ........................................................................................................ 22

Figure 22) Accessing the storage object inventory. ...................................................................................................... 23

Figure 23) Volume inventory page. .............................................................................................................................. 23

Figure 24) Workload cache efficiency visualization. ..................................................................................................... 24

LIST OF TABLES

Table 1) Performance Manager managed objects and reported metrics. .......................................................................7

Page 5: Technical Report OnCommand Performance Manager Best Practices · 2018-09-01 · Technical Report OnCommand Performance Manager Best Practices OnCommand Performance Manager Version

5 TR-4448: OnCommand Performance Manager Best Practices © 2015 NetApp, Inc. All rights reserved

1 NetApp OnCommand Performance Manager Overview

When running a data center and operating storage systems, various questions come to mind:

What is the performance status of my storage systems?

What needs my attention now?

How is my storage system performing in detail?

Are there trends that might cause future issues?

Am I meeting all critical service-level objectives and agreements?

What is abnormal and why?

Storage performance management is a critical part of data center infrastructure management. The

technical challenges associated with this management can be significant, often leading to additional and

unwelcome operational costs. Thus, employing automated tools to reduce or eliminate those costs is vital.

OnCommand Performance Manager, an integrated component of OnCommand® Unified Manager,

reduces costs by continuously collecting, analyzing, and retaining performance data from the entire

storage environment and facilitating performance visualization.

1.1 What Is OnCommand Performance Manager?

OnCommand®

Performance Manager does much more than simply monitor and plot data graphs. It is a

highly sophisticated performance management tool purpose-built for clustered Data ONTAP®

operating

system. It requires minimal configuration and setup to get started. It collects performance data at five-

minute intervals with minimal operational interference and retains data for 13 months. Performance

Manager eliminates the need for retaining in-house clustered Data ONTAP performance subject-matter

experts. It encapsulates the subject-matter performance expertise required to run a clustered Data

ONTAP storage environment in a single tool included with the storage system. It expertly analyzes

clustered Data ONTAP performance data and generates attention-grabbing events based on user-defined

and system-defined thresholds.

OnCommand Performance Manager is a component of the NetApp OnCommand management portfolio

(see Figure 1) and is integrated with OnCommand Unified Manager (see Figure 2).

Figure 1) NetApp OnCommand portfolio.

Page 6: Technical Report OnCommand Performance Manager Best Practices · 2018-09-01 · Technical Report OnCommand Performance Manager Best Practices OnCommand Performance Manager Version

6 TR-4448: OnCommand Performance Manager Best Practices © 2015 NetApp, Inc. All rights reserved

1.2 Benefits

Performance Manager provides the following benefits:

Enables you to visualize storage performance

Alerts on performance events automatically

Built-in system-defined alerts

Automatically established dynamic alerts

User-defined thresholds alerts

Quickly eliminates storage as the source of a performance issue

These benefits are realized through implementation of the following features:

A comprehensive performance dashboard for the entire storage environment

A landing page for the storage objects of interest (such as cluster, volume, and LUN)

A performance explorer to view charts comparing related objects and metrics

Inventory views of all managed storage objects and events

Customized threshold alert template creation (including combination thresholds)

1.3 What Performance Manager Does

Performance Manager manages performance through collection and derivation of storage objects and

performance metrics. Table 1 shows the complete list of objects and metrics analyzed and displayed.

The definition and discussion of the various storage objects and performance metrics are beyond the

scope of this document. For more information about NetApp clustered Data ONTAP®

operating system

see TR-3982: NetApp Clustered Data ONTAP 8.3 and 8.2.x: an Introduction. More information on

performance fundamentals is in TR-4211: NetApp Storage Performance Primer for Clustered Data

ONTAP.

Figure 2) OnCommand Unified Manager dashboard with integrated performance.

Page 7: Technical Report OnCommand Performance Manager Best Practices · 2018-09-01 · Technical Report OnCommand Performance Manager Best Practices OnCommand Performance Manager Version

7 TR-4448: OnCommand Performance Manager Best Practices © 2015 NetApp, Inc. All rights reserved

Table 1) Performance Manager managed objects and reported metrics.

Object IOPS MBps Latency Utilization

Cache Miss %

Advanced (Read/Write/Other)

IOPS MBps Latency

Cluster ✔ ✔

Node ✔ ✔ ✔ ✔ ✔ ✔ ✔

Aggregate ✔ ✔ ✔ ✔ ✔ ✔ ✔

SVM ✔ ✔ ✔ ✔ ✔ ✔

Volume ✔ ✔ ✔ ✔ ✔ ✔ ✔

LUN ✔ ✔ ✔ ✔ ✔ ✔

Port ✔ ✔

LIF ✔ ✔ ✔

Page 8: Technical Report OnCommand Performance Manager Best Practices · 2018-09-01 · Technical Report OnCommand Performance Manager Best Practices OnCommand Performance Manager Version

8 TR-4448: OnCommand Performance Manager Best Practices © 2015 NetApp, Inc. All rights reserved

2 Installation and Setup

NetApp distributes OnCommand® Performance Manager as part of VMware Virtual Appliance and Red

Hat Enterprise Linux packages. Install the Red Hat Enterprise Linux distribution on a bare-metal system

or as a VMware ESXi or Microsoft Hyper-V virtual machine. For the most up-to-date installation

information, see the NetApp® Product Library for OnCommand Performance Manager.

The installation process sets up the Performance Manager network settings and maintenance user

credentials and starts the server process. When the process is complete, simply point a compatible

browser to the Performance Manager server address to use the initial setup wizard to set up maintenance

user e-mail addresses. You can also use the wizard to set up the Performance Manager AutoSupport™

(ASUP®) tool, and to add clusters. Adding a cluster simply requires the cluster management logical

interface (LIF) and the cluster administrator’s credentials. Performance Manager automatically collects

cluster topology and configuration and then begins the automated processes to collect and retain

performance data, automatically establish dynamic thresholds, and analyze system-defined performance

policies. Users can elect to receive alerts directly from Performance Manager by e-mail and from

OnCommand Unified Manager for centralized alerting. After the initial setup, add additional clusters

through the Administration page under the Data Sources tab (see Figure 3).

Best Practice: Always Use the Performance Manager Unified Manager Connection

Appendix A NetApp recommends using the Unified Manager Connection feature to centralize event

management from a single tool.

As discussed previously, Performance Manager is an integral component of OnCommand Unified

Manager. Although Performance Manager is perfectly capable as a stand-alone server, NetApp

recommends enabling the Unified Manager Connection. Enable the connection through the Performance

Manager maintenance console (see Figure 4). Access the maintenance console from the virtual

machine console or from a Secure Shell session using the Performance Manager server address and

maintenance user credentials.

Figure 3) Performance Manager data sources administration.

Page 9: Technical Report OnCommand Performance Manager Best Practices · 2018-09-01 · Technical Report OnCommand Performance Manager Best Practices OnCommand Performance Manager Version

9 TR-4448: OnCommand Performance Manager Best Practices © 2015 NetApp, Inc. All rights reserved

Once configured, Performance Manager forwards performance events to Unified Manager. Unified

Manager then displays performance events on the dashboard, providing links to Performance Manager

event details, and it retains events in the performance event inventory.

3 Architectural Elements

Performance Manager manages storage performance through three basic architectural elements:

1. Dashboard

2. Performance visualization

3. Events (and alerts)

Performance Manager analyzes the entire storage environment. It lists the clusters and, in some cases,

the objects that need immediate attention on the dashboard. View more details navigating between

storage objects and displaying performance metrics collected over time. Performance Manager has the

capability to generate customized alerts and events based on user-defined policies specific to application

environments.

3.1 Dashboard

Ideally, storage environments manage themselves without any human intervention. The next best option

is a tool indicating what demands attention now. Performance Manager does this by composing a cluster

dashboard that first lists clusters that are of the most interest. The order of precedence is first, clusters

that can’t be reached; second, actively alerting clusters; and, last, most active clusters. The dashboard

also presents other high-level information such as key performance metrics, use of the most active

resources, and simple color-coded alert status indicators.

The example dashboard in Figure 5 shows three clusters. In this figure you see that there are no active

alerts because all color-coded status indicators are green and that the cluster called ontaptme-fc-cluster is most active at 7,274 IOPS. You also see that the most active disk aggregate runs at 21% busy and the most active node operates at 45% utilization.

Figure 4) Performance Manager maintenance console.

Page 10: Technical Report OnCommand Performance Manager Best Practices · 2018-09-01 · Technical Report OnCommand Performance Manager Best Practices OnCommand Performance Manager Version

10 TR-4448: OnCommand Performance Manager Best Practices © 2015 NetApp, Inc. All rights reserved

When new alerts are present, the color-coded status indicators change to red. For example, the cluster

status shown in Figure 6 indicates that there are an active volume and a node alert because of a latency

threshold breach.

3.2 Performance Visualization

The visualization of storage performance makes it possible for you to see how storage metrics and

objects relate to each other, observe how relationships change over time, and identify trends to avoid

potential performance issues. Performance Manager accomplishes this task through the storage object

landing page and the performance explorer showing various customizable views.

Figure 6) Actively alerting cluster on dashboard

Figure 5) Performance Manager dashboard.

Page 11: Technical Report OnCommand Performance Manager Best Practices · 2018-09-01 · Technical Report OnCommand Performance Manager Best Practices OnCommand Performance Manager Version

11 TR-4448: OnCommand Performance Manager Best Practices © 2015 NetApp, Inc. All rights reserved

Storage Object Landing Page

The Performance Manager object landing page focuses on a specific storage object in which summary

and detailed performance information is presented. All storage objects have a landing page and have a

similar look and feel. Obviously not all objects are the same, so a cluster landing page has slightly

different metrics and views than a volume landing page. However, all landing pages summarize key high-

level metrics and categorize events for a given storage object over the prior three days. For example, Figure 7 shows a cluster node object landing page in which summary metrics charts display latency,

IOPS, MBps, and utilization. Below each of the metrics charts, links appear to any new and obsolete

events for the given metric and object pair. In addition to object-specific information, the landing page

provides access to the performance explorer through which you can observe interactions with other

objects.

Performance Explorer

Performance Explorer is a modular component of Performance Manager common to all object landing

Figure 7) Performance Manager node object landing page summary.

Figure 8) Performance Explorer comparing volume and aggregate metrics.

Page 12: Technical Report OnCommand Performance Manager Best Practices · 2018-09-01 · Technical Report OnCommand Performance Manager Best Practices OnCommand Performance Manager Version

12 TR-4448: OnCommand Performance Manager Best Practices © 2015 NetApp, Inc. All rights reserved

pages. Its design provides guided exploration, encouraging navigation of related performance objects and

building charts selected from a menu of metrics.

Performance Explorer permits the selection of time ranges over the entire 13 months of retained data: 30

days at 5-minute granularity and 12 months at 1-hour granularity. Several predefined time ranges are

available, such as last 24 hours and last 30 days. However, you can select any time range over all the

retained data at one-hour granularity.

Performance Explorer lets users navigate object relationships and select objects for display using metrics

charts. You can navigate from the current object to the parent container, to a sibling, and to child objects.

For example, Figure 8 shows Performance Explorer building charts that compare latency and IOPS over

the last 24 hours for both volume and aggregate objects. On the left side of the page you can see that

you can easily add additional objects to the same charts.

Performance Manager supports nine metrics in total, including IOPS, MBps, latency, cache miss ratio,

and utilization. Metrics charts with multiple objects are stacked, effectively overlaying them in time to

visually correlate alert events and performance data.

When more detail or higher resolution is needed, you can launch a full-page detailed view of any single

chart from Performance Explorer (see the upper-left corner labeled “Details” in Figure 8).

3.3 Policies and Events

Data centers contain capital equipment shared by numerous business entities. To make sure that these

resources are consumed in alignment with business priorities, service guarantees are put in place. This

fact ultimately leads to the establishment of business policies that are translated to service-level

agreements. To assure service-level objectives and codify critical business policies, Performance

Manager provides threshold policies.

Threshold Policy

A threshold policy template reflects business priorities, storage provisioning, and service-level objectives.

For example, a storage system provisioned for a business-critical workload has service-level objectives in

which latency must not exceed 20ms. A storage volume is provisioned for this workload and Performance

Figure 9) Threshold policy creation.

Page 13: Technical Report OnCommand Performance Manager Best Practices · 2018-09-01 · Technical Report OnCommand Performance Manager Best Practices OnCommand Performance Manager Version

13 TR-4448: OnCommand Performance Manager Best Practices © 2015 NetApp, Inc. All rights reserved

Manager is set up to continuously analyze volume performance and send an alert when the service-level

objective is violated.

To accomplish this goal, threshold policy templates are created and applied to one or more objects. The

following attributes are defined when a policy threshold is created:

Object type

Metrics

Warning threshold

Critical threshold

Duration

The object type and metrics define the storage object and metrics pair that are the target of the policy.

When creating a volume storage object policy, compound metrics are permitted when used in

combination with latency metrics. For example, during off hours when load significantly decreases,

latency calculation can skew artificially high and cause false alerts to be sent. Thus, compound metrics

are analyzed when both latency and IOPS must exceed given thresholds (see Figure 9) to generate an

event.

The policy definition has both warning and critical threshold levels. When either of these thresholds is

breached for the given duration, event alerts are created. The warning threshold level is useful because it

can trigger an early investigation before an issue becomes critical.

A threshold policy is a template and is not of use until applied to one or more storage objects. Threshold

policy templates are applied to objects from the object inventory views (not shown here).

Events

When threshold policies are violated, events are generated. Events remain in the new state until the

alerting condition subsides; the events then transition to an obsolete state. Performance Manager always

posts events to the dashboard and retains a record in the event inventory page. In addition, events can

Figure 10) Event correlation with metrics chart.

Page 14: Technical Report OnCommand Performance Manager Best Practices · 2018-09-01 · Technical Report OnCommand Performance Manager Best Practices OnCommand Performance Manager Version

14 TR-4448: OnCommand Performance Manager Best Practices © 2015 NetApp, Inc. All rights reserved

appear on the OnCommand® Unified Manager dashboard and cause an e-mail alert to be sent to an

administrator.

There are two types of events: system and user-defined. Performance Manager has built-in system event

generation through which NetApp engineering establishes thresholds. These include thresholds associated with the internal operations of the system, such as node busy conditions, file system layout

factors, and disk utilization. Take system events seriously and act on them. User-defined events result

from the violation of a user-created threshold policy.

Events appear in chart timelines such that they can be visually correlated with all other metrics. This

correlation is helpful in confirming expected correlations or in discovering unexpected correlations between object resource consumption and the triggering of an event. In Figure 10 you can see that a

critical system event is correlated with aggregate utilization (red line) crossing 50% utilization (note that

the chart key is not shown to conserve space). In the on-screen display, the user can access additional

event details by hovering a mouse over the red dot.

When an event is generated, the following attributes are recorded:

Status: warning or critical

Type: system or user-defined

State: new or obsolete

Duration: how long the alerting condition lasted

Associated storage object

Description: why the event occurred

Figure 11 shows details from a user-defined CRITICAL event in which volume latency exceeds a 5.00

ms/op threshold setting for 3 hours and 10 minutes.

4 Common Use Cases

As it is for many management tools, the number of potential use cases is nearly infinite. This section

highlights some of the common procedures for using Performance Manager.

4.1 Common Workflows

The main purpose of storage performance management is to confirm that storage systems are operating

as expected. Performance Manager is an excellent tool to aid in establishing expectations after

Figure 11) User-defined event details.

Page 15: Technical Report OnCommand Performance Manager Best Practices · 2018-09-01 · Technical Report OnCommand Performance Manager Best Practices OnCommand Performance Manager Version

15 TR-4448: OnCommand Performance Manager Best Practices © 2015 NetApp, Inc. All rights reserved

deployment and to help meet expectations going forward. Thus, the use cases discussed here center

around these objectives.

Inverse Performance Troubleshooting

Modern data centers house many applications critical to business operations. End users of these

applications experience and, subsequently, frequently report performance incidents. Thus, a common

workflow does not necessarily address storage performance issues. Instead, a common workflow often

shows that storage performance is not the root cause of an application’s reported issues. Performance

Manager easily achieves this common objective. Eliminating storage quickly as the root cause saves

many hours of time that might be better spent on productivity improvement projects.

The most direct method of ruling out storage performance issues is by analyzing latency on the volume or

set of volumes where the application data resides. Performance Manager proficiently produces charts

graphing volume latency over any desired period. With those charts it’s simple to see that no latency

increase equals no storage performance issue.

See the detailed use cases that follow for instructions on creating volume latency charts.

Proactively Manage Performance

Another best practice workflow involves taking advantage of threshold alerts. Recall that threshold alerts

trigger events and e-mails when user-defined thresholds are breached. The most useful metric to analyze

using threshold policies is latency on the volume or volumes where mission-critical applications reside.

When creating threshold policy templates, the critical parameters to define are time duration and the

warning and critical-level thresholds.

The methods to establish acceptable thresholds vary. In some cases, acceptable latency levels are

established through service-level agreements (by design objective); in other cases these levels might be

discovered over time (from historical data). In any case, the recommendation is to apply the policy

template warning-level thresholds to receive early indication that storage performance is not as expected

or as designed. This effectively sets up a scenario in which storage administrators are aware of issues

before users.

See the detailed use cases that follow for detailed instructions on creating and applying threshold latency

templates.

Page 16: Technical Report OnCommand Performance Manager Best Practices · 2018-09-01 · Technical Report OnCommand Performance Manager Best Practices OnCommand Performance Manager Version

16 TR-4448: OnCommand Performance Manager Best Practices © 2015 NetApp, Inc. All rights reserved

4.2 Managing Clusters

Clustered Data ONTAP® operating system is NetApp’s flagship software product and the engine that

drives the Data Fabric technology vision. Thus, it is paramount to know how the clusters in the

environment are performing at any time. The Performance Manager dashboard sorts clusters and

prioritizes those needing attention. This entails performing more advanced management activities that

discover additional details about cluster operations. The good news is that Performance Manager

provides a robust set of tools that simplify and greatly aid in this process.

Cluster Landing Page

The cluster landing page (see Figure 12) will most likely become familiar to Performance Manager

users. Access is easy from the dashboard using the “View Cluster Details” link in the upper-right corner of

the cluster panel. The first portion of the cluster landing page contains summary information from the prior

three days; it shows events and key performance metrics followed by an inventory of cluster managed

objects. There is also a view referred to as “hot objects” that sorts managed objects by activity based on

the user selected.

Figure 12) Cluster landing page (part 1).

Page 17: Technical Report OnCommand Performance Manager Best Practices · 2018-09-01 · Technical Report OnCommand Performance Manager Best Practices OnCommand Performance Manager Version

17 TR-4448: OnCommand Performance Manager Best Practices © 2015 NetApp, Inc. All rights reserved

Cluster Hot Objects

The lower portion of the cluster landing page is where the “hot objects” functionality resides (see Figure 13). The number of objects, for example, volumes, can easily reach hundreds and beyond. The bulk of

the volumes might contain data either at rest or simply casually accessed and thus uninteresting. This

feature shows the most active objects sorted by a user-selected metric over a user-selected time range.

The negative slopes of the histograms in Figure 13, for example, “Top 10 Volumes,” show how

Performance Manager sorts volumes with the highest measured latency.

4.3 Managing Objects

Performance Manager object management functions have a common look and feel through the modular

object landing pages and Performance Explorer (see section 3.2). The following use cases utilize the

features and functionality of these components and are applicable with simple modifications to other

object types.

Figure 13) Cluster landing page "hot objects" (part 2).

Page 18: Technical Report OnCommand Performance Manager Best Practices · 2018-09-01 · Technical Report OnCommand Performance Manager Best Practices OnCommand Performance Manager Version

18 TR-4448: OnCommand Performance Manager Best Practices © 2015 NetApp, Inc. All rights reserved

Comparing Volume Latency

Volumes can be accessed from the dashboard and cluster landing page. Volumes are managed from the

volume landing page (see Figure 14).

The top three charts in Figure 14 summarize metrics from the prior three days for latency, IOPS, and

MBps throughput for this volume. The Performance Explorer tab is below the summaries. The left side of

the Performance Explorer screen lists other sibling volumes on that aggregate. Add any of these volume

objects to the metrics charts on the right side of the screen using the Add to Charts button (annotated).

Adding additional volume objects to the current set of metrics charts (in this example latency) causes

additional series to be added for each object (Figure 15).

Figure 14) Volume landing page.

Figure 15) Metrics chart with multiple volumes.

Page 19: Technical Report OnCommand Performance Manager Best Practices · 2018-09-01 · Technical Report OnCommand Performance Manager Best Practices OnCommand Performance Manager Version

19 TR-4448: OnCommand Performance Manager Best Practices © 2015 NetApp, Inc. All rights reserved

Compare Node Utilization Within a Cluster

A critical aspect of performance management is knowing how resources are used and understanding

performance resource consumption. This knowledge can result in rebalancing workloads by relocating

volumes or LUNs to newly added nodes or underused nodes. It can also result in choosing a node to add

a new workload. In any case, Performance Manager provides node utilization metrics over time and

provides ways to easily compare them.

To compare node utilization within a cluster, access any node object landing page in the cluster (Figure 16). You can do this from the dashboard directly or from the cluster landing page. On the left side of the

screen is a list of the sibling nodes in the cluster. Add all of the node objects to the metrics charts on the

right side of the screen using the Add to Charts buttons (annotated) in Performance Explorer.

After adding all node objects to the metrics charts, the graph appears with the utilization metric plotted for

all the nodes (Figure 17).

Figure 16) Node landing page.

Figure 17) Metrics chart comparing node utilization within a cluster.

Page 20: Technical Report OnCommand Performance Manager Best Practices · 2018-09-01 · Technical Report OnCommand Performance Manager Best Practices OnCommand Performance Manager Version

20 TR-4448: OnCommand Performance Manager Best Practices © 2015 NetApp, Inc. All rights reserved

Customize Displayed Metrics

One of the most powerful features of the Performance Explorer is the capability to select metrics in the

object metrics charts. Here we show how simple that is to do. Starting from the object landing page

(Figure 18), select the pick list (annotated) in the top-right corner.

After accessing the pick list, the metrics selection list appears with the available options (Figure 19).

The precise definition of each metric is beyond the scope of this document (see TR-4211: NetApp

Storage Performance Primer for more information). However, the label “(Advanced)” indicates that the

designated metric is further decomposed by read, write, and other operations.

Figure 18) Performance Explorer chart (metrics) selection.

Page 21: Technical Report OnCommand Performance Manager Best Practices · 2018-09-01 · Technical Report OnCommand Performance Manager Best Practices OnCommand Performance Manager Version

21 TR-4448: OnCommand Performance Manager Best Practices © 2015 NetApp, Inc. All rights reserved

4.4 Correlating Events and Metrics

Taking advantage of threshold alerts is a recommended best practice to receive early warning that

storage might be performing below expectations. When you receive an alert, it is often necessary to

explore activity related to that object at the time the event triggered. The Performance Explorer is the

appropriate tool for this purpose. For example, when investigating a volume latency threshold event that

shows that the volume operation rate decreases as latency increases, see if this event is the result of

interference from other workloads on the same controller. If the operation rate increases as latency

increases, the increase might be caused by an unanticipated additional load and resource consumption.

Figure 19) Metrics chart selection options.

Page 22: Technical Report OnCommand Performance Manager Best Practices · 2018-09-01 · Technical Report OnCommand Performance Manager Best Practices OnCommand Performance Manager Version

22 TR-4448: OnCommand Performance Manager Best Practices © 2015 NetApp, Inc. All rights reserved

Performance Manager offers a simple and an effective method to correlate a performance event with

metrics charts (Figure 20).

Hover the mouse pointer over the event “cherry point” to see additional event information and a link to

more event details (Figure 21).

4.5 Discovering Most Active Volumes Globally

Application workloads can change unexpectedly over time for a variety of reasons. These reasons include

the addition of more users, new application deployments, new projects, and shifting business priorities.

As a result, the unsuspecting storage administrator is often unaware of changing workload patterns. The

Cluster Hot Objects feature displays the most active volumes (and other objects) in a cluster. Here we

show how to take advantage of the Performance Manager inventory views to identify the most active

volumes in the entire managed enterprise.

Figure 20) Performance Explorer summary chart correlating event with metric.

Figure 21) "Cherry point" access to event details.

Page 23: Technical Report OnCommand Performance Manager Best Practices · 2018-09-01 · Technical Report OnCommand Performance Manager Best Practices OnCommand Performance Manager Version

23 TR-4448: OnCommand Performance Manager Best Practices © 2015 NetApp, Inc. All rights reserved

At the top of every page in Performance Manager is a menu bar. There you can access the storage

object inventory by selecting the Storage pull-down menu (Figure 22).

After selecting Volumes from the inventory pull-down, the volume inventory page displays (Figure 23).

Click on the IOPS column heading (annotated); the volumes are sorted with the most active first.

Figure 22) Accessing the storage object inventory.

Figure 23) Volume inventory page.

Page 24: Technical Report OnCommand Performance Manager Best Practices · 2018-09-01 · Technical Report OnCommand Performance Manager Best Practices OnCommand Performance Manager Version

24 TR-4448: OnCommand Performance Manager Best Practices © 2015 NetApp, Inc. All rights reserved

4.6 View the Effects of Caching

A data cache is a good example of a storage resource shared by many workloads. Several types of

caches exist that vary in cost. Caches are sized according to the expected benefit of faster access to data

to reduce workload latency. Thus caches are typically oversubscribed resources that Data ONTAP®

operating system manages as fairly and as efficiently as possible. However, as demand for data changes

during the workday and as business priorities shift over time, the benefits realized from caches change.

The initial assumptions made when sizing caches might no longer hold. Performance Manager tracks

read cache efficiency on a workload basis, which most accurately reflects application impact.

Figure 24 shows how cache efficiency correlates with IOPS and latency improvement. As caches are

populated with read data, the probability is that access to the same data increases and cache hits

increase. The Cache Miss Ratio (%) chart indicates that, as the cache hit % increases (the miss %

decreases), the storage system is able to service more IOPS. That is because data serviced from caches

Figure 24) Workload cache efficiency visualization.

Page 25: Technical Report OnCommand Performance Manager Best Practices · 2018-09-01 · Technical Report OnCommand Performance Manager Best Practices OnCommand Performance Manager Version

25 TR-4448: OnCommand Performance Manager Best Practices © 2015 NetApp, Inc. All rights reserved

reduces the average latency for the workload (effectively avoiding access to the hard disk, which is the

performance bottleneck).

5 OnCommand Performance Manager Key Takeaways

Oncommand®

Performance Manager is more than a monitoring and plotting tool. It is useful in eliminating

storage as the source of performance issues and useful in proactively avoiding performance problems

using threshold alerts. The dashboard presents clusters of interest, first making all the object exploration

tools available. Using Performance Manager, you can easily correlate events and metrics and compare

objects of interest over extended time periods. Finally, Performance Manager shows how storage

resources are used and alerts sent when consumption is not in line with the priorities of the business.

OnCommand Performance Manager is a valuable component of the NetApp® OnCommand product

portfolio and NetApp will heavily invest in it for years to come.

6 Additional Resources

TR-3982: NetApp Clustered Data ONTAP 8.3 and 8.2.x: an Introduction http://www.netapp.com/us/media/tr-3982.pdf

OnCommand Performance Manager product page http://www.netapp.com/us/products/management-software/performance-manager.aspx

OnCommand Performance Manager documentation http://mysupport.netapp.com/documentation/productlibrary/index.html?productID=61809

NetApp Storage Performance Primer http://www.netapp.com/us/media/tr-4211.pdf

7 Contact Us

Let us know how we can improve this technical report.

Contact us at [email protected].

Include TECHNICAL REPORT 4448 in the subject line.

Page 26: Technical Report OnCommand Performance Manager Best Practices · 2018-09-01 · Technical Report OnCommand Performance Manager Best Practices OnCommand Performance Manager Version

26 TR-4448: OnCommand Performance Manager Best Practices © 2015 NetApp, Inc. All rights reserved

Refer to the Interoperability Matrix Tool (IMT) on the NetApp Support site to validate that the exact product and feature versions described in this document are supported for your specific environment. The NetApp IMT defines the product components and versions that can be used to construct configurations that are supported by NetApp. Specific results depend on each customer's installation in accordance with published specifications.

Copyright Information

Copyright © 1994–2015 NetApp, Inc. All rights reserved. Printed in the U.S. No part of this document covered by copyright may be reproduced in any form or by any means—graphic, electronic, or mechanical, including photocopying, recording, taping, or storage in an electronic retrieval system—without prior written permission of the copyright owner.

Software derived from copyrighted NetApp material is subject to the following license and disclaimer:

THIS SOFTWARE IS PROVIDED BY NETAPP "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE, WHICH ARE HEREBY DISCLAIMED. IN NO EVENT SHALL NETAPP BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

NetApp reserves the right to change any products described herein at any time, and without notice. NetApp assumes no responsibility or liability arising from the use of products described herein, except as expressly agreed to in writing by NetApp. The use or purchase of this product does not convey a license under any patent rights, trademark rights, or any other intellectual property rights of NetApp.

The product described in this manual may be protected by one or more U.S. patents, foreign patents, or pending applications.

RESTRICTED RIGHTS LEGEND: Use, duplication, or disclosure by the government is subject to restrictions as set forth in subparagraph (c)(1)(ii) of the Rights in Technical Data and Computer

Software clause at DFARS 252.277-7103 (October 1988) and FAR 52-227-19 (June 1987).

Trademark Information

NetApp, the NetApp logo, Go Further, Faster, AltaVault, ASUP, AutoSupport, Campaign Express,

Cloud ONTAP, Clustered Data ONTAP, Customer Fitness, Data ONTAP, DataMotion, Fitness,

Flash Accel, Flash Cache, Flash Pool, FlashRay, FlexArray, FlexCache, FlexClone, FlexPod,

FlexScale, FlexShare, FlexVol, FPolicy, GetSuccessful, LockVault, Manage ONTAP, Mars,

MetroCluster, MultiStore, NetApp Insight, OnCommand, ONTAP, ONTAPI, RAID DP, RAID-TEC,

SANtricity, SecureShare, Simplicity, Simulate ONTAP, SnapCenter, Snap Creator, SnapCopy,

SnapDrive, SnapIntegrator, SnapLock, SnapManager, SnapMirror, SnapMover, SnapProtect,

SnapRestore, Snapshot, SnapValidator, SnapVault, StorageGRID, Tech OnTap, Unbound Cloud,

WAFL, and other names are trademarks or registered trademarks of NetApp Inc., in the United

States and/or other countries. All other brands or products are trademarks or registered

trademarks of their respective holders and should be treated as such. A current list of NetApp

trademarks is available on the web at http://www.netapp.com/us/legal/netapptmlist.aspx.

TR-4448-0815