SmartCloud Monitoring - Application Insight · 2013-12-21 · Use IBM SmartCloud Monitoring - Application Insight dashboards to monitor the ... Group A group is a named collection

SmartCloud Monitoring - ApplicationInsight

��

ii SmartCloud Monitoring - Application Insight

Contents

IBM SmartCloud Monitoring -Application Insight . . . . . . . . . . 1Monitoring . . . . . . . . . . . . . . . 1

Group Dashboard . . . . . . . . . . . 1Group Details: Group Name dashboard . . . . . 2Data source tab and dashboard . . . . . . . 3Page layout and controls . . . . . . . . . 4

Customizing . . . . . . . . . . . . . . 5Group Configuration . . . . . . . . . . 5Threshold Manager . . . . . . . . . . . 6Historical Data Configuration. . . . . . . . 8

Configuring . . . . . . . . . . . . . . 9

Configuring the fabric node for Amazon EC2 . . 9Configuring the fabric node for SmartCloudProvisioning . . . . . . . . . . . . . 10Configuring the fabric node for VMware . . . 11Advanced Configuration . . . . . . . . . 12

Data sources . . . . . . . . . . . . . . 16Linux OS data source . . . . . . . . . . 16Response Time data source . . . . . . . . 18Unsupported data sources . . . . . . . . 21

Index . . . . . . . . . . . . . . . 23

iii

iv SmartCloud Monitoring - Application Insight

IBM SmartCloud Monitoring - Application Insight

Enterprises are increasingly moving to the cloud to accelerate time to market,improve serviceability, and to reduce costs.

As a cloud consumer, you can use IBM SmartCloud® Monitoring - ApplicationInsight to monitor your cloud virtual machine applications to ensure optimalperformance and efficient use of resources.

Application Insight is able to scale up or down elastically in keeping with theability to grow and shrink, providing multi-tenant monitoring for most workloaddeployments. The monitoring technology can be embedded in virtual machine baseimages and initiated automatically when new workloads are deployed based onthose images.

MonitoringUse IBM SmartCloud Monitoring - Application Insight dashboards to monitor thehealth and efficiency of your cloud VMs.

The Application Insight infrastructure is installed on a management VM called thefabric node. After you log on to the fabric node, the Application Insight homedashboard is displayed.

Group DashboardThe Group Dashboard offers the highest level overview of your monitored cloudapplications.

The Group Dashboard has a table that shows an aggregation of “All Resources”and a row for each group that you have created.

Group A group is a named collection of data sources, such as “Credit Services”and “Mobile Streaming”. You always have an “All Resources” group andyou can create your own logical groups of data sources. (See “GroupConfiguration” on page 5.) Expand a group to see the types of data beingcollected, such as Linux OS; expand a data type to see the data sourceinstances of that type, such as “MyComputer:LZ”. You can open detaileddashboards from any of these levels:

Group nameData type

Data source

A VM instance can have multiple monitored data types, in which case yousee multiple data sources for the same VM. For example, if you deployLinux OS monitoring as well as Response Time monitoring on a LinuxVM, the VM instance shows under both the Linux OS and Response Timeareas of your group.Notice that each level in the hierarchy has a hypertext link to a moredetailed dashboard, filtered for the level you are at:

1

v At the group and data type level, the link opens the Group Details:Group Name dashboard to the Overview tab or the tab for the selecteddata type. For more information, see “Group Details: Group Namedashboard.”

v At the data source level, the link opens the Data type Details: Datasource name dashboard with a Details tab of granular metrics for theimage and an Events tab to show the open events for the image andtheir status. For more information, see “Data source tab and dashboard”on page 3.

EventsThe threshold definition for a data type includes a severity classification.The Fatal Events, Critical Events, Major Events, and Minor Events columnsshow a count of the threshold events that have been opened for the group,data type, or individual data source that can indicate a problem: Fatal,

Critical, Major, and Minor. For example, means that onefatal event is open. For more information about thresholds, see “ThresholdManager” on page 6.

Time of Last EventThis is the time and date of the most recently opened event for the group,data type, or data source.

For a description of the page elements and controls such as Last Updated and, see “Page layout and controls” on page 4.

Group Details: Group Name dashboardAfter you click a hypertext link from the Group Dashboard, this detaileddashboard is displayed at the Overview tab or the Data Type tab, depending on thelaunch point.

Use the IBM SmartCloud Monitoring - Application Insight Group Detailsdashboard to gain insight into the events and health of the application group orVM instance.

The dashboard has an Overview tab and a tab for each type of monitoring datawithin the selected group level. If there are more tabs for data types than can fit inthe window, click or to scroll through the tabs, or click to select from alist.

Overview

The Overview tab has an Application Summary table and two event tables of theVM instances in the group. For example, a “Credit Service” group would have allthe VMs used by that department.

Application SummaryThe Application Summary table is present if your group contains ResponseTime monitoring data. The Application Summary table lists high levelmetrics for each application detected by the Response Time monitor. TheStatus indicator reflects the highest severity threshold event on that datasource, such as for Minor events.v The Average Response Time (Seconds) for a single request that was

observed during the monitoring interval.

2 SmartCloud Monitoring - Application Insight

v Requests Per Second is the rate at which requests are coming into theVM. The measurement is a calculation based on where in thefive-minute period the data sampling occurs: The time when the intervalbegan is subtracted from the sampling time to get the number ofseconds, which is divided into the total number of requests since theinterval began. The result is the requests per second rate.

v Percent Failed and Percent Slow are linear gauges showing thepercentage of requests that failed or were slow.

OS Event SummaryThe OS Event Summary table shows the event count consolidated for thethreshold category, such as Memory or Disk. You assign the category whendefining a threshold.

Click a threshold category, such as CPU, to open the tab for the data typethat has the highest severity event for that category. The same thresholdcategory and event severity can apply to multiple data types, in which casethe tab for the first data type in the tabbed list is displayed.

Table 1. Group Details dashboard's OS Event Summary table. This representation of theOS Event Summary table shows a count of events for every threshold category, sorted bythe category with the highest severity events to the lowest.

Category

CPU 1 2 1

Network 1 1 1 1

Other 1

Events By ComponentThe Events By Component table shows the event count for each severityconsolidated for the data type, such as Linux OS and Response Time.

Data type

For every type of monitoring data in your group, you get a tab with a table ofsystem metrics from the data sources. For each VM image name, you see the IPaddress, overall event status, how long the VM has been running, and othermetrics to help you determine the cause of any events.

The first column in the table has a hypertext link to the Data type Detailsdashboard for the selected data source. For more information about the Data typetab and the detailed dashboard for each data type, see one of the following topics:

“Linux OS Details tab and dashboard” on page 17“Response Time tab and dashboard” on page 19“Unsupported data sources” on page 21See also the Help Table of Contents for other supported products.

Related reference:“Page layout and controls” on page 4

Data source tab and dashboardEvery data source in your monitored environment has a tab in the Group Details:Group Name dashboard.

IBM SmartCloud Monitoring - Application Insight 3

Open the dashboard for a data source, such as Linux OS, from the GroupDashboard or Group Details dashboard to see metrics for all the VMs of that typein the group.

From the data source tab, you can link to more granular metrics for an individualdata source.

For a description of the key performance indicators displayed in the data sourcetab of the Group Details dashboard and the data source dashboard, see thefollowing topics:

“Linux OS Details tab and dashboard” on page 17“Response Time tab and dashboard” on page 19“Unsupported data sources” on page 21

Page layout and controlsUse the tools that are on every IBM SmartCloud Monitoring - Application Insightdashboard and configuration page for changing the display and navigating.

Previous pageThe path to this dashboard is shown along the top of the page. Click ahypertext link to return to a previous dashboard. For example, if you areviewing the Linux OS details on the MySystem VM, the path is Home >All Resources > MySystem:LZ. To return to the Group Dashboard, youclick the Home hypertext link. The browser's Back button does notgenerally work for navigating back to the previous page.

Last UpdatedThe Last Updated bar shows the date and time of the most recent pagerefresh, which is set to every 30 seconds by default. For an immediatesampling of the data shown in the dashboard, click the bar and selectRefresh Now. The other choices set the permanent behavior of thedashboard: to not automatically refresh, to refresh every 30 seconds, or torefresh every minute.

Multiple pagesIf you want to see two dashboards at the same time, you can use thebuilt-in browser capability to open the link target in a new tab or window.However, the resulting page does not include the navigation bar. To seemultiple pages, open a new browser tab or window and log in toApplication Insight again from the new location so that you get thecomplete user interface.

LockedThe indicator at the group and data type level of the Group Dashboardand Group Details dashboard means that the fabric node has no validoperating system user credentials for the data source. Travel down the pathto find the data source and click the indicator to enter the correct logincredentials for the operating system of that VM. If the credentials arecorrect, the lock icon is removed after the next polling of the resource,which might take one or two minutes.

Overdue

When the fabric node has valid user credentials for a data source butcannot make contact with the data source within the timeout period, theoverdue indicator is displayed. Travel down the path to find the datasource that is having the timeout problem.


If you see the indicator displayed and then not displayed frequently,consider changing the HTTP Interface timeout settings in the AdvancedConfiguration page. (See Advanced Configuration - HTTP Interface.)

If the data refresh is overdue for some time, it could indicate that themonitoring agent is not running and that it must be restarted on the VMinstance.

Table filterClick inside the filter text box and type the beginning ofthe value to filter the table by. As you type, the table rows that do not fitthe criteria are filtered out and the Total is updated for the number ofrows found. Click the “x” in the filter box or press theBackspace key to clear the filter.

Column sortClick inside a column heading to sort by that column. Click the samecolumn heading again to switch between ascending and descending sortorder.

Column resizeDrag a column heading border to the right or left to adjust the columnwidth.

Select time rangeWhile viewing Linux OS details, you can adjust the time span for theplotted metrics. Drag the Start and End selectors to change start andend date and time. Click or to move another time range into focus.

Log outFor security, you are logged out automatically after a period of inactivity.Clicking a hypertext link or your browser's Refresh button causes the loginprompt to display for you to log back in.

To log out, click User > Logout or close the browser window.

CustomizingUse the customization tools to set up resource groups, thresholds, and historicaldata collection for effectively monitoring your environment.

Group ConfigurationA group is a named collection of data sources. You can create any number ofgroups and include the same data sources in multiple groups.

Click Settings > Group Configuration to open the Group Configuration pagefor creating logical groups of your cloud applications.

GroupsThe Groups list shows the names of the groups that you have created.

Data SourceSelect a name in the Groups list to see the members of the group in GroupData Sources list. Any data sources that were not added to the group aredisplayed in the Available Data Sources list.

NewClick New to create a new group. The Add New Group dialog boxopens for you to enter a new group name. Enter a name for the group of


up to 64 letters, numbers, underscores, hyphens and spaces. After you clickOK, the new group name is displayed.

To add members to the group, select one or more data sources from theAvailable Data Source list and click > to move them to the Group DataSources list. You can click and drag to select multiple data sources or useCtrl+click.

The next time you select the Group Dashboard, the new group is listed.

DeleteIf you no longer want a particular group, select it from the Groups list andclick Delete.

Threshold ManagerThresholds are used to compare the sampled value of an attribute with the valueset in the threshold. If the sampled value satisfies the comparison, an event isopened.

The event closes automatically when the threshold comparison is no longer true.The Threshold Manager displays a table of the thresholds that were defined for theselected data type. Here you can add, edit and delete thresholds.

Select data source typeThe data types that display when you click the list box are those that areincluded in your managed environment. Select the data type for which youwant to create or view thresholds.

Existing ThresholdsThis table lists all the thresholds that were created for the selected datatype.

New opens the Threshold Editor for defining a threshold for theselected data type.

Select a threshold and click Edit to open the Threshold Editor forediting the definition.

Select a threshold that you no longer want and click Delete.

After you click New or select a threshold and click Edit, the ThresholdEditor is displayed with the following fields:

Name Enter a name for the threshold that users can see in the dashboards. Thename must begin with a letter and can be up to 31 letters, numbers andunderscores, such as “Average_Processor_Speed_Harmless”. All thresholdsmust have unique names.

CategorySelect the category for grouping events in the OS Event Summary table inthe Group Details dashboard: CPU, Memory, Disk, Network, or Other. Thecategory affects the event aggregation for the OS status.

DescriptionOptional. A description is useful for recording the purpose of the thresholdthat users can see in the Threshold Manager.

SeveritySelect the appropriate event severity from the list: Fatal, Critical,Major, Minor, Harmless, Informational, or Unknown.


IntervalEnter or select the time to wait between taking data samples in HHMMSSformat, such as 00 15 00 for fifteen minutes. For sampled-event thresholds,the minimum interval is 000030 (30 seconds) and the maximum is 235959(23 hours, 59 minutes, and 59 seconds).

A value of 000000 (six zeroes) indicates a pure event threshold. Pure eventsare unsolicited notifications. Thresholds for pure events have no samplinginterval, thus there is no constant metric that can be monitored for currentvalues.

Required consecutive samplesSpecify how many consecutive threshold samples must evaluate to truebefore an event is generated. This means, for any threshold with a settingof 1 and a sample that evaluates to true, an event is generatedimmediately; a setting of 2 means that 2 consecutive threshold samplesmust evaluate to true before an event is opened.

Data setSelect the data set for the type of data to be sampled. The metrics availablefor inclusion in the condition are from the chosen data set. If the thresholdhas multiple conditions, they must all be from the same data set.

Do not select a data set with a name beyond 32 characters. Although youcan use a data set name over 32 characters, the threshold cannot be appliedto your data sources for monitoring. Of the supported data sources in thecurrent release, only the Linux data source has data set names of morethan 32 characters: KCA_LZ_Agent_Active_Runtime_Status andKCA_LZ_Agent_Availability_Management_Status.

Logical OperatorIf you are measuring multiple conditions, select And (&) if the previouscondition and the next condition must be met or select Or (|) if either ofthem can be met for the threshold to be breached. The threshold can haveup to nine conditions when the Or operator is used; up to 10 conditionswhen the And operator is used.

ConditionsThe threshold definition can logically include multiple simultaneousthresholds or conditions.

Click New to add a condition. Select a condition and click Edit tomodify the expression, or click Delete to remove the expression.

After you click New or Edit, complete the fields in the AddCondition dialog box that opens:

Metric Select the metric that you want to compare in this condition.

OperatorSelect the relational operator for the type of comparison: Equal,Not Equal, Greater than, Greater than or Equal, Less than, orLess than or equal.

Value Enter the value to compare using the format that is allowed for themetric, such as 20 for 20% or 120 for 2 minutes.

After you click Save, the threshold is applied to all data sources of the same datatype.


Note: The data sources for IBM SmartCloud Monitoring - Application Insightinclude the IBM® Tivoli® Monitoring Linux OS monitoring agent V6.3 and ITCAMfor Transactions Response Time monitoring agent V7.3. If you have an earlierversion of the monitoring agent installed and you have updated the thresholds forthat data source, you must stop and restart the agent before you can see the effectof the changes.

Historical Data ConfigurationYour environment already has historical data configurations for key data setsshown in the dashboards. You can add more historical data configurations forother data sets.

After you click Settings > Historical Data Configuration, the page is displayedfor you to see and work with historical data configurations.

Select data source typeThe data types that display when you click the list box are those that areincluded in your managed environment. Select the data type for which youwant to view or configure historical data collection.

Existing Historical Data ConfigurationsThis table lists all the historical data configurations that were created forthe selected data type. The data sets are prefixed with the data type code,such as KLZ for Linux OS and WRT for Response Time.

New opens the Add Historical Record dialog box for the selected datatype.

Select a historical data configuration and click Edit to open the EditHistorical Record for editing the data collection definition.

Select a historical data configuration that you no longer want and clickDelete.

After you click New or select a historical data configuration and click Edit,the Add Historical Record or Edit Historical Record dialog box is displayed withthe following fields:

Data SetSelect a data set for which you want to collect historical data. The data setsavailable are for the chosen data type.

IntervalEnter the data sampling frequency, from 5 to 60 minutes. Historical datasamples are saved at the monitored resource for retrieval into thedashboards.

Retain Enter the number of hours to keep the historical data samples for, from 24to 72 hours.

After you click OK, the historical data configuration is saved to the data set listand historical data begins to be collected from all data sources of the same datatype.

Note: The data sources for IBM SmartCloud Monitoring - Application Insightinclude the IBM Tivoli Monitoring Linux OS monitoring agent V6.3 and ITCAMfor Transactions Response Time monitoring agent V7.3. If you have an earlier


version of the monitoring agent installed and you have updated the thresholds forthat data source, you must stop and restart the agent before you can see the effectof the changes.

ConfiguringUse the Configuration Wizard to edit the connection settings between the fabricnode and your host provider. You can further configure your environment settingsthrough the Advanced Configuration page.

Configuring the fabric node for Amazon EC2After installing the fabric node on your deployed VM, you must configure thefabric node for communication with the Amazon EC2 service delivery platform.

The configuration wizard starts automatically the first time you log on to the IBMSmartCloud Monitoring - Application Insight console after installation. Thereafter,you can start the wizard whenever you have configuration changes.

Before you begin

Log on to your Amazon Web Service Management Account (http://aws.amazon.com/console) and gather the following information for entry in step 3:v Your Access Key ID and Secret Access Key as described in Getting Your AWS

Access Keys (docs.aws.amazon.com/ses/latest/DeveloperGuide/get-aws-keys.html).

v Your entry point URL, which is based on your Region (shown in the Navigationarea) and listed in the table at Regions and Endpoints - Amazon EC2(docs.amazonwebservices.com/general/latest/gr/rande.html#ec2_region).

Review the Prerequisites (http://pic.dhe.ibm.com/infocenter/tivihelp/v63r1/topic/com.ibm.scmai.doc_1.1/install/scmai_prerequisites.htm) and Dependenciesand limitations (http://pic.dhe.ibm.com/infocenter/tivihelp/v63r1/topic/com.ibm.scmai.doc_1.1/install/scmai_install_dependencies.htm) for the supportedplatforms and requirements for fabric nodes.

Procedure

1. If the Configuration Wizard is not open, click Settings > ConfigurationWizard.

2. For the service delivery platform, select the Amazon Elastic Compute Cloudoption and click Next.

3. For the Amazon EC2 platform configuration, complete the following fields andclick Next:a. Access Key is the AWS Access Key ID for accessing AWS SES.b. Secret Key is the Secret Access Key for accessing AWS SES.c. Confirm Secret Key is the Secret Access Key, which you enter a second time

to ensure that you typed the key correctly.d. Region Endpoint URL is the URL that represents the entry point for AWS

and is based on your Region, such as http://ec2.us-west-1.amazonaws.comfor US West (Northern California) Region.

4. Optional. Enter the default user ID and password for working with themonitoring agents that are deployed to the consumer VMs:a. User ID is root by default.


http://aws.amazon.com/console

http://aws.amazon.com/console

http://docs.aws.amazon.com/ses/latest/DeveloperGuide/get-aws-keys.html

http://docs.aws.amazon.com/ses/latest/DeveloperGuide/get-aws-keys.html

http://docs.amazonwebservices.com/general/latest/gr/rande.html#ec2_region

http://pic.dhe.ibm.com/infocenter/tivihelp/v63r1/topic/com.ibm.scmai.doc_1.1/install/scmai_prerequisites.htm

http://pic.dhe.ibm.com/infocenter/tivihelp/v63r1/topic/com.ibm.scmai.doc_1.1/install/scmai_install_dependencies.htm

http://pic.dhe.ibm.com/infocenter/tivihelp/v63r1/topic/com.ibm.scmai.doc_1.1/install/scmai_install_dependencies.htm

b. Password is passw0rd by default.c. Confirm Password is passw0rd by default.

Results

After you click Done, the configuration parameters are updated on the server andthe Group Dashboard is displayed. Unless you installed consumer VMs before thefabric node, initially, you have one monitored Linux resource: the fabric node.

What to do next

v Click Settings > Group Configuration and create logical groupings of yourmonitored data sources, as described in the Learn more... link topic.

v Click Settings > Threshold Manager and create thresholds for testing keyperformance indicators, as described in the Learn more... link topic.

v Click the Learn more... link in any dashboard to learn more about the dashboardand what you can do.

Configuring the fabric node for SmartCloud ProvisioningAfter installing the fabric node on your deployed VM, you must configure thefabric node file for communication with the SmartCloud Provisioning servicedelivery platform.


Before you beginv Initial configuration of the fabric node involves establishing key credentials with

the web service host, starting the database, and initializing the configurationdatabase. Most of the configuration is automatic after the private key has beenestablished.

v Have at hand the information that is required from SmartCloud Provisioning forconfiguring the fabric node: the SmartCloud Provisioning web host IP address,your access ID, and the private key that is associated with your ID.

Procedure


2. For the service delivery platform, select the SmartCloud Provisioning optionand click Next.

3. For the SmartCloud Provisioning platform configuration, complete thefollowing fields and click Next:a. Access ID is shown on the SmartCloud Provisioning Home page after you

sign in.b. Web Service Host is the IP address of the SmartCloud Provisioning host.c. Web Service Port is set to the default 5678.d. Service Region is the region hosted by SmartCloud Provisioning, and is set

to “query” by default.e. Private Key is pasted from SmartCloud Provisioning Home page by

clicking Show Access Key, copying the entire Private Key text including the-BEGIN- and -END- lines, and pasting here.


4. Optional. Enter the default user ID and password for working with themonitoring agents that are deployed to the consumer VMs:a. User ID is root by default.b. Password is passw0rd by default.c. Confirm Password is passw0rd by default.

Results


What to do next




Configuring the fabric node for VMwareAfter installing the fabric node on your deployed VM, you must configure thefabric node for communication with the VMware service delivery platform.


Procedure


2. For the service delivery platform, select the VMware Virtual Center option andclick Next.

3. For the VMware platform configuration, complete the following fields and clickNext:a. User ID is the VMware login ID.b. Password is the VMware user password.c. Confirm Password is the VMware user password, which you enter a second

time to ensure that it was typed correctly.d. Host Name is the fully qualified host name or IP address of the VMware

service host.e. Port Number is set to the default 80.f. Use SSL is set to false by default. Select true if Secure Socket Layer will be

used for communications with the VMware service host.g. Keystore File is set to /opt/ibm/ccm/CCM-Certs/ccm.truststore by default.

Enter the full path to the keystore file that contains the set of certificatesthat are trusted by IBM SmartCloud Monitoring - Application Insight.

h. Keystore Password and Keystore Confirm Password are disabled becausethe password has already been established.


i. Validate Certificates is set to false by default. Set to true if you want theSSL connection certificates to be validated against the certificates importedinto the fabric node truststore. For details about importing certificates, seeEnabling certificate validation for communication with VMware(http://pic.dhe.ibm.com/infocenter/tivihelp/v63r1/topic/com.ibm.scmai.doc_1.1/install/scmai_installvmware_certificate.htm).

4. Optional. Enter the default user ID and password for working with themonitoring agents that are deployed to the consumer VMs:a. User ID is root by default.b. Password is passw0rd by default.c. Confirm Password is passw0rd by default.

Results


What to do next




Advanced ConfigurationInitial fabric node configuration is done in the Configuration Wizard.

Use the Advanced Configuration page to control communications settings andadvanced features such as event forwarding.

Agent Service InterfaceThe settings here are required information used by the fabric node toconfigure communications with the monitoring agents (data sources).

Refresh Interval (minute)The frequency, in minutes, that the data sources use to queryconfiguration details from the fabric node. Default: 5 minutes.

Port The HTTP port that is used to communicate with data sources.Default: 51920.

Secure PortThe HTTPS port that is used to communicate with data sources.Default: 53661.

Polling Interval (minutes)The frequency, in minutes, for checking that a data source is active.Default: 1 minute.

Trace The trace log component that is used to gather data about the performanceof the IBM SmartCloud Monitoring - Application Insight system.

Maximum File SizeThe maximum size of each log file in bytes. Default: 5000000 bytes(5 MB).


http://pic.dhe.ibm.com/infocenter/tivihelp/v63r1/topic/com.ibm.scmai.doc_1.1/install/scmai_installvmware_certificate.htm

Maximum Log FilesThe maximum number of log files that will be used beforewrapping the log entries. After the maximum number of log files isreached, the oldest log file is replaced by the newest. Default: 5files.

Trace LevelThe level of detail for logging trace entries. The possible values areERROR, DEBUG_MIN, DEBUG_MID and DEBUG_MAX. Higherlevels give you more detailed information, which is useful forinvestigating any problems or errors that occur. Default: ERROR.

Package Level Trace StringA regular expression string describing the classes to trace at aspecified JLog level. Default: /com/ibm/tivoli/ccm/config\\.*:ERROR.

Event ManagerThe Event Manager controls the flow through (forwarding toNetcool/OMNIbus Probe for Tivoli EIF and Simple Mail Transfer Protocol)and the storage of received events.

Event Cache TimeThe number of hours that events are retained in the local cache, upto 96 hours. Default: 24 hours.

If the Event list is very long, consider reducing the number ofhours that are kept. Or, if you want to see events over a weekendperiod, increase the number of hours.

EIF PortThe port to use for Event Integration Facility operations. Default:5151.

EIF Event Target(s)The list of host names or IP addresses to which all received EventIntegration Facility events are forwarded. For example, if you areforwarding events to the Netcool/OMNIbus Probe for Tivoli EIF,enter the fully qualified host name or IP address of the computerwhere the probe is installed. Separate each host name with acomma (,), such as9.87.65.111,9.12.34.115,myhostname.en.ibm.com.

SMTP PortThe port number to use for sending SMTP (Simple Mail TransferProtocol) email. Default: 25.

Sending Email AccountThe account to use when sending an SMTP email event.

Sending Email Account PasswordThe password associated with the sending email account.

SMTP Server AddressThe host name of the SMTP server that is used for sending eventsas emails, such as smtp.gmail.com.

SSL SMTP PortThe port number to use for sending SMTP email using the SSLprotocol. Default: 465.


Email Subject LineThe subject line text that will be applied to every forwarded SMTPemail event. Default: CCM Event.

Target Email AddressesThe list of one or more email addresses to which events areforwarded. Separate each address with a comma (,), such [email protected],[email protected],[email protected].

Use SSLWhether to use SSL as the SMTP transport mechanism. Default:False.

HTTP InterfaceThe HTTP interface is used with the Agent Service Interface tocommunicate with the data sources. The time out values that govern thedisplay of the indicator in the Group Dashboard and Group Detailsdashboard are controlled here. If the indicator is frequently displayed andthen not displayed, you might consider increasing the time out and retryvalues.

Connection TimeoutThe amount of time, in seconds, before an HTTP connectionattempt fails. Default: 2 seconds.

Read TimeoutThe amount of time, in seconds, before an HTTP read attempt fails.Default: 4 seconds.

Fabric Node TransportSetup information for the proper operation of the communications usedbetween fabric nodes.

Port The IP Port used for communications between fabric nodes. Thefabric node must be restarted if the port is changed. Default: 32105.

SDP AdapterThe Socket Direct Protocol (SDP) adapter is responsible for providing theinterface between IBM SmartCloud Monitoring - Application Insight andthe underlying cloud platform.

Discovery IntervalThe time to wait, in seconds, between discovery cycles. Default: 30seconds.

Discovery Plug-insThe list of discovery plug-ins to be started. Valid values are: SCP,VMWARE and AMAZON. Default: VMWARE.

SDP Adapter - Amazon Plug-inAn SDP adapter specifically to support integration with the Amazon EC2environment.

Amazon Access KeyThe access key that is associated with the Amazon Web Serviceaccount.

Amazon Regional End-pointThe URL that represents the entry point for the Amazon WebService.


Amazon Secret KeyThe secret key that is associated with the Amazon Web Serviceaccount.

SDP Adapter - SCP Plug-inAn SmartCloud ProvisioningSCP plug-in that provides support for the IBMSmartCloud Provisioning environment.

Access IDThe ID used for SmartCloud Provisioning authentication.

Private KeyThe text form of the private key.

SCP Service RegionThe region that defines the SmartCloud Provisioning service type.Default: query

SCP Requests TimeoutThe amount of time, in seconds, until a request to SmartCloudProvisioning times out. Default: 30 seconds.

SCP Web Service HostThe SmartCloud Provisioning Web Service host name.

SCP Web Service PortThe SmartCloud Provisioning Web Service port. Default: 5678.

SDP Adapter - VMware Plug-inAn SDP adapter that provides support for the VMware environment.

Virtual Center PasswordThe password for the Virtual Center User Name.

Use SSLWhether to use an SSL connection to the Virtual Center. Default:false.

Virtual Center User NameA user ID that has sufficient privileges to collect monitoring data.

Validate CertificatesA flag that indicates whether certificate host names must bevalidated. Default: False.

Virtual Center HostThe Virtual Center host name.

Virtual Center PortThe Virtual Center port. Default: 80 when SSL is disabled; 443when SSL is enabled.

TruststoreInternal security key repository.

Java™ Keystore FileThe fully-qualified path of the keystore file containing thecollection of certificates trusted by this client. Default:/opt/ibm/ccm/CCM-Certs/ccm.truststore.

Java Keystore File PasswordThe password to access the keystore file that contains the collectionof certificates trusted by this client.


Managed Virtual Machines Agent CredentialsDefault User ID and Password to be tried for all data sources that are usedto collect data.

Default PasswordThe Default Password to be tried for all data sources that are usedto collect data.

Default User IDThe Default User ID to be tried for all data sources that are used tocollect data.

Data sourcesThe types of applications that you can monitor in IBM SmartCloud Monitoring -Application Insight are your data sources.

The dashboards display metrics from both supported (such as the Linux OSmonitoring agent) and unsupported (such as the DB2 monitoring agent) datasources. The supported data sources have predefined thresholds and historical datacollections to help you get started with monitoring your consumer VM instances assoon as you log on to the Application Insight console.

Linux OS data sourceThe Linux OS data source has a set of predefined thresholds and historical datacollections to help you get started with monitoring your Linux OS data sources.

GroupsThe home dashboard initially displays one group called All Resources.Expand All Resources to see a collapsed row of all the data sources in yourmonitored environment, including Linux OS. Expand Linux OS to see allthe Linux OS data sources, named for their VM instance name and thetwo-character product code of the Linux OS monitoring agent: LZ.

Click Settings > Group Configuration to create and edit groups thatinclude one or more Linux OS data sources, as described in “GroupConfiguration” on page 5.

ThresholdsThe Linux OS data source has two predefined thresholds:

Linux_CPU_Utilization_High This threshold is written for the KLZCPU data set. An event is opened if the aggregate CPU usage is at 80%or more.Linux_Memory_Utilization_High This threshold is written for the KLZVM Stats data set. An event is opened if the memory swap space usedis more than forty percent.

You can define additional thresholds using these and other Linux OS data

sets. Click Settings > Threshold Manager to see the existing Linux OSthresholds and to create and edit thresholds, as described in “ThresholdManager” on page 6. All thresholds for the Linux OS data source areapplied to all data sources of the same type.

To learn more about the Linux OS data sets and the metrics that comprisethem, see the “Linux OS Attributes” (pic.dhe.ibm.com/infocenter/tivihelp/v61r1/topic/com.ibm.itm.doc_6.3/oslinux/attr_lz_overview.htm) in theIBM Tivoli Monitoring: Linux OS Agent User's Guide.


http://pic.dhe.ibm.com/infocenter/tivihelp/v61r1/topic/com.ibm.itm.doc_6.3/oslinux/attr_lz_overview.htm

Historical data collection configurationThe Linux OS data source has predefined historical data collections for thefollowing data sets:v KLZ_CPUv KLZ_CPU_Averagesv KLZ_Diskv KLZ_Disk_IOv KLZ_Disk_Usage_Trendsv KLZ_Networkv KLZ_System_Statisticsv KLZ_User_Logonv KLZ_VM_Statsv Linux_Group

Historical data samples are saved for each of these data sets every fiveminutes at the data source for three days (72 hours) before the oldestsamples are deleted to make room for the new data samples. The Linux OSDetails: DataSource:LZ dashboard shows historical data for the selected datasource.

Click Settings > Historical Data Configuration to see and configureLinux OS historical data collections, as described in “Historical DataConfiguration” on page 8.

Linux OS Details tab and dashboardIn the Group Details dashboard, you have a tab for the Linux OS data typeshowing metrics for the Linux OS data sources in the group. From here, you canlink to a more detailed set of Linux OS metrics from the selected VM.

Use the Linux OS tab of the Group Details: Group dashboard to get general systeminformation for each Linux OS data source in the selected group. From either table,you can open the Linux Details dashboard by clicking the hypertext link for a datasource in the Name column.

Application SummaryThe Application Summary table lists the Linux OS data sources in thegroup. The Status indicator reflects the highest severity threshold event onthe data source, such as for a Major event. Use the metrics to see theoverall health of the system:v The Up Time column shows the number hours and minutes since the

Linux OS monitoring was started on the VM instance.v CPU, Memory, and Disk are percentage linear gauges of these metrics so

that you can quickly detect unusually high or low usage.v Network Errors Per Minute with a high rate can indicate problems such

as network congestion.

Event SummaryThe event summary table is like the one shown in the Overview tab,except that it has one row for each data source. The rows are sorted by thedata source with the highest severity events to the lowest. For arepresentation of the table, see Table 1 on page 3.


Open the Linux OS Details: Data source dashboard from the expanded GroupDashboard or from the Group Details Linux OS tab to get detailed operatingsystem metrics and event status. The following tabs are displayed:

DetailsThe Details tab has bar charts showing the top 5 processes that areconsuming CPU and memory. If CPU or memory usage is high, it is likelythat one or more of these processes is responsible.

The line chart plots the last 24 hours of historical CPU Utilization, MemoryUtilization, and Disk Utilization data samples from the data source. Iffewer than 24 hours of historical data collection has occurred, a shorterperiod is displayed. The data samples are taken every 5 minutes or as setin the historical collection definition for a set of metrics.

EventsThe events that have been opened for this Linux OS data source are listedhere in order from highest severity to lowest. You can see at a glance theissues that might be affecting the health of the Linux OS data source.v The State column indicates whether the event is open or closed and the

Time column shows when the state change occurred.v The Event column shows the threshold name for an open event and the

data source name for a closed event.v The Description column shows the threshold name followed by the

condition in square brackets. If the metric is a scaled value, such aspercentage, the number shown in the condition is a multiple of the scalerather than the number you entered. For example, if the condition isidle_cpu > 5, it shows as idle_cpu > 500 because the scale for thatmetric is 2.

Select the radio button for an event to see the current values of all the dataset metrics in the Event Details table.

Extended DataThe Extended Data tab enables you to select any of the data sets for theLinux OS data type to see a table of current data samples from all themetrics in that data set. This is the most granular data available, and usefulwhen you want to see the exact values for a given metric on the selecteddata source for a VM.

Related reference:“Page layout and controls” on page 4

Response Time data sourceThe Response Time data source has a set of predefined thresholds and historicaldata collections to help you get started with monitoring your Response Time datasources.

GroupsThe home dashboard initially displays one group called All Resources.Expand All Resources to see a collapsed row of all the data sources in yourmonitored environment, including Response Time. Expand Response Time tosee all the Response Time data sources, named for their VM instance nameand the two-character product code of the Response Time monitoringagent: T5.


Click Settings > Group Configuration to create and edit groups thatinclude one or more Response Time data sources, as described in “GroupConfiguration” on page 5.

ThresholdsThe Response Time data source has predefined thresholds, all of which arewritten for the WRT Transaction Status data set:

Response_Time_Warning A moderate percentage of the webtransactions have a slow response time. The threshold evaluates to trueif Percent_Slow is between 1% and 5% and the Percent_Available is100%.Response_Time_Availability_Warn tests for a failure rate of under10%, which indicates that a moderately high percentage of webtransactions have failed.Response_Time_Availability_Crit tests for a failure rate of over 10%,which indicates that a high percentage of web transactions have failed.

You can define additional thresholds using these and other Response Time

data sets. Click Settings > Threshold Manager to see the existingResponse Time thresholds and to create and edit thresholds, as describedin “Threshold Manager” on page 6. All Response Time thresholds areapplied to all data sources of the same type.

To learn more about the Response Time data sets and the metrics thatcomprise them, see “Response Time - Attributes listed alphabetically”(pic.dhe.ibm.com/infocenter/tivihelp/v63r1/topic/com.ibm.itcamt.doc_7.3/rt/Attributes/all_alpha.html) in the IBM TivoliComposite Application Manager for Transactions User's Guide.

Historical data collection configurationThe Response Time data source has historical configuration defined andready to begin data collection as soon as the monitoring agent on the datasource is started. There are predefined historical data collections for thefollowing data sets:v WRT_Application_Statusv WRT_Application_Summaryv WRT_Transaction_Statusv WRT_Transaction_Summary

Historical data samples are saved for each of these data sets every fiveminutes at the data source for three days (72 hours) before the oldestsamples are deleted to make room for the new data samples. The ResponseTime Details: DataSource:T5 dashboard shows historical data for theselected data source.

Click Settings > Historical Data Configuration to see and configureLinux OS historical data collections, as described in “Historical DataConfiguration” on page 8.

Response Time tab and dashboardIn the Group Details dashboard, you have a tab for the Response Time data typeshowing significant metrics for the data sources in the group. From here, you canlink to a more detailed set of Response Time metrics from the selected application.


http://pic.dhe.ibm.com/infocenter/tivihelp/v63r1/topic/com.ibm.itcamt.doc_7.3/rt/Attributes/all_alpha.html

The application and transaction status use a five-minute interval for counting andaveraging all of their values.

Use the Response Time tab of the Group Details: Group dashboard to get summaryinformation for each Response Time data source in the selected group. From anytable, you can open the Response Time Details dashboard by clicking the hypertextlink for a data source in the Application Name column.

Application SummaryThe Application Summary table lists the internet services on each datasource in your monitored environment. Use the metrics to see the range ofresponses for the transaction. The Status indicator reflects the highestseverity threshold event, such as for Critical events.v The Average Response Time (Seconds) for a single transaction instance

that was observed during the monitoring interval. During eachmonitoring interval, minimum, maximum, and average response timesfor the aggregate records are recorded.

v Requests Per Second is the rate at which transaction requests are cominginto the server. The measurement is a calculation based on where in thefive-minute period the data sampling occurs: The time when the intervalbegan is subtracted from the sampling time to get the number ofseconds, which is divided into the total number of requests since theinterval began. The result is the requests per second rate.

v Percent Failed and Percent Slow are linear gauges showing thepercentage of transactions whose requests were marked as Failed orSlow.

Transactions: Highest Failure PercentageFor each internet service, the table gives the same Average Response Time(Seconds), Requests Per Second, Percent Failed, and Percent Slow metricsthat you see in the Application Summary table but calculated for thehighest failure rate.

The table shows only failure rates that are 90% or higher within the fiveminute interval. If the failure rate does not reach at least 90% during thattime, the table shows a “No items to display” message rather thanpopulating with data.

Transactions: Slowest Average Response TimesFor each internet service, the table gives the same Average Response Time(Seconds), Requests Per Second, Percent Failed, and Percent Slow metricsthat you see in the Application Summary table but calculated for theslowest average response times.

Use the Response Time For Application Name On VM Host_Name dashboard to seethe Response Time metrics for the selected application and the events that havebeen opened for it. The following tabs are displayed:

Response TimeThe Transactions table gives a breakdown by transaction of the AverageResponse Time (Seconds), Requests Per Second, Percent Failed, and Percent


Slow metrics that you see in the Application Summary table (seeApplication Summary) but for the selected internet service.

Select the radio button for a transaction to update the Response Time inSeconds and Overall Volume charts with the values over time for theselected transaction.

EventsThe Response Time events that have been opened for the data source arelisted in order from highest severity to lowest. The State column indicateswhether the event is open or closed and the Time column shows when thestate change occurred.

Select the radio button for an event to see the current values of all themetrics in the data set for that threshold in the Event Details table.

Extended DataIn the Extended Data tab you can select any of the Response Time datasets to see a table of current data samples from all the metrics in that dataset. This is the most granular data available, and useful when you want tosee the exact values for a given metric on the selected data source.

Linux OSIf monitoring for Linux OS is occurring on the same VM instance thatResponse Time monitoring is on, you also get a Linux OS tab, whichshows the same metrics that you see in the Details tab of the Linux OSdashboard: bar charts showing the top 5 processes that are consuming CPUand memory; and a line chart showing the last 24 hours of data samplesfor CPU Utilization, Memory Utilization, and Disk Utilization.

For a description of the Response Time metrics, see “Response Time - Attributeslisted alphabetically” (pic.dhe.ibm.com/infocenter/tivihelp/v63r1/topic/com.ibm.itcamt.doc_7.3/rt/Attributes/all_alpha.html) in the IBM Tivoli CompositeApplication Manager for Transactions User's Guide.Related reference:“Page layout and controls” on page 4

Unsupported data sourcesIf you have monitoring agents installed on the VMs in your environment that arenot supported in the current release of IBM SmartCloud Monitoring - ApplicationInsight, you can still see monitoring metrics in the dashboards.

You can also create and edit groups, thresholds, and historical data collections forunsupported agents.

Be aware that, because the monitoring agents are not supported, communicationsmight be unreliable and the dashboard displays are limited. If you see theindicator in the Group Dashboard and Group Details dashboard, it means thatthere is a loss of communication with the monitoring agent. You might need tostop and restart the monitoring agent on the VM instance before communicationscan succeed.

Likewise, if you attempt to create or edit a group, threshold, or historical datacollection and see no data sources of that type or get an error message, stop andrestart the monitoring agent on the indicated VM instance, then try again.





Index

Aadvanced configuration 12agent service interface 12Amazon EC2

configuring 9SDP adapter 14

Cconfiguration

advanced 12

Ddashboard

controls 4dashboards

data source 4data sources 16

default user 16port 12

Eemail configuration 13error log

See trace logevent integration facility 13event manager 13

Ffabric node

configuring Amazon EC2 9

fabric node (continued)configuring SmartCloud

Provisioning 10configuring VMware 11IP port 14

HHTTP interface 14

Iindicators 4

Mmonitoring agents

See data sources

Ppage layout and controls 4polling interval 12port

25 1332105 14443 15465 135151 1351920 1253661 125678 1580 11, 15fabric node 11

portSmartCloud Provisioning5678 10

Rrefresh interval 12

SSDP adapter 14security

Java keystore 15truststore 15

SmartCloud Provisioningconfiguring 10SDP adapter 15

SMTP 13SSL 13

Ttable

controls and indicators 4Tivoli monitoring agents

See data sourcestrace log 12

Uuser ID 16

VVMware

configuring 11SDP adapter 15

23

Documents

SmartCloud Monitoring - Application Insight · 2013-12-21 · Use IBM SmartCloud Monitoring - Application Insight dashboards to monitor the ... Group A group is a named collection