VMware Management Solution Technical Overview Iwan ‘e1’ Rahabok Senior Systems Consultant [email protected] | virtual-red-dot.blogspot.com | 9119-9226 VCAP-DCD

VMware Management SolutionTechnical Overview

Iwan ‘e1’ RahabokSenior Systems [email protected] | virtual-red-dot.blogspot.com | 9119-9226

VCAP-DCD

State of VMware Deployment in ASEAN

Architecture

Operation

Maturity?

Challenges after virtualisation:Doing 3x more with 3x less.Sharing of resources. Contention must be managedMuch less head room for futureVery easy to make changes

“Amplified” and “Updated” Needs: Performance monitoring & troubleshootingCapacity planningConfiguration managementChange managementCompliance

New Need: Chargeback

Before:

After:

200 physical servers (150 Prod, 50 Non Prod)

60 ESXi servers, serving600 VM (200 Prod, 200 DR, 200 Test/Dev/etc)

Automation >< Orchestration

More Engineering More Management

Including physical

management

Private Cloud Self-Service Solution Bundle: IaaS

Infrastructure & Operations Performance, Capacity, Configuration

Security & Compliance vShield + VCM: Operational and Regulatory

Compliance

IT Service Management Problem, incident, change, config

Application Management App Release + Performance

1.

2.

3.

4.

5.

vCenter Operation

vCenter Operations Advanced

vCenter Operations Enterprise

+ Configuration & Compliance Management

(vCenter Configuration Manager)+ Other VMware & 3rd Party Integrations

(View, management, servers, storage)

Non-Vmware (incl. physical) environmentsVMware Cloud / vCenter

vSphere

vCenter Operations Standard Capacity

Management

Performance Management

(up to 1500 VM)

Purpose Built Capacity Planning & Analysis• Integrated capacity analysis and forecasting• Decision support & automation via views, alerts,

reports• VM right sizing and capacity reclamation

Automated Configuration & Compliance• Automated Patching and Provisioning• Comprehensive change tracking to isolate root cause• Single-click rollback to remediate and return to normal

Patented Performance Analytics• Self-learning of “normal” performance conditions• Service health baseline and trending • Smart alerts of impending performance degradation

Automating Operations

Progressive

INTEGRATION

Patented

ANALYTICS

Powerful

VISUALIZATION

PerformanceCapacityConfiguration++

Comparing the EditionsStandard Enterprise

Data Sources vCenter x 1 • Any 3rd party monitoring tools’ time series data• Change events• Multiple vCenter Servers

Objects vCenter Objects (i.e.)• Data Centers• Clusters• ESX Hosts• Datastores• VMs x 1500

Unlimited Scope (i.e.)• Applications• Network Infrastructure• Storage• Hosts (ESX, Win, Linux, etc)• VMs

Users Infrastructure (e.g. VI Admins) Operations, Infrastructure, Application Teams, Business Owners, CxOs

Dynamic Thresholds Yes Yes

Performance Root Cause Yes Yes

Proactive Alerting No Yes

Customizable Dashboards No Yes

Notifications No Yes

Sco

peF

unct

ion

vCenter Operation – Standard Edition

vCenter Environment - Workload

• Workload Measures• Demand for resources vs. Resources currently used• Result is a percentage of Workload

• Low number is Good – Object has the resources it needs• Can go above 100% - Object is “Starving”

• Workload summarized across critical resources • CPU• Storage• Network• Memory

• Workload Details View• View the state of the Peer and Parent Objects and troubleshoot

• Am I a victim or a villain? • Is this a population problem?

vCenter Environment - Health

• Health Measures• How normal is this object behaving: • 0-100 (Higher is Healthier or Normal)• Learns dynamic ranges of “Normal” for each metric• Learns patterns of behavior and identifies metric

abnormalities• Healthy = no abnormalities

• Health and Workload together• Workload High & Health High – Normal Behavior for

this timeframe• Workload High & Health Low – Something is amiss!

Important Note

Low Health does not

imply a problem. It

tells you that the

object is acting

differently than

normal.

Learn Normal Behavior and Identify Abnormalities

• Doesn’t assume IT data has a normal bell-shaped distribution• Sophisticated Analytics – 8 different algorithms• Learns your dynamic ranges of “Normal” without templates• Learns patterns of behavior and identifies Abnormalities

BLUE LINEMetric’s Current

Value

GRAY BARUpper and Lower band of Dynamic Threshold -

“Normal”

RED BARBreached Dynamic

Threshold – “Abnormal”

vCenter Environment - Capacity

• Capacity• How much time before Capacity run out?• 0-100: Higher number, longer time.• Thresholds User Configurable

• 30 Days Left = RED• 60 Days Left = Orange• Etc.

• Unlike Workload, Capacity is long-term.• Capacity measured for critical resources

• CPU, RAM, Storage, Network• Capacity Details View

• Shows the chart and trend for each of the above resources• Denotes current state• Projected breach point and days left

Performance Visibility Across the Virtualized Datacenter

Full visibility up and down the

datacenter stack

Aggregates 100s of metrics into 1 intelligent

score

Drill into ESX server for

further details

Intuitive, Web RIA-based user-friendly interface

Context sensitive object

hierarchy

Breadcumbs to track object hierarchy

Search and filter

Continuous, automatic learning of

normal behavior for key metrics

Workload issue correlated to

net I/O constraints

Quickly show Reservation vs

Demand vs Usage

Drilldown to track changes

Diagnostics relative to

parent, peer and child objects

Detailed display of events and health score

changes

Visibility into Disk and Network IO performance

Disk subsystem performance

details by datastores and

LUNs

Network statistics for every NIC

Quiz: what’s the difference between

Total & Host?

Quickly identify “suspect”

performance metric

KPI history with timestamp to indicate root

cause

Relative scores to prioritize any

remediation efforts

Health tree with topology mapping

Top-down visibility into

health changes

Time-series charts for

individual metric

Individual performance metric details

Single view that correlates

multiple metrics

Detailed list of all metrics

indicating smart alerts

Visualisation quickly pinpoints hotspots

Single click drill down for further

details

Storage

• Since all the datastores are on the same array, how do we quickly tell the relative workload generated by every one of them?

• For each of these datastores, how do we know the relative workload generated by the VM?

• For every VM, how do we know the latency is within reasonable number?

• How do we show all the above data in “one chart”, without the need to show a lot of numbers?

Heatmap customisation

vCenter Operations Standard Architecture

Four Main Services: Collector, Analytics, Web, ActiveMQ

Bundled DB:

PostgresSQL DB

File-based DB (FSDB) for raw metric storage

Single Collector for vCenter. Embedded in appliance

vCenter Operations Standard Processing

2a: Analytics runs daily to determine hour-by-hour Dynamic Thresholds for

next 24 hours

2b: Full FSDB is scanned by the analytic algorithms to determine per metric best match the next 24

hour period

1a: vCenter Collector collects metrics, topology & change

events from vCenter - Ongoing -

1b: Data stored in

FSDB

3: Incoming data points are tested against Dynamic

Threshold bands and used to calculate Health,

Workload and Capacity

2c: Store metric Dynamic

Thresholds data in PostgresSQL DB

4: Results provided to UI: Update

“Badges”, provide Root Cause for

Health scores, etc.

VMware Configuration Manager

View available compliance templates

Report on overall compliance posture

Drill in to see detailed compliance analysis

results

Pinpoint what systems failed what checks

Fix compliance violations

Thank you!

Documents

VMware Management Solution Technical Overview Iwan ‘e1’ Rahabok Senior Systems Consultant [email protected] | virtual-red-dot.blogspot.com | 9119-9226 VCAP-DCD