View
216
Download
0
Tags:
Embed Size (px)
Citation preview
VMware Management SolutionTechnical Overview
Iwan ‘e1’ RahabokSenior Systems [email protected] | virtual-red-dot.blogspot.com | 9119-9226
VCAP-DCD
State of VMware Deployment in ASEAN
Architecture
Operation
Maturity?
Challenges after virtualisation:Doing 3x more with 3x less.Sharing of resources. Contention must be managedMuch less head room for futureVery easy to make changes
“Amplified” and “Updated” Needs: Performance monitoring & troubleshootingCapacity planningConfiguration managementChange managementCompliance
New Need: Chargeback
Before:
After:
200 physical servers (150 Prod, 50 Non Prod)
60 ESXi servers, serving600 VM (200 Prod, 200 DR, 200 Test/Dev/etc)
Automation >< Orchestration
More Engineering More Management
Including physical
management
Private Cloud Self-Service Solution Bundle: IaaS
Infrastructure & Operations Performance, Capacity, Configuration
Security & Compliance vShield + VCM: Operational and Regulatory
Compliance
IT Service Management Problem, incident, change, config
Application Management App Release + Performance
1.
2.
3.
4.
5.
vCenter Operation
vCenter Operations Advanced
vCenter Operations Enterprise
+ Configuration & Compliance Management
(vCenter Configuration Manager)+ Other VMware & 3rd Party Integrations
(View, management, servers, storage)
Non-Vmware (incl. physical) environmentsVMware Cloud / vCenter
vSphere
vCenter Operations Standard Capacity
Management
Performance Management
(up to 1500 VM)
Purpose Built Capacity Planning & Analysis• Integrated capacity analysis and forecasting• Decision support & automation via views, alerts,
reports• VM right sizing and capacity reclamation
Automated Configuration & Compliance• Automated Patching and Provisioning• Comprehensive change tracking to isolate root cause• Single-click rollback to remediate and return to normal
Patented Performance Analytics• Self-learning of “normal” performance conditions• Service health baseline and trending • Smart alerts of impending performance degradation
Automating Operations
Progressive
INTEGRATION
Patented
ANALYTICS
Powerful
VISUALIZATION
PerformanceCapacityConfiguration++
Comparing the EditionsStandard Enterprise
Data Sources vCenter x 1 • Any 3rd party monitoring tools’ time series data• Change events• Multiple vCenter Servers
Objects vCenter Objects (i.e.)• Data Centers• Clusters• ESX Hosts• Datastores• VMs x 1500
Unlimited Scope (i.e.)• Applications• Network Infrastructure• Storage• Hosts (ESX, Win, Linux, etc)• VMs
Users Infrastructure (e.g. VI Admins) Operations, Infrastructure, Application Teams, Business Owners, CxOs
Dynamic Thresholds Yes Yes
Performance Root Cause Yes Yes
Proactive Alerting No Yes
Customizable Dashboards No Yes
Notifications No Yes
Sco
peF
unct
ion
vCenter Operation – Standard Edition
vCenter Environment - Workload
• Workload Measures• Demand for resources vs. Resources currently used• Result is a percentage of Workload
• Low number is Good – Object has the resources it needs• Can go above 100% - Object is “Starving”
• Workload summarized across critical resources • CPU• Storage• Network• Memory
• Workload Details View• View the state of the Peer and Parent Objects and troubleshoot
• Am I a victim or a villain? • Is this a population problem?
vCenter Environment - Health
• Health Measures• How normal is this object behaving: • 0-100 (Higher is Healthier or Normal)• Learns dynamic ranges of “Normal” for each metric• Learns patterns of behavior and identifies metric
abnormalities• Healthy = no abnormalities
• Health and Workload together• Workload High & Health High – Normal Behavior for
this timeframe• Workload High & Health Low – Something is amiss!
Important Note
Low Health does not
imply a problem. It
tells you that the
object is acting
differently than
normal.
Learn Normal Behavior and Identify Abnormalities
• Doesn’t assume IT data has a normal bell-shaped distribution• Sophisticated Analytics – 8 different algorithms• Learns your dynamic ranges of “Normal” without templates• Learns patterns of behavior and identifies Abnormalities
BLUE LINEMetric’s Current
Value
GRAY BARUpper and Lower band of Dynamic Threshold -
“Normal”
RED BARBreached Dynamic
Threshold – “Abnormal”
vCenter Environment - Capacity
• Capacity• How much time before Capacity run out?• 0-100: Higher number, longer time.• Thresholds User Configurable
• 30 Days Left = RED• 60 Days Left = Orange• Etc.
• Unlike Workload, Capacity is long-term.• Capacity measured for critical resources
• CPU, RAM, Storage, Network• Capacity Details View
• Shows the chart and trend for each of the above resources• Denotes current state• Projected breach point and days left
Performance Visibility Across the Virtualized Datacenter
Full visibility up and down the
datacenter stack
Aggregates 100s of metrics into 1 intelligent
score
Drill into ESX server for
further details
Intuitive, Web RIA-based user-friendly interface
Context sensitive object
hierarchy
Breadcumbs to track object hierarchy
Search and filter
Continuous, automatic learning of
normal behavior for key metrics
Workload issue correlated to
net I/O constraints
Quickly show Reservation vs
Demand vs Usage
Drilldown to track changes
Diagnostics relative to
parent, peer and child objects
Detailed display of events and health score
changes
Visibility into Disk and Network IO performance
Disk subsystem performance
details by datastores and
LUNs
Network statistics for every NIC
Quiz: what’s the difference between
Total & Host?
Quickly identify “suspect”
performance metric
KPI history with timestamp to indicate root
cause
Relative scores to prioritize any
remediation efforts
Health tree with topology mapping
Top-down visibility into
health changes
Time-series charts for
individual metric
Individual performance metric details
Single view that correlates
multiple metrics
Detailed list of all metrics
indicating smart alerts
Visualisation quickly pinpoints hotspots
Single click drill down for further
details
Storage
• Since all the datastores are on the same array, how do we quickly tell the relative workload generated by every one of them?
• For each of these datastores, how do we know the relative workload generated by the VM?
• For every VM, how do we know the latency is within reasonable number?
• How do we show all the above data in “one chart”, without the need to show a lot of numbers?
Heatmap customisation
vCenter Operations Standard Architecture
Four Main Services: Collector, Analytics, Web, ActiveMQ
Bundled DB:
PostgresSQL DB
File-based DB (FSDB) for raw metric storage
Single Collector for vCenter. Embedded in appliance
vCenter Operations Standard Processing
2a: Analytics runs daily to determine hour-by-hour Dynamic Thresholds for
next 24 hours
2b: Full FSDB is scanned by the analytic algorithms to determine per metric best match the next 24
hour period
1a: vCenter Collector collects metrics, topology & change
events from vCenter - Ongoing -
1b: Data stored in
FSDB
3: Incoming data points are tested against Dynamic
Threshold bands and used to calculate Health,
Workload and Capacity
2c: Store metric Dynamic
Thresholds data in PostgresSQL DB
4: Results provided to UI: Update
“Badges”, provide Root Cause for
Health scores, etc.
VMware Configuration Manager
View available compliance templates
Report on overall compliance posture
Drill in to see detailed compliance analysis
results
Pinpoint what systems failed what checks
Fix compliance violations
Thank you!