2009 VMware Inc. All rights reserved Confidential Private Cloud
Sample Architectures for >1000 VM Private Cloud Sample
Architectures for >1000 VM Singapore, Oct 2011 Iwan e1 Rahabok
virtual-red-dot.blogspot.com | tinyurl.com/SGP-User-Group M: +65
9119-9226 | [email protected] VCAP-DCD
Slide 2
2 Confidential Purpose of This Document There is a lot of talk
about Cloud Computing. But how does it look like at technical
level? How do we really assure SLA, and have 3 Tier of service? If
I have 1000 VM, what does the architecture it look like? This is my
personal opinion. Please dont take it as official and formal VMware
Inc recommendation. Im not authorised to do so. Also, generally we
should judge the content, rather than the organisation/person
behind the content. A technical fact is a technical fact,
regardless who said it Technology changes SSD disk, >10 core
CPU, FCoE, CNA, vStorage API, storage virtualisation, etc will
impact the design. A lot ot new innovation coming within next 2
years. New modules/products from VMware & Ecosystem Partners
will also impact the design. This is just a sample Not a Reference
Architecture, let alone a Detailed Blueprint. So please dont print
and follows to the dot. This is for you to think and tailor. It is
written for hands-on vSphere Admin who have attended Design
Workshop & ICM You should be at least a VCP 5, preferably
VCAP-DCD No explanation on features. A lot of the design
consideration is covered in vSphere Design Workshop. Folks, some
disclaimer, since I am employee of VMware
Slide 3
3 Confidential Table of Contents Introduction Requirements,
Assumptions, Consideration, and Design Summary vSphere Design: Data
Center Data Center, Cluster (DRS, HA, DPM, Resource Pool) vSphere
Design: Server ESXi, physical host vSphere Design: Network vSphere
Design: Storage vSphere Design: Security vCenter roles/permission,
vSphere Design: VM vSphere Design: Management Performance
troubleshooting
Slide 4
4 Confidential Design Methodology Architecting a Private Cloud
is not a sequential process There are 6 components. The components
are inter-linked. Like a mash. In >1000 VM category, where it
takes >2 years to implement, new vSphere will change the design.
Even the Bigger picture is not sequential Sometimes, you may even
have to leave Design and go back to Requirements or Budgetting.
Again, there is no perfect answer. Below is one example. This
entire document is about Design only. Operation is another big
space. I have not taken into account Audit, Change Control, ITIL,
etc. The steps are more like this
Slide 5
5 Confidential Introduction
Slide 6
6 Confidential Assumptions Assumptions are needed to avoid the
infamous It depends answer. The architecture for 50 VM differs with
that for 500 VM, which in turn differs with that for 5000 VM. A
design for large VM (16 vCPU, 128 GB) differs with a design for
small VM (1 vCPU, 1 GB) A design for Server farm differs to Desktop
farm. This assumes 100% virtualised, not 99% It is easier to have 1
platform than 2. Certain things in company, you should only have 1
(email, directory, office suite, back up). Something as big as a
platform should be standardised. Thats why they are called platform
Out of the 1000 VM, we assume some will be Huge. 10 vCPU, 96 GB
RAM, 10 TB storage Latency sensitive. 0.01 ms end to end latency
Secret. Holding company secret data. We assume, it will have 50
databases, mixed of Oracle and SQL Other Oracle softwares (they are
charged per cluster) The design is forward looking Based on 10 GE
network. Assume Security team can be convinced on mixed-mode.
Slide 7
7 Confidential Assumptions used in this example Assumptions #
VM that our design needs to cater750 Production 2250 Non Production
(1:3 ratio) 5 data centers Data Center2 large one (Singapore + HK),
with private connectivity. 5 small ones (to comply with country
regulatory) # Desktops/Laptop10000 With remote access + 2 FA (RSA)
Need offline VDI and iPad access DMZ Zone / SSLF ZoneYes/Yes.
Intranet also zoned Back upTape Network standardCisco ITIL
ComplianceIn place Change ManagementIn place Overall System Mgmt SW
(BMC, CA, etc)Yes, CA DatabaseOracle, SQL. Some have > 5 TB
database. MSCSRequired Audit TeamExternal & Internal Oracle
softwares (BEA, DB, etc)Yes. sub-cluster will be used. IT
OrganisationDifferent teams for Server, Storage, Network, Security,
Database, etc. Fault ToleranceYes (for Tier 0) Complex App
dependancyYes (some apps spans >30 VM)
Slide 8
8 Confidential Application consideration Type of VMImpact on
Design App that holds sensitive dataShould encrypt the data or the
entire file system. vSphere 5 cant encrypt the vmdk file yet. If
you encrypt the Guest OS, back up product may not be able to do
file-level back up. Should ensure no access by MS AD Group
Administrator. Find out how it is back up, and who has access to
the tape. If IT does not even have access to the system, then
vSphere may not pass the audit requirement. Check partner products
like Intel TXT and Hytrust Should be placed on separate cluster, or
even vCenters? A group of apps with complex power on sequence I
recommend HA Isolation response to shut down the VMs running on the
isolated host. If they are shut-down, powering them on may need App
Owner involvement (especially if it needs manual intervention) App
that takes advantages of specific CPU Instruction Set Mixing with
older CPU Architecture is not possible. This is a small problem if
you are buying new server. EVC will not help, as its only a mask.
See speaker notes App that need < 0.01 ms end to end latency
Separate cluster.
Slide 9
9 Confidential Application consideration Type of VMImpact on
Design App that require software dongleDongle must be attached to 1
ESX. vSphere 4.1 adds this support. Best to use network dongle. In
the DR site, the same dongle must be provided too App with high
IOPSMay need its own datastore with dedicated spindles. No point
having dedicated datastores if the underlying spindles are shared
among multiple datastores. Apps that uses very large block
sizeSharePoint uses 256 KB block size. So a mere 400 IOPS will
saturate the GE link already. For such application, FC or FCoE will
be a better protocol. Any application with 1 MB block size can
easily saturate 1 GE link. App with very large RAM or vCPUThis will
impact DRS when a HA event occurs as it needs to have a host that
house the VM. It will still boot so long reservation is not set to
a high number. App that is very sensitive to time accuracy. Time
drift is a possibility in virtual world. Find out business or
technical impact if time deviates by 10 seconds.
Slide 10
10 Confidential Architecting a private cloud: what to consider
Architecture is an Art Balancing a lot of things, some are not even
technical. It considers future (unknown requirements). Trying to be
close to best practice Not in any particular order, below is what I
consider in this vSphere based architecture My personal principle:
Do not design something you cannot troubleshoot. A good IT
Architect does not setup potential risk for Support Person down the
line. Not all counters/metrics/info are visible in vSphere.
Consideration Upgradability This is unique in the virtual world. A
key component of cloud that people have not talked much. After all
my apps run on virtual infrastructure, how do I upgrade the
virtualisation layer itself? Based on historical data, VMware
releases major upgrade every 2-3 years. vSphere 4.0 was released on
May 2009, 5.0 was Sep 2011. If you are laying down an architecture,
check with your VMware rep for NDA roadmap presentation.
Debugability Troubleshooting in virtual environment is harder than
physical, as boundary is blurred and physical resources are shared.
3 types of troubleshooting: Configuration. This does not normally
happen in production, as once it is configured, it is not normally
changed. Stability. Stability means something hang or crash (BSOD,
PSOD, etc) or corrupted Performance. This is the hardest among the
3, especially if the slow performance is short lived and in most
cases it is performing well.
Slide 11
11 Confidential Architecting a private cloud: what to consider
Consideration Supportability This is related, but not the same with
Debug-ability. Support relates to things that make day to day
support easier. Monitoring counters, reading logs, setting up
alerts, etc. For example, centralising the log via syslog and
providing intelligent search (e.g. using Splunk or Integrien)
improves Supportability A good design makes it harder for Support
team to make human error. Virtualisation makes task easy, sometimes
way to easy relative to physical world. Consider this
operational/phychological impact in your design. Support also means
using components that are support by the vendors. For example, SAP
support is from certain versions onwards (old version not
supported) Availability Software has Bugs. Hardware has Fault. We
cater for hardware fault mostly. What about software bugs? Cater
for software bug, which is why the design has 2 VMware clusters
with 2 vCenter. This lets you test cluster-related features in one
cluster, while keeping your critical VM on another cluster. Tier 0
can be added that uses Fault Tolerant hardware (e.g. Stratus)
Reliability Related to availabity, but not the same. Availability
is normally achieved by redundancy. Reliability is normally
achieved by keeping things simple, using proven components,
separating things, standardising. You will notice a lot of
standardisation in the design. The drawback of standardisation is
overhead, as we have to round up to the next bracket. A VM with 6
GB RAM ends up getting 8 GB. Performance Storage, Network,
VMkernel, VMM, Guest OS, etc are considered. We are aiming for
16 Confidential Special scenarios There are scenarios where you
might need to create a separate cluster, or even vCenter. Large VM
(>6 vCPU, > 36 GB RAM) Separate cluster, as the ESXi host
spec is different? Databases. Do you group them into 1 cluster (to
save licence, give DBA more access to the cluster, vShield group)?
or put them together with the app they support? put DB used by IT
in the same cluster with DB used by Business? Oracle VM Separate
cluster or sub-cluster? VM that needs hardware dongle Use
network-base. Separate subcluster. Will also need the same at DR
Site. VM holding company secret Do you put them them in separate
cluster? Can you trust the vCenter Admin? Do you put them in
separate datastore? Do you use VSA as you cant trust the SAN Admin?
Enhance the security with vShield VM with 0.1 ms network latency Do
you put them in separate cluster as your ESXi has to be configured
differently? VM with 5 ms disk latency VM on DMZ zone Same cluster.
We wil use vShield
Slide 17
17 Confidential Overall Architecture This shows an example for
Cloud for >500 VM. It also uses Active/Passive data centers. The
overall architecture remain similar with Large Cloud. This shows an
example for Cloud for >500 VM. It also uses Active/Passive data
centers. The overall architecture remain similar with Large
Cloud.
Slide 18
18 Confidential Cluster Design (1 DC)
Slide 19
19 Confidential The need for Non Prod Cluster This is unique in
the virtual data center. Well, we dont have Cluster to begin with
in physical DC. Non-Prod Cluster serves multiple purposes Run Non
Production VM In our design, all Non-Production run on DR Site to
save cost. A consequence of our design is migrating from/to
Production can mean copying large data across WAN. Disaster
Recovery Test-Bed for Infrastructure patching or updates. Test-Bed
for Infrastructure upgrade or expansion Evaluating or Implementing
new features In Virtual Data Centre, a lot of enhancements can
impact entire data centre e.g. Distributed Switch, Nexus 1000V,
Fault Tolerant, vShield All the above need proper testing. Non-Prod
Cluster should provide sufficient large scale scope to make testing
meaningful Upgrade of the core virtual infrastructure e.g. from
vSphere 5 to future version (major release) This needs extensive
testing and roll back plan. Even with all the above How are you
going to test SRM properly? SRM test needs 2 vCenters, 2 arrays, 2
SRM servers. If all are used in production, then where is the
test-environment for SRM? Business IT This new layer does not exist
in physical world. It is software, hence needs its own Non Prod
envi. This new layer does not exist in physical world. It is
software, hence needs its own Non Prod envi.
Slide 20
20 Confidential The need for IT Cluster Special purpose cluster
Running all the IT VMs used to manage the virtual DC or provide
core services The Central Management will reside here too Separated
for ease for management & security This separation keeps
Business Cluster clean, strictly for business. This separation
keeps Business Cluster clean, strictly for business. Large Cloud
VMwarevCenter (for Server Cloud) vCenter Heart-beat vCenter Update
Manager Symantec AppHA Server vCloud Director StorageStorage Mgmt
tool (may need physical RDM to get fabric info) NetworkNetwork
Management Tool Nexus 1000V Manager (VSM) Core InfraMS AD 1 MS AD 2
Syslog server File Server (FTP Server) Advance vDC ServicesSite
Recovery Manager + DB Chargeback + DB Agentless AV Object-based
Firewall SecuritySecurity Management Server vShield Manager
AdminAdmin client (1 per Sys Admin) VMware Converter vMA vCenter
Orchestrator Application MgmtApp Dependancy Manager
ManagementvCenter Ops + DB Help Desk DesktopView Managers + DB
ThinApp Update Server vCenter (for Desktop Cloud)
Slide 21
21 Confidential Cluster Size I recommend 8 nodes per cluster.
Why 8, not 4 or 12 or 16 or 32? A balance between too small (4
hosts) and too large (>12 hosts) DRS: 8 give DRS sufficient host
to maneuver. 4 is rather small from DRS scheduler point of view.
With vSphere 4.1, having 4 hosts do not give enough hosts to do
sub-cluster For cost reason, some clusters can be as small as 2
nodes. But DPM benefit cant be used. Best practice for cluster is
same hardware spec with same CPU frequency. Eliminates risk of
incompatibility Consistent performance (from user point of view)
Complies with Fault Tolerant & VMware View best practices So
more than 8 means its more difficult/costly to keep them all the
same. You need to buy 8 hosts a time. Upgrading >8 servers at a
time is expensive ($$) and complex. A lot of VMs will be impacted
when you upgrade > 8 hosts. Manageability Too many hosts are
harder to manage (patch, performance troubleshooting, too many VMs
per cluster, HW upgrade) Allow us to isolate 1 host for
VM-troubleshooting purpose. At 4 node, we cant afford such luxury
Too many paths to a LUN can be complex to manage and troubleshoot
Normally, a LUN is shared by 2 clusters, which are adjacent
cluster. 1 ESX is 4 paths. So 8 ESX is 32 paths. 2 clusters is 64
paths. This is a rather high number (if you compare with physical
world) N+2 for Tier 1 and N+1 for others With 8 host, you can
withstand 2 host failures if you design it to. At 4 nodes, it is
too expensive as payload is only 50% at N+2 Small Cluster size From
Availability and Performance point of view, this is rather risky.
Say you have 3-node cluster. You are doing maintenance on Host 1
and suddenly Host 2 goes down you are exposed with just 1 node.
Assuming HA Admission Control is enabled (which you should), the
affected VM may not even boot. When a host is placed into
maintenance mode, or disconnected for that matter, it is taken out
of the admission control calculation. Cost: Too few hosts result in
overhead (the spare host) See slide notes for more details
Slide 22
22 Confidential 3-Tier cluster The host spec can be identical.
But the service can be very different. Below is an example of
3-tier cluster. Tier# HostNode Spec?Failure Tolerance
MSCS#VMMonitoringRemarks Tier 15 (always) Always Identical 2
hostsYesMax 18Application level. Extensive Alert Only for Critical
App. No Resource Overcommit. Tier 24 8Maybe1 hostLimited10 per
(N-1)App can be vMotioned to Tier 1 during critical run Tier 34
8No1 hostNo15 per (N-1)Infrastructure level Minimal Alert. Some
Resource Overcommit
Slide 23
23 Confidential ESXi Host: CPU Sizing ESXi Host: CPU 2 - 4 vCPU
per physical core This is a general guideline. Not meant for sizing
Tier 1 Application. Tier 1 App should be given 1:1 sizing. More
applicable for Test/Dev or Tier 3 12 core box 24 48 vCPU Design
with ~10 VM per box in Production and ~15 VM per box in Non
Production. ~10 VM per box means impact of downtime when host fails
are capped at ~10 Production VM. ~10 VM per box in a 8-node cluster
means ~10 VMs may be able to boot in 7 hosts in the event of HA,
hence reducing down time. Based on 10:1 consolidation ratio, if all
your VMs are 3 vCPU, then you need 30 vCPU, which means a 12 core
ESX gives 2.5:1 CPU oversubcribe. Based on 15:1 consolidation
ratio, if all your VMs are 2 vCPU, then you need 30 vCPU. Buffer
the following: HA event Performance isolation. Hardware maintenance
Peak: month end, quarter end, year end Future requirements: within
12 months DR. If your cluster needs to run VM from the Production
site.
Slide 24
24 Confidential ESXi: Sample host specification Estimated
Hardware Cost: US$ 8K per ESXi. Configuration included in the above
price: 2 Xeon X5650. The E series has different performance &
price attributes 72 GB RAM (18 slots x 4 GB) or 96 GB RAM (12 slots
x 8 GB) 2 x 10 GE ports (no hardware iSCSI) 2 x 8 Gb FC HBA 5 year
warranty (next business day) 2x 50 GB SSD. Swap to host-cache
feature in ESXi 5 Running agent VM that is IO intensive Could be
handy during troubleshooting. Only need 1 HD as its for
troubleshooting purpose. PXE boot No need local disk Installation
service Light-Out Management. Avoid using WoL. Uses IPMI or HP
iLO.
Slide 25
25 Confidential Blade or Rack Both are good. Both have pro and
cons. Table below is relative comparison, not absolute. Consult
principal for specific model. Below is just for guidelines.
Comparison below is only for vSphere purpose. Not for other use
case, say HPC or non VMware. BladeRack Relative Advantages Some
blades come with built-in 2x 10 GE port. To use it, you just need
to get 10 GE switch. Flexibility. Some blade virtualise the 10 GE
NIC and can slice it. As usual, adding another layer adds
complexity. Less cabling. Better power efficiency. Better rack
space efficiency. Better cooling efficiency. The larger fan (4 RU)
is better than the small fan (2 RU) used in rack Some blade can be
stateless. The management software can clone 1 ESX to another.
Better management Typical 2RU rack server normally comes with 4
built-in ports. Better suited for
Slide 26
26 Confidential Server Selection All Tier 1 vendors (HP, Dell,
IBM, Cisco, etc) make great ESXi hosts. Hence the following
guidelines are relatively minor to the base spec. Additional
guidelines for selecting an ESXi Servers: Does it have Embedded
ESXi? How much local SSD (capacity and IOPS) can it handle? This is
useful for stateless desktop architecture. Useful when using local
SSD as cache or virtual storage. Does it have built-in 2x 10 GE
ports? Does the built-in NIC card have hardware iSCSI capability?
Memory cost. Most ESXi Server has around 64 128 GB of RAM, with
mostly around 72 GB. With 4 GB DIMM, it needs a lot of DIMM slots.
What are the server unique features for ESXi? Management
integration. Majority of the server vendors have integrated
management with vCenter. Most are free. Dell is not free, although
it has more features? DPM support?
Slide 27
27 Confidential SAN Boot 4 methods of ESXi boot Local Compact
Flash Local Disk LAN Boot (PXE) with Auto-Deploy SAN Boot For the 3
sample size, we use ESXi Embedded. Environment with >20 ESXi
should consider Auto Deploy. Auto-Deploy is also good for
environment where you need to prove to security team that your ESXi
has not been tempered (you can simply boot it and it is back to
normal ) Advantages of Local Disk to SAN boot No SAN complexity
Need to label the LUN properly. Disadvantages of Local Disk to SAN
boot Need 2 local disk, mirrored. Certain organisation does not
like local disk. Disk is a moving part. Lower MTBF. Save
power/cooling SAN Boot is a step toward stateless ESXi An ideal ESX
is just pure CPU and RAM. No disk, no PCI card, no identity.
Slide 28
28 Confidential Storage Design
Slide 29
29 Confidential Methodology Once mapping is done, turn on QoS
if needed Turn on Storage IO Control if a particular VM needs
certain guarantee. Turn on Storage IO Control is we want fairness
among all VM within the DS Storage IO Control is per datastore. If
underlying LUN shares spindles with all other LUN, then it may not
achieve the result. Consult with storage vendor on this as they
have entire array visibility/control. SLADatastoreVMMappingQoS
Define the standard (Storage Driven Profile) Map each VM to each
datastore Create another DS if insufficient (either capacity or
performance) See next slide for detail For each VM, gather:
Capacity (GB) Performance (IOPS) requirements Importance to
business: Tier 1, 2, 3 Define the Datastore profile. Map Cluster to
Datastore
Slide 30
30 Confidential SLA: Type of Datastores Not all datastores are
equal. Always know the underlying IOPS & SLA that the Array can
provide for a given datastore You should always know where to place
a VM. Use datastore group Always have a mental picture where your
Tier 1 VM resides. It cant be somewhere in the cloud Types of
datastore Business VM Tier 1 VM, Tier 2 VM, Tier 3 VM, Single VM
Each Tier may have multiple datastores. DMZ VM Mounted only by ESX
that has DMZ network? IT VM Isolated VM Template Desktop VM SRM
Placeholder Datastore Heartbeat? Do we dedicate datastores for it?
1 datastore = 1 LUN Relative to 1 LUN = Many VMFS, it gives better
performance due to less SCSI reservation
Slide 31
31 Confidential Special Purpose Datastore 1 low cost Datastores
for ISO and Templates Need 1 per vCenter Need 1 per physical Data
Center. Else you will transfer GBs of data across WAN. Around 500
GB ISO directory structure: 1 staging/troubleshooting datastore To
isolate a VM. Proof to Apps team that datastore is not affected by
other VM. For storage performance study or issue. Makes it easier
to corelate with data from Array. The underlying spindles should
have enough IOPS & Size for the single VM Our sizing: 500 GB 1
SRM Placeholder datastore So you always know where it is. Sharing
with other datastore may confuse others. Used in SRM 5 to place the
VMs metadata so it can be seen in vCenter. 10 GB enough. Low
performance. \ISO\ \OS\Windows \OS\Linux \Non OS\ store things like
anti virus, utilities, etc
Slide 32
32 Confidential SLA: 3 Tier pools of storage Create 3 Tiers of
Storage. This become the type of Storage Pool provided to VM Paves
for standardisation 1 size for each Tier. Keep it consistent.
Choose an easy number. 20% free capacity for VM swap files,
snapshots, logs, thin volume growth, and storage vMotion (inter
tier). Use Thin Provisioning at array level, not ESX level.
Separate Production and Non Production Example Replication is to DR
Site via array replication, not same building. Snapshot = protected
with array-level snapshot for fast restore RAID level does not
matter so much if Array has sufficient cache (with battery backed,
naturally) RDM will be used for data drive with 1 TB.
Virtual-compatibility mode used unless Apps said so. VMDK larger
than 1 TB will be provisioned as RDM. Virtual-compatibility mode
used.
TierInterfaceIOPSLatencyRAIDRPORTOSizeLimitReplicatedSnapshot# VM
1FC300010 ms101 hour1 hour, with SRM1.0 TB70%Yes, hourlyYes~10 VM.
EagerZeroedThick 2FC200015 ms54 hour4 hour, with SRM2.0 TB80%Yes, 4
hourlyNo~20 VM. Normal Thick 3FC100020 ms58 hour 3.0 TB80%No ~30
VM. Thin Provision Consult storage vendor for array specific
design
Slide 33
33 Confidential 3-Tier Storage? Below is a sample diagram,
showing disk grouping inside an array. The array has 48 disks. Hot
Spare not shown for simplicity This example only has 1 RAID Group
(2+2) for simplicity Design consideration Datastore 1 and Datastore
2 performance can impact one another, as they share physical
spindles. The only way they dont impact if there are Share and
Reservation concept at meta slice level. Datastore 3, 4, 5, 6
performance can impact one another. DS 1 and DS 3 can impact each
other since they share the same Controller (or SP). This contention
happens if the shared component becomes bottlenect (e.g. cache,
RAM, CPU). The only way to prevent is to implement Share or
Reservation at SP level.
Slide 34
34 Confidential Mapping: Cluster - Datastore Always know which
cluster mounts what datastores Keep the diagram simple. Not too
many info. The idea is to have a mental picture that you can
remember. If your diagram has too many lines, too many datastores,
too many clusters, then it maybe too complex. Create a Pod when
such thing happens. Modularisation can be good.
Slide 35
35 Confidential Mapping: Datastore Replication
Slide 36
36 Confidential Mapping: Datastore VM Criteria to use when
placing a VM into a Tier: How critical is the VM? Importance to
business. What are its performance and availability requirements?
What are its Point-in-Time restoration requirements? What are its
backup requirements? What are its replication requirements? Have a
document that lists which VM resides on which datastore group
Content can be generated using PowerCLI or Orchestrator, which
shows datastores and their VMs. While rarely happen, you cant rule
out if datastore metadata get corrupted. A VM normally change tiers
throughout its life cycle Criticality is relative and might change
for a variety of reasons, including changes in the organization,
operational processes, regulatory requirements, disaster planning,
and so on. Be prepared to do Storage vMotion. Datastore GroupVM
NameSize (GB)IOPS Total12 VM1 TB1400 IOPS
Slide 37
37 Confidential Storage Calculation We will split System Drive
and Data Drive Enable changing the OS by swapping the C:\ vmdk file
We use 10 GB for C:\ to cater for Win08 and give space for
defragmentation. We use Thin Provisioning, at array level
preferably. The sample calculation below is for our small cloud 30
Production VM: 26 non-DB + 3 DB + 1 File Server Non-DB VM: 100 GB
on average DB VM: 500 GB on average File server VM: 2 TB 15 Non
Production CapacityIOPSRemarks Production (non DB)Average D:\ drive
is 100 GB Space needed: 2.6 TB. Datastore: 3 x 1 TB each 100 IOPS x
26 VM = 2600 IOPS Consult with storage team if this is too high.
What if they have similar peak period? This is on the high side, so
we dont have to add buffer for swap file, snapshot, VMFS/NFS buffer
Production (DB)Average D:\ drive is 500 GB Space needed: 1.5 TB
Tiering in Production DS 500 IOPS x 3 VM = 1500 IOPSThis is on the
high side. IT ClusterAverage D:\ drive is 50 GB File Server is 2 TB
100 IOPS per VM. 300 IOPS for File Server This is on the high side.
Non-ProductionAverage D:\ drive is 100 GB Space needed: 1.5 TB
Datastore: 2 x 1 TB 100 IOPS x 15 VMThis is on the high side.
Isolated VM200 GB200 IOPS Total~ 6.1 TB~ 6000 IOPS
Slide 38
38 Confidential Reasons for FC (partial list) Network issue
does not create storage issue Troubleshooting storage does not mean
troubleshooting network too 8 Gb vs 1 Gb. 16 vs 2 Gb in redundant
mode 10 GE is still expensive and need uplink to change too HP or
Cisco blade may provide good alternative here. Consider the total
TCO and not just cost per box. FC vs IP FC protocol is more
efficient & scalable than IP protocol for storage Path failover
is
44 Confidential Enterprise IT space Separation of Duties with
vSphere VMware Admin >< AD Admin AD Admin has access to NTFS.
This can be too powerful if it has confidential data Segregate the
virtual world Split vSphere access into 3. Storage Server Network
Give Network to Network team. Give Storage to Storage team. Role
with all access to vSphere should be rarely used. VM owner can be
given some access that they dont have in physical world. They will
like the empowerment (self service) vSphere space VMware Admin
Networking Admin Server Admin OperatorVM OwnerOperatorVM Owner
Storage Admin MS AD Admin Storage Admin Network Admin DBA Apps
Admin
Slide 45
45 Confidential Folder Properly use it Do not use Resource Pool
to organise VM. Caveat: the Host/Cluster view + VM is the only view
where you can see both ESX and VM. Study the hierarchy on the right
It is Folder everywhere. Folder is the way to limit access. Certain
object dont have its own access control. They rely on folder. E.g.
You cannot set permissions directly on a vNetwork Distributed
Switches. To set permissions, create a folder on top of it.
Slide 46
46 Confidential Compliance How do we track changes made at
vCenter by authorised staff? vCenter does not track configuration
drift. Tools like vCenter Ops Enterprise provides some level of
configuration management, but not all.
Slide 47
47 Confidential VM Design
Slide 48
48 Confidential Standard VM sizing: Follow McDonald 1 VM = 1
App = 1 purpose. No bundling of services. Having multiple
application or services in 1 OS tend to create more problem. Apps
team knows this better. Start with Small size, especially for CPU
& RAM. Use as few virtual CPUs (vCPUs) as possible. CPU impact
on scheduler, hence performance Hard to take back once you give
them. Also, the app might be configured to match the processor (you
will not know unless you ask the application team). Maintaining a
consistent memory view among multiple vCPUs consumes resources.
There is licencing impact if you assign more CPU. vSphere 4.1
multi-core can help (always verify with ISV) Virtual CPUs not used
still consumes timer interrupts and execute the idle loops of the
guest OS In physical world, CPU tend to be oversized. Right size it
in virtual world. RAM RAM starts with 1 GB, not 512 MB. Patch can
be large (330 MB for XP SP3) and needs RAM Size impact vMotion,
ballooning, etc, so you want to trim the fat Tier 1 Cluster should
use Large Page. Anything above XL needs to be discussed case by
case. Utilise Hot Add to start small (need DC edition) See speaker
notes for more info ItemSmall VMMedium VMLargeCustom CPU1234 8 RAM1
GB2 GB4 GB8, 12, 16 GB, etc Disk50 GB100 GB200 GB300, 400, etc
GB
Slide 49
49 Confidential Operational Excellence
Slide 50
50 Confidential Ownership Where do you draw the line between
Storage and vSphere. Who owns the following: VMFS and RDM VMDK
Storage DRS Who can initiate Storage vMotion? Virtual Disk SCSI
controller Who decide the storage-related design in vSphere? Where
do you draw the line between Network and vSphere? Who decide which
one to use: vSwitch, vDS, Nexus? Who decide on the network-related
design in vSphere? Who troubleshoot network problem in vSphere?
Where do you draw the line between Security and vSphere?
Slide 51
51 Confidential Management in the Virtual World VM is very
different to a physical machine
Slide 52
2009 VMware Inc. All rights reserved Confidential Thank
You