© 2009 VMware Inc. All rights reserved Confidential Private Cloud Sample Architectures for >1000 VM Private Cloud Sample Architectures for >1000 VM Singapore,

  • View
    227

  • Download
    7

Embed Size (px)

Citation preview

  • Slide 1
  • 2009 VMware Inc. All rights reserved Confidential Private Cloud Sample Architectures for >1000 VM Private Cloud Sample Architectures for >1000 VM Singapore, Oct 2011 Iwan e1 Rahabok virtual-red-dot.blogspot.com | tinyurl.com/SGP-User-Group M: +65 9119-9226 | [email protected] VCAP-DCD
  • Slide 2
  • 2 Confidential Purpose of This Document There is a lot of talk about Cloud Computing. But how does it look like at technical level? How do we really assure SLA, and have 3 Tier of service? If I have 1000 VM, what does the architecture it look like? This is my personal opinion. Please dont take it as official and formal VMware Inc recommendation. Im not authorised to do so. Also, generally we should judge the content, rather than the organisation/person behind the content. A technical fact is a technical fact, regardless who said it Technology changes SSD disk, >10 core CPU, FCoE, CNA, vStorage API, storage virtualisation, etc will impact the design. A lot ot new innovation coming within next 2 years. New modules/products from VMware & Ecosystem Partners will also impact the design. This is just a sample Not a Reference Architecture, let alone a Detailed Blueprint. So please dont print and follows to the dot. This is for you to think and tailor. It is written for hands-on vSphere Admin who have attended Design Workshop & ICM You should be at least a VCP 5, preferably VCAP-DCD No explanation on features. A lot of the design consideration is covered in vSphere Design Workshop. Folks, some disclaimer, since I am employee of VMware
  • Slide 3
  • 3 Confidential Table of Contents Introduction Requirements, Assumptions, Consideration, and Design Summary vSphere Design: Data Center Data Center, Cluster (DRS, HA, DPM, Resource Pool) vSphere Design: Server ESXi, physical host vSphere Design: Network vSphere Design: Storage vSphere Design: Security vCenter roles/permission, vSphere Design: VM vSphere Design: Management Performance troubleshooting
  • Slide 4
  • 4 Confidential Design Methodology Architecting a Private Cloud is not a sequential process There are 6 components. The components are inter-linked. Like a mash. In >1000 VM category, where it takes >2 years to implement, new vSphere will change the design. Even the Bigger picture is not sequential Sometimes, you may even have to leave Design and go back to Requirements or Budgetting. Again, there is no perfect answer. Below is one example. This entire document is about Design only. Operation is another big space. I have not taken into account Audit, Change Control, ITIL, etc. The steps are more like this
  • Slide 5
  • 5 Confidential Introduction
  • Slide 6
  • 6 Confidential Assumptions Assumptions are needed to avoid the infamous It depends answer. The architecture for 50 VM differs with that for 500 VM, which in turn differs with that for 5000 VM. A design for large VM (16 vCPU, 128 GB) differs with a design for small VM (1 vCPU, 1 GB) A design for Server farm differs to Desktop farm. This assumes 100% virtualised, not 99% It is easier to have 1 platform than 2. Certain things in company, you should only have 1 (email, directory, office suite, back up). Something as big as a platform should be standardised. Thats why they are called platform Out of the 1000 VM, we assume some will be Huge. 10 vCPU, 96 GB RAM, 10 TB storage Latency sensitive. 0.01 ms end to end latency Secret. Holding company secret data. We assume, it will have 50 databases, mixed of Oracle and SQL Other Oracle softwares (they are charged per cluster) The design is forward looking Based on 10 GE network. Assume Security team can be convinced on mixed-mode.
  • Slide 7
  • 7 Confidential Assumptions used in this example Assumptions # VM that our design needs to cater750 Production 2250 Non Production (1:3 ratio) 5 data centers Data Center2 large one (Singapore + HK), with private connectivity. 5 small ones (to comply with country regulatory) # Desktops/Laptop10000 With remote access + 2 FA (RSA) Need offline VDI and iPad access DMZ Zone / SSLF ZoneYes/Yes. Intranet also zoned Back upTape Network standardCisco ITIL ComplianceIn place Change ManagementIn place Overall System Mgmt SW (BMC, CA, etc)Yes, CA DatabaseOracle, SQL. Some have > 5 TB database. MSCSRequired Audit TeamExternal & Internal Oracle softwares (BEA, DB, etc)Yes. sub-cluster will be used. IT OrganisationDifferent teams for Server, Storage, Network, Security, Database, etc. Fault ToleranceYes (for Tier 0) Complex App dependancyYes (some apps spans >30 VM)
  • Slide 8
  • 8 Confidential Application consideration Type of VMImpact on Design App that holds sensitive dataShould encrypt the data or the entire file system. vSphere 5 cant encrypt the vmdk file yet. If you encrypt the Guest OS, back up product may not be able to do file-level back up. Should ensure no access by MS AD Group Administrator. Find out how it is back up, and who has access to the tape. If IT does not even have access to the system, then vSphere may not pass the audit requirement. Check partner products like Intel TXT and Hytrust Should be placed on separate cluster, or even vCenters? A group of apps with complex power on sequence I recommend HA Isolation response to shut down the VMs running on the isolated host. If they are shut-down, powering them on may need App Owner involvement (especially if it needs manual intervention) App that takes advantages of specific CPU Instruction Set Mixing with older CPU Architecture is not possible. This is a small problem if you are buying new server. EVC will not help, as its only a mask. See speaker notes App that need < 0.01 ms end to end latency Separate cluster.
  • Slide 9
  • 9 Confidential Application consideration Type of VMImpact on Design App that require software dongleDongle must be attached to 1 ESX. vSphere 4.1 adds this support. Best to use network dongle. In the DR site, the same dongle must be provided too App with high IOPSMay need its own datastore with dedicated spindles. No point having dedicated datastores if the underlying spindles are shared among multiple datastores. Apps that uses very large block sizeSharePoint uses 256 KB block size. So a mere 400 IOPS will saturate the GE link already. For such application, FC or FCoE will be a better protocol. Any application with 1 MB block size can easily saturate 1 GE link. App with very large RAM or vCPUThis will impact DRS when a HA event occurs as it needs to have a host that house the VM. It will still boot so long reservation is not set to a high number. App that is very sensitive to time accuracy. Time drift is a possibility in virtual world. Find out business or technical impact if time deviates by 10 seconds.
  • Slide 10
  • 10 Confidential Architecting a private cloud: what to consider Architecture is an Art Balancing a lot of things, some are not even technical. It considers future (unknown requirements). Trying to be close to best practice Not in any particular order, below is what I consider in this vSphere based architecture My personal principle: Do not design something you cannot troubleshoot. A good IT Architect does not setup potential risk for Support Person down the line. Not all counters/metrics/info are visible in vSphere. Consideration Upgradability This is unique in the virtual world. A key component of cloud that people have not talked much. After all my apps run on virtual infrastructure, how do I upgrade the virtualisation layer itself? Based on historical data, VMware releases major upgrade every 2-3 years. vSphere 4.0 was released on May 2009, 5.0 was Sep 2011. If you are laying down an architecture, check with your VMware rep for NDA roadmap presentation. Debugability Troubleshooting in virtual environment is harder than physical, as boundary is blurred and physical resources are shared. 3 types of troubleshooting: Configuration. This does not normally happen in production, as once it is configured, it is not normally changed. Stability. Stability means something hang or crash (BSOD, PSOD, etc) or corrupted Performance. This is the hardest among the 3, especially if the slow performance is short lived and in most cases it is performing well.
  • Slide 11
  • 11 Confidential Architecting a private cloud: what to consider Consideration Supportability This is related, but not the same with Debug-ability. Support relates to things that make day to day support easier. Monitoring counters, reading logs, setting up alerts, etc. For example, centralising the log via syslog and providing intelligent search (e.g. using Splunk or Integrien) improves Supportability A good design makes it harder for Support team to make human error. Virtualisation makes task easy, sometimes way to easy relative to physical world. Consider this operational/phychological impact in your design. Support also means using components that are support by the vendors. For example, SAP support is from certain versions onwards (old version not supported) Availability Software has Bugs. Hardware has Fault. We cater for hardware fault mostly. What about software bugs? Cater for software bug, which is why the design has 2 VMware clusters with 2 vCenter. This lets you test cluster-related features in one cluster, while keeping your critical VM on another cluster. Tier 0 can be added that uses Fault Tolerant hardware (e.g. Stratus) Reliability Related to availabity, but not the same. Availability is normally achieved by redundancy. Reliability is normally achieved by keeping things simple, using proven components, separating things, standardising. You will notice a lot of standardisation in the design. The drawback of standardisation is overhead, as we have to round up to the next bracket. A VM with 6 GB RAM ends up getting 8 GB. Performance Storage, Network, VMkernel, VMM, Guest OS, etc are considered. We are aiming for
  • 16 Confidential Special scenarios There are scenarios where you might need to create a separate cluster, or even vCenter. Large VM (>6 vCPU, > 36 GB RAM) Separate cluster, as the ESXi host spec is different? Databases. Do you group them into 1 cluster (to save licence, give DBA more access to the cluster, vShield group)? or put them together with the app they support? put DB used by IT in the same cluster with DB used by Business? Oracle VM Separate cluster or sub-cluster? VM that needs hardware dongle Use network-base. Separate subcluster. Will also need the same at DR Site. VM holding company secret Do you put them them in separate cluster? Can you trust the vCenter Admin? Do you put them in separate datastore? Do you use VSA as you cant trust the SAN Admin? Enhance the security with vShield VM with 0.1 ms network latency Do you put them in separate cluster as your ESXi has to be configured differently? VM with 5 ms disk latency VM on DMZ zone Same cluster. We wil use vShield
  • Slide 17
  • 17 Confidential Overall Architecture This shows an example for Cloud for >500 VM. It also uses Active/Passive data centers. The overall architecture remain similar with Large Cloud. This shows an example for Cloud for >500 VM. It also uses Active/Passive data centers. The overall architecture remain similar with Large Cloud.
  • Slide 18
  • 18 Confidential Cluster Design (1 DC)
  • Slide 19
  • 19 Confidential The need for Non Prod Cluster This is unique in the virtual data center. Well, we dont have Cluster to begin with in physical DC. Non-Prod Cluster serves multiple purposes Run Non Production VM In our design, all Non-Production run on DR Site to save cost. A consequence of our design is migrating from/to Production can mean copying large data across WAN. Disaster Recovery Test-Bed for Infrastructure patching or updates. Test-Bed for Infrastructure upgrade or expansion Evaluating or Implementing new features In Virtual Data Centre, a lot of enhancements can impact entire data centre e.g. Distributed Switch, Nexus 1000V, Fault Tolerant, vShield All the above need proper testing. Non-Prod Cluster should provide sufficient large scale scope to make testing meaningful Upgrade of the core virtual infrastructure e.g. from vSphere 5 to future version (major release) This needs extensive testing and roll back plan. Even with all the above How are you going to test SRM properly? SRM test needs 2 vCenters, 2 arrays, 2 SRM servers. If all are used in production, then where is the test-environment for SRM? Business IT This new layer does not exist in physical world. It is software, hence needs its own Non Prod envi. This new layer does not exist in physical world. It is software, hence needs its own Non Prod envi.
  • Slide 20
  • 20 Confidential The need for IT Cluster Special purpose cluster Running all the IT VMs used to manage the virtual DC or provide core services The Central Management will reside here too Separated for ease for management & security This separation keeps Business Cluster clean, strictly for business. This separation keeps Business Cluster clean, strictly for business. Large Cloud VMwarevCenter (for Server Cloud) vCenter Heart-beat vCenter Update Manager Symantec AppHA Server vCloud Director StorageStorage Mgmt tool (may need physical RDM to get fabric info) NetworkNetwork Management Tool Nexus 1000V Manager (VSM) Core InfraMS AD 1 MS AD 2 Syslog server File Server (FTP Server) Advance vDC ServicesSite Recovery Manager + DB Chargeback + DB Agentless AV Object-based Firewall SecuritySecurity Management Server vShield Manager AdminAdmin client (1 per Sys Admin) VMware Converter vMA vCenter Orchestrator Application MgmtApp Dependancy Manager ManagementvCenter Ops + DB Help Desk DesktopView Managers + DB ThinApp Update Server vCenter (for Desktop Cloud)
  • Slide 21
  • 21 Confidential Cluster Size I recommend 8 nodes per cluster. Why 8, not 4 or 12 or 16 or 32? A balance between too small (4 hosts) and too large (>12 hosts) DRS: 8 give DRS sufficient host to maneuver. 4 is rather small from DRS scheduler point of view. With vSphere 4.1, having 4 hosts do not give enough hosts to do sub-cluster For cost reason, some clusters can be as small as 2 nodes. But DPM benefit cant be used. Best practice for cluster is same hardware spec with same CPU frequency. Eliminates risk of incompatibility Consistent performance (from user point of view) Complies with Fault Tolerant & VMware View best practices So more than 8 means its more difficult/costly to keep them all the same. You need to buy 8 hosts a time. Upgrading >8 servers at a time is expensive ($$) and complex. A lot of VMs will be impacted when you upgrade > 8 hosts. Manageability Too many hosts are harder to manage (patch, performance troubleshooting, too many VMs per cluster, HW upgrade) Allow us to isolate 1 host for VM-troubleshooting purpose. At 4 node, we cant afford such luxury Too many paths to a LUN can be complex to manage and troubleshoot Normally, a LUN is shared by 2 clusters, which are adjacent cluster. 1 ESX is 4 paths. So 8 ESX is 32 paths. 2 clusters is 64 paths. This is a rather high number (if you compare with physical world) N+2 for Tier 1 and N+1 for others With 8 host, you can withstand 2 host failures if you design it to. At 4 nodes, it is too expensive as payload is only 50% at N+2 Small Cluster size From Availability and Performance point of view, this is rather risky. Say you have 3-node cluster. You are doing maintenance on Host 1 and suddenly Host 2 goes down you are exposed with just 1 node. Assuming HA Admission Control is enabled (which you should), the affected VM may not even boot. When a host is placed into maintenance mode, or disconnected for that matter, it is taken out of the admission control calculation. Cost: Too few hosts result in overhead (the spare host) See slide notes for more details
  • Slide 22
  • 22 Confidential 3-Tier cluster The host spec can be identical. But the service can be very different. Below is an example of 3-tier cluster. Tier# HostNode Spec?Failure Tolerance MSCS#VMMonitoringRemarks Tier 15 (always) Always Identical 2 hostsYesMax 18Application level. Extensive Alert Only for Critical App. No Resource Overcommit. Tier 24 8Maybe1 hostLimited10 per (N-1)App can be vMotioned to Tier 1 during critical run Tier 34 8No1 hostNo15 per (N-1)Infrastructure level Minimal Alert. Some Resource Overcommit
  • Slide 23
  • 23 Confidential ESXi Host: CPU Sizing ESXi Host: CPU 2 - 4 vCPU per physical core This is a general guideline. Not meant for sizing Tier 1 Application. Tier 1 App should be given 1:1 sizing. More applicable for Test/Dev or Tier 3 12 core box 24 48 vCPU Design with ~10 VM per box in Production and ~15 VM per box in Non Production. ~10 VM per box means impact of downtime when host fails are capped at ~10 Production VM. ~10 VM per box in a 8-node cluster means ~10 VMs may be able to boot in 7 hosts in the event of HA, hence reducing down time. Based on 10:1 consolidation ratio, if all your VMs are 3 vCPU, then you need 30 vCPU, which means a 12 core ESX gives 2.5:1 CPU oversubcribe. Based on 15:1 consolidation ratio, if all your VMs are 2 vCPU, then you need 30 vCPU. Buffer the following: HA event Performance isolation. Hardware maintenance Peak: month end, quarter end, year end Future requirements: within 12 months DR. If your cluster needs to run VM from the Production site.
  • Slide 24
  • 24 Confidential ESXi: Sample host specification Estimated Hardware Cost: US$ 8K per ESXi. Configuration included in the above price: 2 Xeon X5650. The E series has different performance & price attributes 72 GB RAM (18 slots x 4 GB) or 96 GB RAM (12 slots x 8 GB) 2 x 10 GE ports (no hardware iSCSI) 2 x 8 Gb FC HBA 5 year warranty (next business day) 2x 50 GB SSD. Swap to host-cache feature in ESXi 5 Running agent VM that is IO intensive Could be handy during troubleshooting. Only need 1 HD as its for troubleshooting purpose. PXE boot No need local disk Installation service Light-Out Management. Avoid using WoL. Uses IPMI or HP iLO.
  • Slide 25
  • 25 Confidential Blade or Rack Both are good. Both have pro and cons. Table below is relative comparison, not absolute. Consult principal for specific model. Below is just for guidelines. Comparison below is only for vSphere purpose. Not for other use case, say HPC or non VMware. BladeRack Relative Advantages Some blades come with built-in 2x 10 GE port. To use it, you just need to get 10 GE switch. Flexibility. Some blade virtualise the 10 GE NIC and can slice it. As usual, adding another layer adds complexity. Less cabling. Better power efficiency. Better rack space efficiency. Better cooling efficiency. The larger fan (4 RU) is better than the small fan (2 RU) used in rack Some blade can be stateless. The management software can clone 1 ESX to another. Better management Typical 2RU rack server normally comes with 4 built-in ports. Better suited for
  • Slide 26
  • 26 Confidential Server Selection All Tier 1 vendors (HP, Dell, IBM, Cisco, etc) make great ESXi hosts. Hence the following guidelines are relatively minor to the base spec. Additional guidelines for selecting an ESXi Servers: Does it have Embedded ESXi? How much local SSD (capacity and IOPS) can it handle? This is useful for stateless desktop architecture. Useful when using local SSD as cache or virtual storage. Does it have built-in 2x 10 GE ports? Does the built-in NIC card have hardware iSCSI capability? Memory cost. Most ESXi Server has around 64 128 GB of RAM, with mostly around 72 GB. With 4 GB DIMM, it needs a lot of DIMM slots. What are the server unique features for ESXi? Management integration. Majority of the server vendors have integrated management with vCenter. Most are free. Dell is not free, although it has more features? DPM support?
  • Slide 27
  • 27 Confidential SAN Boot 4 methods of ESXi boot Local Compact Flash Local Disk LAN Boot (PXE) with Auto-Deploy SAN Boot For the 3 sample size, we use ESXi Embedded. Environment with >20 ESXi should consider Auto Deploy. Auto-Deploy is also good for environment where you need to prove to security team that your ESXi has not been tempered (you can simply boot it and it is back to normal ) Advantages of Local Disk to SAN boot No SAN complexity Need to label the LUN properly. Disadvantages of Local Disk to SAN boot Need 2 local disk, mirrored. Certain organisation does not like local disk. Disk is a moving part. Lower MTBF. Save power/cooling SAN Boot is a step toward stateless ESXi An ideal ESX is just pure CPU and RAM. No disk, no PCI card, no identity.
  • Slide 28
  • 28 Confidential Storage Design
  • Slide 29
  • 29 Confidential Methodology Once mapping is done, turn on QoS if needed Turn on Storage IO Control if a particular VM needs certain guarantee. Turn on Storage IO Control is we want fairness among all VM within the DS Storage IO Control is per datastore. If underlying LUN shares spindles with all other LUN, then it may not achieve the result. Consult with storage vendor on this as they have entire array visibility/control. SLADatastoreVMMappingQoS Define the standard (Storage Driven Profile) Map each VM to each datastore Create another DS if insufficient (either capacity or performance) See next slide for detail For each VM, gather: Capacity (GB) Performance (IOPS) requirements Importance to business: Tier 1, 2, 3 Define the Datastore profile. Map Cluster to Datastore
  • Slide 30
  • 30 Confidential SLA: Type of Datastores Not all datastores are equal. Always know the underlying IOPS & SLA that the Array can provide for a given datastore You should always know where to place a VM. Use datastore group Always have a mental picture where your Tier 1 VM resides. It cant be somewhere in the cloud Types of datastore Business VM Tier 1 VM, Tier 2 VM, Tier 3 VM, Single VM Each Tier may have multiple datastores. DMZ VM Mounted only by ESX that has DMZ network? IT VM Isolated VM Template Desktop VM SRM Placeholder Datastore Heartbeat? Do we dedicate datastores for it? 1 datastore = 1 LUN Relative to 1 LUN = Many VMFS, it gives better performance due to less SCSI reservation
  • Slide 31
  • 31 Confidential Special Purpose Datastore 1 low cost Datastores for ISO and Templates Need 1 per vCenter Need 1 per physical Data Center. Else you will transfer GBs of data across WAN. Around 500 GB ISO directory structure: 1 staging/troubleshooting datastore To isolate a VM. Proof to Apps team that datastore is not affected by other VM. For storage performance study or issue. Makes it easier to corelate with data from Array. The underlying spindles should have enough IOPS & Size for the single VM Our sizing: 500 GB 1 SRM Placeholder datastore So you always know where it is. Sharing with other datastore may confuse others. Used in SRM 5 to place the VMs metadata so it can be seen in vCenter. 10 GB enough. Low performance. \ISO\ \OS\Windows \OS\Linux \Non OS\ store things like anti virus, utilities, etc
  • Slide 32
  • 32 Confidential SLA: 3 Tier pools of storage Create 3 Tiers of Storage. This become the type of Storage Pool provided to VM Paves for standardisation 1 size for each Tier. Keep it consistent. Choose an easy number. 20% free capacity for VM swap files, snapshots, logs, thin volume growth, and storage vMotion (inter tier). Use Thin Provisioning at array level, not ESX level. Separate Production and Non Production Example Replication is to DR Site via array replication, not same building. Snapshot = protected with array-level snapshot for fast restore RAID level does not matter so much if Array has sufficient cache (with battery backed, naturally) RDM will be used for data drive with 1 TB. Virtual-compatibility mode used unless Apps said so. VMDK larger than 1 TB will be provisioned as RDM. Virtual-compatibility mode used. TierInterfaceIOPSLatencyRAIDRPORTOSizeLimitReplicatedSnapshot# VM 1FC300010 ms101 hour1 hour, with SRM1.0 TB70%Yes, hourlyYes~10 VM. EagerZeroedThick 2FC200015 ms54 hour4 hour, with SRM2.0 TB80%Yes, 4 hourlyNo~20 VM. Normal Thick 3FC100020 ms58 hour 3.0 TB80%No ~30 VM. Thin Provision Consult storage vendor for array specific design
  • Slide 33
  • 33 Confidential 3-Tier Storage? Below is a sample diagram, showing disk grouping inside an array. The array has 48 disks. Hot Spare not shown for simplicity This example only has 1 RAID Group (2+2) for simplicity Design consideration Datastore 1 and Datastore 2 performance can impact one another, as they share physical spindles. The only way they dont impact if there are Share and Reservation concept at meta slice level. Datastore 3, 4, 5, 6 performance can impact one another. DS 1 and DS 3 can impact each other since they share the same Controller (or SP). This contention happens if the shared component becomes bottlenect (e.g. cache, RAM, CPU). The only way to prevent is to implement Share or Reservation at SP level.
  • Slide 34
  • 34 Confidential Mapping: Cluster - Datastore Always know which cluster mounts what datastores Keep the diagram simple. Not too many info. The idea is to have a mental picture that you can remember. If your diagram has too many lines, too many datastores, too many clusters, then it maybe too complex. Create a Pod when such thing happens. Modularisation can be good.
  • Slide 35
  • 35 Confidential Mapping: Datastore Replication
  • Slide 36
  • 36 Confidential Mapping: Datastore VM Criteria to use when placing a VM into a Tier: How critical is the VM? Importance to business. What are its performance and availability requirements? What are its Point-in-Time restoration requirements? What are its backup requirements? What are its replication requirements? Have a document that lists which VM resides on which datastore group Content can be generated using PowerCLI or Orchestrator, which shows datastores and their VMs. While rarely happen, you cant rule out if datastore metadata get corrupted. A VM normally change tiers throughout its life cycle Criticality is relative and might change for a variety of reasons, including changes in the organization, operational processes, regulatory requirements, disaster planning, and so on. Be prepared to do Storage vMotion. Datastore GroupVM NameSize (GB)IOPS Total12 VM1 TB1400 IOPS
  • Slide 37
  • 37 Confidential Storage Calculation We will split System Drive and Data Drive Enable changing the OS by swapping the C:\ vmdk file We use 10 GB for C:\ to cater for Win08 and give space for defragmentation. We use Thin Provisioning, at array level preferably. The sample calculation below is for our small cloud 30 Production VM: 26 non-DB + 3 DB + 1 File Server Non-DB VM: 100 GB on average DB VM: 500 GB on average File server VM: 2 TB 15 Non Production CapacityIOPSRemarks Production (non DB)Average D:\ drive is 100 GB Space needed: 2.6 TB. Datastore: 3 x 1 TB each 100 IOPS x 26 VM = 2600 IOPS Consult with storage team if this is too high. What if they have similar peak period? This is on the high side, so we dont have to add buffer for swap file, snapshot, VMFS/NFS buffer Production (DB)Average D:\ drive is 500 GB Space needed: 1.5 TB Tiering in Production DS 500 IOPS x 3 VM = 1500 IOPSThis is on the high side. IT ClusterAverage D:\ drive is 50 GB File Server is 2 TB 100 IOPS per VM. 300 IOPS for File Server This is on the high side. Non-ProductionAverage D:\ drive is 100 GB Space needed: 1.5 TB Datastore: 2 x 1 TB 100 IOPS x 15 VMThis is on the high side. Isolated VM200 GB200 IOPS Total~ 6.1 TB~ 6000 IOPS
  • Slide 38
  • 38 Confidential Reasons for FC (partial list) Network issue does not create storage issue Troubleshooting storage does not mean troubleshooting network too 8 Gb vs 1 Gb. 16 vs 2 Gb in redundant mode 10 GE is still expensive and need uplink to change too HP or Cisco blade may provide good alternative here. Consider the total TCO and not just cost per box. FC vs IP FC protocol is more efficient & scalable than IP protocol for storage Path failover is
  • 44 Confidential Enterprise IT space Separation of Duties with vSphere VMware Admin >< AD Admin AD Admin has access to NTFS. This can be too powerful if it has confidential data Segregate the virtual world Split vSphere access into 3. Storage Server Network Give Network to Network team. Give Storage to Storage team. Role with all access to vSphere should be rarely used. VM owner can be given some access that they dont have in physical world. They will like the empowerment (self service) vSphere space VMware Admin Networking Admin Server Admin OperatorVM OwnerOperatorVM Owner Storage Admin MS AD Admin Storage Admin Network Admin DBA Apps Admin
  • Slide 45
  • 45 Confidential Folder Properly use it Do not use Resource Pool to organise VM. Caveat: the Host/Cluster view + VM is the only view where you can see both ESX and VM. Study the hierarchy on the right It is Folder everywhere. Folder is the way to limit access. Certain object dont have its own access control. They rely on folder. E.g. You cannot set permissions directly on a vNetwork Distributed Switches. To set permissions, create a folder on top of it.
  • Slide 46
  • 46 Confidential Compliance How do we track changes made at vCenter by authorised staff? vCenter does not track configuration drift. Tools like vCenter Ops Enterprise provides some level of configuration management, but not all.
  • Slide 47
  • 47 Confidential VM Design
  • Slide 48
  • 48 Confidential Standard VM sizing: Follow McDonald 1 VM = 1 App = 1 purpose. No bundling of services. Having multiple application or services in 1 OS tend to create more problem. Apps team knows this better. Start with Small size, especially for CPU & RAM. Use as few virtual CPUs (vCPUs) as possible. CPU impact on scheduler, hence performance Hard to take back once you give them. Also, the app might be configured to match the processor (you will not know unless you ask the application team). Maintaining a consistent memory view among multiple vCPUs consumes resources. There is licencing impact if you assign more CPU. vSphere 4.1 multi-core can help (always verify with ISV) Virtual CPUs not used still consumes timer interrupts and execute the idle loops of the guest OS In physical world, CPU tend to be oversized. Right size it in virtual world. RAM RAM starts with 1 GB, not 512 MB. Patch can be large (330 MB for XP SP3) and needs RAM Size impact vMotion, ballooning, etc, so you want to trim the fat Tier 1 Cluster should use Large Page. Anything above XL needs to be discussed case by case. Utilise Hot Add to start small (need DC edition) See speaker notes for more info ItemSmall VMMedium VMLargeCustom CPU1234 8 RAM1 GB2 GB4 GB8, 12, 16 GB, etc Disk50 GB100 GB200 GB300, 400, etc GB
  • Slide 49
  • 49 Confidential Operational Excellence
  • Slide 50
  • 50 Confidential Ownership Where do you draw the line between Storage and vSphere. Who owns the following: VMFS and RDM VMDK Storage DRS Who can initiate Storage vMotion? Virtual Disk SCSI controller Who decide the storage-related design in vSphere? Where do you draw the line between Network and vSphere? Who decide which one to use: vSwitch, vDS, Nexus? Who decide on the network-related design in vSphere? Who troubleshoot network problem in vSphere? Where do you draw the line between Security and vSphere?
  • Slide 51
  • 51 Confidential Management in the Virtual World VM is very different to a physical machine
  • Slide 52
  • 2009 VMware Inc. All rights reserved Confidential Thank You