Upload
brett-robinson
View
217
Download
0
Tags:
Embed Size (px)
Citation preview
End-to-End Design for a Highly Available Datacenter
Philip Moss, Managing Partner IT, NTTX
CDP-B363
Journey
PlatformBuilding out a software defined fabric
Disaster RecoveryMaking DR work in the real-world
High-AvailabilityBuilding an end-to-end HA solution
Software defined platform
Business drivers
The high-availability drivers
You must delver your HA plan using technologies and systems you already know to drive down deployment costs
Failures stop people working, your organization looses moneyBusiness
requirement for reliable IT systems, with 24/7 operation is now common
High-Availability Goals
At the end of the day all the matters is keeping users working
Consider: What does that really involve
Multiple geographic locations
Secure access, redundant power supplies, environmental controlled facilities
Redundant, multi-path Internet connections
High-availability network protection and firewalling
Fault-tolerant hardware; with hot swop components
Continuously available scale-out storage
Multi-route network switching, with multi-port NIC teaming across switching fabric
Highly-available hyper-visor clusters
Virtualisation of all service workloads
Application level HA (all applications configured in solution level, highly available configurations)
Environmental
Edge
Hardware
Fabric
Application [Microsoft software defined datacentre solution
Logical architecture
Storage Spaces
Scale-out CA file-server
SMB Transport
Hyper- V Cluster – General Workloads Hyper-V Cluster - PVM͛0s (WARP)
Hyper-V Cluster - PVM͛0s (virtual GPU)
DC͛0s Exchange Lync RDSH
SQL DPM DHCP
RDS SharePoint WDS
DNS
Storage
Networking
Compute
Sofs and Storage spaces
SMB 3.0 and Software defined networking
Hyper-V clustering
Core Platform DA, DNS, DHCP, WSUS
Services RDS, VDI, DPM
Productivity Applications Exchange, SharePoint, Lync
Location
Location considerations
Active / active Active / passive Active / cloud
Location planning considerationsWhat do you need from a multi site solution?5 Primary options irrespective of active or passive choice:
On-prem to on-premOn-prem to hosterOn-prem to AzureHoster to (different) hosterHoster to Azure
On-prem to on-prem
On-prem to hoster
On-prem to Azure
Hoster to hoster
Hoster to Azure
How do you maintain application constancy between all locationsDo you implement internal connection redirectionDoes each site maintain a full copy of all dataIn the event of site loss:
Do you maintain N+1 (full) capacityOperate at reduced resource availably
During failure; is a key services model required
Location planning considerations
Direct on-net connections
VPN or other off net pipe
Is link redundancy
required
If running active / active; how are
incoming connection routed
Access considerations
Storage
SoFS is inherently an HA solutionCSV shares support dynamic failover
Continuously available (CA) shares must be usedHA shares will result in IO loss
Multi-path access to JBOD’s provides node failure protectionStorage spaces supports multiple disk redundancy topologies
2 way mirror3 way mirrorParityEnclosure awareness
SoFS HA basic’s
Scenario; to keep things simple
2 node; 2 x 2 configuration2 SoFS nodes2 JBOD’s4 Nic’s per SoFS node
SoFS HA planning – digging deeper
JBOD
SoFS Node
JBOD
SoFS Node
SAS Connections
Network Network
MPIOMultiple paths from each SoFS node to each JBODDirect from server
Common in 2 x 2 and 3 x 3 deploymentsVia JBOD expanders
Common in 4 x 4 deployments
Provides protection against cable or SAS interface failureNo performance gainSoFS requires MPIO in failover mode
Complex to deploy in a stable configurationUnless using a manufacture validated solutionJBOD
SoFS Node
JBOD
SoFS Node
SAS Connections
JBOD
SoFS Node
JBOD
SoFS Node
SAS Connections
SoFS - Network planningSMB Multi-channel and clustering
Multi-channel key to high-performance SoFSWhen in a cluster (as the SoFS is), each NIC must be on a separate IP subnet for multi-channel to operate
To simplify network management
Register only 1 subnet with DNSUse static IP’s on all interfaces
Teaming cannot be usedEach NIC is on a fixed IPLoopback into parent workaround can not be used
No vSwitch
This creates a problem for HA planning}
Network planning
SoFS Node SoFS Node
Network NetworkS
ub
net
A DN
S
Regis
tere
d
Su
bn
et
B Su
bn
et
C Su
bn
et
D Su
bn
et
A DN
S
Regis
tere
d
Su
bn
et
B Su
bn
et
C Su
bn
et
D
Make the SoFS node the fault domain
CSV share setup – balancing load
JBOD
Sofs Node Sofs Node Sofs Node Sofs Node
JBOD JBOD JBOD
CSV 1 CSV 2 CSV 3 CSV 4CSV 1 CSV 2
100% load increase on one node
CSV setup – balancing load
JBOD
Sofs Node Sofs Node Sofs Node Sofs Node
JBOD JBOD JBOD
Equal load increase on all remaining nodes
Consider using more than one SoFSProtects against software bug / failure causing catastrophic failureAllows you to distribute VM storage across both systemsDoes not effect Hyper-V cluster setup
All SoFS volumes are available across the entire networkLimited benefit if VM clustering to is be used
The shared VHDx can only be stored on one of the SoFS’s
In 2012 R2 there is no SoFS stretch clustering solution
Storage final considerations
Network
Switch Agnostic NIC TeamingIntegrated Solution for Network Card Resiliency and load balancingVendor agnostic and shipped inboxEnables teams of up to 32 NICsAggregates bandwidth from multiple network adapters whilst providing traffic failover in the event of NIC outageIncludes multiple nodes: switch dependent and independentMultiple traffic distribution algorithms: Hyper-V Switch Port, Hashing and Dynamic Load Balancing
NIC Teaming
PhysicalNetworkadaptors Team network
adapterTeam network
adapter
Operating system
Parent OS
vSwitch binds each vNIC to a pNIC in rotationWhen a pNIC fails, the vNIC is moved to a working pNICThe DNS registered subnet A vNIC will always be bound to a working pNIC System performance degradation will occur
Switch agnostic team
Hyper-V high-availability
pNIC pNIC pNIC pNIC
vSwitch
VM NIC Subnet: D
VM NIC Subnet: C
VM NIC Subnet: B
VM NIC Subnet: A
QoS
Do you need fault-tolerant NIC configurations?
“NIC’s and switch ports are costly. Make the server the fault domain.”
NTTX are in our next generation DC’s
Compute
Hyper-V ClustersHyper-V Replica /
Azure Site Recovery
Application level clusters• Single Site• Multi-Site
Native application scale-out HA
Your Compute HA Arsenal
Hyper-V – a few bitter pills to swallowHyper-V Clusters ARE NOT a true high-availability solution, closer to “near time DR”
Hyper-v replica has potential weaknesses with complex application typesThe classic IP injection challenges, these are only mitigated through NVRGE.
Application HA is only real solution.Requires End to End design and planning
Host
dies
VM dies
VM fails over
to new host
User losses session and data
Deploy more than one clusterLive migration allows seamless movement between clustersPrevents single point of failureAs VHDx’s storage is delivered over CSV shares, no requirement to maintain CSV across the Hyper-V cluster
Do you even need to cluster?VM clusters and application level HA provide excellent HA capabilitiesSimplifies Hyper-V deployment and management
Hyper-V cluster planning
Hyper-V cluster
Preventing all eggs in one (host) basket – the power of affinity
Hyper-V host A
Hyper-V host B
Hyper-V host C
VM B – VM
Cluster 1
VM B – VM
Cluster 2
VM A – VM
Cluster 1
VM A – VM
Cluster 2
Without anti-affinity
Hyper-V cluster
Preventing all eggs in one (host) basket – the power of affinity
Hyper-V host A
Hyper-V host B
Hyper-V host C
VM B – VM
Cluster 1
VM B – VM
Cluster 2
VM A – VM
Cluster 1
VM A – VM
Cluster 1
With anti-affinity
Cluster aware UpdatingGreatly simplifies updating clustersRemoves the requirement for manual drain stop / VM migrations
Drain stop’s hosts in turn and migrates workloads to other nodesAffinity and anti-affinity rules are maintainedAffinity rules are invoked during drain stopRules can be soft or hardIf hard rules cannot be complied with, prroritorisattion is applied
May be used for all Cluster workloadsHyper-VSoFS
VM high-availability NIC’sNo requirement for multiple vNIC’s in VM
vSwtich takes care of vNIC to pNIC mapping / failoverOnly required for meet performance goals
Additional consideration should be applied to this configuration
SRIOV ConsiderationsKey for high-performance and low latency applicationsDirect 1 to 1 mapping of pNIC to vNICNo inherent failover on vNIC is pNIC fails
Multiple NIC’s must be exposed to VM
Setup VM based network teamAs SRIOV, pNIC’s on host will be dedicated to the VM’s use
Creates potential load and pNIC utilisation challenges on host
Consider using a non-SRIOV NIC as second vNIC
Provides fault tolerancePartially mitigates pNIC usage issuesNon-SRIOV vNIC will automatically get moved to a working pNIC by vSwtichPerformance degradation will occur
Introduced in 2012 R2Enables 100% VHDx based VM storage solutionRemoves relicense on synthetic iSCSI or FC for shared storage in VM
Primary workloadsHA file-serversLegacy SQL serversBespoke line of business application requiring shared disk
ConsiderationsNo support for Hyper-V replica
Therefore no Hyper-V Azure Site Recovery supportStretch clusters are not supported
VM based clusters using shared VHDx
Hyper-V Cluster
VM Cluster
Scale Out File Server (Continuously Available)
VM A VM B
VHDx (VM A) VHDx (VM B)Shared HVDx
(cluster shared storage)
Shared VHDx cluster – single storage
Shared VHDx cluster – duel Hyper-cluster
Hyper-V Cluster BHyper-V Cluster A
VM Cluster
Scale Out File Server (Continuously Available)
VM A VM B
VHDx (VM A) VHDx (VM B)Shared HVDx
(cluster shared storage)
Shared VHDx – Split SoFS
Scale Out File Server B
Hyper-V Cluster BHyper-V Cluster A
VM Cluster
Scale Out File Server A
VM A VM B
VHDx (VM A) VHDx (VM B)Shared HVDx
(cluster shared storage)XBad design;
do not use as you gain no additional protection.
Deployed using a shared VHDx clusterPrimarily used for all traditional file server roles
Supports all roles and configuration you would expect in a file serverDFS namespace’s supportedNot to be used for VHDx delivery
Deploy as traditional file serverNot a SoFSActive / passive failoverIO connection drop
Perfectly acceptable for file-share workloads as most applications will reconnect
Deploy multiple 2 node file-server clusters inline with system demand
Use DFS to provide unfired namespace if required
VM based HA file server
A few bitter pills - application HA trades offs
Considerable amount of data duplication.Disk is cheap; especially when using commodity technology.
Inherent headroom required in application VM’sCPU is cheap and constantly getting cheaper.
Careful planning and deep application knowledge requiredPure IaaS relies on client’s “buy in” to end to end designThis can be challenging; ASR a reasonable workaround.
There is no “real” HA solution for true VDIBroker is not Hyper-V replica awareBroker is not multi-site aware
Core Services
Inherently multi-masterConsider where you GC’s are within each siteMake sure auth is supported from all datacentre loationsBuild in enough DC’s to cope with application demands
Exchange, etc
DC Chicken and egg issue no longer appliesRemoves the requirement to maintain a hardware based domain controller to support Hyper-V and cluster start-upCluster and Hyper-V will do retroactive auth
Allows all DC’s to be VM based
Active Directory
Maintain highly available DHCP on each siteActive / passive 2 node cluster
Do not use DHCP relay as HA solution between locations
Creates request prioritisation issuesCreates challenges with maintaining address allocations and reservations
DHCP
Internal AD DNS straight forwardDealt with as part of AD integrated replication
Where should your public DNS be locatedPotentially externally to prevent loosing all DNS when datacentre fails – they are the only way of reaching your exterior
HA DNSBetter than traditional primary, secondary….
If primary is lost, no updates to zone may occurAllows updates to zone at any time
Load-balancersSoftware and hardware optionsProvides “front-end” incoming connection routingCommonly provide “dead route” protection
DNS
Services
Network virtualization gatewayContoso Fabrikam
ResilientHNV
Gateway
Resilient HNVGateway
Internet
ResilientHNV
Gateway
Service Provider
Hyper-V Host Hyper-V Host
ResilientHNV
Gateway
Compute Hyper-v cluster
Edge Hyper-v cluster
Physical hosts are duel homed
HA edge design
Edge Hyper-v cluster
Physical hosts are duel homed
General VM workloads
Edge Hyper-v cluster
Physical hosts are duel homed
HA edge design
Edge Hyper-v cluster
Physical hosts are duel homed
Internal network
External network
Hyper-v Host
Edge Hyper-v cluster
Physical hosts are duel homed
Hyper-V Edge Cluster BHyper-V Edge Cluster A
VM Cluster
HNV GW VM1 HNV GW VM 2
vNIC bound to internal
vNIC bound to extrenal
vNIC bound to internal
vNIC bound to extrenal
HA edge design
Active / passiveVirtual IP is GW termination point
Physical Servers duel homed to allow true network isolationVM’s maintain duel homed configuration to provide logical isolation
Bound to physical layer
Fault tolerance on host and VM NIc’s not required
HA pair provides the protection
HA edge design
Edge Hyper-v cluster
Physical hosts are duel homed
Edge Hyper-v cluster
Physical hosts are duel homed
GW VM1
GW VM2
GW VM1
GW VM1
GW VM1
GW VM1
RDS configurationAll services deployed on VM’s
With exception of RDVH agent
Incoming connections distributed across NLB via virtual IPBroker maintains list of VDI desktop’s and RDSH servers
Broker active / passive HADatabase maintained in SQL
RDSH servers in a single collection
During maintenance or failure, server is simply taken offline and client request hit other servers in the collection
RDVH agent running on Hyper-V maintains location and state of VDI PD’s
RDweb / RD Gateway
RD Broker
RDSH VDI
SQL cluster
NLB
Active / passive
Broker DB storage
RDSH servers in single collection
RDVH agent tracks PD host
Multi-site HA is possible for RDSH / RemoteApp with suitable planning
RDS provides no multi-site HA solution for VDIEven when using Hyper-V replica
Productivity Applications
Delivering the productivity on the core HA systemUse inherent application HA where possible
Better than agnostic replication or failover solutionsMore complicated to deploy, however provides a better user experience
MS core suite optionsExchange
Database availability groups (DAG) provide DB replicationDynamic failover between active DB’s when a user connectsIncoming CAS connection move support
SQL Always OnDB log replicationSupport synchronise replication
Single siteMulti-site
Lync – Inherent supportMulti-master topologyPlan for dropped conference call meetings during failoverDirect calls will be maintain as they are peer to peer
SharePointLoad balanced front-end serversDB operations delivered via SQL always onMulti-site deployment requires carful planning to avoid data integrity issues.
Productive software considerations
Understanding the HA stackFailover layers and options
StorageHow to manage load during failoverRemoving single point of failure
NetworkSoftware defined teamingMaintaining HA in the v-Switch
ComputeHyper-V cluster planningAnti-affinity and VM placementShared VM clusters
Services and applicationsHNV GatewayCore datacentre servicesProductivity applications
Summary
Resources
Learning
Microsoft Certification & Training Resources
www.microsoft.com/learning
Developer Network
http://developer.microsoft.com
TechNet
Resources for IT Professionals
http://microsoft.com/technet
Sessions on Demand
http://channel9.msdn.com/Events/TechEd
Come visit us in the Microsoft Solutions Experience (MSE)!Look for the Cloud and Datacenter Platform area TechExpo Hall 7
For more informationWindows Server Technical Previewhttp://technet.microsoft.com/library/dn765472.aspx
Windows Server
Microsoft Azure
Microsoft Azurehttp://azure.microsoft.com/en-us/
System Center
System Center Technical Previewhttp://technet.microsoft.com/en-us/library/hh546785.aspx
Azure Pack Azure Packhttp://www.microsoft.com/en-us/server-cloud/products/windows-azure-pack
Azure
Implementing Microsoft Azure Infrastructure Solutions
Classroomtraining
Exams
+
(Coming soon)Microsoft Azure Fundamentals
Developing Microsoft Azure Solutions
MOC
10979
Implementing Microsoft Azure Infrastructure Solutions
Onlinetraining
(Coming soon)Architecting Microsoft Azure Solutions
(Coming soon)Architecting Microsoft Azure Solutions
Developing Microsoft Azure Solutions
(Coming soon)Microsoft Azure Fundamentals
http://bit.ly/Azure-Cert
http://bit.ly/Azure-MVA
http://bit.ly/Azure-Train
Get certified for 1/2 the price at TechEd Europe 2014!http://bit.ly/TechEd-CertDeal
2 5 5MOC
20532
MOC
20533
EXAM
532EXAM
533EXAM
534
MVA MVA
Please Complete An Evaluation FormYour input is important!TechEd Schedule Builder CommNet station or PC
TechEd Mobile appPhone or Tablet
QR code
Evaluate this session
© 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.