UCC402
Exchange Server 2010High Availability Deep Dive
Scott SchnollPrincipal Technical WriterMicrosoft Corporation
Agenda
Exchange Server 2010 High Availability Deep DiveDatabase Availability Group NetworksActive ManagerBest Copy SelectionDatacenter Activation Coordination Mode
3
Exchange Server 2010 High AvailabilityDeep Dive: Database Availability Group Networks
DAG Networks
A DAG network is a logical collection of one or more subnetsThere are two types of DAG networks
MAPI Network - connects DAG members to network resources (Active Directory, other Exchange servers, DNS, etc.)
Registered in DNS / DNS configuredUses default gatewayClient for Microsoft Networks/File and Print Sharing enabled
Replication Network - used for/by continuous replication (log shipping and seeding)
Not registered in DNS / DNS not configuredNo default gatewayClient for Microsoft Networks/File and Print Sharing disabled
DAG Networks
All DAGs must have:Exactly one MAPI networkZero or more Replication networks
Separate network(s) on separate subnet(s)LRU determines which replication network is used with multiple replication networks
DAG networks automatically created when Mailbox server is added to DAG
Based on cluster’s enumeration of networks, which uses subnetsOne cluster network is created per subnet
DAG Networks
Maximum round trip return latency between all DAG members must be 500 ms or less
Regardless of network latency, validate that the network between all DAG members is capable of satisfying your data protection and availability goalsMay need to increase the number of databases or decreasing the number of mailboxes per database to achieve goals
DAG Networks
Server / Network IP Address / Subnet Bits Default Gateway
EX1 – MAPI 192.168.0.15/24 192.168.0.1
EX1 – REPLICATION 10.0.0.15/24 N / A
EX2 – MAPI 192.168.0.16/24 192.168.0.1
EX2 – REPLICATION 10.0.0.16/24 N / A
Name Subnet(s) Interface(s) MAPI Access Enabled
Replication Enabled
DAGNetwork01 192.168.0.0/24 EX1 (192.168.0.15)EX2 (192.168.0.16)
True True
DAGNetwork02 10.0.0.0/24 EX1 (10.0.0.15)EX2 (10.0.0.16)
False True
DAG Networks
Name Subnet(s) Interface(s) MAPI Access Enabled
Replication Enabled
DAGNetwork01 192.168.0.0/24 EX1 (192.168.0.15) True True
DAGNetwork02 10.0.0.0/24 EX1 (10.0.0.15) False True
DAGNetwork03 192.168.1.0/24 EX2 (192.168.1.15) True True
DAGNetwork04 10.0.1.0/24 EX2 (10.0.1.15) False True
Server / Network IP Address / Subnet Bits Default Gateway
EX1 – MAPI 192.168.0.15/24 192.168.0.1
EX1 – REPLICATION 10.0.0.15/24 N / A
EX2 – MAPI 192.168.1.15/24 192.168.1.1
EX2 – REPLICATION 10.0.1.15/24 N / A
DAG Networks
Collapse DAG networks and disable replication on MAPI network:
Name Subnet(s) Interface(s) MAPI Access Enabled
Replication Enabled
DAGNetwork01 192.168.0.0/24 EX1 (192.168.0.15) True True
DAGNetwork02 10.0.0.0/24 EX1 (10.0.0.15) False True
DAGNetwork03 192.168.1.0/24 EX2 (192.168.1.15) True True
DAGNetwork04 10.0.1.0/24 EX2 (10.0.1.15) False True
Set-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork01 -Subnets 192.168.0.0,192.168.1.0 -ReplicationEnabled:$false
Set-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork02 -Subnets 10.0.0.0,10.0.1.0
Remove-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork03
Remove-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork04
DAG Networks
Collapse DAG networks and disable replication on MAPI network:
Name Subnet(s) Interface(s) MAPI Access Enabled
Replication Enabled
DAGNetwork01 192.168.0.0/24192.168.1.0/24
EX1 (192.168.0.15)EX2 (192.168.1.15)
True False
DAGNetwork02 10.0.0.0/2410.0.1.0/24
EX1 (10.0.0.15)EX2 (10.0.1.15)
False True
Set-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork01 -Subnets 192.168.0.0,192.168.1.0 -ReplicationEnabled:$false
Set-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork02 -Subnets 10.0.0.0,10.0.1.0
Remove-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork03
Remove-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork04
DAG Networks
All DAGs extended to multiple datacenters should have hotfix from KB 2550886 installedAutomatic detection occurs when members added to DAG
If NICs are added after server is member of DAG, you must perform discovery
Set-DatabaseAvailabilityGroup <DAGName> -DiscoverNetworks
DAG network configuration persisted in cluster databaseHKLM\Cluster\Exchange\DAG Network
DAGs include built-in encryption and compressionEncryption: Kerberos SSP EncryptMessage/DecryptMessage APIsCompression: Microsoft XPRESS, based on LZ77 algorithm
DAG Networks
When using a single NICIt is both the MAPI and the Replication network
EnableReplication is $True
When using multiple NICsOne NIC is the MAPI network
EnableReplication is $False
Other NIC(s) are Replication network(s)Replication uses LRU to pick Replication network to useIf Replication networks are unavailable, MAPI network is used
DAG Networks
Use netsh, router ACLs or other means to block cross-network traffic
Blocked
Allowed
Subnet 3
Subnet 4Subnet 2
Subnet 1
M M M M
R R R R
DAG Networks
If using iSCSI storage, configure DAG and cluster to ignore iSCSI networksSet-DatabaseAvailabilityGroupNetwork -Identity <DAGNetworkName> -ReplicationEnabled:$false -IgnoreNetwork:$true
DAG Networks
When a DAG spans multiple subnets you need an IP address on the MAPI network for each subnetUse DHCP in site resilience configurations to assign IP addresses to Replication network
Enables delivery of the typically required static routesIf using static IP addresses, use netsh to configure static routes
Configure a DNS TTL on namespace records consistent with your SLA
For example, use a TTL of 5 minutes for a 60 minute RTO SLA
Exchange Server 2010 High AvailabilityDeep Dive: Active Manager
Active Manager
What are the three Active Manager roles?StandalonePAM (Primary Active Manager)SAM (Standby Active Manager)
Transition of role state logged into Microsoft-Exchange-HighAvailability/Operational event log (Crimson Channel)
Active Manager Functionality
Mount and Dismount DatabasesProvide Database Availability InformationProvide Interface for Administrative TasksMonitor for and React to FailuresMaintains Database and Server State Information
Mount / Dismount Database Copy
Mount DatabaseAn administrator action invoked through a taskThe last part of a move operation
Dismount DatabaseAn administrator action invoked through a taskThe first part of a move operation
Auto Dismount – DAG Member
Occurs when a DAG loses quorum
All DAG members are running (but may not be participating in the cluster)
Databases dismounted as quickly as possible to avoid split-brain
Information Store service is terminated
Active Manager – Move Database
Move DatabaseAn administrator action invoked by a taskAutomatic operation initiated by the PAM (failover)
Begins with a Dismount operation and ends with a Mount operation
Exchange Server 2010 High AvailabilityDeep Dive: Best Copy Selection
Best Copy Selection
Active Manager selects the “best” copy to become the new active copy when the existing active copy fails, or when an administrator performs a target-less switchoverBCS is the process of finding the best copy of an individual database to activate, given a list potential copies for activation and their status
During BCS, any servers that are unreachable or activation blocked are ignored
Best Copy Selection – RTM
Sorts copies by copy queue length to minimize data loss, using activation preference as a secondary sorting key if necessarySelects from sorted listed based on which set of criteria met by each copyAttempt Copy Last Logs (ACLL) runs and attempts to copy missing log files from previous active copy
Best Copy Selection – SP1
Sorts copies by activation preference when auto database mount dial is set to Lossless
Otherwise, sorts copies based on copy queue length, with activation preference used a secondary sorting key if necessary
Selects from sorted listed based on which set of criteria met by each copyAttempt Copy Last Logs (ACLL) runs and attempts to copy missing log files from previous active copy
Best Copy Selection
Is database mountable?Is copy queue length <= AutoDatabaseMountDial?
If Yes, database is marked as current active and mount request is issuedIf not, next best database tried (if one is available)
Best Copy Selection
Criteria Copy Queue Length Replay Queue Length Content Index Status
1 < 10 logs < 50 logs Healthy
2 < 10 logs < 50 logs Crawling
3 N / A < 50 logs Healthy
4 N / A < 50 logs Crawling
5 N / A < 50 logs N / A
6 < 10 logs N / A Healthy
7 < 10 logs N / A Crawling
8 N / A N / A Healthy
9 N / A N / A Crawling
10 Any database copy with a status of Healthy, DisconnectedAndHealthy, DisconnectedAndResynchronizing, or SeedingSource
Best Copy Selection – RTM
Four copies of DB1DB1 currently active on Server1
Database Copy Activation Preference
Copy Queue Length
Replay Queue Length
CI State Database State
Server2\DB1 2 4 0 Healthy Healthy
Server3\DB1 3 2 2 Healthy DiscAndHealthy
Server4\DB1 4 10 0 Crawling Healthy
DB1
Server1 Server2 Server3 Server4
DB1 DB1 DB1X
Best Copy Selection – RTM
Sort list of available copies based by Copy Queue Length (using Activation Preference as secondary sort key if necessary):
Server3\DB1Server2\DB1Server4\DB1
Database Copy Activation Preference
Copy Queue Length
Replay Queue Length
CI State Database State
Server2\DB1 2 4 0 Healthy Healthy
Server3\DB1 3 2 2 Healthy DiscAndHealthy
Server4\DB1 4 10 0 Crawling Healthy
Best Copy Selection – RTM
Only two copies meet first set of criteria for activation (CQL< 10; RQL< 50; CI=Healthy):
Server3\DB1Server2\DB1Server4\DB1
Lowest copy queue length – tried first
Database Copy
Activation Preference
Copy Queue Length
Replay Queue Length
CI State Database State
Server2\DB1 2 4 0 Healthy Healthy
Server3\DB1 3 2 2 Healthy DiscAndHealthy
Server4\DB1 4 10 0 Crawling Healthy
Best Copy Selection – SP1
Four copies of DB1DB1 currently active on Server1Auto database mountdial set to Lossless
DB1
Server1 Server2 Server3 Server4
DB1 DB1 DB1XDatabase Copy Activation
PreferenceCopy Queue
LengthReplay Queue
LengthCI State Database State
Server2\DB1 2 4 0 Healthy Healthy
Server3\DB1 3 2 2 Healthy DiscAndHealthy
Server4\DB1 4 10 0 Crawling Healthy
Best Copy Selection – SP1
Sort list of available copies based by Activation Preference:Server2\DB1Server3\DB1Server4\DB1
Database Copy Activation Preference
Copy Queue Length
Replay Queue Length
CI State Database State
Server2\DB1 2 4 0 Healthy Healthy
Server3\DB1 3 2 2 Healthy DiscAndHealthy
Server4\DB1 4 10 0 Crawling Healthy
Best Copy Selection – SP1
Sort list of available copies based by Activation Preference:Server2\DB1Server3\DB1Server4\DB1
Lowest preference value – tried first
Database Copy Activation Preference
Copy Queue Length
Replay Queue Length
CI State Database State
Server2\DB1 2 4 0 Healthy Healthy
Server3\DB1 3 2 2 Healthy DiscAndHealthy
Server4\DB1 4 10 0 Crawling Healthy
Best Copy Selection
After Active Manager determines the best copy to activate
The Replication service on the target server tries to copy missing log files from source (ACLL)
If successful, database will mount with zero data lossIf unsuccessful (lossy failure), database will mount based on the AutoDatabaseMountDial setting
If data loss is outside of dial setting, next copy will be tried
Best Copy Selection
If an activated database copy is mountedIt will generate new log files (using the same log generation sequence)Transport Dumpster requests will be initiated for the mounted database to recover lost messagesWhen original server or database recovers, it will run through divergence detection and either perform an incremental resync or require a full reseed
Exchange Server 2010 High AvailabilityDeep Dive: Datacenter Activation Coordination Mode
Datacenter Activation Coordination Mode
DAC mode is a property of a DAGActs as an application-level form of quorum
Controls whether or not a Mailbox server attempts to automatically mount its active databases on startupDesigned to prevent multiple copies of same database mounting on different members due to loss of network (split brain)
Also enables use of Site Resilience tasksStop-DatabaseAvailabilityGroupRestore-DatabaseAvailabilityGroupStart-DatabaseAvailabilityGroup
Datacenter Activation Coordination Mode
RTM: DAC Mode for DAGs with three or more members that are extended to two Active Directory sites
Don’t enable for two-member DAGs where each member is in different AD site or DAGs where all members are in the same AD site
SP1: DAC Mode can be enabled for all DAGsIf using Third Party Replication (TPR) mode, check with your vendor for guidance on DAC mode
Datacenter Activation Coordination Mode
Uses Datacenter Activation Coordination Protocol (DACP)A bit in memory (in MSExchangeRepl.exe) set to either:
0 = can’t auto-mount at startup1 = can auto-mount at startup
Datacenter Activation Coordination Mode
Active Manager startup sequenceDACP is set to 0DAG member communicates with other DAG members it can reach to determine the current value for their DACP bits
If the starting DAG member can communicate with all other members on the StartedMailboxServers list, DACP bit switches to 1If the starting DAG member can communicate with another member, and that other member’s DACP bit is set to 1, starting DAG member DACP bit switches to 1If the starting DAG member can communicate with another member, and that other member’s DACP bits are set to 0, starting DAG member DACP bit remains at 0
Prim
ary D
atace
nter
Secondary Datacenter
MBX-B
CAS-Pri
MBX-D
CAS-Sec HT2010
MBX-CMBX-A
HT2010
DAG1
Outlook Outlook
DAG1FSW
Active Active
Datacenter Activation Coordination Mode
Prim
ary D
atace
nter
Secondary Datacenter
MBX-B
CAS-Pri
MBX-D
CAS-Sec HT2010
MBX-CMBX-A
HT2010
DAG1
Outlook Outlook
DAG1FSW
Active Active
AWS
Datacenter Activation Coordination Mode
Prim
ary D
atace
nter
Secondary Datacenter
MBX-B
CAS-Pri
MBX-D
CAS-Sec HT2010
MBX-CMBX-A
HT2010
DAG1
Outlook Outlook
DAG1FSW
Active Active
AWS
Datacenter Activation Coordination Mode
0 0 1 1
Related Content
UCC305 - Exchange Server 2010 High Availability Design
Resources
Exchange Team Bloghttp://aka.ms/EHLO
Exchange 2010 Documentation Libraryhttp://aka.ms/Ex2010Docs
Feedback
Your feedback is very important! Please complete an evaluation form!
Thank you!
Questions?
UCC402Scott Schnoll
Principal Technical [email protected]://blogs.technet.com/scottschnollTwitter: @schnoll
You can ask me questions at the “Ask the Expert” zone:November 10, 2011 12:30 – 13:30