3 Server / Network IP Address / Subnet BitsDefault Gateway EX1 – MAPI192.168.0.15/24192.168.0.1...

Preview:

Citation preview

UCC402

Exchange Server 2010High Availability Deep Dive

Scott SchnollPrincipal Technical WriterMicrosoft Corporation

Agenda

Exchange Server 2010 High Availability Deep DiveDatabase Availability Group NetworksActive ManagerBest Copy SelectionDatacenter Activation Coordination Mode

3

Exchange Server 2010 High AvailabilityDeep Dive: Database Availability Group Networks

DAG Networks

A DAG network is a logical collection of one or more subnetsThere are two types of DAG networks

MAPI Network - connects DAG members to network resources (Active Directory, other Exchange servers, DNS, etc.)

Registered in DNS / DNS configuredUses default gatewayClient for Microsoft Networks/File and Print Sharing enabled

Replication Network - used for/by continuous replication (log shipping and seeding)

Not registered in DNS / DNS not configuredNo default gatewayClient for Microsoft Networks/File and Print Sharing disabled

DAG Networks

All DAGs must have:Exactly one MAPI networkZero or more Replication networks

Separate network(s) on separate subnet(s)LRU determines which replication network is used with multiple replication networks

DAG networks automatically created when Mailbox server is added to DAG

Based on cluster’s enumeration of networks, which uses subnetsOne cluster network is created per subnet

DAG Networks

Maximum round trip return latency between all DAG members must be 500 ms or less

Regardless of network latency, validate that the network between all DAG members is capable of satisfying your data protection and availability goalsMay need to increase the number of databases or decreasing the number of mailboxes per database to achieve goals

DAG Networks

Server / Network IP Address / Subnet Bits Default Gateway

EX1 – MAPI 192.168.0.15/24 192.168.0.1

EX1 – REPLICATION 10.0.0.15/24 N / A

EX2 – MAPI 192.168.0.16/24 192.168.0.1

EX2 – REPLICATION 10.0.0.16/24 N / A

Name Subnet(s) Interface(s) MAPI Access Enabled

Replication Enabled

DAGNetwork01 192.168.0.0/24 EX1 (192.168.0.15)EX2 (192.168.0.16)

True True

DAGNetwork02 10.0.0.0/24 EX1 (10.0.0.15)EX2 (10.0.0.16)

False True

DAG Networks

Name Subnet(s) Interface(s) MAPI Access Enabled

Replication Enabled

DAGNetwork01 192.168.0.0/24 EX1 (192.168.0.15) True True

DAGNetwork02 10.0.0.0/24 EX1 (10.0.0.15) False True

DAGNetwork03 192.168.1.0/24 EX2 (192.168.1.15) True True

DAGNetwork04 10.0.1.0/24 EX2 (10.0.1.15) False True

Server / Network IP Address / Subnet Bits Default Gateway

EX1 – MAPI 192.168.0.15/24 192.168.0.1

EX1 – REPLICATION 10.0.0.15/24 N / A

EX2 – MAPI 192.168.1.15/24 192.168.1.1

EX2 – REPLICATION 10.0.1.15/24 N / A

DAG Networks

Collapse DAG networks and disable replication on MAPI network:

Name Subnet(s) Interface(s) MAPI Access Enabled

Replication Enabled

DAGNetwork01 192.168.0.0/24 EX1 (192.168.0.15) True True

DAGNetwork02 10.0.0.0/24 EX1 (10.0.0.15) False True

DAGNetwork03 192.168.1.0/24 EX2 (192.168.1.15) True True

DAGNetwork04 10.0.1.0/24 EX2 (10.0.1.15) False True

Set-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork01 -Subnets 192.168.0.0,192.168.1.0 -ReplicationEnabled:$false

Set-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork02 -Subnets 10.0.0.0,10.0.1.0

Remove-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork03

Remove-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork04

DAG Networks

Collapse DAG networks and disable replication on MAPI network:

Name Subnet(s) Interface(s) MAPI Access Enabled

Replication Enabled

DAGNetwork01 192.168.0.0/24192.168.1.0/24

EX1 (192.168.0.15)EX2 (192.168.1.15)

True False

DAGNetwork02 10.0.0.0/2410.0.1.0/24

EX1 (10.0.0.15)EX2 (10.0.1.15)

False True

Set-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork01 -Subnets 192.168.0.0,192.168.1.0 -ReplicationEnabled:$false

Set-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork02 -Subnets 10.0.0.0,10.0.1.0

Remove-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork03

Remove-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork04

DAG Networks

All DAGs extended to multiple datacenters should have hotfix from KB 2550886 installedAutomatic detection occurs when members added to DAG

If NICs are added after server is member of DAG, you must perform discovery

Set-DatabaseAvailabilityGroup <DAGName> -DiscoverNetworks

DAG network configuration persisted in cluster databaseHKLM\Cluster\Exchange\DAG Network

DAGs include built-in encryption and compressionEncryption: Kerberos SSP EncryptMessage/DecryptMessage APIsCompression: Microsoft XPRESS, based on LZ77 algorithm

DAG Networks

When using a single NICIt is both the MAPI and the Replication network

EnableReplication is $True

When using multiple NICsOne NIC is the MAPI network

EnableReplication is $False

Other NIC(s) are Replication network(s)Replication uses LRU to pick Replication network to useIf Replication networks are unavailable, MAPI network is used

DAG Networks

Use netsh, router ACLs or other means to block cross-network traffic

Blocked

Allowed

Subnet 3

Subnet 4Subnet 2

Subnet 1

M M M M

R R R R

DAG Networks

If using iSCSI storage, configure DAG and cluster to ignore iSCSI networksSet-DatabaseAvailabilityGroupNetwork -Identity <DAGNetworkName> -ReplicationEnabled:$false -IgnoreNetwork:$true

DAG Networks

When a DAG spans multiple subnets you need an IP address on the MAPI network for each subnetUse DHCP in site resilience configurations to assign IP addresses to Replication network

Enables delivery of the typically required static routesIf using static IP addresses, use netsh to configure static routes

Configure a DNS TTL on namespace records consistent with your SLA

For example, use a TTL of 5 minutes for a 60 minute RTO SLA

Exchange Server 2010 High AvailabilityDeep Dive: Active Manager

Active Manager

What are the three Active Manager roles?StandalonePAM (Primary Active Manager)SAM (Standby Active Manager)

Transition of role state logged into Microsoft-Exchange-HighAvailability/Operational event log (Crimson Channel)

Active Manager Functionality

Mount and Dismount DatabasesProvide Database Availability InformationProvide Interface for Administrative TasksMonitor for and React to FailuresMaintains Database and Server State Information

Mount / Dismount Database Copy

Mount DatabaseAn administrator action invoked through a taskThe last part of a move operation

Dismount DatabaseAn administrator action invoked through a taskThe first part of a move operation

Auto Dismount – DAG Member

Occurs when a DAG loses quorum

All DAG members are running (but may not be participating in the cluster)

Databases dismounted as quickly as possible to avoid split-brain

Information Store service is terminated

Active Manager – Move Database

Move DatabaseAn administrator action invoked by a taskAutomatic operation initiated by the PAM (failover)

Begins with a Dismount operation and ends with a Mount operation

Exchange Server 2010 High AvailabilityDeep Dive: Best Copy Selection

Best Copy Selection

Active Manager selects the “best” copy to become the new active copy when the existing active copy fails, or when an administrator performs a target-less switchoverBCS is the process of finding the best copy of an individual database to activate, given a list potential copies for activation and their status

During BCS, any servers that are unreachable or activation blocked are ignored

Best Copy Selection – RTM

Sorts copies by copy queue length to minimize data loss, using activation preference as a secondary sorting key if necessarySelects from sorted listed based on which set of criteria met by each copyAttempt Copy Last Logs (ACLL) runs and attempts to copy missing log files from previous active copy

Best Copy Selection – SP1

Sorts copies by activation preference when auto database mount dial is set to Lossless

Otherwise, sorts copies based on copy queue length, with activation preference used a secondary sorting key if necessary

Selects from sorted listed based on which set of criteria met by each copyAttempt Copy Last Logs (ACLL) runs and attempts to copy missing log files from previous active copy

Best Copy Selection

Is database mountable?Is copy queue length <= AutoDatabaseMountDial?

If Yes, database is marked as current active and mount request is issuedIf not, next best database tried (if one is available)

Best Copy Selection

Criteria Copy Queue Length Replay Queue Length Content Index Status

1 < 10 logs < 50 logs Healthy

2 < 10 logs < 50 logs Crawling

3 N / A < 50 logs Healthy

4 N / A < 50 logs Crawling

5 N / A < 50 logs N / A

6 < 10 logs N / A Healthy

7 < 10 logs N / A Crawling

8 N / A N / A Healthy

9 N / A N / A Crawling

10 Any database copy with a status of Healthy, DisconnectedAndHealthy, DisconnectedAndResynchronizing, or SeedingSource

Best Copy Selection – RTM

Four copies of DB1DB1 currently active on Server1

Database Copy Activation Preference

Copy Queue Length

Replay Queue Length

CI State Database State

Server2\DB1 2 4 0 Healthy Healthy

Server3\DB1 3 2 2 Healthy DiscAndHealthy

Server4\DB1 4 10 0 Crawling Healthy

DB1

Server1 Server2 Server3 Server4

DB1 DB1 DB1X

Best Copy Selection – RTM

Sort list of available copies based by Copy Queue Length (using Activation Preference as secondary sort key if necessary):

Server3\DB1Server2\DB1Server4\DB1

Database Copy Activation Preference

Copy Queue Length

Replay Queue Length

CI State Database State

Server2\DB1 2 4 0 Healthy Healthy

Server3\DB1 3 2 2 Healthy DiscAndHealthy

Server4\DB1 4 10 0 Crawling Healthy

Best Copy Selection – RTM

Only two copies meet first set of criteria for activation (CQL< 10; RQL< 50; CI=Healthy):

Server3\DB1Server2\DB1Server4\DB1

Lowest copy queue length – tried first

Database Copy

Activation Preference

Copy Queue Length

Replay Queue Length

CI State Database State

Server2\DB1 2 4 0 Healthy Healthy

Server3\DB1 3 2 2 Healthy DiscAndHealthy

Server4\DB1 4 10 0 Crawling Healthy

Best Copy Selection – SP1

Four copies of DB1DB1 currently active on Server1Auto database mountdial set to Lossless

DB1

Server1 Server2 Server3 Server4

DB1 DB1 DB1XDatabase Copy Activation

PreferenceCopy Queue

LengthReplay Queue

LengthCI State Database State

Server2\DB1 2 4 0 Healthy Healthy

Server3\DB1 3 2 2 Healthy DiscAndHealthy

Server4\DB1 4 10 0 Crawling Healthy

Best Copy Selection – SP1

Sort list of available copies based by Activation Preference:Server2\DB1Server3\DB1Server4\DB1

Database Copy Activation Preference

Copy Queue Length

Replay Queue Length

CI State Database State

Server2\DB1 2 4 0 Healthy Healthy

Server3\DB1 3 2 2 Healthy DiscAndHealthy

Server4\DB1 4 10 0 Crawling Healthy

Best Copy Selection – SP1

Sort list of available copies based by Activation Preference:Server2\DB1Server3\DB1Server4\DB1

Lowest preference value – tried first

Database Copy Activation Preference

Copy Queue Length

Replay Queue Length

CI State Database State

Server2\DB1 2 4 0 Healthy Healthy

Server3\DB1 3 2 2 Healthy DiscAndHealthy

Server4\DB1 4 10 0 Crawling Healthy

Best Copy Selection

After Active Manager determines the best copy to activate

The Replication service on the target server tries to copy missing log files from source (ACLL)

If successful, database will mount with zero data lossIf unsuccessful (lossy failure), database will mount based on the AutoDatabaseMountDial setting

If data loss is outside of dial setting, next copy will be tried

Best Copy Selection

If an activated database copy is mountedIt will generate new log files (using the same log generation sequence)Transport Dumpster requests will be initiated for the mounted database to recover lost messagesWhen original server or database recovers, it will run through divergence detection and either perform an incremental resync or require a full reseed

Exchange Server 2010 High AvailabilityDeep Dive: Datacenter Activation Coordination Mode

Datacenter Activation Coordination Mode

DAC mode is a property of a DAGActs as an application-level form of quorum

Controls whether or not a Mailbox server attempts to automatically mount its active databases on startupDesigned to prevent multiple copies of same database mounting on different members due to loss of network (split brain)

Also enables use of Site Resilience tasksStop-DatabaseAvailabilityGroupRestore-DatabaseAvailabilityGroupStart-DatabaseAvailabilityGroup

Datacenter Activation Coordination Mode

RTM: DAC Mode for DAGs with three or more members that are extended to two Active Directory sites

Don’t enable for two-member DAGs where each member is in different AD site or DAGs where all members are in the same AD site

SP1: DAC Mode can be enabled for all DAGsIf using Third Party Replication (TPR) mode, check with your vendor for guidance on DAC mode

Datacenter Activation Coordination Mode

Uses Datacenter Activation Coordination Protocol (DACP)A bit in memory (in MSExchangeRepl.exe) set to either:

0 = can’t auto-mount at startup1 = can auto-mount at startup

Datacenter Activation Coordination Mode

Active Manager startup sequenceDACP is set to 0DAG member communicates with other DAG members it can reach to determine the current value for their DACP bits

If the starting DAG member can communicate with all other members on the StartedMailboxServers list, DACP bit switches to 1If the starting DAG member can communicate with another member, and that other member’s DACP bit is set to 1, starting DAG member DACP bit switches to 1If the starting DAG member can communicate with another member, and that other member’s DACP bits are set to 0, starting DAG member DACP bit remains at 0

Prim

ary D

atace

nter

Secondary Datacenter

MBX-B

CAS-Pri

MBX-D

CAS-Sec HT2010

MBX-CMBX-A

HT2010

DAG1

Outlook Outlook

DAG1FSW

Active Active

Datacenter Activation Coordination Mode

Prim

ary D

atace

nter

Secondary Datacenter

MBX-B

CAS-Pri

MBX-D

CAS-Sec HT2010

MBX-CMBX-A

HT2010

DAG1

Outlook Outlook

DAG1FSW

Active Active

AWS

Datacenter Activation Coordination Mode

Prim

ary D

atace

nter

Secondary Datacenter

MBX-B

CAS-Pri

MBX-D

CAS-Sec HT2010

MBX-CMBX-A

HT2010

DAG1

Outlook Outlook

DAG1FSW

Active Active

AWS

Datacenter Activation Coordination Mode

0 0 1 1

Related Content

UCC305 - Exchange Server 2010 High Availability Design

Resources

Exchange Team Bloghttp://aka.ms/EHLO

Exchange 2010 Documentation Libraryhttp://aka.ms/Ex2010Docs

Feedback

Your feedback is very important! Please complete an evaluation form!

Thank you!

Questions?

UCC402Scott Schnoll

Principal Technical Writerscott.schnoll@microsoft.comhttp://blogs.technet.com/scottschnollTwitter: @schnoll

You can ask me questions at the “Ask the Expert” zone:November 10, 2011 12:30 – 13:30

Recommended