High availability and site redundancy with Exchange 2007: Notes from the field Gareth Ireland...

Preview:

Citation preview

High availability and site redundancy with Exchange

2007: Notes from the field

Gareth IrelandInfrastructure Consultant

Session Objectives And Takeaways• Session Objectives:

– Understanding High Availability requirements and objectives of a business.

– Understanding what to protect in an Exchange Server 2007 environment

– Understanding Exchange Server 2007 features and solutions for protecting services and data

– Understanding issues to consider in site resiliency solutions

– Compare an Exchange 2003 Geo-cluster deployment to that of a Exchange 2007 solution.

Session Objectives And Takeaways• Session Objectives (cont.):

– Practical demonstration of an Exchange Server 2007 High Availability deployment.

• Key Takeaways:– New High Availability features and solutions

reduce the chance of disaster– New Disaster Recovery features and

solutions reduce the time of recovery when disasters do occur

– Demystify the concepts of High Availability features in Exchange Server 2007

High Availability Requirements of Business

Types of FailuresMid-ScaleMid-Scale

Full server Full server failurefailureComplete Complete cluster cluster failurefailureLarge Large storage storage failure, e.g., failure, e.g., SAN failureSAN failure

Small-ScaleSmall-Scale

Accidently Accidently deleted deleted itemsitemsDeleted Deleted mailboxmailboxDisk failureDisk failureDisk Disk Controller Controller failurefailureDatabase Database CorruptionCorruptionLog Log CorruptionCorruptionStorage Storage failure (DAS)failure (DAS)

Large-ScaleLarge-Scale

Total site Total site failurefailure

Exchange Server 2007 High Availability

Exchange OrganizationExchange Organization

Edge Transport server role

Hub Transport server role

Client Access server role

Internet

CCR ClusterCCR Cluster

Mailbox server role

(Active)

Mailbox server role

(Passive)

Unified Messaging server role

Overview

Exchange Server 2007 High Availability SolutionsMatrix

What to Protect and How

• Exchange Server 2003– Requires shared storage– SMTP, OWA, and Mailbox are cluster-aware– Single copy of mailbox data– Up to 8-node Active/Passive– 2-Node Active/Active– Geo-Clusters required Synchronized Storage Replication.– Split-Brain Scenarios

• Exchange Server 2007– Requires shared storage– Mailbox Only

• Simple redundancy for other roles– Single copy of mailbox data– Up to 8-node Active/Passive– Active/Active cut– Improvements in Install, Management, Behavior

Q

DB

Lo

gs

SMTPSMTPMBMBOWAOWA

Q

DB

Lo

gs

MB

• Deployment/operationalcost and complexity

• Recovery time varies based on backup technology, but can be lengthy and painful

• Data redundancy requires integration of partner technology

LimitationsLimitations

Q

DB

Lo

gs

MB

Local Continuous Replication (LCR)

• Standalone server data availability– Data outages expensive to recover– Significant data loss (hours?)– Previous versions of Exchange required

partner products for replication

• What is LCR?– Data replication on a single server

in a single datacenter• Enabled per storage group• Easy to configure

Local Continuous Replication

• Key things to know:– Per storage group, manual configuration– Adds overhead to server– Some configuration limitations

• Benefits– Enables recovery in minutes– Enables recovery without data loss– Enables large mailboxes– Variety of storage and backup options

• Decreases TOC by enabling I/O offload

– Within reach of broad set of customers

DB

Lo

gs

Service Pack 1 Service Pack 1

DB

DB

Lo

gs

Lo

gs

FileShare

Q

Passive Node

CCRCCR

MBX

SCC

MBX

Standby Continuous Replication

Standby Continuous Replication

• Designed for datacenter recovery

• Enables standby configurations out of the box – No clustering required between servers– No single subnet requirement– Spans multiple AD sites

• Granular configuration• Flexible configuration

– Many-to-many

• Manual activation

• Two-node Active/Passive failover cluster– File Share Witness (MNS Quorum)– No shared storage– Witness on Hub Transport– Automatic recovery

• Continuous data replication• Full redundancy• One or two datacenter solution

DB

DB

Lo

gs

Lo

gs

FileShare

• Outage Management– Easy-to-use scheduled outage support– Automatic recovery of unscheduled outages

• Symmetric failover• Resource requirements• Variety of backup options• Reduced backup TCO• Configuration limitations D

B

DB

L

og

s

L

og

s

FileShare

WitnessKB 921181

• Fast recovery to data problems on active node• No single point of failure• Simplified hardware requirements• Simplified storage requirements• Simplified deployment• Exchange-provided replication solution • Enables Mailbox server failover

to second datacenter• Improved management experience• Ability to offload VSS-based backups

BenefitsBenefits

• Cluster service monitors the resources– Failure detection is not instantaneous

• IP Address or Network Name resource failures cause failover– A machine, or network access to it,

has failed completely

• Exchange service failure or timeout doesn’t cause failover– The service is restarted on the same node

• Database failure doesn’t cause failover– Don’t want to move 49 databases because 1 failed

CCR failover behaviorCCR failover behavior

StoreStore

DBDB

ReplicatioReplicationn

ServiceService

CopCopyy

StoreStore

DBDB

ReplicatioReplicationn

ServiceService

CopCopyy

Cluster

Standalone Server CCR

LCR

Available configurationsAvailable configurations

Logs

pulled by

Passive

Active Node Passive Node

• A ‘pull’ model• Exchange server creates log files normally• Log files are copied by Replication service

– Share created on the active node– Exxnnnnnnnn.log files copied as they appear

• Replication service keeps a copy of the database up-to-date– Inspects, and replays log files

• Exx.log is copied for handoff/failover

Basic architectureBasic architecture

Cluster Continuous Replication

Node1 Node2

Database Logs DatabaseLogs

Copy and verify logsCopy and verify logs

\\node1\GUID\\node1\GUID

E00.logE00.log

E0000000012.loE0000000012.logg

E0000000011.loE0000000011.logg

E0000000012.logE0000000012.log

E0000000011.logE0000000011.log

Advance DB Advance DB by playing by playing

logslogs

Online Online seedseed

Updated Updated DB DB

ActiveActive PassivePassive

Clustered Continuous Replication Failover Scenarios

• Scheduled outage• Scheduled outage to correct corruption (logs available)• Scheduled outage to correct corruption (No logs available)

– Transport Dumpster• Store Crash• OS blue screen

– Incremental Replay• Active Network Failure

– Logs copied• Geographically Dispersed Cluster Single machine failure• Geographically Dispersed Cluster Datacenter failure

DEMO : Useful CCR cmdlets • Get-ClusterMailboxServerStatus

– Status information of the cluster• Get-StorageGroupCopyStatus

– Complete status information of CCR or LCR copy• Move-ClusterMailboxServer

– Scheduled (Lossless) move of Exchange resource• Update-StorageGroupCopy

– Initiate or resync an CCR or LCR copy (use Suspend-StorageGroupCopy and Resume-StorageGroupCopy cmdlet as required)

• Get-TransportConfig and Set-TransportConfig– Get and set transport dumpster configuration.

ActiveActive

E00 (Gen 5)E00 0000 0005

E00 0000 0004

• Passive node copies log files– Exx.log is in use

• On move, Exx.log is copied

• Designations are now reversed

Scheduled outage

E00 0000 0001

E00 (Gen 2)

E00 (Gen 3)

E00 (Gen 4)

E00 (Gen 6)

E00 (Gen 5)

E00 0000 0003

E00 0000 0002

E00 0000 0001

E00 0000 0002

E00 0000 0003

E00 0000 0004

Node 1 Node 2

ActiveActive• Failover without copying all log files is called “lossy”

• Passive DB is not completely up-to-date

• Log generation numbers are reused

• Log files havedifferent content!

• Database might be different!

E00 0000 0001

E00 (Gen 2)

E00 (Gen 3)

E00 (Gen 4)E00 (Gen 4)

E00 (Gen 5)

E00 (Gen 6)

E00 (Gen 5)

E00 0000 0003

E00 0000 0002

E00 0000 0001

E00 0000 0002

E00 0000 0003

E00 0000 0004E00 0000 0004

E00 (Gen 5)

E00 0000 0004

E00 0000 0005

Node 1 Node 2

Transport Dumpster• Transport Dumpster is a feature that is only enabled for use by

Clustered Continuous Replication• The transport dumpster submits recently delivered mail after an

unscheduled outage from the Hub Transport Servers• It is enabled by default and should always be turned on when using

CCR• The transport dumpster is enabled organization wide by setting the

amount of storage available per storage group and setting the time to retain mail in the dumpster

• What it does:– The Hub Transport server maintains a queue of mail that was

recently delivered to a clustered mailbox server– In the event of an unplanned failover, CCR automatically

requests every Hub Transport server in the site to redeliver mail from the transport dumpster queue

– The information store automatically deletes the duplicates and redelivers mail that was lost

Transport Dumpster

Types of Failures 2003 vs. 2007 Reviewed

• Stretch CCR on Windows 2003– 1 node per datacenter– Integrated data & server redundancy– Separate storage for each node in

each site– Flexible hardware options– Mailbox server failover and

switchover (manual & automatic)

– File Share Witness quorum– Requirements

• AD fix up for other Exchange roles on site failover• Windows 2003 still requires single subnet• Network pipe between datacenters must carry wide range of

traffic

Exchange Server 2007 Cluster Continuous Exchange Server 2007 Cluster Continuous ReplicationReplication

Key

- Active Directory Logical Site

- Physical Datacenter Site

-Active Directory Domain

WAN

STRETCH CCR/WIN2003

VLAN

Same Subnet

Same Subnet

Stretch CCR with Windows 2003

DC

/GC

1

DC

/GC

2

Primary Data Center

MB

X 1

MB

X 2

CA

S 1

CA

S 2

HU

B 1

HU

B 2

Ed

ge

1

Ed

ge

2

Internet

CMS

AD Site: Redmond

FSW

Internet

Standby Data Center

AD Site: Quincy

DC

/GC

3

DC

/GC

4

CA

S 3

CA

S 4

HU

B 3

HU

B 4

Ed

ge

3

Ed

ge

4

Public

Private

(fs

ws

vr)

Cluster Continuous Replication (CCR)

CMS

Network Load=

Replication+HUB+CAS+

Heartbeats+AD Access+

Client Access+AD Replication

MX Record MX Record

DO IT!

//mail.tailspin.com/…

//mail.tailspin.com/…

FSW

(fs

ws

vr)

BACK

??D

edic

ated

or

Non

-D

edic

ated

??

TOTAL SITE DISASTER

Demo

Demo Lab Setup

Questions & Answers

Blogcasts, Webcasts, & Whitepapers

Support Webcast Microsoft Exchange 2007 Disaster Recovery

http://support.microsoft.com/kb/937563/en-us

CCR: http://msexchangeteam.com/archive/2006/08/09/428642.aspx

SCR: http://msexchangeteam.com/archive/2007/02/23/435699.aspx

LCR: http://msexchangeteam.com/archive/2006/05/24/427788.aspx

Resources

Technical Communities, Webcasts, Blogs, Chats & User Groupshttp://www.microsoft.com/communities/default.mspx

Microsoft Developer Network (MSDN) & TechNet http://microsoft.com/msdn http://microsoft.com/technet

Trial Software and Virtual Labshttp://www.microsoft.com/technet/downloads/trials/default.mspx

Microsoft Learning and Certificationhttp://www.microsoft.com/learning/default.mspx

Database Portabilityhttp:// technet.microsoft.com/en-us/library/bb123954.aspx

File Share Witness for Cluster Continuous Replicationhttp://support.microsoft.com/kb/921181

Dial Tone Recovery Using an Alternate Serverhttp://technet.microsoft.com/en-us/library/bb310785.aspx

Thank you

http://www.microsoft.com/southafrica/ucs/2007

Recommended