View
218
Download
0
Category
Tags:
Preview:
Citation preview
High availability and site redundancy with Exchange
2007: Notes from the field
Gareth IrelandInfrastructure Consultant
Session Objectives And Takeaways• Session Objectives:
– Understanding High Availability requirements and objectives of a business.
– Understanding what to protect in an Exchange Server 2007 environment
– Understanding Exchange Server 2007 features and solutions for protecting services and data
– Understanding issues to consider in site resiliency solutions
– Compare an Exchange 2003 Geo-cluster deployment to that of a Exchange 2007 solution.
Session Objectives And Takeaways• Session Objectives (cont.):
– Practical demonstration of an Exchange Server 2007 High Availability deployment.
• Key Takeaways:– New High Availability features and solutions
reduce the chance of disaster– New Disaster Recovery features and
solutions reduce the time of recovery when disasters do occur
– Demystify the concepts of High Availability features in Exchange Server 2007
High Availability Requirements of Business
Types of FailuresMid-ScaleMid-Scale
Full server Full server failurefailureComplete Complete cluster cluster failurefailureLarge Large storage storage failure, e.g., failure, e.g., SAN failureSAN failure
Small-ScaleSmall-Scale
Accidently Accidently deleted deleted itemsitemsDeleted Deleted mailboxmailboxDisk failureDisk failureDisk Disk Controller Controller failurefailureDatabase Database CorruptionCorruptionLog Log CorruptionCorruptionStorage Storage failure (DAS)failure (DAS)
Large-ScaleLarge-Scale
Total site Total site failurefailure
Exchange Server 2007 High Availability
Exchange OrganizationExchange Organization
Edge Transport server role
Hub Transport server role
Client Access server role
Internet
CCR ClusterCCR Cluster
Mailbox server role
(Active)
Mailbox server role
(Passive)
Unified Messaging server role
Overview
Exchange Server 2007 High Availability SolutionsMatrix
What to Protect and How
• Exchange Server 2003– Requires shared storage– SMTP, OWA, and Mailbox are cluster-aware– Single copy of mailbox data– Up to 8-node Active/Passive– 2-Node Active/Active– Geo-Clusters required Synchronized Storage Replication.– Split-Brain Scenarios
• Exchange Server 2007– Requires shared storage– Mailbox Only
• Simple redundancy for other roles– Single copy of mailbox data– Up to 8-node Active/Passive– Active/Active cut– Improvements in Install, Management, Behavior
Q
DB
Lo
gs
SMTPSMTPMBMBOWAOWA
Q
DB
Lo
gs
MB
• Deployment/operationalcost and complexity
• Recovery time varies based on backup technology, but can be lengthy and painful
• Data redundancy requires integration of partner technology
LimitationsLimitations
Q
DB
Lo
gs
MB
Local Continuous Replication (LCR)
• Standalone server data availability– Data outages expensive to recover– Significant data loss (hours?)– Previous versions of Exchange required
partner products for replication
• What is LCR?– Data replication on a single server
in a single datacenter• Enabled per storage group• Easy to configure
Local Continuous Replication
• Key things to know:– Per storage group, manual configuration– Adds overhead to server– Some configuration limitations
• Benefits– Enables recovery in minutes– Enables recovery without data loss– Enables large mailboxes– Variety of storage and backup options
• Decreases TOC by enabling I/O offload
– Within reach of broad set of customers
DB
Lo
gs
Service Pack 1 Service Pack 1
DB
DB
Lo
gs
Lo
gs
FileShare
Q
Passive Node
CCRCCR
MBX
SCC
MBX
Standby Continuous Replication
Standby Continuous Replication
• Designed for datacenter recovery
• Enables standby configurations out of the box – No clustering required between servers– No single subnet requirement– Spans multiple AD sites
• Granular configuration• Flexible configuration
– Many-to-many
• Manual activation
• Two-node Active/Passive failover cluster– File Share Witness (MNS Quorum)– No shared storage– Witness on Hub Transport– Automatic recovery
• Continuous data replication• Full redundancy• One or two datacenter solution
DB
DB
Lo
gs
Lo
gs
FileShare
• Outage Management– Easy-to-use scheduled outage support– Automatic recovery of unscheduled outages
• Symmetric failover• Resource requirements• Variety of backup options• Reduced backup TCO• Configuration limitations D
B
DB
L
og
s
L
og
s
FileShare
WitnessKB 921181
• Fast recovery to data problems on active node• No single point of failure• Simplified hardware requirements• Simplified storage requirements• Simplified deployment• Exchange-provided replication solution • Enables Mailbox server failover
to second datacenter• Improved management experience• Ability to offload VSS-based backups
BenefitsBenefits
• Cluster service monitors the resources– Failure detection is not instantaneous
• IP Address or Network Name resource failures cause failover– A machine, or network access to it,
has failed completely
• Exchange service failure or timeout doesn’t cause failover– The service is restarted on the same node
• Database failure doesn’t cause failover– Don’t want to move 49 databases because 1 failed
CCR failover behaviorCCR failover behavior
StoreStore
DBDB
ReplicatioReplicationn
ServiceService
CopCopyy
StoreStore
DBDB
ReplicatioReplicationn
ServiceService
CopCopyy
Cluster
Standalone Server CCR
LCR
Available configurationsAvailable configurations
Logs
pulled by
Passive
Active Node Passive Node
• A ‘pull’ model• Exchange server creates log files normally• Log files are copied by Replication service
– Share created on the active node– Exxnnnnnnnn.log files copied as they appear
• Replication service keeps a copy of the database up-to-date– Inspects, and replays log files
• Exx.log is copied for handoff/failover
Basic architectureBasic architecture
Cluster Continuous Replication
Node1 Node2
Database Logs DatabaseLogs
Copy and verify logsCopy and verify logs
\\node1\GUID\\node1\GUID
E00.logE00.log
E0000000012.loE0000000012.logg
E0000000011.loE0000000011.logg
E0000000012.logE0000000012.log
E0000000011.logE0000000011.log
Advance DB Advance DB by playing by playing
logslogs
Online Online seedseed
Updated Updated DB DB
ActiveActive PassivePassive
Clustered Continuous Replication Failover Scenarios
• Scheduled outage• Scheduled outage to correct corruption (logs available)• Scheduled outage to correct corruption (No logs available)
– Transport Dumpster• Store Crash• OS blue screen
– Incremental Replay• Active Network Failure
– Logs copied• Geographically Dispersed Cluster Single machine failure• Geographically Dispersed Cluster Datacenter failure
DEMO : Useful CCR cmdlets • Get-ClusterMailboxServerStatus
– Status information of the cluster• Get-StorageGroupCopyStatus
– Complete status information of CCR or LCR copy• Move-ClusterMailboxServer
– Scheduled (Lossless) move of Exchange resource• Update-StorageGroupCopy
– Initiate or resync an CCR or LCR copy (use Suspend-StorageGroupCopy and Resume-StorageGroupCopy cmdlet as required)
• Get-TransportConfig and Set-TransportConfig– Get and set transport dumpster configuration.
ActiveActive
E00 (Gen 5)E00 0000 0005
E00 0000 0004
• Passive node copies log files– Exx.log is in use
• On move, Exx.log is copied
• Designations are now reversed
Scheduled outage
E00 0000 0001
E00 (Gen 2)
E00 (Gen 3)
E00 (Gen 4)
E00 (Gen 6)
E00 (Gen 5)
E00 0000 0003
E00 0000 0002
E00 0000 0001
E00 0000 0002
E00 0000 0003
E00 0000 0004
Node 1 Node 2
ActiveActive• Failover without copying all log files is called “lossy”
• Passive DB is not completely up-to-date
• Log generation numbers are reused
• Log files havedifferent content!
• Database might be different!
E00 0000 0001
E00 (Gen 2)
E00 (Gen 3)
E00 (Gen 4)E00 (Gen 4)
E00 (Gen 5)
E00 (Gen 6)
E00 (Gen 5)
E00 0000 0003
E00 0000 0002
E00 0000 0001
E00 0000 0002
E00 0000 0003
E00 0000 0004E00 0000 0004
E00 (Gen 5)
E00 0000 0004
E00 0000 0005
Node 1 Node 2
Transport Dumpster• Transport Dumpster is a feature that is only enabled for use by
Clustered Continuous Replication• The transport dumpster submits recently delivered mail after an
unscheduled outage from the Hub Transport Servers• It is enabled by default and should always be turned on when using
CCR• The transport dumpster is enabled organization wide by setting the
amount of storage available per storage group and setting the time to retain mail in the dumpster
• What it does:– The Hub Transport server maintains a queue of mail that was
recently delivered to a clustered mailbox server– In the event of an unplanned failover, CCR automatically
requests every Hub Transport server in the site to redeliver mail from the transport dumpster queue
– The information store automatically deletes the duplicates and redelivers mail that was lost
Transport Dumpster
Types of Failures 2003 vs. 2007 Reviewed
• Stretch CCR on Windows 2003– 1 node per datacenter– Integrated data & server redundancy– Separate storage for each node in
each site– Flexible hardware options– Mailbox server failover and
switchover (manual & automatic)
– File Share Witness quorum– Requirements
• AD fix up for other Exchange roles on site failover• Windows 2003 still requires single subnet• Network pipe between datacenters must carry wide range of
traffic
Exchange Server 2007 Cluster Continuous Exchange Server 2007 Cluster Continuous ReplicationReplication
Key
- Active Directory Logical Site
- Physical Datacenter Site
-Active Directory Domain
WAN
STRETCH CCR/WIN2003
VLAN
Same Subnet
Same Subnet
Stretch CCR with Windows 2003
DC
/GC
1
DC
/GC
2
Primary Data Center
MB
X 1
MB
X 2
CA
S 1
CA
S 2
HU
B 1
HU
B 2
Ed
ge
1
Ed
ge
2
Internet
CMS
AD Site: Redmond
FSW
Internet
Standby Data Center
AD Site: Quincy
DC
/GC
3
DC
/GC
4
CA
S 3
CA
S 4
HU
B 3
HU
B 4
Ed
ge
3
Ed
ge
4
Public
Private
(fs
ws
vr)
Cluster Continuous Replication (CCR)
CMS
Network Load=
Replication+HUB+CAS+
Heartbeats+AD Access+
Client Access+AD Replication
MX Record MX Record
DO IT!
//mail.tailspin.com/…
//mail.tailspin.com/…
FSW
(fs
ws
vr)
BACK
??D
edic
ated
or
Non
-D
edic
ated
??
TOTAL SITE DISASTER
Demo
Demo Lab Setup
Questions & Answers
Blogcasts, Webcasts, & Whitepapers
Support Webcast Microsoft Exchange 2007 Disaster Recovery
http://support.microsoft.com/kb/937563/en-us
CCR: http://msexchangeteam.com/archive/2006/08/09/428642.aspx
SCR: http://msexchangeteam.com/archive/2007/02/23/435699.aspx
LCR: http://msexchangeteam.com/archive/2006/05/24/427788.aspx
Resources
Technical Communities, Webcasts, Blogs, Chats & User Groupshttp://www.microsoft.com/communities/default.mspx
Microsoft Developer Network (MSDN) & TechNet http://microsoft.com/msdn http://microsoft.com/technet
Trial Software and Virtual Labshttp://www.microsoft.com/technet/downloads/trials/default.mspx
Microsoft Learning and Certificationhttp://www.microsoft.com/learning/default.mspx
Database Portabilityhttp:// technet.microsoft.com/en-us/library/bb123954.aspx
File Share Witness for Cluster Continuous Replicationhttp://support.microsoft.com/kb/921181
Dial Tone Recovery Using an Alternate Serverhttp://technet.microsoft.com/en-us/library/bb310785.aspx
Thank you
http://www.microsoft.com/southafrica/ucs/2007
Recommended