Upload
frederick-mckinney
View
216
Download
0
Tags:
Embed Size (px)
Citation preview
Getting Back to Business in Higher Education
Paul SchopisJim Gerrity
Ohio Supercomputer Center
Internet2 Fall 2007
San Diego, CA
2
Disasters can be very bad
3
Outages can affect large regions
The increasing reach of service affecting events
4
Agenda
Part 1: Business Resumption Planning– Identifying the Need for Business Continuity Planning– What to Plan For– Identifying Key Function RTO & RPO– The Key Planning Function– An Example
Part 2: Off-site Backup & Recovery Considerations and Strategies
– Recovery Strategy Prerequisites– Recovery technology options and RPO & RTO metrics– Overview of D2D replication/mirroring solutions– Q & A
5
Identifying the Need for Business Continuity Planning
• Student Services
• Grants and Endowments
• General Administration and Finance
• Distance Learning
6
What to Plan For
• Risk of Common Outages– Power loss– Cooling (water)– Network loss
• Risk of Disaster Impact– Reduce likelihood of impact DR site by same disaster
• Risk of Terrorism– Proximity to possible targets
7
What to plan for
• Availability of Staff (Pandemic)
• Ability of staff to get to DR location
• Technology Considerations Data replication • Asynchronous or Synchronous
– Data Locations • tapes off-site• Delivery to DR location
• Cost Considerations
8
Identifying Key Function RTO & RPO
A Good Risk Analysis is Important– Identifies Key Function– Provides Recovery Timeframes– Provides Recovery Point Objectives– Identifies “cost” of downtime or importance of recovery
9
The Key Planning Function
Disaster Recovery is:– A flexible response to a crisis– A Place to recover (location/equipment/network)– A communications Plan– A defined recovery set– Reliable backups– Test / maintain / test– Service continuity
10
Disaster Recovery is NOT:
– Recovery of all services– A business continuity plan
11
Some Key DR Planning Mistakes
• One recovery plan for all scenarios– Modules that fit a broader business continuity plan
• Planning and testing with IT personnel only– Adopt an integrated approach to planning and testing– Perform a business impact analysis
• Further away is better– Conduct a risk impact analysis– Invest in infrastructure that ensure availability of resources that are
beyond your control• Power, telecommunictions
12
Some Key DR Planning Mistakes
• One copy of mirrored data at the recovery site is appropriate
– What happens on resync
• The planned telecommunications bandwidth should exceed the peak data transfer requirements
– Only needed for synchronous remote copy
• Not Planning for transfer back
13
Establish a Foundation for Business Resumption
• Identify Facilities Required
• Ensure Telecommunications Needs are Met
• Cost Effectiveness
14
University DR Planning
• Universities should consider a centralized state or regional DR facility
– Already geographically dispersed• Limited impact from a common event• Reduced costs• Common Network Access
15
The Internet as a Key Component of a DR Plan
• Ability to transport key data securely
• Reduced storage / recovery costs
• VOIP
• Staff location
16
An Example
• Statewide Disaster RecoveryFor Ohio Higher Education Institutions
• Ohio State University and University of Cincinnati
17
Introduction
The Ohio State University and the University of Cincinnati have collaborated to provide reciprocal disaster preparedness resources to our respective institutions and have subsequently expanded the capability and now offer similar capabilities to other institutions in the state of Ohio.
18
How The Relationship Began
• OSU and UC happened to sit across from each other at a Microsoft briefing asking each other “what do you do for DR for your mainframe?”
• This led to:• What size is your mainframe? • Do you have spare capacity?• What’s your storage environment look like?• What kind of staff do you have?• What skills do they have?• What are you doing for open systems?
19
What We Had Going For Us
• Both data center facilities meet Tier 3 standards for DR operability
• Each has sufficient space to accommodate additional systems in the event of a disaster
• Our facilities are 105 miles apart, neither is in a flood plane or earthquake zone
• Columbus and Cincinnati have separate utility, transportation and telecommunications infrastructures
• We were fairly close technically & determined we could put something basic in place without too much difficulty or expense
20
Other Factors That We Considered
The ‘state’ of Ohio in June 2003• The two Only 2 of 15 public universities had a disaster recovery plan or option in place
• Institutions paid for third party DR options at a cost of over $300,000 per year
• None of the Universities shared services – even if they were just down the road
Our schools represent combined assets of:• $8 billion total revenues• 320,000 students• $625 Million in net assets
21
22
The Common Needs Were Obvious
• Both organizations had a need for a functioning DR capability
• Neither had unlimited funding so cost was a major consideration
• We saw the opportunity to be able to start basic and improve our capabilities over time
23
Result - A Decision to Collaborate
• OSU and UC determined that enough motivation and synergies existed to make a mutual DRP endeavor practical and desirable.
• There was also a reasonable expectation that other institutions in the state would be interested in playing in this space. In addition, we believed that between us we would have the capability to support these institutions.
24
Our Initial Strategy
Embark on a phased approach that targeted:
1st Meeting the short term goal to put a working solution in place, then add sophistication while addressing the long term needs of our institutions.
2nd Develop a flexible solution that could be made available to other institutions in the state
25
Our Goals - Mainframe
• Target Mission Critical mainframe systems using a tape based recovery approach.
• Be capable of having a recovery environment operational and ready to accept application recovery efforts within 4 hours of an emergency being declared.
• Implement an electronic data exchange so that data could be copied in near real time, virtually reducing data loss to zero by the end of 2006.
26
Our Goals – Non-mainframe
• Support a drop ship, tape recovery strategy
• Allow for hosting skeleton infrastructure
• Allow for hosting cold, warm or hot systems
• Allow for real time data synchronization
27
Part 1 / Part 2
Presentation Break
Off-site Backup & Recovery Considerations and Strategies
Leveraging the WDM infrastructure for Business Resumption
Internet2 Fall 2007
San Diego, CA
© 2006 ADVA Optical Networking. All rights reserved.29
Developing Recovery Strategies: Prerequisites
Executive level sponsorship Business Impact Analysis (BIA)
Quantifies risk levels – acceptable downtime parameters and financial, legal, social impact for business and academic functionsPersonnelProcessesTechnology
Findings include two key metrics : RPO & RTO
© 2006 ADVA Optical Networking. All rights reserved.30
time
Recovery Point Objective (RPO) Recovery Time Objective (RTO)
Application back onlineLast data backup
Disaster strikes
RTO
RPO
seconds
seconds
minutes hours days
minutes
hours
RPO and RTO
RPO: Point in time data must be restored after an outage
RTO: Period of time systems, applications, functions must be recovered after an outage
© 2006 ADVA Optical Networking. All rights reserved.31
Recovery Strategy components
Back Office ResourcesFacility, hardware, network, software, data,
staff Establishing an Alternate Site(s) Backup Hardware Technologies
© 2006 ADVA Optical Networking. All rights reserved.32
-24 -12 0 12 24 36 48 60 72 84
Traditional Recovery -
Standby OS -
Electronic Vaulting -
Remote Journaling -
Replication/Mirroring -
Clustering -
Hours of Lost Transactions (RPO)
Hours Required to Resume Business (RTO)
TransactionRecreation
TransactionsNot Captured
Declaration Data Retrieval
Transit SystemRestore
IPL &Network
DatabaseRestore
Sample Recovery Strategies and RTO/RPO Considerations
Sources: BIA, GIAC
© 2006 ADVA Optical Networking. All rights reserved.33
…..additional considerations
Along with RTO/RPO, must factor in backup windows
Consider the recovery process carefully; what’s involved in restoration…and who can initiate the process
Security elements Optimum recovery solution is a function of ‘Cost
of Impact’ Vs. ‘Cost of Recovery’
© 2006 ADVA Optical Networking. All rights reserved.34
Accessing Business Impact and Technology Options for Off-site DR
Ranking Characteristics Recovery Window (RTO)
Recovery Technology
Class 1
Essential
Severe impact to Bus. Operations; CRM, Financial/ Revenue, Clinical Care, Safety.
0 – 4 hours Disk mirroring/replication
Clustering
Class 2
Critical
Some impact for Clinical Care. Potential for adverse impact to Student Services, Supply Chain, Grants & Endowments.
4 – 12 hours Disk mirroring/replication
Class 3
Important
Some Bus. Ops. not available. No direct impact to revenue, safety.
12 – 24 hours Electronic Vaulting, (Nearline Tape, ATL)
Remote Journaling
Class 4As needed….
Minor impact to business operations. Data stored for long periods; compliance and preservation.
24 – 72+ hours Tape Archive
© 2006 ADVA Optical Networking. All rights reserved.35
Disk-to-Disk (D2D) backup replacing tape…..
Tape most widely deployed, but D2D rapidly gaining ground Tape still ‘key’ for archiving, ‘D2D2T’ Tape roughly 50% less expensive than ‘Tier 1’ disk-based solutions
Tier 1 Disk $$ are decreasing Tier 2 SATA RAID-6, high capacity platforms available and proven
D2D (including Virtual Tape Libraries (VTL) ) remedy for Tape reliability and performance issues
VTL – disk-based but emulate tape libraries Resides between tape libraries and disk on the RPO/RTO continuum Preserves investment in existing tape backup software & systems Can use as part of tiered disk and tape backup strategy
Data Replication/Mirroring most popular D2D remote backup solution for critical data, applications
Replication/Mirroring has several flavors……
© 2006 ADVA Optical Networking. All rights reserved.36
Disk Mirroring/Replication
Many choices….and combinations……
Synchronous
Asynchronous
Fabric-based
CDP
Point-in-Time Copies
In-band
Snap Shot Copy
Host-based
Array-based ??Data Deduplication
Virtualization
© 2006 ADVA Optical Networking. All rights reserved.37
Disk Mirror - Sync operation
NMSDISK (Source)
Servers/mainframes
ChannelDirector
Site-B Sync Mirror
DISK(Target)
Up to 200km
Fiber
Tape vault
Data CenterSite-A
Servers/mainframes
Synchronous operation:Local transaction will only complete when remote transaction completes
- Servers not required at Site B
© 2006 ADVA Optical Networking. All rights reserved.38
Disk mirror – Sync operation
Provides ‘real-time’ data copy…..file level protection Transparent to systems being mirrored S/W, H/W often vendor proprietary Due to response time objectives subject to distance
limitations; up to 200km Must have enough FC ‘buffer credits’ in switch and/or WDM
Performance dependent of number of I/O’s and bandwidth May configure for multiple, concurrent I/Os to multiple
volumes WDM addresses bandwidth
© 2006 ADVA Optical Networking. All rights reserved.39
Disk Mirror - Async operation
NMSDISK (Source)
Servers/mainframes
ChannelDirector
Site-B Async Mirror
DISK(Target)
Up to 1000’s km
Fiber
Tape vault
Data CenterSite-A
Servers/mainframes
Asynchronous operation: No specific link between completion of
a local and remote transaction
- Servers not required at Site B
© 2006 ADVA Optical Networking. All rights reserved.40
Disk mirror – Async operation
Provides ‘near real-time’ data copy…..file level protection Some data loss may occur “Point-in-Time” Async addresses “file level” issue….but adds to
RPO Less expensive than Sync Transparent to systems being mirrored S/W, H/W often vendor proprietary Not subject to Sync distance limitations
Like Sync, still must have enough FC ‘buffer credits’ in switch and/or WDM
Performance; supports multiple, concurrent I/O’s
© 2006 ADVA Optical Networking. All rights reserved.41
NMS
Data centerSite-A
ChannelDirector
Intermediate site-BSync Mirror
0-1000s km
DISK(Second Copy)
DR Site-CAsync Mirror
DISK(Third Copy)
0-200 km
ChannelDirector
Fiber
Servers
Combined Sync/Async operation andTier 1 and Tier 2 storage…..and ILM
Tape
Servers/Mainframes
Servers/Mainframes
DISK (First Copy)
Fiber
Tape
Tier 1 ‘FC” or ‘SCSI’ Disk
Tier 1 ‘FC’ or SCSI and Tier 2 ‘SATA’ Disk
Tier 2 ‘SATA’ Disk
Supports DR, Reduces Costs, enables Information Life Cycle Management
© 2006 ADVA Optical Networking. All rights reserved.42
Host-based replication/mirroring
Storage platform agnostic Servers required at all DR sites Software-based; consumes host resources…can affect production
application performance Operating System dependent More complex installing, implementing and trouble-shooting
problems Management complexity increases as backup data increases
© 2006 ADVA Optical Networking. All rights reserved.43
New, emerging technologies…..
…that compliment replication/mirroring to evaluate:
Continuous Data Protection (CDP) Virtualization Data Deduplication
© 2006 ADVA Optical Networking. All rights reserved.44
WDM benefits for remote storage networking
Enterprise Elasticity
- Platform, protocol and bit rate agnostic
- Support for multiple interfaces and networks
- Low latency; required for most storage networking applications
- Capacity and performance
- Centralized management, distributed GMPLS control plane - Lower TCO by doing more with less
Reliable, Future proof, scalable, flexible, cost-effectiveReliable, Future proof, scalable, flexible, cost-effective
© 2006 ADVA Optical Networking. All rights reserved.45
Summary
No DR/BC strategy will work without Sr. Executive support and a comprehensive Business Impact Analysis (BIA); including an understanding of RPO/RTO of applications and data
No single backup & restore solution fits an organization’s over-all DR/BC plan
Ensure your D2D investments are compatible and complimentary with new and emerging replication technologies
Look to utilize lower cost SATA disk and VTL technology where applicable (RPO/RTO)
Regardless of the strategy…..backed up data should still be copied to offline media and rotated to off-site storage
Thank You
Jim GerrityDirector, Enterprise Vertical Markets Development and
Storage Solutions
+ 203 483 [email protected]