View
216
Download
2
Category
Tags:
Preview:
Citation preview
BC:ISMDR:BEIT:VIII:chap8:Madhu N PIIT - 2
Section Objective
Upon completion of this section, you will be able to:• Understand the concept of information availability
and its measurement• Describe the backup/recovery purposes and
considerations• Discuss architecture and different backup/recovery
topologies• Describe local replication technologies and their
operation• Describe remote replication technologies and their
operation.
BC:ISMDR:BEIT:VIII:chap8:Madhu N PIIT - 4
Chapter ObjectiveAfter completing this chapter, you will be able to:• Define Business Continuity and Information
Availability • Detail impact of information unavailability • Define BC measurement and terminologies• Describe BC planning process• Detail BC technology solutions
BC:ISMDR:BEIT:VIII:chap8:Madhu N PIIT - 5
What is Business Continuity
• Business Continuity is preparing for, responding to, and recovering from an application outage that adversely affects business operations
• Business Continuity solutions address unavailability and degraded application performance
• BC is an integrated and enterprise wide process and set of activities to ensure “information availability”
BC:ISMDR:BEIT:VIII:chap8:Madhu N PIIT - 6
What is Information Availability (IA)
• IA refers to the ability of an infrastructure to function according to business expectations during its specified time of operation
• IA can be defined in terms of three parameters:– Accessibility
• Information should be accessible at right place and to the right user
– Reliability• Information should be reliable and correct
– Timeliness• Information must be available whenever required
BC:ISMDR:BEIT:VIII:chap8:Madhu N PIIT - 7
Causes of Information Unavailability
Disaster (<1% of Occurrences)Natural or man made
Flood, fire, earthquakeContaminated building
Unplanned Outages (20%)Failure
Database corruptionComponent failureHuman error
Planned Outages (80%)Competing workloads
Backup, reportingData warehouse extractsApplication and data restore
BC:ISMDR:BEIT:VIII:chap8:Madhu N PIIT - 8
Impact of DowntimeLost RevenueKnow the downtime costs (per
hour, day, two days...)• Number of employees impacted (x hours out * hourly rate)
Damaged Reputation
• Customers• Suppliers• Financial markets• Banks• Business partners
Financial Performance
• Revenue recognition• Cash flow• Lost discounts (A/P)• Payment guarantees• Credit rating• Stock price
Other ExpensesTemporary employees, equipment rental, overtime costs, extra shipping costs, travel expenses...
• Direct loss• Compensatory payments• Lost future revenue• Billing losses• Investment losses
Lost Productivity
BC:ISMDR:BEIT:VIII:chap8:Madhu N PIIT - 9
Measuring Information Availability
• MTBF: Average time available for a system or component to perform its normal operations between failures
• MTTR: Average time required to repair a failed componentIA = MTBF / (MTBF + MTTR) or IA = uptime / (uptime + downtime)
Detection
IncidentTime
Detection elapsed time
Diagnosis
Response Time
Repair
Recovery
Repair time
Restoration
Recovery Time
MTTR – Time to repair or ‘downtime’
Incident
MTBF – Time between failures or ‘uptime’
BC:ISMDR:BEIT:VIII:chap8:Madhu N PIIT - 10
Availability Measurement – Levels of ‘9s’ Availability
% Uptime % Downtime Downtime per Year Downtime per Week
98% 2% 7.3 days 3hrs 22 min
99% 1% 3.65 days 1 hr 41 min
99.8% 0.2% 17 hrs 31 min 20 min 10 sec
99.9% 0.1% 8 hrs 45 min 10 min 5 sec
99.99% 0.01% 52.5 min 1 min
99.999% 0.001% 5.25 min 6 sec
99.9999% 0.0001% 31.5 sec 0.6 sec
BC:ISMDR:BEIT:VIII:chap8:Madhu N PIIT - 11
BC Terminologies
• Disaster recovery– Coordinated process of restoring systems, data, and
infrastructure required to support ongoing business operations in the event of a disaster
– Restoring previous copy of data and applying logs to that copy to bring it to a known point of consistency
– Generally implies use of backup technology• Disaster restart
– Process of restarting from disaster using mirrored consistent copies of data and applications
– Generally implies use of replication technologies
BC:ISMDR:BEIT:VIII:chap8:Madhu N PIIT - 12
BC Terminologies (Cont.)Recovery Point Objective (RPO)• Point in time to which systems
and data must be recovered after an outage
• Amount of data loss that a business can endure
Recovery Time Objective (RTO)• Time within which systems,
applications, or functions must be recovered after an outage
• Amount of downtime that a business can endure and survive
Recovery-point objective Recovery-time objective
Seconds
Minutes
Hours
Days
Weeks
Seconds
Minutes
Hours
Days
Weeks Tape Backup
Periodic Replication
Asynchronous Replication
Synchronous Replication
Tape Restore
Disk Restore
Manual Migration
Global Cluster
BC:ISMDR:BEIT:VIII:chap8:Madhu N PIIT - 13
Business Continuity Planning (BCP) Process
• Identifying the critical business functions • Collecting data on various business processes
within those functions• Business Impact Analysis (BIA)
– Risk Analysis• Assessing, prioritizing, mitigating, and managing risk
• Designing and developing contingency plans and disaster recovery plan (DR Plan)
• Testing, training and maintenance
BC:ISMDR:BEIT:VIII:chap8:Madhu N PIIT - 14
BC Technology Solutions
• Following are the solutions and supporting technologies that enable business continuity and uninterrupted data availability:– Single point of failure– Multi-pathing software– Backup and replication
• Backup recovery• Local replication• Remote replication
BC:ISMDR:BEIT:VIII:chap8:Madhu N PIIT - 15
Resolving Single Points of Failure
FC Switches
Storage Array
Redundant Network
Clustered ServersRedundant Arrays
Remote Site
Redundant Ports
Redundant FC Switches
Redundant Paths
Heartbeat Connection
IP
Storage Array
Client
BC:ISMDR:BEIT:VIII:chap8:Madhu N PIIT - 16
Multi-pathing Software
• Configuration of multiple paths increases data availability
• Even with multiple paths, if a path fails I/O will not reroute unless system recognizes that it has an alternate path
• Multi-pathing software helps to recognize and utilizes alternate I/O path to data
• Multi-pathing software also provide the load balancing• Load balancing improves I/O performance and data
path utilization
BC:ISMDR:BEIT:VIII:chap8:Madhu N PIIT - 17
Backup and Replication • Local Replication
– Data from the production devices is copied to replica devices within the same array
– The replicas can then be used for restore operations in the event of data corruption or other events
• Remote Replication– Data from the production devices is copied to replica devices on a
remote array – In the event of a failure, applications can continue to run from the
target device• Backup/Restore
– Backup to tape has been a predominant method to ensure business continuity
– Frequency of backup is depend on RPO/RTO requirements
BC:ISMDR:BEIT:VIII:chap8:Madhu N PIIT - 18
Chapter Summary
Key points covered in this chapter:• Importance of Business Continuity• Types of outages and their impact to businesses• Information availability measurements• Definitions of disaster recovery and restart, RPO
and RTO• Business Continuity technology solutions
overview
BC:ISMDR:BEIT:VIII:chap8:Madhu N PIIT - 19
Concept in Practice – EMC PowerPath
SERV
ERST
ORA
GE
SCSIDriver
SCSIDriver
SCSIDriver
SCSIDriver
SCSIDriver
SCSIDriver
SCSIController
SCSIController
SCSIController
SCSIController
SCSIController
SCSIController
PowerPath Host Based Software
Resides between application and SCSI device driver
Provides Intelligent I/O path management
Transparent to the application
Automatic detection and recovery from host-to-array path failures
Host Application (s)
LUNLUN
LUNLUN
Storage Network
Recommended