21
1 Availability Policy (slides from Clement Chen and Craig Lewis)

1 Availability Policy (slides from Clement Chen and Craig Lewis)

Embed Size (px)

Citation preview

1

Availability Policy

(slides from Clement Chen and Craig Lewis)

2

Definition

bull The degree to which data or systems are accessible and in functioning condition

bull Looking at it another way the degree to which the system is fulfilling the intended function

3

Availability and Reliability

Availability and Reliability are not the same thingbull Availability means that the system is ready for

usebull Reliability means that a device or system can

perform its job when called upon to do so

There is overlap but they are not the same thing

4

Major Causes of Disruption

bull Human Interferencendash Operator errorndash Virus and hacker attackndash Theft or sabotage

bull Communication Failurebull Hardware or system failurebull Natural Disastersbull Power Failurebull Water Damagebull Fire

5

Aspects of Availability

bull Data Availability

bull Network Availability

bull Communication Availability

bull System Availability

bull Power Availability

bull People Availability

bull Other Resources Availability

6

Data Availability

bull 1048708 Rule 1 Backup

bull 1048708 Rule 2 Backup

bull 1048708 Rule 3 Backup

7

Backup Methods

Full Backupndash Backup every filendash Takes a lot of storage space

Incremental Backupndash backs up files that have been created or modified only since the

last backupndash backup operator needing several tapes to do a complete restoration

Differential Backupndash backs up files that have been created or modified only since the

last full backupndash backup operator need only the full backup and the one differential

backup to restore thesystem

8

Data RetentionSarbanes Oxleybull All electronic company information must be retained for at

least five yearsbull Accounting firms that audit publicly traded companies

must retain all related documents for 7 years after auditHIPPAbull 1048708 Members of health care industry must retain patient

information for 6 yearsSEC 17a-3 and 17a-4bull Brokersdealers must retain records for 3-6 years and more

9

Data Vaulting

bull Copy of data is saved at a remote site periodically or continuously via network

bull Remote site may be own site or at a vendor location

bull Minimal or no data maybe lost in a disaster

bull There is typically some delay before data can actually be used

10

Network Availability

bull Prioritize the systems needing network access

bull Measure the amount of bandwidth needed to fulfill purpose of each component

bull Calculate overhead of protective measures

bull Decide what (if anything) can drop

11

Service Level Agreement

bull Can the ISP deliver

bull Can your equipment handle it

bull Higher bandwidth ndash for whatndash More businessndash Faster customer accessndash Faster music downloadsndash More scanning

12

People and Availability

bull People are a source of informationbull Staff with knowledge of how to fix a problem not

being there to fix it negatively impacts availabilityndash Positional redundancy ndash ldquoWorker X can do that but

shersquos not here until tomorrowrdquondash Shared knowledge ndash ldquoWhat if I get hit by a busrdquondash Limitations on physical access ndash ldquoItrsquos a 30 second fix

but it will take me 10 minutes to get thererdquondash Limitations placed by policy ndash ldquoI know how to fix it

but Irsquom not allowed to go in the server roomrdquo

13

Infrastructure Availability

bull Availability of the infrastructure can have a direct impact on availability of informationndash Voice communicationsndash Powerndash HVACndash Physical access

14

Infrastructure Solutions

Voicebull Cellular Phones bull WiFi Phonesbull Walkie-talkiesPowerbull Uninterruptible Power Supply (UPS)bull GeneratorsHVACbull Portable coolersbull FansBlowersPhysical Accessbull Security guardsbull Transportation shuttlesbull Backupalternative to electronic access controls

15

Measuring Availability

What does it mean to be available and how can it be measured

Availability means that systems or data are accessible but does not guaranteendash Performancendash Typical ways of doing things can still be usedndash Full system capacity

16

MTBF amp MTTR

Definitionsbull Mean Time Between Failure (MTBF) is the amount of

time between failures where failure is defined as a departure from acceptable service for a system This is a measure of reliability

bull Mean Time to Recover (MTTR) measures the amount of time required to repair or recovery for a failed system

bull Availability is the ratio of the time a system is actually available to the time it should have been available

Availability = MTBF (MTBF + MTTR)

17

Availability Values

bull 1 weekThreshold Downtime

99 11 hr

999 63 min

9999 378 sec

99999 38 sec

999999 038 sec

18

Business Continuity Planning

bull Big deal since 911bull Every Business Continuity strategy includes three

fundamental componentsndash Business Impact Analysisndash Recovery Strategyndash Design and Develop the disaster recovery process

bull BCP should consider every type of interruption from a brief power outage up to the worst possible natural disaster or terrorist attack

19

Requirements of a BCP

1 Provide procedures and listing of resources to assist in the recovery process

2 Provide an immediate accurate and measured response to emergency situations

3 Identify vendors that may be needed in the recovery process and put agreements in place with selected vendors

4 Avoid confusion experienced during a crisis by documenting testing an training plan procedures

5 Clear guidance for declaring a disaster6 Provide the necessary directions to ensure the timely resumption of

critical services7 Document recovery processes so they can be executed by

knowledgeable people

20

BCDR Resources

Survive The Business Continuity Groupndash httpwwwsurvivecom

Emergency Information Infrastructure Partnershipndash httpwwweiiporg

Disaster Recovery Journalndash httpwwwdrjcom

21

Summary

bull Lots of parts of availability

bull Tradeoffs are essential

bull Complexity complexity complexity

bull Need policy for a roadmap

  • Availability Policy
  • Definition
  • Availability and Reliability
  • Major Causes of Disruption
  • Aspects of Availability
  • Data Availability
  • Backup Methods
  • Data Retention
  • Data Vaulting
  • Network Availability
  • Service Level Agreement
  • People and Availability
  • Infrastructure Availability
  • Infrastructure Solutions
  • Measuring Availability
  • MTBF amp MTTR
  • Availability Values
  • Business Continuity Planning
  • Requirements of a BCP
  • BCDR Resources
  • Summary

2

Definition

bull The degree to which data or systems are accessible and in functioning condition

bull Looking at it another way the degree to which the system is fulfilling the intended function

3

Availability and Reliability

Availability and Reliability are not the same thingbull Availability means that the system is ready for

usebull Reliability means that a device or system can

perform its job when called upon to do so

There is overlap but they are not the same thing

4

Major Causes of Disruption

bull Human Interferencendash Operator errorndash Virus and hacker attackndash Theft or sabotage

bull Communication Failurebull Hardware or system failurebull Natural Disastersbull Power Failurebull Water Damagebull Fire

5

Aspects of Availability

bull Data Availability

bull Network Availability

bull Communication Availability

bull System Availability

bull Power Availability

bull People Availability

bull Other Resources Availability

6

Data Availability

bull 1048708 Rule 1 Backup

bull 1048708 Rule 2 Backup

bull 1048708 Rule 3 Backup

7

Backup Methods

Full Backupndash Backup every filendash Takes a lot of storage space

Incremental Backupndash backs up files that have been created or modified only since the

last backupndash backup operator needing several tapes to do a complete restoration

Differential Backupndash backs up files that have been created or modified only since the

last full backupndash backup operator need only the full backup and the one differential

backup to restore thesystem

8

Data RetentionSarbanes Oxleybull All electronic company information must be retained for at

least five yearsbull Accounting firms that audit publicly traded companies

must retain all related documents for 7 years after auditHIPPAbull 1048708 Members of health care industry must retain patient

information for 6 yearsSEC 17a-3 and 17a-4bull Brokersdealers must retain records for 3-6 years and more

9

Data Vaulting

bull Copy of data is saved at a remote site periodically or continuously via network

bull Remote site may be own site or at a vendor location

bull Minimal or no data maybe lost in a disaster

bull There is typically some delay before data can actually be used

10

Network Availability

bull Prioritize the systems needing network access

bull Measure the amount of bandwidth needed to fulfill purpose of each component

bull Calculate overhead of protective measures

bull Decide what (if anything) can drop

11

Service Level Agreement

bull Can the ISP deliver

bull Can your equipment handle it

bull Higher bandwidth ndash for whatndash More businessndash Faster customer accessndash Faster music downloadsndash More scanning

12

People and Availability

bull People are a source of informationbull Staff with knowledge of how to fix a problem not

being there to fix it negatively impacts availabilityndash Positional redundancy ndash ldquoWorker X can do that but

shersquos not here until tomorrowrdquondash Shared knowledge ndash ldquoWhat if I get hit by a busrdquondash Limitations on physical access ndash ldquoItrsquos a 30 second fix

but it will take me 10 minutes to get thererdquondash Limitations placed by policy ndash ldquoI know how to fix it

but Irsquom not allowed to go in the server roomrdquo

13

Infrastructure Availability

bull Availability of the infrastructure can have a direct impact on availability of informationndash Voice communicationsndash Powerndash HVACndash Physical access

14

Infrastructure Solutions

Voicebull Cellular Phones bull WiFi Phonesbull Walkie-talkiesPowerbull Uninterruptible Power Supply (UPS)bull GeneratorsHVACbull Portable coolersbull FansBlowersPhysical Accessbull Security guardsbull Transportation shuttlesbull Backupalternative to electronic access controls

15

Measuring Availability

What does it mean to be available and how can it be measured

Availability means that systems or data are accessible but does not guaranteendash Performancendash Typical ways of doing things can still be usedndash Full system capacity

16

MTBF amp MTTR

Definitionsbull Mean Time Between Failure (MTBF) is the amount of

time between failures where failure is defined as a departure from acceptable service for a system This is a measure of reliability

bull Mean Time to Recover (MTTR) measures the amount of time required to repair or recovery for a failed system

bull Availability is the ratio of the time a system is actually available to the time it should have been available

Availability = MTBF (MTBF + MTTR)

17

Availability Values

bull 1 weekThreshold Downtime

99 11 hr

999 63 min

9999 378 sec

99999 38 sec

999999 038 sec

18

Business Continuity Planning

bull Big deal since 911bull Every Business Continuity strategy includes three

fundamental componentsndash Business Impact Analysisndash Recovery Strategyndash Design and Develop the disaster recovery process

bull BCP should consider every type of interruption from a brief power outage up to the worst possible natural disaster or terrorist attack

19

Requirements of a BCP

1 Provide procedures and listing of resources to assist in the recovery process

2 Provide an immediate accurate and measured response to emergency situations

3 Identify vendors that may be needed in the recovery process and put agreements in place with selected vendors

4 Avoid confusion experienced during a crisis by documenting testing an training plan procedures

5 Clear guidance for declaring a disaster6 Provide the necessary directions to ensure the timely resumption of

critical services7 Document recovery processes so they can be executed by

knowledgeable people

20

BCDR Resources

Survive The Business Continuity Groupndash httpwwwsurvivecom

Emergency Information Infrastructure Partnershipndash httpwwweiiporg

Disaster Recovery Journalndash httpwwwdrjcom

21

Summary

bull Lots of parts of availability

bull Tradeoffs are essential

bull Complexity complexity complexity

bull Need policy for a roadmap

  • Availability Policy
  • Definition
  • Availability and Reliability
  • Major Causes of Disruption
  • Aspects of Availability
  • Data Availability
  • Backup Methods
  • Data Retention
  • Data Vaulting
  • Network Availability
  • Service Level Agreement
  • People and Availability
  • Infrastructure Availability
  • Infrastructure Solutions
  • Measuring Availability
  • MTBF amp MTTR
  • Availability Values
  • Business Continuity Planning
  • Requirements of a BCP
  • BCDR Resources
  • Summary

3

Availability and Reliability

Availability and Reliability are not the same thingbull Availability means that the system is ready for

usebull Reliability means that a device or system can

perform its job when called upon to do so

There is overlap but they are not the same thing

4

Major Causes of Disruption

bull Human Interferencendash Operator errorndash Virus and hacker attackndash Theft or sabotage

bull Communication Failurebull Hardware or system failurebull Natural Disastersbull Power Failurebull Water Damagebull Fire

5

Aspects of Availability

bull Data Availability

bull Network Availability

bull Communication Availability

bull System Availability

bull Power Availability

bull People Availability

bull Other Resources Availability

6

Data Availability

bull 1048708 Rule 1 Backup

bull 1048708 Rule 2 Backup

bull 1048708 Rule 3 Backup

7

Backup Methods

Full Backupndash Backup every filendash Takes a lot of storage space

Incremental Backupndash backs up files that have been created or modified only since the

last backupndash backup operator needing several tapes to do a complete restoration

Differential Backupndash backs up files that have been created or modified only since the

last full backupndash backup operator need only the full backup and the one differential

backup to restore thesystem

8

Data RetentionSarbanes Oxleybull All electronic company information must be retained for at

least five yearsbull Accounting firms that audit publicly traded companies

must retain all related documents for 7 years after auditHIPPAbull 1048708 Members of health care industry must retain patient

information for 6 yearsSEC 17a-3 and 17a-4bull Brokersdealers must retain records for 3-6 years and more

9

Data Vaulting

bull Copy of data is saved at a remote site periodically or continuously via network

bull Remote site may be own site or at a vendor location

bull Minimal or no data maybe lost in a disaster

bull There is typically some delay before data can actually be used

10

Network Availability

bull Prioritize the systems needing network access

bull Measure the amount of bandwidth needed to fulfill purpose of each component

bull Calculate overhead of protective measures

bull Decide what (if anything) can drop

11

Service Level Agreement

bull Can the ISP deliver

bull Can your equipment handle it

bull Higher bandwidth ndash for whatndash More businessndash Faster customer accessndash Faster music downloadsndash More scanning

12

People and Availability

bull People are a source of informationbull Staff with knowledge of how to fix a problem not

being there to fix it negatively impacts availabilityndash Positional redundancy ndash ldquoWorker X can do that but

shersquos not here until tomorrowrdquondash Shared knowledge ndash ldquoWhat if I get hit by a busrdquondash Limitations on physical access ndash ldquoItrsquos a 30 second fix

but it will take me 10 minutes to get thererdquondash Limitations placed by policy ndash ldquoI know how to fix it

but Irsquom not allowed to go in the server roomrdquo

13

Infrastructure Availability

bull Availability of the infrastructure can have a direct impact on availability of informationndash Voice communicationsndash Powerndash HVACndash Physical access

14

Infrastructure Solutions

Voicebull Cellular Phones bull WiFi Phonesbull Walkie-talkiesPowerbull Uninterruptible Power Supply (UPS)bull GeneratorsHVACbull Portable coolersbull FansBlowersPhysical Accessbull Security guardsbull Transportation shuttlesbull Backupalternative to electronic access controls

15

Measuring Availability

What does it mean to be available and how can it be measured

Availability means that systems or data are accessible but does not guaranteendash Performancendash Typical ways of doing things can still be usedndash Full system capacity

16

MTBF amp MTTR

Definitionsbull Mean Time Between Failure (MTBF) is the amount of

time between failures where failure is defined as a departure from acceptable service for a system This is a measure of reliability

bull Mean Time to Recover (MTTR) measures the amount of time required to repair or recovery for a failed system

bull Availability is the ratio of the time a system is actually available to the time it should have been available

Availability = MTBF (MTBF + MTTR)

17

Availability Values

bull 1 weekThreshold Downtime

99 11 hr

999 63 min

9999 378 sec

99999 38 sec

999999 038 sec

18

Business Continuity Planning

bull Big deal since 911bull Every Business Continuity strategy includes three

fundamental componentsndash Business Impact Analysisndash Recovery Strategyndash Design and Develop the disaster recovery process

bull BCP should consider every type of interruption from a brief power outage up to the worst possible natural disaster or terrorist attack

19

Requirements of a BCP

1 Provide procedures and listing of resources to assist in the recovery process

2 Provide an immediate accurate and measured response to emergency situations

3 Identify vendors that may be needed in the recovery process and put agreements in place with selected vendors

4 Avoid confusion experienced during a crisis by documenting testing an training plan procedures

5 Clear guidance for declaring a disaster6 Provide the necessary directions to ensure the timely resumption of

critical services7 Document recovery processes so they can be executed by

knowledgeable people

20

BCDR Resources

Survive The Business Continuity Groupndash httpwwwsurvivecom

Emergency Information Infrastructure Partnershipndash httpwwweiiporg

Disaster Recovery Journalndash httpwwwdrjcom

21

Summary

bull Lots of parts of availability

bull Tradeoffs are essential

bull Complexity complexity complexity

bull Need policy for a roadmap

  • Availability Policy
  • Definition
  • Availability and Reliability
  • Major Causes of Disruption
  • Aspects of Availability
  • Data Availability
  • Backup Methods
  • Data Retention
  • Data Vaulting
  • Network Availability
  • Service Level Agreement
  • People and Availability
  • Infrastructure Availability
  • Infrastructure Solutions
  • Measuring Availability
  • MTBF amp MTTR
  • Availability Values
  • Business Continuity Planning
  • Requirements of a BCP
  • BCDR Resources
  • Summary

4

Major Causes of Disruption

bull Human Interferencendash Operator errorndash Virus and hacker attackndash Theft or sabotage

bull Communication Failurebull Hardware or system failurebull Natural Disastersbull Power Failurebull Water Damagebull Fire

5

Aspects of Availability

bull Data Availability

bull Network Availability

bull Communication Availability

bull System Availability

bull Power Availability

bull People Availability

bull Other Resources Availability

6

Data Availability

bull 1048708 Rule 1 Backup

bull 1048708 Rule 2 Backup

bull 1048708 Rule 3 Backup

7

Backup Methods

Full Backupndash Backup every filendash Takes a lot of storage space

Incremental Backupndash backs up files that have been created or modified only since the

last backupndash backup operator needing several tapes to do a complete restoration

Differential Backupndash backs up files that have been created or modified only since the

last full backupndash backup operator need only the full backup and the one differential

backup to restore thesystem

8

Data RetentionSarbanes Oxleybull All electronic company information must be retained for at

least five yearsbull Accounting firms that audit publicly traded companies

must retain all related documents for 7 years after auditHIPPAbull 1048708 Members of health care industry must retain patient

information for 6 yearsSEC 17a-3 and 17a-4bull Brokersdealers must retain records for 3-6 years and more

9

Data Vaulting

bull Copy of data is saved at a remote site periodically or continuously via network

bull Remote site may be own site or at a vendor location

bull Minimal or no data maybe lost in a disaster

bull There is typically some delay before data can actually be used

10

Network Availability

bull Prioritize the systems needing network access

bull Measure the amount of bandwidth needed to fulfill purpose of each component

bull Calculate overhead of protective measures

bull Decide what (if anything) can drop

11

Service Level Agreement

bull Can the ISP deliver

bull Can your equipment handle it

bull Higher bandwidth ndash for whatndash More businessndash Faster customer accessndash Faster music downloadsndash More scanning

12

People and Availability

bull People are a source of informationbull Staff with knowledge of how to fix a problem not

being there to fix it negatively impacts availabilityndash Positional redundancy ndash ldquoWorker X can do that but

shersquos not here until tomorrowrdquondash Shared knowledge ndash ldquoWhat if I get hit by a busrdquondash Limitations on physical access ndash ldquoItrsquos a 30 second fix

but it will take me 10 minutes to get thererdquondash Limitations placed by policy ndash ldquoI know how to fix it

but Irsquom not allowed to go in the server roomrdquo

13

Infrastructure Availability

bull Availability of the infrastructure can have a direct impact on availability of informationndash Voice communicationsndash Powerndash HVACndash Physical access

14

Infrastructure Solutions

Voicebull Cellular Phones bull WiFi Phonesbull Walkie-talkiesPowerbull Uninterruptible Power Supply (UPS)bull GeneratorsHVACbull Portable coolersbull FansBlowersPhysical Accessbull Security guardsbull Transportation shuttlesbull Backupalternative to electronic access controls

15

Measuring Availability

What does it mean to be available and how can it be measured

Availability means that systems or data are accessible but does not guaranteendash Performancendash Typical ways of doing things can still be usedndash Full system capacity

16

MTBF amp MTTR

Definitionsbull Mean Time Between Failure (MTBF) is the amount of

time between failures where failure is defined as a departure from acceptable service for a system This is a measure of reliability

bull Mean Time to Recover (MTTR) measures the amount of time required to repair or recovery for a failed system

bull Availability is the ratio of the time a system is actually available to the time it should have been available

Availability = MTBF (MTBF + MTTR)

17

Availability Values

bull 1 weekThreshold Downtime

99 11 hr

999 63 min

9999 378 sec

99999 38 sec

999999 038 sec

18

Business Continuity Planning

bull Big deal since 911bull Every Business Continuity strategy includes three

fundamental componentsndash Business Impact Analysisndash Recovery Strategyndash Design and Develop the disaster recovery process

bull BCP should consider every type of interruption from a brief power outage up to the worst possible natural disaster or terrorist attack

19

Requirements of a BCP

1 Provide procedures and listing of resources to assist in the recovery process

2 Provide an immediate accurate and measured response to emergency situations

3 Identify vendors that may be needed in the recovery process and put agreements in place with selected vendors

4 Avoid confusion experienced during a crisis by documenting testing an training plan procedures

5 Clear guidance for declaring a disaster6 Provide the necessary directions to ensure the timely resumption of

critical services7 Document recovery processes so they can be executed by

knowledgeable people

20

BCDR Resources

Survive The Business Continuity Groupndash httpwwwsurvivecom

Emergency Information Infrastructure Partnershipndash httpwwweiiporg

Disaster Recovery Journalndash httpwwwdrjcom

21

Summary

bull Lots of parts of availability

bull Tradeoffs are essential

bull Complexity complexity complexity

bull Need policy for a roadmap

  • Availability Policy
  • Definition
  • Availability and Reliability
  • Major Causes of Disruption
  • Aspects of Availability
  • Data Availability
  • Backup Methods
  • Data Retention
  • Data Vaulting
  • Network Availability
  • Service Level Agreement
  • People and Availability
  • Infrastructure Availability
  • Infrastructure Solutions
  • Measuring Availability
  • MTBF amp MTTR
  • Availability Values
  • Business Continuity Planning
  • Requirements of a BCP
  • BCDR Resources
  • Summary

5

Aspects of Availability

bull Data Availability

bull Network Availability

bull Communication Availability

bull System Availability

bull Power Availability

bull People Availability

bull Other Resources Availability

6

Data Availability

bull 1048708 Rule 1 Backup

bull 1048708 Rule 2 Backup

bull 1048708 Rule 3 Backup

7

Backup Methods

Full Backupndash Backup every filendash Takes a lot of storage space

Incremental Backupndash backs up files that have been created or modified only since the

last backupndash backup operator needing several tapes to do a complete restoration

Differential Backupndash backs up files that have been created or modified only since the

last full backupndash backup operator need only the full backup and the one differential

backup to restore thesystem

8

Data RetentionSarbanes Oxleybull All electronic company information must be retained for at

least five yearsbull Accounting firms that audit publicly traded companies

must retain all related documents for 7 years after auditHIPPAbull 1048708 Members of health care industry must retain patient

information for 6 yearsSEC 17a-3 and 17a-4bull Brokersdealers must retain records for 3-6 years and more

9

Data Vaulting

bull Copy of data is saved at a remote site periodically or continuously via network

bull Remote site may be own site or at a vendor location

bull Minimal or no data maybe lost in a disaster

bull There is typically some delay before data can actually be used

10

Network Availability

bull Prioritize the systems needing network access

bull Measure the amount of bandwidth needed to fulfill purpose of each component

bull Calculate overhead of protective measures

bull Decide what (if anything) can drop

11

Service Level Agreement

bull Can the ISP deliver

bull Can your equipment handle it

bull Higher bandwidth ndash for whatndash More businessndash Faster customer accessndash Faster music downloadsndash More scanning

12

People and Availability

bull People are a source of informationbull Staff with knowledge of how to fix a problem not

being there to fix it negatively impacts availabilityndash Positional redundancy ndash ldquoWorker X can do that but

shersquos not here until tomorrowrdquondash Shared knowledge ndash ldquoWhat if I get hit by a busrdquondash Limitations on physical access ndash ldquoItrsquos a 30 second fix

but it will take me 10 minutes to get thererdquondash Limitations placed by policy ndash ldquoI know how to fix it

but Irsquom not allowed to go in the server roomrdquo

13

Infrastructure Availability

bull Availability of the infrastructure can have a direct impact on availability of informationndash Voice communicationsndash Powerndash HVACndash Physical access

14

Infrastructure Solutions

Voicebull Cellular Phones bull WiFi Phonesbull Walkie-talkiesPowerbull Uninterruptible Power Supply (UPS)bull GeneratorsHVACbull Portable coolersbull FansBlowersPhysical Accessbull Security guardsbull Transportation shuttlesbull Backupalternative to electronic access controls

15

Measuring Availability

What does it mean to be available and how can it be measured

Availability means that systems or data are accessible but does not guaranteendash Performancendash Typical ways of doing things can still be usedndash Full system capacity

16

MTBF amp MTTR

Definitionsbull Mean Time Between Failure (MTBF) is the amount of

time between failures where failure is defined as a departure from acceptable service for a system This is a measure of reliability

bull Mean Time to Recover (MTTR) measures the amount of time required to repair or recovery for a failed system

bull Availability is the ratio of the time a system is actually available to the time it should have been available

Availability = MTBF (MTBF + MTTR)

17

Availability Values

bull 1 weekThreshold Downtime

99 11 hr

999 63 min

9999 378 sec

99999 38 sec

999999 038 sec

18

Business Continuity Planning

bull Big deal since 911bull Every Business Continuity strategy includes three

fundamental componentsndash Business Impact Analysisndash Recovery Strategyndash Design and Develop the disaster recovery process

bull BCP should consider every type of interruption from a brief power outage up to the worst possible natural disaster or terrorist attack

19

Requirements of a BCP

1 Provide procedures and listing of resources to assist in the recovery process

2 Provide an immediate accurate and measured response to emergency situations

3 Identify vendors that may be needed in the recovery process and put agreements in place with selected vendors

4 Avoid confusion experienced during a crisis by documenting testing an training plan procedures

5 Clear guidance for declaring a disaster6 Provide the necessary directions to ensure the timely resumption of

critical services7 Document recovery processes so they can be executed by

knowledgeable people

20

BCDR Resources

Survive The Business Continuity Groupndash httpwwwsurvivecom

Emergency Information Infrastructure Partnershipndash httpwwweiiporg

Disaster Recovery Journalndash httpwwwdrjcom

21

Summary

bull Lots of parts of availability

bull Tradeoffs are essential

bull Complexity complexity complexity

bull Need policy for a roadmap

  • Availability Policy
  • Definition
  • Availability and Reliability
  • Major Causes of Disruption
  • Aspects of Availability
  • Data Availability
  • Backup Methods
  • Data Retention
  • Data Vaulting
  • Network Availability
  • Service Level Agreement
  • People and Availability
  • Infrastructure Availability
  • Infrastructure Solutions
  • Measuring Availability
  • MTBF amp MTTR
  • Availability Values
  • Business Continuity Planning
  • Requirements of a BCP
  • BCDR Resources
  • Summary

6

Data Availability

bull 1048708 Rule 1 Backup

bull 1048708 Rule 2 Backup

bull 1048708 Rule 3 Backup

7

Backup Methods

Full Backupndash Backup every filendash Takes a lot of storage space

Incremental Backupndash backs up files that have been created or modified only since the

last backupndash backup operator needing several tapes to do a complete restoration

Differential Backupndash backs up files that have been created or modified only since the

last full backupndash backup operator need only the full backup and the one differential

backup to restore thesystem

8

Data RetentionSarbanes Oxleybull All electronic company information must be retained for at

least five yearsbull Accounting firms that audit publicly traded companies

must retain all related documents for 7 years after auditHIPPAbull 1048708 Members of health care industry must retain patient

information for 6 yearsSEC 17a-3 and 17a-4bull Brokersdealers must retain records for 3-6 years and more

9

Data Vaulting

bull Copy of data is saved at a remote site periodically or continuously via network

bull Remote site may be own site or at a vendor location

bull Minimal or no data maybe lost in a disaster

bull There is typically some delay before data can actually be used

10

Network Availability

bull Prioritize the systems needing network access

bull Measure the amount of bandwidth needed to fulfill purpose of each component

bull Calculate overhead of protective measures

bull Decide what (if anything) can drop

11

Service Level Agreement

bull Can the ISP deliver

bull Can your equipment handle it

bull Higher bandwidth ndash for whatndash More businessndash Faster customer accessndash Faster music downloadsndash More scanning

12

People and Availability

bull People are a source of informationbull Staff with knowledge of how to fix a problem not

being there to fix it negatively impacts availabilityndash Positional redundancy ndash ldquoWorker X can do that but

shersquos not here until tomorrowrdquondash Shared knowledge ndash ldquoWhat if I get hit by a busrdquondash Limitations on physical access ndash ldquoItrsquos a 30 second fix

but it will take me 10 minutes to get thererdquondash Limitations placed by policy ndash ldquoI know how to fix it

but Irsquom not allowed to go in the server roomrdquo

13

Infrastructure Availability

bull Availability of the infrastructure can have a direct impact on availability of informationndash Voice communicationsndash Powerndash HVACndash Physical access

14

Infrastructure Solutions

Voicebull Cellular Phones bull WiFi Phonesbull Walkie-talkiesPowerbull Uninterruptible Power Supply (UPS)bull GeneratorsHVACbull Portable coolersbull FansBlowersPhysical Accessbull Security guardsbull Transportation shuttlesbull Backupalternative to electronic access controls

15

Measuring Availability

What does it mean to be available and how can it be measured

Availability means that systems or data are accessible but does not guaranteendash Performancendash Typical ways of doing things can still be usedndash Full system capacity

16

MTBF amp MTTR

Definitionsbull Mean Time Between Failure (MTBF) is the amount of

time between failures where failure is defined as a departure from acceptable service for a system This is a measure of reliability

bull Mean Time to Recover (MTTR) measures the amount of time required to repair or recovery for a failed system

bull Availability is the ratio of the time a system is actually available to the time it should have been available

Availability = MTBF (MTBF + MTTR)

17

Availability Values

bull 1 weekThreshold Downtime

99 11 hr

999 63 min

9999 378 sec

99999 38 sec

999999 038 sec

18

Business Continuity Planning

bull Big deal since 911bull Every Business Continuity strategy includes three

fundamental componentsndash Business Impact Analysisndash Recovery Strategyndash Design and Develop the disaster recovery process

bull BCP should consider every type of interruption from a brief power outage up to the worst possible natural disaster or terrorist attack

19

Requirements of a BCP

1 Provide procedures and listing of resources to assist in the recovery process

2 Provide an immediate accurate and measured response to emergency situations

3 Identify vendors that may be needed in the recovery process and put agreements in place with selected vendors

4 Avoid confusion experienced during a crisis by documenting testing an training plan procedures

5 Clear guidance for declaring a disaster6 Provide the necessary directions to ensure the timely resumption of

critical services7 Document recovery processes so they can be executed by

knowledgeable people

20

BCDR Resources

Survive The Business Continuity Groupndash httpwwwsurvivecom

Emergency Information Infrastructure Partnershipndash httpwwweiiporg

Disaster Recovery Journalndash httpwwwdrjcom

21

Summary

bull Lots of parts of availability

bull Tradeoffs are essential

bull Complexity complexity complexity

bull Need policy for a roadmap

  • Availability Policy
  • Definition
  • Availability and Reliability
  • Major Causes of Disruption
  • Aspects of Availability
  • Data Availability
  • Backup Methods
  • Data Retention
  • Data Vaulting
  • Network Availability
  • Service Level Agreement
  • People and Availability
  • Infrastructure Availability
  • Infrastructure Solutions
  • Measuring Availability
  • MTBF amp MTTR
  • Availability Values
  • Business Continuity Planning
  • Requirements of a BCP
  • BCDR Resources
  • Summary

7

Backup Methods

Full Backupndash Backup every filendash Takes a lot of storage space

Incremental Backupndash backs up files that have been created or modified only since the

last backupndash backup operator needing several tapes to do a complete restoration

Differential Backupndash backs up files that have been created or modified only since the

last full backupndash backup operator need only the full backup and the one differential

backup to restore thesystem

8

Data RetentionSarbanes Oxleybull All electronic company information must be retained for at

least five yearsbull Accounting firms that audit publicly traded companies

must retain all related documents for 7 years after auditHIPPAbull 1048708 Members of health care industry must retain patient

information for 6 yearsSEC 17a-3 and 17a-4bull Brokersdealers must retain records for 3-6 years and more

9

Data Vaulting

bull Copy of data is saved at a remote site periodically or continuously via network

bull Remote site may be own site or at a vendor location

bull Minimal or no data maybe lost in a disaster

bull There is typically some delay before data can actually be used

10

Network Availability

bull Prioritize the systems needing network access

bull Measure the amount of bandwidth needed to fulfill purpose of each component

bull Calculate overhead of protective measures

bull Decide what (if anything) can drop

11

Service Level Agreement

bull Can the ISP deliver

bull Can your equipment handle it

bull Higher bandwidth ndash for whatndash More businessndash Faster customer accessndash Faster music downloadsndash More scanning

12

People and Availability

bull People are a source of informationbull Staff with knowledge of how to fix a problem not

being there to fix it negatively impacts availabilityndash Positional redundancy ndash ldquoWorker X can do that but

shersquos not here until tomorrowrdquondash Shared knowledge ndash ldquoWhat if I get hit by a busrdquondash Limitations on physical access ndash ldquoItrsquos a 30 second fix

but it will take me 10 minutes to get thererdquondash Limitations placed by policy ndash ldquoI know how to fix it

but Irsquom not allowed to go in the server roomrdquo

13

Infrastructure Availability

bull Availability of the infrastructure can have a direct impact on availability of informationndash Voice communicationsndash Powerndash HVACndash Physical access

14

Infrastructure Solutions

Voicebull Cellular Phones bull WiFi Phonesbull Walkie-talkiesPowerbull Uninterruptible Power Supply (UPS)bull GeneratorsHVACbull Portable coolersbull FansBlowersPhysical Accessbull Security guardsbull Transportation shuttlesbull Backupalternative to electronic access controls

15

Measuring Availability

What does it mean to be available and how can it be measured

Availability means that systems or data are accessible but does not guaranteendash Performancendash Typical ways of doing things can still be usedndash Full system capacity

16

MTBF amp MTTR

Definitionsbull Mean Time Between Failure (MTBF) is the amount of

time between failures where failure is defined as a departure from acceptable service for a system This is a measure of reliability

bull Mean Time to Recover (MTTR) measures the amount of time required to repair or recovery for a failed system

bull Availability is the ratio of the time a system is actually available to the time it should have been available

Availability = MTBF (MTBF + MTTR)

17

Availability Values

bull 1 weekThreshold Downtime

99 11 hr

999 63 min

9999 378 sec

99999 38 sec

999999 038 sec

18

Business Continuity Planning

bull Big deal since 911bull Every Business Continuity strategy includes three

fundamental componentsndash Business Impact Analysisndash Recovery Strategyndash Design and Develop the disaster recovery process

bull BCP should consider every type of interruption from a brief power outage up to the worst possible natural disaster or terrorist attack

19

Requirements of a BCP

1 Provide procedures and listing of resources to assist in the recovery process

2 Provide an immediate accurate and measured response to emergency situations

3 Identify vendors that may be needed in the recovery process and put agreements in place with selected vendors

4 Avoid confusion experienced during a crisis by documenting testing an training plan procedures

5 Clear guidance for declaring a disaster6 Provide the necessary directions to ensure the timely resumption of

critical services7 Document recovery processes so they can be executed by

knowledgeable people

20

BCDR Resources

Survive The Business Continuity Groupndash httpwwwsurvivecom

Emergency Information Infrastructure Partnershipndash httpwwweiiporg

Disaster Recovery Journalndash httpwwwdrjcom

21

Summary

bull Lots of parts of availability

bull Tradeoffs are essential

bull Complexity complexity complexity

bull Need policy for a roadmap

  • Availability Policy
  • Definition
  • Availability and Reliability
  • Major Causes of Disruption
  • Aspects of Availability
  • Data Availability
  • Backup Methods
  • Data Retention
  • Data Vaulting
  • Network Availability
  • Service Level Agreement
  • People and Availability
  • Infrastructure Availability
  • Infrastructure Solutions
  • Measuring Availability
  • MTBF amp MTTR
  • Availability Values
  • Business Continuity Planning
  • Requirements of a BCP
  • BCDR Resources
  • Summary

8

Data RetentionSarbanes Oxleybull All electronic company information must be retained for at

least five yearsbull Accounting firms that audit publicly traded companies

must retain all related documents for 7 years after auditHIPPAbull 1048708 Members of health care industry must retain patient

information for 6 yearsSEC 17a-3 and 17a-4bull Brokersdealers must retain records for 3-6 years and more

9

Data Vaulting

bull Copy of data is saved at a remote site periodically or continuously via network

bull Remote site may be own site or at a vendor location

bull Minimal or no data maybe lost in a disaster

bull There is typically some delay before data can actually be used

10

Network Availability

bull Prioritize the systems needing network access

bull Measure the amount of bandwidth needed to fulfill purpose of each component

bull Calculate overhead of protective measures

bull Decide what (if anything) can drop

11

Service Level Agreement

bull Can the ISP deliver

bull Can your equipment handle it

bull Higher bandwidth ndash for whatndash More businessndash Faster customer accessndash Faster music downloadsndash More scanning

12

People and Availability

bull People are a source of informationbull Staff with knowledge of how to fix a problem not

being there to fix it negatively impacts availabilityndash Positional redundancy ndash ldquoWorker X can do that but

shersquos not here until tomorrowrdquondash Shared knowledge ndash ldquoWhat if I get hit by a busrdquondash Limitations on physical access ndash ldquoItrsquos a 30 second fix

but it will take me 10 minutes to get thererdquondash Limitations placed by policy ndash ldquoI know how to fix it

but Irsquom not allowed to go in the server roomrdquo

13

Infrastructure Availability

bull Availability of the infrastructure can have a direct impact on availability of informationndash Voice communicationsndash Powerndash HVACndash Physical access

14

Infrastructure Solutions

Voicebull Cellular Phones bull WiFi Phonesbull Walkie-talkiesPowerbull Uninterruptible Power Supply (UPS)bull GeneratorsHVACbull Portable coolersbull FansBlowersPhysical Accessbull Security guardsbull Transportation shuttlesbull Backupalternative to electronic access controls

15

Measuring Availability

What does it mean to be available and how can it be measured

Availability means that systems or data are accessible but does not guaranteendash Performancendash Typical ways of doing things can still be usedndash Full system capacity

16

MTBF amp MTTR

Definitionsbull Mean Time Between Failure (MTBF) is the amount of

time between failures where failure is defined as a departure from acceptable service for a system This is a measure of reliability

bull Mean Time to Recover (MTTR) measures the amount of time required to repair or recovery for a failed system

bull Availability is the ratio of the time a system is actually available to the time it should have been available

Availability = MTBF (MTBF + MTTR)

17

Availability Values

bull 1 weekThreshold Downtime

99 11 hr

999 63 min

9999 378 sec

99999 38 sec

999999 038 sec

18

Business Continuity Planning

bull Big deal since 911bull Every Business Continuity strategy includes three

fundamental componentsndash Business Impact Analysisndash Recovery Strategyndash Design and Develop the disaster recovery process

bull BCP should consider every type of interruption from a brief power outage up to the worst possible natural disaster or terrorist attack

19

Requirements of a BCP

1 Provide procedures and listing of resources to assist in the recovery process

2 Provide an immediate accurate and measured response to emergency situations

3 Identify vendors that may be needed in the recovery process and put agreements in place with selected vendors

4 Avoid confusion experienced during a crisis by documenting testing an training plan procedures

5 Clear guidance for declaring a disaster6 Provide the necessary directions to ensure the timely resumption of

critical services7 Document recovery processes so they can be executed by

knowledgeable people

20

BCDR Resources

Survive The Business Continuity Groupndash httpwwwsurvivecom

Emergency Information Infrastructure Partnershipndash httpwwweiiporg

Disaster Recovery Journalndash httpwwwdrjcom

21

Summary

bull Lots of parts of availability

bull Tradeoffs are essential

bull Complexity complexity complexity

bull Need policy for a roadmap

  • Availability Policy
  • Definition
  • Availability and Reliability
  • Major Causes of Disruption
  • Aspects of Availability
  • Data Availability
  • Backup Methods
  • Data Retention
  • Data Vaulting
  • Network Availability
  • Service Level Agreement
  • People and Availability
  • Infrastructure Availability
  • Infrastructure Solutions
  • Measuring Availability
  • MTBF amp MTTR
  • Availability Values
  • Business Continuity Planning
  • Requirements of a BCP
  • BCDR Resources
  • Summary

9

Data Vaulting

bull Copy of data is saved at a remote site periodically or continuously via network

bull Remote site may be own site or at a vendor location

bull Minimal or no data maybe lost in a disaster

bull There is typically some delay before data can actually be used

10

Network Availability

bull Prioritize the systems needing network access

bull Measure the amount of bandwidth needed to fulfill purpose of each component

bull Calculate overhead of protective measures

bull Decide what (if anything) can drop

11

Service Level Agreement

bull Can the ISP deliver

bull Can your equipment handle it

bull Higher bandwidth ndash for whatndash More businessndash Faster customer accessndash Faster music downloadsndash More scanning

12

People and Availability

bull People are a source of informationbull Staff with knowledge of how to fix a problem not

being there to fix it negatively impacts availabilityndash Positional redundancy ndash ldquoWorker X can do that but

shersquos not here until tomorrowrdquondash Shared knowledge ndash ldquoWhat if I get hit by a busrdquondash Limitations on physical access ndash ldquoItrsquos a 30 second fix

but it will take me 10 minutes to get thererdquondash Limitations placed by policy ndash ldquoI know how to fix it

but Irsquom not allowed to go in the server roomrdquo

13

Infrastructure Availability

bull Availability of the infrastructure can have a direct impact on availability of informationndash Voice communicationsndash Powerndash HVACndash Physical access

14

Infrastructure Solutions

Voicebull Cellular Phones bull WiFi Phonesbull Walkie-talkiesPowerbull Uninterruptible Power Supply (UPS)bull GeneratorsHVACbull Portable coolersbull FansBlowersPhysical Accessbull Security guardsbull Transportation shuttlesbull Backupalternative to electronic access controls

15

Measuring Availability

What does it mean to be available and how can it be measured

Availability means that systems or data are accessible but does not guaranteendash Performancendash Typical ways of doing things can still be usedndash Full system capacity

16

MTBF amp MTTR

Definitionsbull Mean Time Between Failure (MTBF) is the amount of

time between failures where failure is defined as a departure from acceptable service for a system This is a measure of reliability

bull Mean Time to Recover (MTTR) measures the amount of time required to repair or recovery for a failed system

bull Availability is the ratio of the time a system is actually available to the time it should have been available

Availability = MTBF (MTBF + MTTR)

17

Availability Values

bull 1 weekThreshold Downtime

99 11 hr

999 63 min

9999 378 sec

99999 38 sec

999999 038 sec

18

Business Continuity Planning

bull Big deal since 911bull Every Business Continuity strategy includes three

fundamental componentsndash Business Impact Analysisndash Recovery Strategyndash Design and Develop the disaster recovery process

bull BCP should consider every type of interruption from a brief power outage up to the worst possible natural disaster or terrorist attack

19

Requirements of a BCP

1 Provide procedures and listing of resources to assist in the recovery process

2 Provide an immediate accurate and measured response to emergency situations

3 Identify vendors that may be needed in the recovery process and put agreements in place with selected vendors

4 Avoid confusion experienced during a crisis by documenting testing an training plan procedures

5 Clear guidance for declaring a disaster6 Provide the necessary directions to ensure the timely resumption of

critical services7 Document recovery processes so they can be executed by

knowledgeable people

20

BCDR Resources

Survive The Business Continuity Groupndash httpwwwsurvivecom

Emergency Information Infrastructure Partnershipndash httpwwweiiporg

Disaster Recovery Journalndash httpwwwdrjcom

21

Summary

bull Lots of parts of availability

bull Tradeoffs are essential

bull Complexity complexity complexity

bull Need policy for a roadmap

  • Availability Policy
  • Definition
  • Availability and Reliability
  • Major Causes of Disruption
  • Aspects of Availability
  • Data Availability
  • Backup Methods
  • Data Retention
  • Data Vaulting
  • Network Availability
  • Service Level Agreement
  • People and Availability
  • Infrastructure Availability
  • Infrastructure Solutions
  • Measuring Availability
  • MTBF amp MTTR
  • Availability Values
  • Business Continuity Planning
  • Requirements of a BCP
  • BCDR Resources
  • Summary

10

Network Availability

bull Prioritize the systems needing network access

bull Measure the amount of bandwidth needed to fulfill purpose of each component

bull Calculate overhead of protective measures

bull Decide what (if anything) can drop

11

Service Level Agreement

bull Can the ISP deliver

bull Can your equipment handle it

bull Higher bandwidth ndash for whatndash More businessndash Faster customer accessndash Faster music downloadsndash More scanning

12

People and Availability

bull People are a source of informationbull Staff with knowledge of how to fix a problem not

being there to fix it negatively impacts availabilityndash Positional redundancy ndash ldquoWorker X can do that but

shersquos not here until tomorrowrdquondash Shared knowledge ndash ldquoWhat if I get hit by a busrdquondash Limitations on physical access ndash ldquoItrsquos a 30 second fix

but it will take me 10 minutes to get thererdquondash Limitations placed by policy ndash ldquoI know how to fix it

but Irsquom not allowed to go in the server roomrdquo

13

Infrastructure Availability

bull Availability of the infrastructure can have a direct impact on availability of informationndash Voice communicationsndash Powerndash HVACndash Physical access

14

Infrastructure Solutions

Voicebull Cellular Phones bull WiFi Phonesbull Walkie-talkiesPowerbull Uninterruptible Power Supply (UPS)bull GeneratorsHVACbull Portable coolersbull FansBlowersPhysical Accessbull Security guardsbull Transportation shuttlesbull Backupalternative to electronic access controls

15

Measuring Availability

What does it mean to be available and how can it be measured

Availability means that systems or data are accessible but does not guaranteendash Performancendash Typical ways of doing things can still be usedndash Full system capacity

16

MTBF amp MTTR

Definitionsbull Mean Time Between Failure (MTBF) is the amount of

time between failures where failure is defined as a departure from acceptable service for a system This is a measure of reliability

bull Mean Time to Recover (MTTR) measures the amount of time required to repair or recovery for a failed system

bull Availability is the ratio of the time a system is actually available to the time it should have been available

Availability = MTBF (MTBF + MTTR)

17

Availability Values

bull 1 weekThreshold Downtime

99 11 hr

999 63 min

9999 378 sec

99999 38 sec

999999 038 sec

18

Business Continuity Planning

bull Big deal since 911bull Every Business Continuity strategy includes three

fundamental componentsndash Business Impact Analysisndash Recovery Strategyndash Design and Develop the disaster recovery process

bull BCP should consider every type of interruption from a brief power outage up to the worst possible natural disaster or terrorist attack

19

Requirements of a BCP

1 Provide procedures and listing of resources to assist in the recovery process

2 Provide an immediate accurate and measured response to emergency situations

3 Identify vendors that may be needed in the recovery process and put agreements in place with selected vendors

4 Avoid confusion experienced during a crisis by documenting testing an training plan procedures

5 Clear guidance for declaring a disaster6 Provide the necessary directions to ensure the timely resumption of

critical services7 Document recovery processes so they can be executed by

knowledgeable people

20

BCDR Resources

Survive The Business Continuity Groupndash httpwwwsurvivecom

Emergency Information Infrastructure Partnershipndash httpwwweiiporg

Disaster Recovery Journalndash httpwwwdrjcom

21

Summary

bull Lots of parts of availability

bull Tradeoffs are essential

bull Complexity complexity complexity

bull Need policy for a roadmap

  • Availability Policy
  • Definition
  • Availability and Reliability
  • Major Causes of Disruption
  • Aspects of Availability
  • Data Availability
  • Backup Methods
  • Data Retention
  • Data Vaulting
  • Network Availability
  • Service Level Agreement
  • People and Availability
  • Infrastructure Availability
  • Infrastructure Solutions
  • Measuring Availability
  • MTBF amp MTTR
  • Availability Values
  • Business Continuity Planning
  • Requirements of a BCP
  • BCDR Resources
  • Summary

11

Service Level Agreement

bull Can the ISP deliver

bull Can your equipment handle it

bull Higher bandwidth ndash for whatndash More businessndash Faster customer accessndash Faster music downloadsndash More scanning

12

People and Availability

bull People are a source of informationbull Staff with knowledge of how to fix a problem not

being there to fix it negatively impacts availabilityndash Positional redundancy ndash ldquoWorker X can do that but

shersquos not here until tomorrowrdquondash Shared knowledge ndash ldquoWhat if I get hit by a busrdquondash Limitations on physical access ndash ldquoItrsquos a 30 second fix

but it will take me 10 minutes to get thererdquondash Limitations placed by policy ndash ldquoI know how to fix it

but Irsquom not allowed to go in the server roomrdquo

13

Infrastructure Availability

bull Availability of the infrastructure can have a direct impact on availability of informationndash Voice communicationsndash Powerndash HVACndash Physical access

14

Infrastructure Solutions

Voicebull Cellular Phones bull WiFi Phonesbull Walkie-talkiesPowerbull Uninterruptible Power Supply (UPS)bull GeneratorsHVACbull Portable coolersbull FansBlowersPhysical Accessbull Security guardsbull Transportation shuttlesbull Backupalternative to electronic access controls

15

Measuring Availability

What does it mean to be available and how can it be measured

Availability means that systems or data are accessible but does not guaranteendash Performancendash Typical ways of doing things can still be usedndash Full system capacity

16

MTBF amp MTTR

Definitionsbull Mean Time Between Failure (MTBF) is the amount of

time between failures where failure is defined as a departure from acceptable service for a system This is a measure of reliability

bull Mean Time to Recover (MTTR) measures the amount of time required to repair or recovery for a failed system

bull Availability is the ratio of the time a system is actually available to the time it should have been available

Availability = MTBF (MTBF + MTTR)

17

Availability Values

bull 1 weekThreshold Downtime

99 11 hr

999 63 min

9999 378 sec

99999 38 sec

999999 038 sec

18

Business Continuity Planning

bull Big deal since 911bull Every Business Continuity strategy includes three

fundamental componentsndash Business Impact Analysisndash Recovery Strategyndash Design and Develop the disaster recovery process

bull BCP should consider every type of interruption from a brief power outage up to the worst possible natural disaster or terrorist attack

19

Requirements of a BCP

1 Provide procedures and listing of resources to assist in the recovery process

2 Provide an immediate accurate and measured response to emergency situations

3 Identify vendors that may be needed in the recovery process and put agreements in place with selected vendors

4 Avoid confusion experienced during a crisis by documenting testing an training plan procedures

5 Clear guidance for declaring a disaster6 Provide the necessary directions to ensure the timely resumption of

critical services7 Document recovery processes so they can be executed by

knowledgeable people

20

BCDR Resources

Survive The Business Continuity Groupndash httpwwwsurvivecom

Emergency Information Infrastructure Partnershipndash httpwwweiiporg

Disaster Recovery Journalndash httpwwwdrjcom

21

Summary

bull Lots of parts of availability

bull Tradeoffs are essential

bull Complexity complexity complexity

bull Need policy for a roadmap

  • Availability Policy
  • Definition
  • Availability and Reliability
  • Major Causes of Disruption
  • Aspects of Availability
  • Data Availability
  • Backup Methods
  • Data Retention
  • Data Vaulting
  • Network Availability
  • Service Level Agreement
  • People and Availability
  • Infrastructure Availability
  • Infrastructure Solutions
  • Measuring Availability
  • MTBF amp MTTR
  • Availability Values
  • Business Continuity Planning
  • Requirements of a BCP
  • BCDR Resources
  • Summary

12

People and Availability

bull People are a source of informationbull Staff with knowledge of how to fix a problem not

being there to fix it negatively impacts availabilityndash Positional redundancy ndash ldquoWorker X can do that but

shersquos not here until tomorrowrdquondash Shared knowledge ndash ldquoWhat if I get hit by a busrdquondash Limitations on physical access ndash ldquoItrsquos a 30 second fix

but it will take me 10 minutes to get thererdquondash Limitations placed by policy ndash ldquoI know how to fix it

but Irsquom not allowed to go in the server roomrdquo

13

Infrastructure Availability

bull Availability of the infrastructure can have a direct impact on availability of informationndash Voice communicationsndash Powerndash HVACndash Physical access

14

Infrastructure Solutions

Voicebull Cellular Phones bull WiFi Phonesbull Walkie-talkiesPowerbull Uninterruptible Power Supply (UPS)bull GeneratorsHVACbull Portable coolersbull FansBlowersPhysical Accessbull Security guardsbull Transportation shuttlesbull Backupalternative to electronic access controls

15

Measuring Availability

What does it mean to be available and how can it be measured

Availability means that systems or data are accessible but does not guaranteendash Performancendash Typical ways of doing things can still be usedndash Full system capacity

16

MTBF amp MTTR

Definitionsbull Mean Time Between Failure (MTBF) is the amount of

time between failures where failure is defined as a departure from acceptable service for a system This is a measure of reliability

bull Mean Time to Recover (MTTR) measures the amount of time required to repair or recovery for a failed system

bull Availability is the ratio of the time a system is actually available to the time it should have been available

Availability = MTBF (MTBF + MTTR)

17

Availability Values

bull 1 weekThreshold Downtime

99 11 hr

999 63 min

9999 378 sec

99999 38 sec

999999 038 sec

18

Business Continuity Planning

bull Big deal since 911bull Every Business Continuity strategy includes three

fundamental componentsndash Business Impact Analysisndash Recovery Strategyndash Design and Develop the disaster recovery process

bull BCP should consider every type of interruption from a brief power outage up to the worst possible natural disaster or terrorist attack

19

Requirements of a BCP

1 Provide procedures and listing of resources to assist in the recovery process

2 Provide an immediate accurate and measured response to emergency situations

3 Identify vendors that may be needed in the recovery process and put agreements in place with selected vendors

4 Avoid confusion experienced during a crisis by documenting testing an training plan procedures

5 Clear guidance for declaring a disaster6 Provide the necessary directions to ensure the timely resumption of

critical services7 Document recovery processes so they can be executed by

knowledgeable people

20

BCDR Resources

Survive The Business Continuity Groupndash httpwwwsurvivecom

Emergency Information Infrastructure Partnershipndash httpwwweiiporg

Disaster Recovery Journalndash httpwwwdrjcom

21

Summary

bull Lots of parts of availability

bull Tradeoffs are essential

bull Complexity complexity complexity

bull Need policy for a roadmap

  • Availability Policy
  • Definition
  • Availability and Reliability
  • Major Causes of Disruption
  • Aspects of Availability
  • Data Availability
  • Backup Methods
  • Data Retention
  • Data Vaulting
  • Network Availability
  • Service Level Agreement
  • People and Availability
  • Infrastructure Availability
  • Infrastructure Solutions
  • Measuring Availability
  • MTBF amp MTTR
  • Availability Values
  • Business Continuity Planning
  • Requirements of a BCP
  • BCDR Resources
  • Summary

13

Infrastructure Availability

bull Availability of the infrastructure can have a direct impact on availability of informationndash Voice communicationsndash Powerndash HVACndash Physical access

14

Infrastructure Solutions

Voicebull Cellular Phones bull WiFi Phonesbull Walkie-talkiesPowerbull Uninterruptible Power Supply (UPS)bull GeneratorsHVACbull Portable coolersbull FansBlowersPhysical Accessbull Security guardsbull Transportation shuttlesbull Backupalternative to electronic access controls

15

Measuring Availability

What does it mean to be available and how can it be measured

Availability means that systems or data are accessible but does not guaranteendash Performancendash Typical ways of doing things can still be usedndash Full system capacity

16

MTBF amp MTTR

Definitionsbull Mean Time Between Failure (MTBF) is the amount of

time between failures where failure is defined as a departure from acceptable service for a system This is a measure of reliability

bull Mean Time to Recover (MTTR) measures the amount of time required to repair or recovery for a failed system

bull Availability is the ratio of the time a system is actually available to the time it should have been available

Availability = MTBF (MTBF + MTTR)

17

Availability Values

bull 1 weekThreshold Downtime

99 11 hr

999 63 min

9999 378 sec

99999 38 sec

999999 038 sec

18

Business Continuity Planning

bull Big deal since 911bull Every Business Continuity strategy includes three

fundamental componentsndash Business Impact Analysisndash Recovery Strategyndash Design and Develop the disaster recovery process

bull BCP should consider every type of interruption from a brief power outage up to the worst possible natural disaster or terrorist attack

19

Requirements of a BCP

1 Provide procedures and listing of resources to assist in the recovery process

2 Provide an immediate accurate and measured response to emergency situations

3 Identify vendors that may be needed in the recovery process and put agreements in place with selected vendors

4 Avoid confusion experienced during a crisis by documenting testing an training plan procedures

5 Clear guidance for declaring a disaster6 Provide the necessary directions to ensure the timely resumption of

critical services7 Document recovery processes so they can be executed by

knowledgeable people

20

BCDR Resources

Survive The Business Continuity Groupndash httpwwwsurvivecom

Emergency Information Infrastructure Partnershipndash httpwwweiiporg

Disaster Recovery Journalndash httpwwwdrjcom

21

Summary

bull Lots of parts of availability

bull Tradeoffs are essential

bull Complexity complexity complexity

bull Need policy for a roadmap

  • Availability Policy
  • Definition
  • Availability and Reliability
  • Major Causes of Disruption
  • Aspects of Availability
  • Data Availability
  • Backup Methods
  • Data Retention
  • Data Vaulting
  • Network Availability
  • Service Level Agreement
  • People and Availability
  • Infrastructure Availability
  • Infrastructure Solutions
  • Measuring Availability
  • MTBF amp MTTR
  • Availability Values
  • Business Continuity Planning
  • Requirements of a BCP
  • BCDR Resources
  • Summary

14

Infrastructure Solutions

Voicebull Cellular Phones bull WiFi Phonesbull Walkie-talkiesPowerbull Uninterruptible Power Supply (UPS)bull GeneratorsHVACbull Portable coolersbull FansBlowersPhysical Accessbull Security guardsbull Transportation shuttlesbull Backupalternative to electronic access controls

15

Measuring Availability

What does it mean to be available and how can it be measured

Availability means that systems or data are accessible but does not guaranteendash Performancendash Typical ways of doing things can still be usedndash Full system capacity

16

MTBF amp MTTR

Definitionsbull Mean Time Between Failure (MTBF) is the amount of

time between failures where failure is defined as a departure from acceptable service for a system This is a measure of reliability

bull Mean Time to Recover (MTTR) measures the amount of time required to repair or recovery for a failed system

bull Availability is the ratio of the time a system is actually available to the time it should have been available

Availability = MTBF (MTBF + MTTR)

17

Availability Values

bull 1 weekThreshold Downtime

99 11 hr

999 63 min

9999 378 sec

99999 38 sec

999999 038 sec

18

Business Continuity Planning

bull Big deal since 911bull Every Business Continuity strategy includes three

fundamental componentsndash Business Impact Analysisndash Recovery Strategyndash Design and Develop the disaster recovery process

bull BCP should consider every type of interruption from a brief power outage up to the worst possible natural disaster or terrorist attack

19

Requirements of a BCP

1 Provide procedures and listing of resources to assist in the recovery process

2 Provide an immediate accurate and measured response to emergency situations

3 Identify vendors that may be needed in the recovery process and put agreements in place with selected vendors

4 Avoid confusion experienced during a crisis by documenting testing an training plan procedures

5 Clear guidance for declaring a disaster6 Provide the necessary directions to ensure the timely resumption of

critical services7 Document recovery processes so they can be executed by

knowledgeable people

20

BCDR Resources

Survive The Business Continuity Groupndash httpwwwsurvivecom

Emergency Information Infrastructure Partnershipndash httpwwweiiporg

Disaster Recovery Journalndash httpwwwdrjcom

21

Summary

bull Lots of parts of availability

bull Tradeoffs are essential

bull Complexity complexity complexity

bull Need policy for a roadmap

  • Availability Policy
  • Definition
  • Availability and Reliability
  • Major Causes of Disruption
  • Aspects of Availability
  • Data Availability
  • Backup Methods
  • Data Retention
  • Data Vaulting
  • Network Availability
  • Service Level Agreement
  • People and Availability
  • Infrastructure Availability
  • Infrastructure Solutions
  • Measuring Availability
  • MTBF amp MTTR
  • Availability Values
  • Business Continuity Planning
  • Requirements of a BCP
  • BCDR Resources
  • Summary

15

Measuring Availability

What does it mean to be available and how can it be measured

Availability means that systems or data are accessible but does not guaranteendash Performancendash Typical ways of doing things can still be usedndash Full system capacity

16

MTBF amp MTTR

Definitionsbull Mean Time Between Failure (MTBF) is the amount of

time between failures where failure is defined as a departure from acceptable service for a system This is a measure of reliability

bull Mean Time to Recover (MTTR) measures the amount of time required to repair or recovery for a failed system

bull Availability is the ratio of the time a system is actually available to the time it should have been available

Availability = MTBF (MTBF + MTTR)

17

Availability Values

bull 1 weekThreshold Downtime

99 11 hr

999 63 min

9999 378 sec

99999 38 sec

999999 038 sec

18

Business Continuity Planning

bull Big deal since 911bull Every Business Continuity strategy includes three

fundamental componentsndash Business Impact Analysisndash Recovery Strategyndash Design and Develop the disaster recovery process

bull BCP should consider every type of interruption from a brief power outage up to the worst possible natural disaster or terrorist attack

19

Requirements of a BCP

1 Provide procedures and listing of resources to assist in the recovery process

2 Provide an immediate accurate and measured response to emergency situations

3 Identify vendors that may be needed in the recovery process and put agreements in place with selected vendors

4 Avoid confusion experienced during a crisis by documenting testing an training plan procedures

5 Clear guidance for declaring a disaster6 Provide the necessary directions to ensure the timely resumption of

critical services7 Document recovery processes so they can be executed by

knowledgeable people

20

BCDR Resources

Survive The Business Continuity Groupndash httpwwwsurvivecom

Emergency Information Infrastructure Partnershipndash httpwwweiiporg

Disaster Recovery Journalndash httpwwwdrjcom

21

Summary

bull Lots of parts of availability

bull Tradeoffs are essential

bull Complexity complexity complexity

bull Need policy for a roadmap

  • Availability Policy
  • Definition
  • Availability and Reliability
  • Major Causes of Disruption
  • Aspects of Availability
  • Data Availability
  • Backup Methods
  • Data Retention
  • Data Vaulting
  • Network Availability
  • Service Level Agreement
  • People and Availability
  • Infrastructure Availability
  • Infrastructure Solutions
  • Measuring Availability
  • MTBF amp MTTR
  • Availability Values
  • Business Continuity Planning
  • Requirements of a BCP
  • BCDR Resources
  • Summary

16

MTBF amp MTTR

Definitionsbull Mean Time Between Failure (MTBF) is the amount of

time between failures where failure is defined as a departure from acceptable service for a system This is a measure of reliability

bull Mean Time to Recover (MTTR) measures the amount of time required to repair or recovery for a failed system

bull Availability is the ratio of the time a system is actually available to the time it should have been available

Availability = MTBF (MTBF + MTTR)

17

Availability Values

bull 1 weekThreshold Downtime

99 11 hr

999 63 min

9999 378 sec

99999 38 sec

999999 038 sec

18

Business Continuity Planning

bull Big deal since 911bull Every Business Continuity strategy includes three

fundamental componentsndash Business Impact Analysisndash Recovery Strategyndash Design and Develop the disaster recovery process

bull BCP should consider every type of interruption from a brief power outage up to the worst possible natural disaster or terrorist attack

19

Requirements of a BCP

1 Provide procedures and listing of resources to assist in the recovery process

2 Provide an immediate accurate and measured response to emergency situations

3 Identify vendors that may be needed in the recovery process and put agreements in place with selected vendors

4 Avoid confusion experienced during a crisis by documenting testing an training plan procedures

5 Clear guidance for declaring a disaster6 Provide the necessary directions to ensure the timely resumption of

critical services7 Document recovery processes so they can be executed by

knowledgeable people

20

BCDR Resources

Survive The Business Continuity Groupndash httpwwwsurvivecom

Emergency Information Infrastructure Partnershipndash httpwwweiiporg

Disaster Recovery Journalndash httpwwwdrjcom

21

Summary

bull Lots of parts of availability

bull Tradeoffs are essential

bull Complexity complexity complexity

bull Need policy for a roadmap

  • Availability Policy
  • Definition
  • Availability and Reliability
  • Major Causes of Disruption
  • Aspects of Availability
  • Data Availability
  • Backup Methods
  • Data Retention
  • Data Vaulting
  • Network Availability
  • Service Level Agreement
  • People and Availability
  • Infrastructure Availability
  • Infrastructure Solutions
  • Measuring Availability
  • MTBF amp MTTR
  • Availability Values
  • Business Continuity Planning
  • Requirements of a BCP
  • BCDR Resources
  • Summary

17

Availability Values

bull 1 weekThreshold Downtime

99 11 hr

999 63 min

9999 378 sec

99999 38 sec

999999 038 sec

18

Business Continuity Planning

bull Big deal since 911bull Every Business Continuity strategy includes three

fundamental componentsndash Business Impact Analysisndash Recovery Strategyndash Design and Develop the disaster recovery process

bull BCP should consider every type of interruption from a brief power outage up to the worst possible natural disaster or terrorist attack

19

Requirements of a BCP

1 Provide procedures and listing of resources to assist in the recovery process

2 Provide an immediate accurate and measured response to emergency situations

3 Identify vendors that may be needed in the recovery process and put agreements in place with selected vendors

4 Avoid confusion experienced during a crisis by documenting testing an training plan procedures

5 Clear guidance for declaring a disaster6 Provide the necessary directions to ensure the timely resumption of

critical services7 Document recovery processes so they can be executed by

knowledgeable people

20

BCDR Resources

Survive The Business Continuity Groupndash httpwwwsurvivecom

Emergency Information Infrastructure Partnershipndash httpwwweiiporg

Disaster Recovery Journalndash httpwwwdrjcom

21

Summary

bull Lots of parts of availability

bull Tradeoffs are essential

bull Complexity complexity complexity

bull Need policy for a roadmap

  • Availability Policy
  • Definition
  • Availability and Reliability
  • Major Causes of Disruption
  • Aspects of Availability
  • Data Availability
  • Backup Methods
  • Data Retention
  • Data Vaulting
  • Network Availability
  • Service Level Agreement
  • People and Availability
  • Infrastructure Availability
  • Infrastructure Solutions
  • Measuring Availability
  • MTBF amp MTTR
  • Availability Values
  • Business Continuity Planning
  • Requirements of a BCP
  • BCDR Resources
  • Summary

18

Business Continuity Planning

bull Big deal since 911bull Every Business Continuity strategy includes three

fundamental componentsndash Business Impact Analysisndash Recovery Strategyndash Design and Develop the disaster recovery process

bull BCP should consider every type of interruption from a brief power outage up to the worst possible natural disaster or terrorist attack

19

Requirements of a BCP

1 Provide procedures and listing of resources to assist in the recovery process

2 Provide an immediate accurate and measured response to emergency situations

3 Identify vendors that may be needed in the recovery process and put agreements in place with selected vendors

4 Avoid confusion experienced during a crisis by documenting testing an training plan procedures

5 Clear guidance for declaring a disaster6 Provide the necessary directions to ensure the timely resumption of

critical services7 Document recovery processes so they can be executed by

knowledgeable people

20

BCDR Resources

Survive The Business Continuity Groupndash httpwwwsurvivecom

Emergency Information Infrastructure Partnershipndash httpwwweiiporg

Disaster Recovery Journalndash httpwwwdrjcom

21

Summary

bull Lots of parts of availability

bull Tradeoffs are essential

bull Complexity complexity complexity

bull Need policy for a roadmap

  • Availability Policy
  • Definition
  • Availability and Reliability
  • Major Causes of Disruption
  • Aspects of Availability
  • Data Availability
  • Backup Methods
  • Data Retention
  • Data Vaulting
  • Network Availability
  • Service Level Agreement
  • People and Availability
  • Infrastructure Availability
  • Infrastructure Solutions
  • Measuring Availability
  • MTBF amp MTTR
  • Availability Values
  • Business Continuity Planning
  • Requirements of a BCP
  • BCDR Resources
  • Summary

19

Requirements of a BCP

1 Provide procedures and listing of resources to assist in the recovery process

2 Provide an immediate accurate and measured response to emergency situations

3 Identify vendors that may be needed in the recovery process and put agreements in place with selected vendors

4 Avoid confusion experienced during a crisis by documenting testing an training plan procedures

5 Clear guidance for declaring a disaster6 Provide the necessary directions to ensure the timely resumption of

critical services7 Document recovery processes so they can be executed by

knowledgeable people

20

BCDR Resources

Survive The Business Continuity Groupndash httpwwwsurvivecom

Emergency Information Infrastructure Partnershipndash httpwwweiiporg

Disaster Recovery Journalndash httpwwwdrjcom

21

Summary

bull Lots of parts of availability

bull Tradeoffs are essential

bull Complexity complexity complexity

bull Need policy for a roadmap

  • Availability Policy
  • Definition
  • Availability and Reliability
  • Major Causes of Disruption
  • Aspects of Availability
  • Data Availability
  • Backup Methods
  • Data Retention
  • Data Vaulting
  • Network Availability
  • Service Level Agreement
  • People and Availability
  • Infrastructure Availability
  • Infrastructure Solutions
  • Measuring Availability
  • MTBF amp MTTR
  • Availability Values
  • Business Continuity Planning
  • Requirements of a BCP
  • BCDR Resources
  • Summary

20

BCDR Resources

Survive The Business Continuity Groupndash httpwwwsurvivecom

Emergency Information Infrastructure Partnershipndash httpwwweiiporg

Disaster Recovery Journalndash httpwwwdrjcom

21

Summary

bull Lots of parts of availability

bull Tradeoffs are essential

bull Complexity complexity complexity

bull Need policy for a roadmap

  • Availability Policy
  • Definition
  • Availability and Reliability
  • Major Causes of Disruption
  • Aspects of Availability
  • Data Availability
  • Backup Methods
  • Data Retention
  • Data Vaulting
  • Network Availability
  • Service Level Agreement
  • People and Availability
  • Infrastructure Availability
  • Infrastructure Solutions
  • Measuring Availability
  • MTBF amp MTTR
  • Availability Values
  • Business Continuity Planning
  • Requirements of a BCP
  • BCDR Resources
  • Summary

21

Summary

bull Lots of parts of availability

bull Tradeoffs are essential

bull Complexity complexity complexity

bull Need policy for a roadmap

  • Availability Policy
  • Definition
  • Availability and Reliability
  • Major Causes of Disruption
  • Aspects of Availability
  • Data Availability
  • Backup Methods
  • Data Retention
  • Data Vaulting
  • Network Availability
  • Service Level Agreement
  • People and Availability
  • Infrastructure Availability
  • Infrastructure Solutions
  • Measuring Availability
  • MTBF amp MTTR
  • Availability Values
  • Business Continuity Planning
  • Requirements of a BCP
  • BCDR Resources
  • Summary