Backup Methods For a Hot Site Dieter W. Storr Los Angeles Times 23 August 2005

Preview:

Citation preview

Backup Methods For a Hot Site

Dieter W. Storr Los Angeles Times

23 August 2005

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

2

B/R Methods Existing Backup Method Experiences Mirroring or Replicating Fast Copy of Data Proposals and Costs Future Technology Lessons learned

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

3

Existing Backup Method

•From disk (databases)

•Copy to

•3490 / 3590-1 / VTS

•Then, copy to

•3590-1 (cartridge)

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

4

ADABAS 6.2.2 Back-up at LA Times

W e e k l y 2 1 : 0 0 - 2 1 : 3 0

A D A P n B K F O n l i n e S A V E

A D A P n P L C F E O F P L

A D A P n P L C P L O G S w i t c h

D F D S S F u l l V o l u m e B a c k - u p

A D A P n B K O C o p y O n l i n e S A V E s

B R M / A B A R S S e v e r a l J o b s

P D S , G D G s , e t c . D i s k P o o l

2 : 0 0 3 : 0 0 8 : 0 0 - 1 1 : 0 0

P i c k - u p b y R e c a l l

2 1 : 3 0 - 1 : 1 5

J o b 3 4 9 0 t a p e s ( 3 5 9 0 - 1 )

D A P 1 B K O 2 ( 1 ) A D A P 2 B K O 3 5 ( 1 ) A D A P 3 B K O 1 6 ( 1 ) A D A P 4 B K O 8 ( 1 ) A D A P 5 B K O 4 ( 1 ) 6 5 ( 5 ) D F D S S / o n e t a p e p e r v o l u m e

5 9 ( ? )

B R M / A B A R S 2 2 ( ? ) T O T A L 2 1 1 ( ? ) ( O n l y f o r A D A B A S )

S t a t u s : 2 7 J a n 2 0 0 5

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

5

B/R Methods

Source: http://www.drj.com/articles/spr02/1502-07.html

“Companies that relied on

tape or on third-party provider

found in many cases they had

difficulty meeting their recovery

time objectives.”

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

6

B/R Methods

Source: 15 Apr 2004 | SearchSecurity.com

“Flaws in tape-based data backup may be

leaving enterprises without key information

and could lead to legal exposure under

emerging laws such as Sarbanes-Oxley, say

data backup and recovery experts. “

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

7

B/R Methods In a survey of 500 IT departments completed …

found that as many as 20% of routine, nightly backups fail to capture all data.

40% of IT managers had been unable to recover data from a tape when they needed it

More than 23% sought to use data stored on tape backups more than 20 times in a year

Source: 15 Apr 2004 | SearchSecurity.com

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

8

B/R Methods

Are tapes really so bad?LA Times experiences?

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

9

Tape Problems

1 November 2002:

Six tape drive errors

Delay

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

10

Tape Problems

24 March 2003:

Only two channel paths per

tape controller were provided

Slow restore time

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

11

Tape Problems

5 October 2003:

3590 tape drives were not

defined to DFSMS (SMS)

ADABAS restore and

application test cancelled

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

12

Tape Problems

6 December 2003:

VTS problems with GDG

datasets

End-user functions

couldn’t be tested

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

13

Tape Problems

5 August 2004:

Restore jobs had to wait for an input

tape that was being used by another

restore job

Delay

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

14

Tape Problems

30 October 2004:

Packages didn’t arrive in time,

due to a thunderstorm that

affected FedEx delivery

Major delay

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

15

Tape Problems30 October 2004:

Automated tape library experienced

unit address problems during the

restore process

Delay

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

16

Tape Problems

30 October 2004:

VTS logical tapes were not shipped

to Wood Dale (HSM level 2, SAR

level 2)

Delay

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

17

Tape Problems30 October 2004:

Confusion about

when to load DRP1

and DRP2 tapes,

before or after IPL

Delay

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

18

Tape Problems

30 October 2004:

ICIS libraries were not

backed up to tape

Application tests were not

possible

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

19

Tape Problems

8 December 2004:

Load problems

Tapes were loaded before IPL and

not after IPL

Major delay

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

20

Tape Problems

8 December 2004:

Experienced problems when

trying to restore MIG1 data,

e.G. DRADABC0 job

Major delay

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

21

Tape Problems

8 December 2004:

Recall sent by FedEx tapes to SunGard

One damaged package arrived without

tapes Restored DATA one generation back (-1)

System was generation (0)

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

22

Tape Problems

21 March 2005:

Level 2 tapes for VTS not

being sent off-site (but have

been on the list)

Application team couldn’t

test all data

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

23

Tape Problems

5 August 2005:

3590-1 cartridges ejected,

not found DSS8370W - TMS SHOWS TAPE

N00318 OUT OF AREA “DRP1”,SLOT

00031

Delay

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

24

Time Warner employee data missing

May 2, 2005: 5:51 PM EDT

NEW YORK (CNN) - Time Warner Inc. said Monday that data on 600,000 current and former employees stored on computer backup tapes was lost by an outside storage company and that the Secret Service is now investigating.

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

25

Lost Backup Tape Held Ameritrade Client Data

Wednesday, April 20, 2005 - LA Times

… package was damaged during shipping between vendors ….. fourth tape is still missing…… The tapes may have included customers’ Social Security numbers …..

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

26

Info On 3.9M Citigroup Customers Lost

Monday, June 6, 2005 – CNN.COM

Citigroup, the nation's biggest financial services company, said that UPS lost the tapes while shipping them to a credit bureau in Texas.

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

27

Costs for Tape Backups SunGard recovery services Offsite tape storage Tape handling Shipping per test Special extra pick-ups

Yearly $150,000

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

28

Costs

Not capable to restore one day $$ ???

Last December: 2 weeks to rebuild manually (?) customer tables

Does it make sense to restore more than 2 days back ??

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

29

Costs

Example:

20 employees x $140 per day x 10 days

= $28,000

And they couldn’t work on other projects

$140 is based on $51,100 yearly income

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

30

Quantitative Risk Analysis Single Loss Expectancy

SLE = Single Loss Expectancy EF = Exposure Factor, for

example 50% or .50 AV = Asset Value, for example

$1,000,000

SLE = EF * AV

SLE = .5 x $1,000,000 = $500,000

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

31

B/R Methods

Reducing tapes

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

32

B/R Methods

Reducing tapes Stacking datasets to

3590-1 cartridges Using Delta Save Facility

from ADABAS

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

33

B/R Methods

Reducing tapes Using Forward Index

Compression (FIC) from ADABAS

Using larger block size for 3590 tapes = 256K, supported by ADABAS

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

34

ASSO

ADASAV

DLOG

Delta Save

changed blocks

NUCLEUS

DDPLOGR1

DATA

ASSO

ASSO

DATADATA

Buffer Pool Delta Log (RABN) changed RABN

ADARES

PLCOPY

DSIM

DDPLOGR2

SAVE

DELTA

PLOG copy

DDSAVE1

DDDSIM

DSF=YES

DDSIAUS1

DSF=YES

DSF=YES

Dual Protection Log

Extracted

Blocks

Delta Save Facility (DSF)

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

35

Delta SaveADASAV

RESTORE

DSIM

DDDSIM

DSF=YES

DATADDDELT1-8

DDREST1

Full Image

Save

Online/Offline

Online Images

RABN

extracted

ASSO

RABN

from PLOG

Delta Save Facility

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

36

B/R Methods

Forward Index CompressionRochester Gas & Electric

Space savings: Normal Index: 37% - 55% Upper Index: 21% - 69%

Within an index block the part of the index value that is identical to the forward part of the previous index value is suppressed.

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

37

B/R Methods

IBM Magstar 3494 / Virtual Tape Server (VTS)

LA Times

SunGard

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

38

B/R Methods

VTS problemsLA Times: Completion code A78 RC 18 We switched from VTS to 3590-1

cartridges

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

39

B/R Methods

VTS problemsVirginia Information Technologies Agency: Ran 2003/2004 into the same problem system

completion code A78 RC 18 We … converted … to 3490/3590 physical

tapes Problem solved

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

40

B/R Methods

Disk to Disk Mirroring

Hardware Software

Replicating Software

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

41

B/R Methods – Enterprise Server

Enterprise Server

UNIX

NT / 2000 / XP

Hot Site

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

42

B/R Methods – Open System

Hot Site

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

43

B/R Methods

Marty StewartDisaster Recovery Manager

AnMed Health:

“…we’d rather have a server that’s running slower than having no server at all.”

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

44

Disk Mirroring

Benefits Asynchronous disk mirroring can

provide better physical protection by supporting extended physical distances.

No loss of committed transactions in synchronous storage (mirroring/RAID) on a CPU failure

ASSO

DATA

ASSO

DATA

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

45

Limitations No protection from data corruption Secondary site is not guaranteed to be

transitionally consistent, in the case of asynchronous mirroring.

Client application must be re-started after failure and need to be aware of failure

ASSO

DATA

ASSO

DATA

Disk Mirroring

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

46

Disk Mirroring

Limitations Synchronous mirroring and RAID

devices can add overhead to application performance.

Redundant/specialized high availability hardware/software can be expensive and restricted to use for backup purposes only.

ASSO

DATA

ASSO

DATA

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

47

Limitations Secondary copy of data is not

available for use – low hardware utilization.

Need to replicate everything on disk, no selectivity of data replication

ASSO

DATA

ASSO

DATA

Disk Mirroring

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

48

Example For Disk Mirroring

S/390 UNIX

S/390 UNIX

12-15 miles

OC-3 link

EMC 5700

EMC 5700

SRDF remote mirroredsynchronized

Back Up / Hot Site

SRDF remote mirroredsynchronized

Main Platform

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

49

B/R Methods Can we buy used

Enterprise Servers? Yes…..and inexpensive OP system is free for D/R

Search for “selling used mainframes,” for example:

http://www.used-line.com/fdc3236-find-dealer.htm

http://www.azure.co.uk/

etc.

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

50

Dedicated line broadband speeds and prices

T-1 - 1.544 megabits per second (24 DS0 lines) Ave. cost $400.-$650./mo.

T-3 - 43.232 megabits per second (28 T1s) Ave. cost $6,000.-$16,000./mo.

OC-3 - 155 megabits per second (100 T1s) Ave. cost $20,000.-$45,000./mo.

OC-12 - 622 megabits per second (4 OC3s) no price OC-48 - 2.5 gigabits per seconds (4 OC12s) no price OC-192 - 9.6 gigabits per second (4 OC48s) no price

Source: http://www.infobahn.com/research-information.htm

prices updated: 12 May 2005

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

51

Peer-to-Peer Remote Copy Extended Distance (PPRC-XD)

PPRC = 60 miles - PPRC-XD = continent

ESS Shark

- IBM ESS DASD - HDSalso support PPRC

ESS Shark

FlashCopy

Also see TimeFinder from EMC

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

52

External Back-up Systems

Fast Copy of Data Snapshot

No data movement A virtual copy by copying pointers

Copy Process Physical copy async. from the log. copy No impact on applic. on the original data

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

53

Fast Copy of Data Specific Hardware Required

Software works only with the hardware Work on Volume Level

Some snapshot only tools work also on dataset level

External Back-up Systems

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

54

Snapshot & Physical Copy

IBM Hardware: Enterprise Storage Server Software: FlashCopy

http://www.share.org/proceedings/sh98/data/S3087.PDF

EMC2

Hardware: Symmetrix Remote Data Facility Software: EMC TimeFinderhttp://www.emc.com/interactive_center/media/timefinder/tf_noRC.html

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

55

Flash Copy

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

56

How It Works

Read / update

PhysicalBackup

PhysicalBackup

SnapshotSnapshot

Read / updateRead only

snap

Pre-defined time window

Suspend Resume

SourceData

SourceData

Read only: update requests are queued

Source: SAG

ADADBS TRANSACTIONS SUSPEND,TTSYN=60,TRESUME=120

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

57

Replication

Benefits Warm standby systems can be

configured over a Wide Area Network, providing protection from site failures.

Ability to more quickly swap to the standby system in the event of failure, as backup database is already on-line.

Data corruption is typically not replicated as transactions are logically reproduced rather than I/O blocks mirrored.

ASSO

DATA

WORK

ASSO

DATA

WORK

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

58

ReplicationBenefits Automatic switch over for clients using

a switching mechanism, no client restart needed.

Originating applications are minimally impacted as replication takes place asynchronously after commit of the originating transaction.

The warm standby database is available for read-only operations, allowing better utilization of backup systems.

ASSO

DATA

WORK

ASSO

DATA

WORK

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

59

Benefits Ability to resynchronize and easily

switch back to primary system when it becomes available without loss of data.

ASSO

DATA

WORK

ASSO

DATA

WORK

Replication

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

60

Limitations Warm standby system will be out-of-

date by transactions committed at the active database that have not been applied to the standby.

Protection is limited to components supporting Warm Standby (e.g. DBMS data sources may be protected but file systems may not be supported).

ASSO

DATA

WORK

ASSO

DATA

WORK

Replication

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

61

Entire Transaction Propagator The Entire Transaction

Propagator allows for

asynchronous data

replication.

Replicated data can be

updated and

synchronized with

master data at user

specified intervals.

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

62

ADABAS Data Replication Logical dissemination of ADABAS Data to

homogeneous or heterogeneous targets Near real time propagation Event driven at the Transaction level Implemented at the Database/file level for Store,

Delete and Update commands Define Replication rules through subscriptions Minimal Impact on normal nucleus activity Strategic for Enterprise Data Sharing Replace Entire Transaction Propagator

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

63

ADABAS Data Replication

Origin

DBMS

File File

Target

Target

Field

Target

Field

Target

DBMS

Target

Tablez/OS

Image B

Unix

Server D

z/OS

Image Cz/OS Image A

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

64

Possible Hot Site Solutions

Enterprise Server Los Angeles

Own Enterprise Server Hot Site

Shark

Shark

OC3OC3

Shark

EMC

OC3OC3

EMC

EMC

OC3OC3

Converter ESCON

FICON

Fiber Optic

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

65

Costs for Tape Backups SunGard recovery services Offsite tape storage Tape handling Shipping per test Special extra pick-ups

Yearly $150,000

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

66

Costs for Real Disaster SunGard Declaration Fee D/R Site Daily Usage Fee Office Space Daily Usage Fee Work Group Declaration Fee Work Group Daily Usage Fee LAN Bridge Declaration Fee LAN Bridge Daily Usage Fee

30 Days $475,000

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

67

Costs for Own Hot Site Used IBM Z800-0X2 Mainframe Used IBM 2105-F20 Shark Storage Used IBM 3494 Library, VTS and Tape Drives 3rd Party Next Day HW Maintenance Printer and Terminal Controller Re-location Costs 3490 Tape Drive and Controller Re-location Costs Other Costs

1st Year $520,000After 5 Years Total $735,000

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

68

Costs for Own Hot Site

5 Years SunGard = $750,000

30 Days Real Disaster = $475,000

5 Years Own Facility = $735,000

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

69

Restore Times (Min)

0

100

200

300

400

500

600

IPL done DB restore

IPL done 0 330 425 540 365 420 420 330 270 360

DB restore 229 129 165 221 0 104 63 90 78 136

Feb-02

Jun-02

Nov-02

Mar-03

Oct-03

Dec-03

Oct-04

Dec-04

Mar-05

Aug-05

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

70

Benefits of Own Hot Site Financial savings > $150,000

annually providing an almost 5% ROI

Reduced recovery time Reduced impact due to road and airport

closures Elimination of reliance on external vendors Mainframe and open system can use the

same facility

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

71

Grid Computing

virtual machine

virtual memory

virtual storage

virtual I/O

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

72

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

73

Grid Computing

ResourceBroker

ReplicaCatalog

Information Service

ComputerElement(s)

StorageElement(s)

HardwareSoftware

Locations

User

Interface

Passkey

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

74

Grid Computing

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

75

Grid Computing BBC builds distributed

grid for content sharing

(Gridcast).

62.73.167.57/publicdocs/ppt/prompeg307.pptfile:///C:/My%20Documents/Dieter/my%20presentations/prompeg307.ppt

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

76

Grid Computing For Backup?

Intra or Extra Grid?

Pull or Push?

Grid Software

http://www.gridcomputing.com/

http://www.gridforum.org/ggf_grid_understand.htm

http://gridcafe.web.cern.ch/gridcafe/animations.html

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

77

Backup Methods

Mostly used by other companiesSource: DRJ Magazine

VTS Disk to disk

Is more and more common for enterprise storage servers and AIX server technology, for example.

Source: @server Magazine

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

78

B/R Methods

Problems for other companies High third-party hot site costs, approx.

$10,000 - $70,000 per month Restore time

24-30 hours

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

79

How Far is ‘Far Enough?’(http://www.drj.com/articles/spr03/1602-02.html)

Alternate Facility

Offsite Storage

Facility

Answer = 105 miles

…so the survey

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

80

Lessons Learned (http://www.drj.com/articles/spr02/1502-07.html)

Distance is keyStreets, bridges, tunnels, airports are closed

Tape recovery is not effective All applications are critical Inconsistent back-up is no back-up at all

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

81

Lessons Learned (http://www.drj.com/articles/spr02/1502-07.html)

People-dependent processes do not suffice

Two sites are not enough People are hard to replace but

information is irreplaceable

23 August 2005 Dieter W. Storr -- www.storrconsulting.com

82

…..we should have an excellent HOT SITE!

Recommended