Customer Engagement Workshop IT Service Continuity Phoenix, Aston 6th May 2015 Paul Gant, Head of...

Preview:

Citation preview

Customer Engagement WorkshopIT Service Continuity

Phoenix, Aston 6th May 2015

Paul Gant, Head of BCM Assurance

David Davies, BCM Assurance Consultant

Agenda

• 11:00 Registration, refreshments and networking.

• 11:30 Why get fit, anyway?

• 11:50 Fictitious live incident.

• 12:10 Post incident review.

• 12:30 Steps to success.

• 12:50 Questions & answers.

• 13:00 Lunch, tours, event close.

• 13:30 BCM Assurance 1-2-1 sessions by appointment.

Why get fit, anyway?

Introducing BCM Assurance – your personal trainers

What if?

Real Recovery (Invocations) is like a Battle

YOUR ENEMIES

• (Lack of) time.

• You can’t recover what you haven’t backed up.

• You can’t upgrade recovery technology during an invocation.

YOUR FRIENDS

• Phoenix.

• Your preparation.

What does “Preparation” involve?

It’s not just about the technology!

But aren’t policies, analysis, plans and reports only there to satisfy to auditor?

Is there any rhyme or reason to them?

Priorities

Dependencies Plans Testing Mainten

ance

IT Service Continuity Management

Focuses on 5 things…

1. What’s needed first?

Sir, is it women and children first…

… or Active Directory and Exchange?

Priorities Dependencies Plans Testing Maintenance

2. What rests on what?

3

Dependencies Plans Testing MaintenancePriorities

3. Make a plan

Dependencies Plans Testing MaintenancePriorities

4. See if it works

Dependencies Plans Testing MaintenancePriorities

5. Keep it up-to-date

Priorities Dependencies Plans Testing Maintenance

What goes wrong? Issues reported in the media

DATACOM co-location datacentre flood, Melbourne Australia, March 2010

• Heavy rain broke a ceiling panel and poured water into the data centre.

• Water damaged SANs, servers and routers.

• All equipment impacted by 12 hour power outage.

Camera Corner / Connecting Point datacentre fire, Green Bay, Wisconsin, USA, 19th March 2008

• Fire alarms but no fire suppression.

• 75 hosted servers destroyed.

• “10 day outage” reported, with 98% of services resumed by 1st April.

Phoenix Standby Reasons

Phoenix Invocation Reasons

Phoenix Invocation Reasons

The reccurring dangers that we see

• IT recovery requirements haven’t been agreed with the business (through a BIA).

• IT recovery strategy isn’t joined up (i.e. a full end to end solution isn’t there).

• Strategy isn’t supported by plans and isn’t tested rigorously enough (resulting in inefficiencies and failures during actual recovery).

Fictitious Live Incident

(Why have a personal trainer to help you?)

Warehouse and second server room (ground floor)

Backup SAN and tapes

Offices andServer room 2nd (top) floor

MAIN GATE

VIS

ITO

R C

AR

P

AR

KIN

G

STAFF CAR PARKING

GARDENS GARDENS

CRITICAL SYSTEMS:Recovery Time Objective

24 hours

Recovery Point Objective 24 hours (disk to disk

daily)

NON CRITICAL SYSTEMS:Recovery Time Objective

5 days

Recovery Point Objective 1 day (local tape) and

7 day (offsite tape)

SIDE GATE(FOOTPATH)

Warehouse and second server room (ground floor)

Backup SAN and tapes

Offices andServer room 2nd (top) floor

100 mbps 100 mbps

1 gbps

CRITICAL SYSTEMS:Recovery Time Objective

24 hours

Recovery Point Objective 24 hours (disk to disk

daily)

NON CRITICAL SYSTEMS:Recovery Time Objective

5 days

Recovery Point Objective 1 day (local tape) and

7 day (offsite tape)

08:07Fire

Warehouse and second server room (ground floor)

Backup SAN and tapes

Offices andServer room 2nd (top) floor

MAIN GATE

VIS

ITO

R C

AR

P

AR

KIN

G

GARDENS GARDENS

CRITICAL SYSTEMS:Recovery Time Objective

24 hours

Recovery Point Objective 24 hours (disk to disk

daily)

NON CRITICAL SYSTEMS:Recovery Time Objective

5 days

Recovery Point Objective 1 day (local tape) and

7 day (offsite tape)

STAFF CAR PARKINGSIDE GATE(FOOTPATH)

12:15Servers onsite

Warehouse and second server room (ground floor)

Backup SAN and tapes

Offices andServer room 2nd (top) floor

MAIN GATE

VIS

ITO

R C

AR

P

AR

KIN

G

GARDENS GARDENS

CRITICAL SYSTEMS:Recovery Time Objective

24 hours

Recovery Point Objective 24 hours (disk to disk

daily)

NON CRITICAL SYSTEMS:Recovery Time Objective

5 days

Recovery Point Objective 1 day (local tape) and

7 day (offsite tape)

08:07Fire

STAFF CAR PARKINGSIDE GATE(FOOTPATH)

Warehouse and second server room (ground floor)

Backup SAN and tapes

Offices andServer room 2nd (top) floor

MAIN GATE

VIS

ITO

R C

AR

P

AR

KIN

G

GARDENS GARDENS

CRITICAL SYSTEMS:Recovery Time Objective

24 hours

Recovery Point Objective 24 hours (disk to disk

daily)

NON CRITICAL SYSTEMS:Recovery Time Objective

5 days

Recovery Point Objective 1 day (local tape) and

7 day (offsite tape)

12:15Servers onsite

08:07Fire

12:45Exec

Report

STAFF CAR PARKINGSIDE GATE(FOOTPATH)

Warehouse and second server room (ground floor)

Backup SAN and tapes

Offices andServer room 2nd (top) floor

MAIN GATE

VIS

ITO

R C

AR

P

AR

KIN

G

GARDENS GARDENS

12:15Servers onsite

08:07Fire

12:45Exec

Report

CRITICAL SYSTEMS:Recovery Time Objective

24 hours

Recovery Point Objective 24 hours (disk to disk

daily)

NON CRITICAL SYSTEMS:Recovery Time Objective

5 days

Recovery Point Objective 1 day (local tape) and

7 day (offsite tape)

13:15 Start

recovery

STAFF CAR PARKINGSIDE GATE(FOOTPATH)

12:15Servers onsite

08:07Fire

12:45Exec

Report

13:15 Start

recovery

12:15Servers onsite

08:07Fire

12:45Exec

Report

13:15 Start

recovery

09:30Server

recovered?

12:15Servers onsite

08:07Fire

12:45Exec

Report

13:15 Start

recovery

09:30Server

recovered?

11:45Recovery

stalled

Post Incident Review

(What are the consequences of being unfit?)

Post Incident Review

• What went well? (Where were they fit?)

• what went badly? (Where were they unfit?)

• What could the IT manager have done differently during the recovery?

• What could the IT manager have done differently before the recovery? 

IT Service Continuity Issues

Have you experienced any of the issues raised?

• Difficulty in getting board engagement.

• No business requirements for IT recovery (i.e. not BIA).

• Single points of failure in key skills sets.

• Lack of recovery documentation (perhaps no spare time to write it?)

• Lack of formal testing and test reporting.

• Any other issues?

The Barriers and Results

• What’s stopping you / stopped you from making changes?

• What would happen if changes aren’t made and you invoke?

• What would happen if you do make the changes?

Steps to Success

(How to become IT service continuity fit.)

What if?

Steps to success

The Steps to Successful IT Service Continuity

1. Engagement and sponsorship at a strategic level.

2. Balance between the technology and ITSC management.

3. Do all of ITSC, and run it as a repeating programme.

1. Strategy: Talk the Language of the Business

I need to upgrade the NAS by 5 terabytes and research getting an

enhanced burstable pipe.Err… good for you.

1. Strategy: Talk the Language of the Business

I’m concerned that our IT recovery could be

inadequate until business requirements are confirmed

in a BIA.

At present, our business may struggle to recover

from an IT outage.

What? We need to do something about this.

1. Strategy: Engage with the Executive Team

Does the Executive Team know:

• What are the impacts if IT fails?

• What are the risks associated with IT failure?

• What is the RTO and RPO of services – and what these terms mean.

• What is the recovery and hand back process?

2. Balance Technology with ITSC Management

Priorities Dependencies Plans Testing Maintena

nce

3. Do all of the Programme Steps, and Repeat

BusinessImpact

Analysis

IT Service Continuity

Plan

IT Recovery Testing

Time

Trigger

PEAK

BC Readiness

Priorities Dependencies Plans Testing Maintenance

3. Do all of the Programme Steps, and Repeat

BusinessImpact

Analysis

Time

Trigger

PEAK

BC Readiness

Priorities Dependencies Plans Testing Maintenance

IT Service Continuity

Plan

IT Recovery Testing

What if?

The traps

Trap 1: The Scope Trap

I’ve tested Email and Filestore time and again.

I have complete confidence in their recovery.

Great, what about the other 48 IT services?

Trap 2: The Audit Trap

Quick, we need to dust off the plans to satisfy the

auditor.

Then we can forget about ITSC again.

He’ll never know… ha ha!

Trap 3: The Importance and Urgency Trap

We’ve got ten projects going live this quarter.

There’s no time to fully implement and test IT DR,

as it will affect “go live” dates.

Well I suppose we can sort it out later.

We don’t want to get in the way of business strategy.

Trap 4: The Gambler’s (or Optimist’s) Trap

It’ll never happen…

If it does, we’ll be all right provided it happens on a

Monday and I’ve remembered to take the backup tapes home with

me.

Good odds eh?

I’m not bothered, I plan to win the lottery and retire

this week.

Trap 5: The Hero Trap

We’ll all pull together and work extra hours to nail it.

Sleep’s for wimps.

Yeah, it’s nothing that a load of pizza and energy

drinks can’t solve!

Any Questions?

Thank you for participating.

Lunch is now ready.

Would you like a tour or a meeting?

Thank You