28
Copyright © 2009 Athens Group, Inc. 15th Annual American Industrial Hygiene Association - Rocky Mountain Section (AIHA-RMS), and the American Society of Safety Engineers (ASSE) Colorado Chapter FALL TECHNICAL CONFERENCE September 16th & 17th, 2009 Arvada Center “Environmental Health & Safety Broadening Our Alliances” Accidents Caused by Software Errors Don Shafer, CSDP Chief Technology and HSE Officer Athens Group, Inc.

Rocky Mtn Safety090917

Embed Size (px)

DESCRIPTION

Software Kills latest version for Rocky Mountain Safety Conference - 17 September 2009

Citation preview

Page 1: Rocky Mtn Safety090917

Copyright © 2009 Athens Group, Inc.

15th Annual American Industrial Hygiene Association - Rocky

Mountain Section (AIHA-RMS), and the American Society of

Safety Engineers (ASSE) Colorado Chapter

FALL TECHNICAL CONFERENCE

September 16th & 17th, 2009 Arvada Center

“Environmental Health & Safety Broadening Our Alliances”

Accidents Caused by

Software Errors

Don Shafer, CSDP

Chief Technology and HSE Officer

Athens Group, Inc.

Page 2: Rocky Mtn Safety090917

2 Copyright©2009 Athens Group, Inc.

A Safety Minute – 17 September 2009

• Safety - Dropped object fatality in the Keppel FELS shipyard

• Since arriving in Singapore, I‟ve be mildly shocked at the nonchalance shown

around lifting operations. It‟s quite common to see crane operations lifting loads

over active walkways; walkways that are not taped off and often there‟s little

notice by the workers using the walkways of the lifting operations occurring

above them.

• This resulted in tragedy yesterday. A basket of scrap cable was being

transferred from a rig to the dock. Reportedly, the crane was blowing its horn as

per protocol for transfer operations. The load shifted and a chunk of cable

dropped, landing on the head of a dockworker underneath. I don‟t believe the

dockworker was participating in the lifting operation.

• Even more disturbing – no one on the dock came to his aid, rig personnel ran

over an attempted CPR. The ambulance arrived without paramedics, rig

personnel accompanied the dockworker to the hospital where he was

pronounced dead.

• Be careful out there, your PPE will protect you from many things, but your

awareness will save your life.

Page 3: Rocky Mtn Safety090917

3 Copyright©2009 Athens Group, Inc.

Presentation Outline

• Examples of Software Related Incidents

– Software you can “see”

– Software you cannot “see”

• Proven Practices to Reduce Software Risk

– Life Cycle Recognition

– Configuration Management

– FMECA

Page 4: Rocky Mtn Safety090917

4 Copyright©2009 Athens Group, Inc.

Those old Software Safety Chestnuts!

Page 5: Rocky Mtn Safety090917

5 Copyright©2009 Athens Group, Inc.

And, some not so old!

1. Air France - What is known about the crash of an Air France airbus on 1 June

bears similarities with the little-noticed loss much earlier of two computer-

controlled passenger jets. Those two crashes raised questions of whether the

pilots or systems were really in control. Airbus said this data showed that the

pilots might have received conflicting information about their speed. There

was a “divergence in airspeed measurement” by the onboard systems of the

Air France aircraft. This is one of the matters being investigated, said Airbus.

Data to the onboard computers about air speed came from sensors called

pitot tubes, at least one of which was due for replacement. French authorities

have suggested that inconsistent air speed readings are not dangerous.

Page 6: Rocky Mtn Safety090917

6 Copyright©2009 Athens Group, Inc.

Software you can “see”

6

Page 7: Rocky Mtn Safety090917

7 Copyright©2009 Athens Group, Inc.

Initial requirements definition;FMECA of the control system andsoftware change control protocols would have avoided this incident.

Solution

The elevators and bales of an older-model top drive reacted

erratically to a rapid and erroneous user command. The

vendor had released a software patch to that model to

prevent this erratic behavior, but somehow it had not been

communicated or installed on that drilling unit. There was

little or no initial design and testing of the control software

and the software interlock issue was not discovered. Little or

no system requirements gathering were done on the control

system and no FMECA was done on the control portion of

the top drive. There was no consistent management of

change treating software as an asset on the MODU between

the supplying vendor and the operator.

Incident

Result

Safety Incident: Injured Rig Hands

The bales swung around and injured two

of the rig hands, resulting in reportable

LTIs.

Estimated Lost Time: 5 days

Day Rate: $310,000.

Minimum Cost: $1,550,000.

Page 8: Rocky Mtn Safety090917

8 Copyright©2009 Athens Group, Inc.

Safety Incident: Potentially Deadly Mishap

An FMECA of the equipment covering operational states and message flow could have prevented this incident.

Solution

A driller was performing a test with a riser joint suspended

70 feet (21 meters) above the drill floor. Prior to leaving the

drill cabin for a Job Risk Analysis meeting with the

roughnecks, the driller selected “standby” mode on the

drilling chair. While doing so, he inadvertently pressed the

keypad button that activates Pipe Handling mode. In this

mode, the drill control system sends a pressure monitoring

command to the pipe elevator every three minutes. The

driller stepped out onto the drill floor and three minutes later

the pressure monitoring command was sent to the riser

handling equipment which mistook it for an unlock

command.

Incident

Result

The riser tool released the joint which fell through the well center into the ocean. The joint fell perfectly through the slips. Neither personal injury nor collateral equipment damage was experienced.

Estimated Lost Time: 1.5 daysDay Rate: $310,000Minimum Cost: $465,000

3 people

4 people

Page 9: Rocky Mtn Safety090917

9 Copyright©2009 Athens Group, Inc.

Safety Incident: Dropped Blocks

An FMECA of the equipment covering operational states and message flow could have prevented this incident.Regression testing of software upgrades and formal change control should have taken place.

Solution

The semisubmersible MODU was in the final stages of

pulling the BOP. The BOP was being lifted the last meter to

gain clearance for access to the BOP transporter in the

moonpool. With the travelling block at the uppermost limit,

the Kinetic Energy Management System was „tripped‟, and

the resulting action was not as expected. The anti-bird

nesting components were incorrectly installed thus limiting

the 1200 psi used to function the service brake to 200psi.

There was no operator error and the incident was a result of

a disc brake system failure.

Incident

Result

Traveling blocks, complete with riser and suspended BOP, descended approximately 50 meters in an uncontrolled manner, until the Top Drive impacted against the riser gimbal at the rig floor level.

Estimated Lost Time: 5 daysDay Rate: $477,000Minimum Cost: $2,385,000

Page 10: Rocky Mtn Safety090917

10 Copyright©2009 Athens Group, Inc.

Safety Incident: Top Drive Out of Control

Following software change control and testing protocols would have prevented this.

Solution

During the voyage to location, a technician was

„tweaking‟ the zone management parameters on a

newbuild. A few minutes later the top drive started

rotating by itself. The technician in his zeal to fix one

thing had broken another – thereby introducing

regression into the system. He was also unable to

quickly recover to a previous known state as he

wasn‟t following software change control protocols.

Incident

Result

The technician and the team had to scramble to correct the issue. Fortunately there was no equipment damaged or personnel injured.

Estimated Lost Time: 2 daysDay Rate: $380,000Minimum Cost: $760,000

Page 11: Rocky Mtn Safety090917

11 Copyright©2009 Athens Group, Inc.

If formal control procedures had been adopted no unofficial change requests should have been carried out.

Solution

A vendor arrived onboard a rig after having been officially

requested to make changes to the rig‟s automation system.

While onboard, an unofficial request was made by a system

operator regarding the numbering of main engine cooling

system valves. The vendor either hadn‟t completely

understood the request or had been distracted and

inadvertently made the change to the wrong valve. Some

time later a different operator attempted to give a close

command to the valve in preparation for maintenance of the

system.

Incident

Result

Safety Incident: Generator Trip

Closing of the incorrect valve caused a generator trip.

Estimated Lost Time: .5 daysDay Rate: $310,000Minimum Cost: $155,000

Page 12: Rocky Mtn Safety090917

12 Copyright©2009 Athens Group, Inc.

Safety Incident: Control System Reset Kills 4

An FMECA of the equipment covering operational states and message flow could have prevented this incident.Document the impact of resetting control systems during operations.

Solution

A control system failure occurred on a large, off-shore construction

vessel. Two control units were restarted twice, unsuccessfully. A

blinking red lamp on the PLC indicated that a memory reset was

required, even though a memory reset had NEVER been requested

by control system diagnostics during equipment operations. As soon

as the hydraulic power packs started, a loud bang was heard. A

quadruple joint of pipe dropped approximately one meter to the

welding deck below. A second quadruple joint of pipe in the pipe

elevator was released (all clamps opened and the hydraulic safety

stop swung away) and fell the full length of the tower, smashing

through a crowded access platform to the deck below.

The initialization instruction was pre-loaded in PLC EPROM memory

and the initialization included instructions to OPEN ALL CLAMPS.

Incident

Result

Eight personnel were injured – four fatally. All were located on the access platform and several were thrown overboard by the impact.

Estimated Lost Time: 20 daysDay Rate: $510KMinimum Cost: $10,200,000

Page 13: Rocky Mtn Safety090917

13 Copyright©2009 Athens Group, Inc.

What Did We Learn?

• Understand the impact of resetting control systems during operations

When a system is reset, to what configuration do the components return?

Predefined, Fail-set or Fail-safe state?

Loss of communications caused revert to an unanticipated configuration –

pipe rams opened unexpectedly and string lost in-hole

• Known instances where systems were reset, as a matter of procedure on

established intervals to prevent incidents

Incident occurred as a result – Loss of station after failed reboot of DP

processor

• Statistically, most reset/reboot operations are completed without incident

False sense of security – perception that reset is actually preventing

incidents

Page 14: Rocky Mtn Safety090917

14 Copyright©2009 Athens Group, Inc.

Software you cannot “see”

14Complexity is NOT your friend!

Page 15: Rocky Mtn Safety090917

15 Copyright©2009 Athens Group, Inc.

Your IT Network is Safe?

IT contractor indicted for sabotaging offshore rig management

system, Company had refused to offer him a permanent job,

feds say, March 18, 2009:

Mario Azar, 28 of Upland, Calif., was charged with illegally

accessing and compromising a computer system used by

Pacific Energy Resources Ltd. (PER) to monitor offshore

platforms in California and Anchorage and to detect oil

leaks. The indictment papers allege that Azar's actions

affected the "integrity and availability" of the system and

resulted in it becoming temporarily unavailable. Though no

oil spill or environmental hazard occurred while the system

was compromised, Azar's actions caused thousands of

dollars in damage, the indictment said.

Page 16: Rocky Mtn Safety090917

16 Copyright©2009 Athens Group, Inc.

Cyber criminals targeting energy – 15 March 2009

• Based on an analysis of more than 240 billion requests for

analysis by the corporate users, there was near 600% malware

growth between like quarters in 2007 and 2008, and a 300%

volume ratio increase from January 2008 through December

2008.

• A vertical industry analysis of malware growth found the energy

and oil sector to rank in the top five targets in all threat

categories. But energy and oil leads the pack by a long shot

when it comes to one important category: encounters with

unique new variants of data theft Trojans.

• With advances in the technology and sophistication of cyber

attacks, malware delivered through the web can be remotely

customized and configured once in place, based on the victim‟s

identity.

Page 17: Rocky Mtn Safety090917

17 Copyright©2009 Athens Group, Inc.

What do the Authorities Say?

How to implement these activities

and processes is not prescribed by

the recommended practice. The

recommended practice is primarily

focused on the „what to do‟.

DNV RP D-201

Recommended Practice for Integrated Control Systems

Page 18: Rocky Mtn Safety090917

18 Copyright©2009 Athens Group, Inc.

How can Software become Safer?

• Awareness of Development Life Cycle

• Software Configuration Management (SCM)

• Failure Mode and Effects Criticality Analysis

(FMECA)

Page 19: Rocky Mtn Safety090917

19 Copyright©2009 Athens Group, Inc.

Athens Group Deliverables Life Cycle Model

Design AcceptanceConstruction Operation

FMECAFMECA FMECA

System

Requirements

Concept

Definition

Activity

Detailed

Design

Module Development

Unit Test

Integration and

Test

Preliminary

Design

Hardware

Requirements

DeploymentIntegrated System

Testing

Operations

Maintenance

Detailed

Design

Coding

Unit Test

Integration

and Test

Preliminary

Design

Software

Requirements

Alarm Management

Commissioning

Planning

Factory Acceptance

Testing

Troubleshooting and

Remediation

Contractual

Software

Standards

Vendor

Software

Process

Assessment

Design

Verification

RequirementsValidation

Controls &

Network

CommissioningStartup

Support

Vendor Management

Software Change Management

Acceptance

Planning

Page 20: Rocky Mtn Safety090917

20 Copyright©2009 Athens Group, Inc.

Software Change Management

Page 21: Rocky Mtn Safety090917

21 Copyright©2009 Athens Group, Inc.

Failure Mode and Effects Criticality Analysis (FMECA)

D e te r m in a t io n o f

E f fe c ts o f F a i lu r e

M o d e s

S O W a c c e p te d a n d

d o c u m e n ts m a d e a v a i la b le

I te r a t iv e Q u e s t io n s w i th

c u s to m e r a n d v e n d o r ( s )

B lo c k D ia g r a m s

a n d In i t ia l P r o d u c t /

P r o c e s s A n a ly s is

Id e n t i f ic a t io n o f

S y s te m a n d

F u n c t io n s

Id e n t i f ic a t io n o f

F a i lu r e M o d e s

Id e n t i f ic a t io n o f

P o s s ib le C a u s e s

P a r t ic ip a n t R e v ie w

o f A n a ly z e d

In fo r m a t io n

D o c u m e n ta t io n o f

F M E C A R e s u l tsIm p le m e n t R is k M it ig a t io n

S M E E n te r s H e re

S M E E x its H e re

F a c i la t i to r E n te r s H e r e P a r t ic ip a n t E n te rs H e r e

P a r t ic ip a n t E x its H e r e

F a c i l i ta to r E x its H e r e

Page 22: Rocky Mtn Safety090917

22 Copyright©2009 Athens Group, Inc.

In Conclusion

You can – and MUST - make

Software Safer1. Awareness of Development Life Cycle

2. Software Configuration Management

(SCM)

3. Failure Mode and Effects Criticality

Analysis (FMECA)

Page 23: Rocky Mtn Safety090917

23 Copyright©2009 Athens Group, Inc.

Don Shafer, CSDP

Chief Technology Officer

5608 Parkcrest Drive, Suite 200

Austin, Tx 78731

[email protected]

www.athensgroup.com

512.345.0600 x117

Page 24: Rocky Mtn Safety090917

24 Copyright©2009 Athens Group, Inc.

References

• NORA Symposium 2008: Public Market for Ideas and Partnerships -

http://www.cdc.gov/niosh/nora/symp08/posters/035.html

• Fatalities Among Oil and Gas Extraction Workers --- United States,

2003 - 2006 - http://www.cdc.gov/mmwr/preview/mmwrhtml/mm5716a3.htm

• Therac 25 - http://sunnyday.mit.edu/papers/therac.pdf

• Air France 2009 -

http://www.computerweekly.com/Articles/2009/06/01/236245/air-france-

crash-thought-to-be-caused-by-system-failure.htm

http://www.computerweekly.com/Articles/2009/06/16/236447/air-france-

airbus-pitot-sensor-linked-to-two-fatal-crashes.htm

• 8 Software Related Death Incidents -

http://www.baselinemag.com/c/a/Projects-Processes/Eight-Fatal-

SoftwareRelated-Accidents/

Page 25: Rocky Mtn Safety090917

25 Copyright©2009 Athens Group, Inc.

Speaker Bio

Don Shafer, CSDP, developed Athens Group's oil and gas practice and leads Athens

Group engineers in delivering superior rig software services and oil and gas exploration

as well as production and pipeline monitoring systems for clients such as BP, Noble,

Transocean, Maersk, ExxonMobil, Conoco Phillips and Shell. Prior to co-founding

Athens Group, Don led groups developing and marketing hardware and software

products for Motorola, AMD and Crystal Semiconductor. He was responsible for

managing a $129 million-a-year PC product group that produced the award-winning

audio components for Apple. From the development of low-level software drivers in yet-

to-be-released Microsoft operating systems to the selection and monitoring of Taiwan

semiconductor fabrication facilities, Don has led key product and process efforts.

Don earned a BS degree from the USAF Academy and an MBA from the University of

Denver. Treasurer of the IEEE Computer Society Board of Governors, Past Editor-in-

Chief of the IEEE Computer Society Press, IEEE Senior Member and software

engineering book series author, Shafer is an adjunct professor in the Cockrell School of

Engineering at the University of Texas at Austin. An avid writer, Don has contributed to

three books, written over 20 published articles, and is co-author of Quality Software

Project Management, recently released by Prentice-Hall. He is a contributor to the 2010

edition of the multi-volume Encyclopedia of Software Engineering. His latest patents

are in state-based machine control.

Page 26: Rocky Mtn Safety090917

26 Copyright©2009 Athens Group, Inc.

Who We Are

• Founded 1998

• Offices in Houston and Austin, TX

• Pioneered Drilling Technology AssuranceSM (DTA)

Services

• 100% Referenceable Customers

– Over 70% have completed more than one project

with us

• Committed to supporting innovation and education in the

Oil & Gas Industry

Page 27: Rocky Mtn Safety090917

27 Copyright©2009 Athens Group, Inc.

Who We‟ve Helped

Page 28: Rocky Mtn Safety090917

28 Copyright©2009 Athens Group, Inc.

What We Deliver