4
Computer Fraud & Security Bulletin February 1990 suppressed, and since blackmailing or damaging software can be disguised with labels carrying manufacturers’ names, another approach must be found to avoid damage. Any new software should not be processed unless it has been thoroughly tested on a machine, the contents of which may be deleted if software turns out to be malicious. Any manipulation detection software must control the whole hard disk, including the boot sectors. The alteration of only one bit must be detectable, and the files where changes took places should be identified. After new software has been released to users, a regular control procedure must provide for the detection of unexplainable alterations which may occur later (due to the trigger technique which might leave a damaging routine sleeping for a while). Such an approach means that a user service has to do test work, that users and managers are informed about the risks they are faced with, and the control steps they have to perform. It is the auditors’ duty to determine whether the internal control works. This is the only way to keep damaging software out of micro computers which are used regularly. Hans Gliss and Ralf Herweg MANAGEMENT AND COMPUTER DISASTERS A RECOVERY CASE STUDY We had not long before sold a second Computer disasters are commonly regarded as either unlikely to occur or as events which happen to other people. Because of this feeling and because of the technical nature of computers, management have a tendency to leave computer disaster planning and dealing with the actual disasters to the Computer Departments rather than take an active involvement in this area. should disaster strike due to the extensive use being made of all types of computers by organizations which have come to depend on them more and more over the years. Many companies would quickly be brought to a standstill in the event of a prolonged computer breakdown and do not realize the seriousness of the effects such disasters could have on their staffs and businesses. The Financial Times suffered a disaster in its accounting computer area in the summer of 1988 and, although we were fortunate to achieve ultimate recovery without too serious consequences, there were many lessons to be learnt from our experience. At the time of the disaster we were in the process of upgrading our computers from DEC PDP 1 l/70 machines to MicroVAX II, and our equipment at the time of the incident was: A VAX 11/75 used for circulation and adver- tisement billing, sales ledgers and word processing. Fortunately, this machine was unaffected throughout the incident. A MicroVAX II used for management ac counting. A PDP 11/70 for purchase/nominal ledgers, payrolls, newsletter subscriptions and word processing. A second MicroVAX II just delivered to replace the PDP 1 l/70. This was no help during our disaster as it was not installed or tested at the time. PDP 11/70 ready to make ways for the new MicroVAX II and so had no internal cover for our remaining PDP 1 l/70 when the problems commenced with this machine. Computer disasters do not always happen with a ‘big-bang’ and so are not always readily apparent. This problem started with a routine hardware failure, and progressed with a creeping effect due to intermittent failure over a three week period. Engineers repaired the This approach can lead to many difficulties 10 01990 Elsevier Science Publishers Ltd

A recovery case study

Embed Size (px)

Citation preview

Page 1: A recovery case study

Computer Fraud & Security Bulletin February 1990

suppressed, and since blackmailing or damaging software can be disguised with labels carrying manufacturers’ names, another approach must be found to avoid damage. Any new software should not be processed unless it has been thoroughly tested on a machine, the contents of which may be deleted if software turns out to be malicious. Any manipulation detection software must control the whole hard disk, including the boot sectors. The alteration of only one bit must be detectable, and the files where changes took places should be identified. After new software has been released to users, a regular control

procedure must provide for the detection of unexplainable alterations which may occur later (due to the trigger technique which might leave a damaging routine sleeping for a while).

Such an approach means that a user service has to do test work, that users and

managers are informed about the risks they are faced with, and the control steps they have to perform. It is the auditors’ duty to determine whether the internal control works. This is the

only way to keep damaging software out of micro computers which are used regularly.

Hans Gliss and Ralf Herweg

MANAGEMENT AND COMPUTER DISASTERS

A RECOVERY CASE STUDY We had not long before sold a second

Computer disasters are commonly regarded as either unlikely to occur or as events which happen to other people. Because of this feeling and because of the technical nature of computers, management have a tendency to leave computer disaster planning and dealing with the actual disasters to the Computer Departments rather than take an active involvement in this area.

should disaster strike due to the extensive use being made of all types of computers by organizations which have come to depend on them more and more over the years. Many companies would quickly be brought to a standstill in the event of a prolonged computer breakdown and do not realize the seriousness of the effects such disasters could have on their staffs and businesses.

The Financial Times suffered a disaster in its accounting computer area in the summer of 1988 and, although we were fortunate to achieve ultimate recovery without too serious consequences, there were many lessons to be learnt from our experience.

At the time of the disaster we were in the process of upgrading our computers from DEC PDP 1 l/70 machines to MicroVAX II, and our equipment at the time of the incident was:

A VAX 11/75 used for circulation and adver- tisement billing, sales ledgers and word processing. Fortunately, this machine was

unaffected throughout the incident.

A MicroVAX II used for management ac counting.

A PDP 11/70 for purchase/nominal ledgers, payrolls, newsletter subscriptions and word processing.

A second MicroVAX II just delivered to replace the PDP 1 l/70. This was no help during our disaster as it was not installed

or tested at the time.

PDP 11/70 ready to make ways for the new MicroVAX II and so had no internal cover for our remaining PDP 1 l/70 when the problems commenced with this machine.

Computer disasters do not always happen with a ‘big-bang’ and so are not always readily apparent. This problem started with a routine hardware failure, and progressed with a creeping effect due to intermittent failure over a three week period. Engineers repaired the

This approach can lead to many difficulties

10 01990 Elsevier Science Publishers Ltd

Page 2: A recovery case study

February 1990 Computer Fraud & Security Bulletin

equipment which then worked for a day or two before breaking down again.

Throughout this period the work of the Accounting Department gradually fell further

and further behind. However, it was always felt that it would soon be caught up as both the computer staff and engineers continued to optimistically forecast that the fault would be cured “tomorrow”.

What the computer specialists did not appreciate was the effect that this ‘stop-go’

process was having on the staff as well as on workloads. Once the computer went down, 20 to 30 data input staff and clerks could no longer perform their jobs as they were totally reliant on the computer for their everyday tasks. There was often no alternative than to send these staff home as, if left at the office

with nothing to do, they had an adverse effect on the productivity of the rest of the staff. Although they were called back to work overtime in an attempt to catch up the work when the machine was available, such action was not sustainable for any length of time. Many of the staff concerned had other commitments and preferred not to work in the

evening or weekends.

During this period the users felt frustrated

and impotent. They could not influence events, did not know what was wrong and yet their lives were being disrupted. It was essential during this time that management remained

visible, ‘walked the floor’ regularly and explained what efforts were being made to put matters right.

Management of both the accounting and computer functions met regularly during this period to review the position and eventually, after three weeks, decided to put our disaster recovery plan into operation. This involved the use of a hot-site at a computer bureau in central London.

The inadequacies of the plan showed up very quickly. All testing at the hot-site had been carried out by computer staff, so users

had never previously visited the bureau. Consequently, with unfamiliar equipment and

surroundings, they were unable to achieve anything like normal levels of productivity. Some of our programs required adapting before they would run on the bureau equipment, again slowing work down. Finally the capacity available at the hot-site was insufficient, because our requirements had increased significantly since the recovery plan was devised.

Therefore, although the move to the

hot-site improved morale as staff were working

again and suppliers payments were resumed, we were still falling further and further behind schedule. The deadlines for the payment of

the monthly payrolls were fast approaching. Also we had, after many months of negotiation, got the agreement of our printing workforce to move from weekly to monthly pay and this was to be the month of the first payment. There was now a real possibility of missing these deadlines and this problem was to dominate management’s thinking over the coming days.

The turning point came one Monday

morning when I arrived at the office to find messages from the Computer Department management to the effect that they had worked all night without success and had gone

home because they were exhausted. This forced me to take personal charge of the situation and to review the risks and alternatives facing us. During a hectic day, I

drew up an agenda for a management meeting setting out the issues and outlining a number of options for courses of action. As a result of this meeting, a number of important decisions were taken:

l Get more processing capacity, particularly for payroll.

l Contact our auditor’s consultancy arm.

l Schedule all critical deadlines.

l Explain the position and costs to date to our Chief Executive.

01990 Elsevier Science Publishers Ltd 11

Page 3: A recovery case study

Computer Fraud & Security Bulletin Februarv 1990

We were quickly able to place the payroll to another bureau where it stayed until all deadlines were met and the PDP 11170 had been working again for some time. In addition, our newsletter subscription processing was put out to a third site.

Our auditors were able to provide people to assist us who had experience of similar

situations. These people also gave the hard pressed local management extra resources and most importantly were able to give a detached view. Management had become so

involved in the day to day problems that it was unable to look forward more than a few days.

Although it was another month before the problems were completely solved and the work returned to our own premises, we were now beginning able to get the situation under control and to keep to the revised working schedules. However, managing three outside sites and our own premises, whilst constantly having to take decisions and review priorities

on working schedules, in addition to running the rest of the accounting function, placed a very heavy burden on management. The amount of management time involved was immense and four senior managers within the accounting area worked on little else for at least a month.

We were fortunate that we were able to prevent a total disaster since non-payment of our suppliers for a long period, or failure to pay

our employees, would have been tremendously damaging to our business and reputation. However, there were a number of benefits to come out of the disaster and

lessons to be learnt from the experience.

The benefits include:

9 better co-operation between the users and the computer staff. Previously there had been something of a ‘them and us’ situa- tion. Both groups had to work together during the disaster and obtained a better understanding of each others problems.

A more unified management team. The shared experience taught us to put aside in- dividual differences in order to achieve the immediate and urgent goals.

An increased awareness by management

throughout the organization of the impor- tance of computers and how much the operations of the business depended upon them.

Increased support for the investment of time and money in the development of a more effective disaster recovery plan.

The lessons for management include:

the need to assess the risks to the busi- ness of a failure of your computers, to iden- tify the critical areas and to ascertain how long each section could survive without their computers. Only then will you be able to decide what type and scale of disaster recovery plan is required. This is a task for senior general management and should not be left to the computer specialists, whose task should be to implement and

test the plan agreed upon.

The disaster recovery plan should be reviewed on a regular basis as changes in levels of business or the risks involved can soon render the plan out of date.

Test the disaster recovery plan on a regular basis and involve the users in the tests as well as the specialist computer staff.

Senior management should check to en- sure that both the review and tests of the plan are being regularly performed.

Take outside advice when drawing up the plan as someone with a detached view will be able to suggest things which would not occur to internal staff.

Try to take actions which remove or mini- mize the biggest risk areas. For example, payroll was our number one problem. We have since put this out to a large payroll

12 01990 Elsevier Science Publishers Ltd

Page 4: A recovery case study

February 1990 Cornouter Fraud & Security Bulletin

bureau to guarantee processing, because they have a level of back-up we could never justify.

Once into a disaster, escalate the problems early. There is a natural reluc tance at all levels to admit that the situation cannot be handled and to bring in the next level. This applies equally to outside com- puter engineers.

Do not underestimate problems of manag- ing disasters, particularly when a number of sites are involved.

Always assume the worst and do not believe the over-optimistic predictions of computer engineers. Because we set worst estimate targets during our disaster, we were not unduly affected in our schedules by events such as equipment failure at one of the hot-sites.

Above all be decisive and do not panic.

David J. Hail Financial Times

THE PEOPLE PROBLEM

THE PERSONNEL SIDE OF COMPUTER SECURITY

Almost the first aspect I look at, in conducting a computer security survey, is people. I do this even if it has never occurred to my client to ask me to examine the area.

Quite simply, if personnel management is bad, it doesn’t matter how much money has been spent on computer security facilities, or with what care they have been implemented, the company is at considerable risk. Conversely, a company can get away with a surprising amount of security laxness if its staff are fundamentally committed to their employer.

The aspects of computer security that

attract attention are all to do with technological ingenuity. In the popular imagination there is a

war, akin to the arms race between nations, between ever guileful perpetrators and equally ingenious designers of security products. Computer security is too often seen in terms of

“here is a terrible new threat . . . and here is a product which will eliminate or (more modestly) reduce it”.

The computer crime case material - we

now have some 20 years of it - tells a quite different story. Not only can you find instances

of most of the classic computer crime modus operandi as early as 1974; as one reads the detailed accounts in quantity, one cannot but be overwhelmed by a series of common factors:

l the most ubiquitous form of computer

criminal is an employee or contractee of the victim;

l overwhelmingly, computer crime takes place in the gap between what the com- puter says is going on and what is really happening in the commercial environment the computer is serving;

l technically sophisticated computer crimes are rare; the skill the computer criminal re quires is to be able to spot and exploit an

opportunity, typically to fool the computer in the most trivial of ways;

l the discovery of computer crimes by techni-

cal means is also rare;

. computer criminals turn out to be remarkab- ly lacking in sophisticated computer skills.

It is the case material which forces the diligent risk analyst/surveyor to place personnel considerations near the top of the survey agenda. Organizations are nearly always much more in the hands of their employees than they realize.

However good the inherent security

facilities of a computer system, someone has to have access to it at the most fundamental

01990 Elsevier Science Publishers Ltd 13