54
Network Management Brian Bramer Department of Computing Sciences DeMontfort University Leicester UK

Network Management Brian Bramer Department of Computing Sciences DeMontfort University Leicester UK

  • View
    219

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Network Management Brian Bramer Department of Computing Sciences DeMontfort University Leicester UK

Network Management

Brian BramerDepartment of Computing Sciences

DeMontfort UniversityLeicester UK

Page 2: Network Management Brian Bramer Department of Computing Sciences DeMontfort University Leicester UK

1.1 Why is network management needed?

1. Networks encapsulate a corporate asset

The computer system hardware and software and the information stored can form a large percentage of the companies assets.

This would be in terms of the capital cost of the hardware and software and, the less easy to

quantify, value of the information stored, eg lists of customers and contacts, sales data and forecasts, designs of products (circuit diagrams, plans, etc).

Page 3: Network Management Brian Bramer Department of Computing Sciences DeMontfort University Leicester UK

Organisations traditionally (over)manage tangible (hardware) assets

The companies hardware assets appear on the inventory and the shareholders would expect these to managed and operated efficiently. This could range from a few staff required to

manage a PC network to 100+ for a large mainframe system.

Page 4: Network Management Brian Bramer Department of Computing Sciences DeMontfort University Leicester UK

The data represents a crucial asset:

(a) integrity and security: data must be protected from unauthorised access, theft, and even

deliberate corruption. The sale of a companies data to a competitor

could cause its failure and bankruptcy.

Page 5: Network Management Brian Bramer Department of Computing Sciences DeMontfort University Leicester UK

(b) accessibility: authorised people require access to information when and where they require it.

The traditional mainframe environment was physically centralised simplifying management

and security.

The move to distributing network systems has distributed the problems but not the responsibility

(staff have to manage systems at remote sites often in other countries).

Page 6: Network Management Brian Bramer Department of Computing Sciences DeMontfort University Leicester UK

1.2 What is needed?

1. Operational control for operational decisions

The day to day operation of the system in terms of fault finding and reporting (hardware and

software), backing up file systems, mounting new software, etc.

Page 7: Network Management Brian Bramer Department of Computing Sciences DeMontfort University Leicester UK

2. Administrative control for tactical decisions

In a competitive environment the company must keep up to date in terms of its operation and

products.

Computer systems play an important role in this, eg what new technology will become the industry

standard over the next few years?

Page 8: Network Management Brian Bramer Department of Computing Sciences DeMontfort University Leicester UK

3. Performance analysis for tactical & strategic decisions

The enable future planning detailed performance analysis of the current system is required,

eg has the current system any severe problems, what will happen to the overall system if a particular area is expanded or enhanced

(eg will a server or network segment become overloaded), etc.

Page 9: Network Management Brian Bramer Department of Computing Sciences DeMontfort University Leicester UK

1.3 What is required to support this?

Provision of adequate information

Raw data on the performance of the system in terms of usage, data flows, operational costs, etc.

Page 10: Network Management Brian Bramer Department of Computing Sciences DeMontfort University Leicester UK

Tools to support analysis of this informationMasses of raw data is no use to management;

they need easy to read tables, diagrams, graphs, pie charts, etc.

Procedures to implement resultant decisionsThe current system is overloaded or expansion is

planned; who is responsible for what?

Page 11: Network Management Brian Bramer Department of Computing Sciences DeMontfort University Leicester UK

2 Design Criteria for NetworksAny network design will be subject to constraints

and required levels of performance.

These will normally fall into the following categories:

Cost - Availability and Reliability Throughput - Response - Security

Any design will be a trade-off between these criteria, with the particular applications which the

network must support deciding their relative priorities.

 

Page 12: Network Management Brian Bramer Department of Computing Sciences DeMontfort University Leicester UK

2.1 Cost

All networks are expensive, both in terms of equipment (hardware and software) and

personnel, but some are enormously expensive.

The first category of cost is for the network hardware.

This is typically a small proportion of the overall cost but will be high relative to the cost of the

processing devices which it connects, eg a small office network to connect 16 PCs @ around £500 to £1500 each will require an interface board in

each PC (@ around £50 to £150) and a dedicated file server (@ around £3000 to £5000).

Page 13: Network Management Brian Bramer Department of Computing Sciences DeMontfort University Leicester UK

The second cost category is for the communication links.

Leased telephone lines are obviously expensive and their costs have not fallen in the way that

hardware has.

With local area networks the cost of the cabling itself is not high but the costs of physically

installing it are.

Alternative wireless networks - security?

Page 14: Network Management Brian Bramer Department of Computing Sciences DeMontfort University Leicester UK

A final category of cost which many organisations forget is that for support staff.

A network does not run itself but needs to be managed - support staff are needed to maintain

the configuration of the network, monitor performance, track faults and so forth.

Page 15: Network Management Brian Bramer Department of Computing Sciences DeMontfort University Leicester UK

2.2 Availability and Reliability

Availability measures the percentage time that services on the network are accessible to a user

as a percentage of the time they should be available.

 

Page 16: Network Management Brian Bramer Department of Computing Sciences DeMontfort University Leicester UK

Reliability measures how far the network preserves the integrity of network data in the light

of faults.

These can always be improved (although 100% can never be guaranteed) at greater cost.

This is achieved by increasing the redundancy in the network, eg by adding duplicate comms links

to provide alternate routes, duplicating equipment, etc.

Redundancy is also used in communication protocols to provides checks on the integrity of

data.

Page 17: Network Management Brian Bramer Department of Computing Sciences DeMontfort University Leicester UK

2.3 Throughput

Throughput measures the capacity of the network for sustained transfer of USEFUL data;

this measure is often crucial to the success of transaction processing systems such as the Stock

Exchange share dealing system.

It is not the same as the network data rate;

an Ethernet local area network may have a raw transmission rate of 10 Mbps but the 'application' data which it can carry in a second will be very

much less.  

Page 18: Network Management Brian Bramer Department of Computing Sciences DeMontfort University Leicester UK

Many network protocols add a significant overhead to the data (the redundancy mentioned

above) and may operate on a basis which requires the transmission of control messages,

retransmission of messages damaged by errors, etc.

A number of network protocols are very badly behaved when load level increases,

eg they may stop transmitting ANY useful data above a certain load.

Page 19: Network Management Brian Bramer Department of Computing Sciences DeMontfort University Leicester UK

2.4 Response time

Response Time is the criteria which is the biggest bugbear for most network designers since, after availability, it is usually the aspect to which users

are most sensitive.

Response time problems are caused by queuing for network facilities - processor, line capacity or (most often) file access - and increase with load.

Page 20: Network Management Brian Bramer Department of Computing Sciences DeMontfort University Leicester UK

The problem can be improved at a cost by increasing the speed of the scarcest resources

but it is usually very difficult to predict exactly how much capacity will be required.

The mathematical techniques which exist require a large number of idealised assumptions about

the load; these idealised conditions are unlikely to be met in practice.

Page 21: Network Management Brian Bramer Department of Computing Sciences DeMontfort University Leicester UK

2.5 Security

Security is concerned with preventing unauthorised access to the network,

eg unauthorised users trying to gain access or by passive eavesdropping on the traffic flowing

through the network.

The first is usually implemented by means of physical (badges, toggles) or logical (password)

keys.

Page 22: Network Management Brian Bramer Department of Computing Sciences DeMontfort University Leicester UK

The second can be achieved by encryption of all data transmitted around the network and/or

shielding the transmission medium.

Data access is normally controlled via a hierarchical series of permissions for the different

file operations.

Page 23: Network Management Brian Bramer Department of Computing Sciences DeMontfort University Leicester UK

3 Network Management Elements

3.1 Configuration Management

1    What have we got where (hardware, software, information, users)?

2    What is it used for?  Who has access to it?

3    What is its status? Operational, being upgraded, faulty, etc.

Page 24: Network Management Brian Bramer Department of Computing Sciences DeMontfort University Leicester UK

3.2 Fault Management

1    Detection of fault conditions

2    Diagnosis of problem

3    Recovery of service

4    Progressing fault clearance

Page 25: Network Management Brian Bramer Department of Computing Sciences DeMontfort University Leicester UK

3.4 Performance Analysis

1 Monitoring service levels

2 Identifying potential faults

3    Forward capacity planning

Ideally all the above form part of an integrated package.

Page 26: Network Management Brian Bramer Department of Computing Sciences DeMontfort University Leicester UK

3.4 Performance Analysis

1 Monitoring service levels

2 Identifying potential faults

3    Forward capacity planning

Ideally all the above form part of an integrated package.

 

Page 27: Network Management Brian Bramer Department of Computing Sciences DeMontfort University Leicester UK

4 Configuration Management

4.1 Why is it needed?

1. Fault detection and identification

How are faults detected?

Who checks reported faults?

What is wrong? 

Page 28: Network Management Brian Bramer Department of Computing Sciences DeMontfort University Leicester UK

2 Reconfiguration for fault isolation & recovery

In a commercial organisation a fault could cause serious disruption and loss of revenue.

Is it possible to reconfigure the system while the fault is being repaired, eg mirrored disks and/or

servers, alternative network routes, etc.

Page 29: Network Management Brian Bramer Department of Computing Sciences DeMontfort University Leicester UK

3 Facilitating change

The system will need to change due to changes in technology, new software and in response to

changes in organisational requirements,

Eg: personnel changes - training, backup and replacement of staff

upgrades to hardware or software

system expansion due to new organisational requirements

How is this managed without causing chaos and staff resentment?

Page 30: Network Management Brian Bramer Department of Computing Sciences DeMontfort University Leicester UK

4 Supplier performance analysis

How is the hardware and software performing,

eg are particular makes of disk giving problems?

What is the quality of the maintenance,

eg response time, quality of components, capability of suppliers staff, does the fault

reappear, etc?

Page 31: Network Management Brian Bramer Department of Computing Sciences DeMontfort University Leicester UK

4.2 What is involved?

A formalised knowledge of all components, their location and status:

(a) inventory

type of equipment and functionidentification, eg network address, serial number,

etc.

supplier information,

eg who to contact when faulty

Page 32: Network Management Brian Bramer Department of Computing Sciences DeMontfort University Leicester UK

(b) topography

physical cable pathsbridges, gateways, routers

hubs, network connections and tapscable type and capacity, eg bandwidth

Page 33: Network Management Brian Bramer Department of Computing Sciences DeMontfort University Leicester UK

(c) status current and historical

status

eg processor type, RAM and disk size, etc.upgradability,

eg free memory and bus slots, physical capacity, etc.

Page 34: Network Management Brian Bramer Department of Computing Sciences DeMontfort University Leicester UK

5 Fault Management

5.1 Why is it needed?

To support rapid response to problems to maximise:

1 user satisfaction to real or perceived faults

2 company productivity

Page 35: Network Management Brian Bramer Department of Computing Sciences DeMontfort University Leicester UK

5.2 What is involved?

1 Fault detection (signal alarm)

(a) faults reported by users, eg "crash" or degradation of service level

Page 36: Network Management Brian Bramer Department of Computing Sciences DeMontfort University Leicester UK

(b) a large network needs continuous monitoring ideally with automated probes:

programs which monitor network activity and error levels, server workload, disk errors, etc.

could be done from remote central site typically by raising alarm signal on network map

Page 37: Network Management Brian Bramer Department of Computing Sciences DeMontfort University Leicester UK

2 Problem diagnosis (accept alarm)

(a) check it is a real fault

(b) localise problem and isolate from network

(c) classify and prioritise, eg is a server down?

(d) identify responsibility for repair, eg on-site, supplier, maintenance company, etc.

 

Page 38: Network Management Brian Bramer Department of Computing Sciences DeMontfort University Leicester UK

3 Problem recovery (clear alarm)

reconfigure network if possible, eg using mirrors, moving work to other systems, etc.

(I) automatic rerouting

(ii) hot spare switched in on line, e.g. mirror disks and servers.

(iii) cold spare install replacement

implies duplication of facilities - expensive applies to both hardware & software backup

Page 39: Network Management Brian Bramer Department of Computing Sciences DeMontfort University Leicester UK

4 Progressing faults: typically based on trouble ticket

(a) identifies component & symptom

(b) raised when fault detected

(c) status of open tickets reported periodically

(d) status updated periodically

(e) identify cause and correction

(f) closed when problem corrected

Page 40: Network Management Brian Bramer Department of Computing Sciences DeMontfort University Leicester UK

Clearly one needs to maintain a record of the fault history of the various system components,

eg have you a 'Friday afternoon' component with a poor history.

Page 41: Network Management Brian Bramer Department of Computing Sciences DeMontfort University Leicester UK

5.3 What tools are needed?

Fault management is very expensive of personnel

Maximise automated support - equipment monitors level of faults, etc. and

reports when a threshold is exceeded.

Most networks provide data collection tools & probes to support this

Page 42: Network Management Brian Bramer Department of Computing Sciences DeMontfort University Leicester UK

Most equipment supports loopback tests - enables testing of individual peices of equipment if

one is not sure what is wrong.

Number of third party suppliers for WANs, PC LANs, etc. -

gives a wide choice of suppliers of services and equipment.

Page 43: Network Management Brian Bramer Department of Computing Sciences DeMontfort University Leicester UK

6 Security & Accounting Management

6.1 Why is it needed?

The companies data is a valuable asset and it is important to:

Minimise risk of inadvertent or malicious damage, eg by incompetent operators, disgruntled staff,

etc.

Prevent theft or unauthorised disclosure

Page 44: Network Management Brian Bramer Department of Computing Sciences DeMontfort University Leicester UK

Minimise or prevent loss of information when a fault occurs, eg a disk crash

Adequately account for consumption of resources: computer systems are expensive to purchase

and operate - they must be properly costed and paid for by users.

Page 45: Network Management Brian Bramer Department of Computing Sciences DeMontfort University Leicester UK

6.2 What is involved?

Physical access controls on network, eg key operated doors to access terminals, locks

on keyboards, etc.

Logical access controls on network & data paths, eg passwords at various security levels within the

system.

Page 46: Network Management Brian Bramer Department of Computing Sciences DeMontfort University Leicester UK

Filters on the import of files , eg users not allowed to import files from floppy

disk or via the internet.

Audit, ie maintaining track of what users are doing

Page 47: Network Management Brian Bramer Department of Computing Sciences DeMontfort University Leicester UK

Systematic backup of storage

ie continuous/daily/weekly backup either locally to magnetic tape or to remote sites,

care of backups (in fire proof safes, copies in other buildings, etc.) - also use of mirrored

disks/servers, etc.

Page 48: Network Management Brian Bramer Department of Computing Sciences DeMontfort University Leicester UK

Bookkeeping for accounting management

1 storage used

2 peripheral use, eg printers, plotters, etc.

3 cpu usage

4 communications transmission capacity usage

Page 49: Network Management Brian Bramer Department of Computing Sciences DeMontfort University Leicester UK

 6.3 What tools are needed?

Variety of hardware and software access controls, eg badges, passwords etc.

Virus checkers and disinfectants

Automatic spy monitors to check usage, ie programs that monitor what is being run or

accessed and by whom.

Usage loggers - easy to do at a node level -  remarkably difficult to allocate fairly

Page 50: Network Management Brian Bramer Department of Computing Sciences DeMontfort University Leicester UK

7 Performance Analysis

7.1 Why is it needed?

Optimise (not maximise) resource utilization

Proactive fault analysis, ie find a fault before it becomes catastrophic

Support forward capacity planning

Page 51: Network Management Brian Bramer Department of Computing Sciences DeMontfort University Leicester UK

7.2 What does it involve?

1    Monitoring objective system performance criteria

throughput, ie amount of programs running, data transferred, etc.

response, eg typical response to a query on a database

availability, ie how much system down time?

Page 52: Network Management Brian Bramer Department of Computing Sciences DeMontfort University Leicester UK

2    Identifying potential problem areas:

high error incidence on a cable segment, eg possibly damaged cable, bad connector.

abnormal level of retries on a connection

abnormal level of transfer errors from a disk  

Page 53: Network Management Brian Bramer Department of Computing Sciences DeMontfort University Leicester UK

3    Identifying bottlenecks

low disk space, ie need more disk space or move data to another server

excessive collisions on an Ethernet segment, eg need another bridge, move some terminals to

another segment, etc.

high reject level from a bridge

Page 54: Network Management Brian Bramer Department of Computing Sciences DeMontfort University Leicester UK

7.3 What tools are needed?

Monitors similar to those for fault management.