itft-File design

File Design

FILE DESIGNInformation systems in business are file and

database oriented.

Data are accumulated into files that are processed ormaintained by the system.

The systems analyst is responsible for designingfiles, determining their contents and selecting amethod for organising the data.

File Components

• Data Item

Individual elements of data are called data items also known asfields or simply items. For example bank cheque consists of thefollowing data items ,check number, date, payee, numericamount, script amount, note, bank identification, accountnumber, and signature.

• Record

The complete set of related data pertaining to an entry, such as a bank cheque is a record Treated as a single unit. The bank cheque is therefore a record consisting of seven separate fields related to the payment transaction. Each field has a defined length and type (alphabetic, alphanumeric, or numeric)

File Components (example)RECORD NAME DATA ITEM NAME TYPE

LENGTH

Bank cheque Cheque Originator c 90

Cheque Number N6

Date8

Payee C24

Amount N8,2

Bank Number N9

Account Number N8

Fixed and variable Length RecordsFixed length records

When the number and size of data item in a record areconstant for every record, the record is called a fixed lengthrecord. The advantage of fixed-length record is that they arealways of the same size.Thus, the system does not have todetermine how long the record is or where it stops and the nextone begins, thus saving processing time.

Variable-length records

Variable Length records are less common in most businessapplications than fixed-length designs because the latter areeasier to manage and meet most application needs. Record sizemay vary because the individual data items vary in length(each record can have a different number of bytes)or becausethe number of data items in a record changes from one

occurrence to another.

Record Key

• To distinguish one specific record from another,systems analysts select one data item in the recordthat is likely to be unique in all records of a file anduse it for identification purposes.

• This item, called the record key, key attribute, orsimply key, is already part of the record, notadditional data added to it just for the purpose ofidentification.

• Common examples of record keys are the partnumber in an inventory record, the chart number ina patient medical record, the student number in auniversity record, or the serial number of amanufactured product. Each of these record keyshas various other uses in the organisation orbusiness setting, although their function is

Entity

• An entity is any person, place, thing, or eventof interest to the organisation and aboutwhich data are captured, stored, orprocessed. Patients and tests are entities ofinterest in hospitals, while banking entitiesinclude customers and cheques.

File and Database

File

A file is a collection of related records. Each record in afile is included because it pertains to the same entity.A file of cheques, for example, consists only ofcheques. Inventory records and invoice do not belongin a cheque file, since they pertain to differententities.

Databases

A database is an integrated collection of data. Recordsfor different entities are typically stored in adatabase (whereas files store records for a singleentity). In a university database, for example,records for students, courses, and faculty areinterrelated in the same database.

File Organization

Records are stored in files using a file organisation that determines how the records will be

• Stored

• Located

• Retrieved

Sequential Organization

• Sequential organisation is the simplest way to store and retrieve records in a file.

• In a sequential file, records are stored one after the other without concern for the actual value of the data in the records.

• The first record stored is placed at the beginning of the file. The second is stored right after the first ( there are no unused positions), the third after the second, and so on. This order never changes in sequential file organisation, unlike the other organisations to be discussed

Sequential Organization(Reading)• To read a sequential file, the system always

starts at the beginning of the file and reads itsway up to the record, one record at a time.For example,

• if a particular record happens to be the tenthone in a file, the system starts at the firstrecord and reads ahead one record at a timeuntil the tenth is reached. It cannot go directlyto the tenth record in a sequential filewithout starting from the beginning.

• In fact, the system does not know it is the tenthrecord. Depending on the nature of the systembeing designed, this feature can be anadvantage or a drawback

Sequential Organization(Searching Record)• Records are accessed in order of their appearance in the file.

• E.g to find location of cheque 1258 in a sequential file, we willcall the cheque number 1258, the search key.

• The program controls all the processing steps that follow.

• The first record is read and its cheque number compared withthe search key: 1240(Let it be first) versus 1258. Since thecheque number and search key do not match, the process isrepeated. The cheque number for the next record is 1244, andit also does not match the search key.

• The process of reading and comparing records continues untilthe cheque number and the search key match. If the file doesnot contain a cheque numbered 1258, the reading andcomparing process continues until the end of the file is reached.

Direct-Access Organisation• In contrast to sequential organisation,

processing a direct-access file does not requirethe system to start at the first record in thefile.

• Direct-access files are keyed files. Theyassociate a record with a specific key valueand a particular storage location.

• All records are stored by key at addressesrather than by position;

• if the program knows the record key, it candetermine the location address of a record andretrieve it independently of every other recordin the file.

•

Direct-Access Organisation(Direct Addressing)• In the cheque example, the direct access of records

is demonstrated by using a storage area that has aspace reserved for every cheque number from 1240to 1300.

• The system uses the cheque number as a physicalrecord key.

• Cheque number 1248 is stored at address 1248,the location reserved for the cheque with thatnumber.

• To retrieve that cheque from storage in a computersystem, the program is instructed to use the number1248 as the search key.

Direct-Access Organisation(Direct Addressing)• It knows that the key serves as the address

and thus goes directly to the assignedlocation for the record with the key of 1248and retrieves the record.

• The attractive feature of direct organisationis that records are retrieved much morequickly than when the file must be searchedfrom the beginning.

• When storage is assigned for the file, it startsat the lowest key value and extends to thehighest key value.

Direct Access Organization(Drawbacks-Direct Accessing)• Storage must be allocated even though it will

go unused.

• Another problem prohibiting use of directaddressing arises when the keys for therecords do not match storage addresses.Even if the analyst wants to use directaddressing, it is impossible to do so if keyvalues and addresses do not correspond. Forexample, if keys contain characters (e.g., a keyof AB1CD) in direct addressing is not possible,since there is no address for AB1CD.

Direct Access Organization(Hash Addressing)• When direct addressing is not possible but direct access is

necessary, the analyst specifies the alternative accessmethod of hashing.

• Hashing (also called key transformation or randomising) refersto the process of deriving a storage address from a record key.

• An algorithm (an arithmetic procedure) is devised to change akey value into another value that serves as a storage address.(The data value in the record itself does not change.)

• There is no perfect hashing algorithm, although some aremuch better than others when it comes to minimisingsynonyms.

• In practice, synonyms occur when the hashing procedure isapplied on different keys and produces the same address instorage.

Direct Access Organization(Hash Addressing-contd..)• A separate overflow area is set aside to provide for

record storage when synonyms occur. When a recordis stored, the hashing algorithm is performed andthe address derived.

• The program accesses that storage area, and, if it isunused, the record is stored there. If there is alreadya record stored there, the new record is written inthe overflow area. When the system must retrieve arecord, the hashing algorithm is performed andthe storage address determined. Then the recordin the storage area is checked. If it is not the correctone (meaning that a synonym occurred earlier), thesystem automatically goes to the overflow area andretrieves the record for processing.

Indexed Organisation

• A third way of accessing records is through anindex.

• The basic form of index included a record keyand the storage address for a record.

• To find a record when the storage address isunknown (as with direct address andhashing structures), it is necessary to scanthe records. However, the search will befaster if an index is used, since it takes lesstime to search an index than an entire file ofdata.

Indexed Organisation(Characteristics)• An index is a separate file from the master file to which it

pertains. Each record in the index contains only two items ofdata: a record key and a storage address.

• To find a specific record when the file is stored under an indexedorganisation, the index is first searched to find the key of therecord wanted. When it is found, the corresponding storageaddress is noted and then the program accesses the recorddirectly.

• This method uses a sequential scan of the index, followed bydirect access to the appropriate record. The index helps speedthe search compared with a sequential file, but it is slower thandirect addressing. When the master file is not in any specificorder , this method of file organisation is indexed non-sequential organisation. There is one entry in the index for everyrecord in the master file.

Indexed Sequential Organisation• The one most widely used in information systems, creates a

pseudo sequential file. Groups of records are stored in blockswith a capacity for a specified amount of data.

• For example, the blocks can store up to 3150 pieces of data.The first block, starting at address 1345, is in sequential order.

• The master file stores individual blocks of records in sequentialorder. This is not a sequential file, however, since all the recordsare not stored in physically adjacent positions; think of it as afile of separate, full or partially full blocks, each in sequentialorder.

• The adjacent blocks are not in ascending order. For example,to pursue a logical ascending sequence, the record following1115 at the end of the first block is in the block at address 1349.

Indexed Sequential Organisation (Example)

Record Key Starting Block Address

1115 1345

1315 1349

1429 1346

1725 1350

Indexed Sequential Organisation (Example)

1346

1349

1350

Overflow

blocks

1010 1011 101

3

101

4

1017 1019 111

0

1113 1115

1316 1317 132

1

132

3

1324 1410 141

4

1415 1417 1418 141

9

1427 1428 142

9

1117 1121 112

0

121

0

1211 1212 121

5

1217 1218 1221 131

0

1311 1313 131

5

1510 1521 152

2

161

7

1619 1620 172

1

1724 1725

1345

Inverted File

• The other type of data structure commonly used indatabase management systems is an inverted file.

• This approach uses an index to store informationabout the location of records having particularattributes.

• In a fully inverted file, there is one index for each typeof data item in the data set . Each record in the indexcontains the storage address of each record in the filethat meets the attribute.

• Some data items in a database will probably never beused to retrieve data. Therefore, no index will bebuilt for those data items. If not all attributes areindexed, the database is only partially inverted,which is more common data structure.

OUTPUT DESIGN • One of the most important features of an

information system for users is the output itproduces.

• Outputs from computer systems are requiredprimarily to communicate the results ofprocessing to users.

• Without quality output, the entire system mayappear to be so unnecessary that users willavoid using it, possibly causing it to fail.

• The term output applies to any information produced by an information system

Output Objectives • Convey information about past activities,

current status or projections of the future e.g.- a report on stock in hand shows currentstatus, exception report e.g. for electricity billingnumber of houses locked in a area.

• Signal important events, opportunitiesproblems or warnings

• Trigger an action e.g. reorder level reportwhether printed or displayed.

• Confirm an action e.g. report of goods received

Key Output Questions

• Who will receive the output ?

• What is its planned use ?

• How much detail is needed ?

• When and how often is the output needed ?

• By What Method ?

Contents of the Outputs Data Items

The name of each data item along with itscharacteristics should be recorded in a standard form: -

• Whether it is alphabetic or numeric Valid and specificrange of values e.g. minimum, maximum fixed valuesor ranges.

• Size of data item

• Position of decimal point, arithmetic sign or any otherindicator

The objective is to present the same data item beingreferred to by various names or the same namebeing used to describe different items

Contents of the Outputs (Contd..)

Data Totals

There is often a need to provide totals atvarious levels. Their source must be identifiedand they must be defined and registered asdata items. The systems analyst must specify :-

• At what level(s) they are required e.g.subtotal, grand total.

• The position e.g. at the end of line.

• What will cause them to occur e.g. change of key or any other condition

Contents of the Outputs (Contd..)Data Editing

It is not always desirable to print or displaydata as it is held on a computer. The systemsanalyst must know whether the form in whichit is stored is suitable for the output. So if anyediting is required he must specify it e.g.

• Decimal points to be inserted or not.

• Where the currency symbol should appear asprefix or suffix.

• Alignment of items e.g., right, left.

Contents of the Outputs (Contd..)Output Media

Systems analyst also has to determine the mostappropriate medium for the outputs. This willinvolve consideration of wide range of devicesincluding

• Line Printer

• Graph plotter

• V D U

• Magnetic Media

• Microfilm

Contents of the Outputs (Contd..)

Considerations while selecting Media

• Suitability of the device to the particularapplication.

• The need for hard copy and number of copiesrequired.

• The response time required.

• The location of users

• The S/W and H/W available.

• The cost.

Developing A Printed Output Layout

• The design of printed output will determineits usefulness to the recipient.

• An output layout is the arrangement ofitems on the output medium. Whenanalysts design an output layout, they arebuilding a mock up of the actual report ordocument as it will appear after the systemis in operation.

Developing A Printed Output Layout (Contd..)The layout should show the location andposition of the following.

• All variable information• Item details

• Summaries and totals

• Separators e.g. dash & underline, control breaks

• All pre-printed details• Headings

• Document name

• Organisation name and address

• Instructions

• Notes & comments

Developing A Printed Output Layout (Contd..)

Common notations used in designing anoutput layout :-

• Variable information• X to denote that an alphabet or special

character *,/ will be printed or displayed.

• 9 to denote a number will be printed.

• Constant information

The information written on the form as itshould appear when printed.

Designing Printed Output• Headings

In every report- title of the report, date andtime should be included to tell the users whatthey are working with and on what date it wasprepared. The page number provides quickreference for the users who work with datafound at various locations throughout thereport.

Designing Printed Output(Contd..)• Column Headings

Before actually marking in the data fields,enter the column headings. It is a goodpractice to use an underline, dash or someother symbol to separate the columnheadings from the start of data. Everycolumn should have a heading that describeits contents.

Designing Printed Output(Contd..)• Data & Details

Enter the description of the data below thecolumn headings, using the X and 9conventions explained earlier and indicatesize of data item.

• Summaries

Some report designs specify summaryinformation, column totals or subtotals. Labelall titles and headings as you wish them toappear, denote variable data by X or 9 andindicate the maximum length of the field.

Guidelines for Report Design (Summary)• Reports and documents should be designed toread from left to right and top to bottom.

• The most important items should be easiest tofind e.g. in an inventory report Item Numberis the most important item. It is placed in thefirst column.

• All pages should have a title and page numberand show the date the output was prepared.

• All columns should be labelled.

• Abbreviations should be avoided.

INPUT DESIGN

Introduction

Input Specification describes the manner inwhich data enter the systems for processing.Input design features can ensure reliability ofsystem and produce results from accuratedata. The input design also determineswhether the user can interact efficiently withthe system.

Objectives of Input Design

• Controlling Amount Of Input

Data preparation and data entry operationsdepend on people. Because labour costs arehigh, the cost of preparing and entering datais high, so reducing data requirements canlower costs.

The computer may sit idle while data arebeing prepared & input for processing. Byreducing input requirements, the analyst canspeed the entire process from data captureto processing.

Objectives of Input Design(Contd..)• Avoiding Delay

Avoiding processing delays resulting fromdata preparation or data entry operationsshould be one of the objectives of the analystin designing input.

• Avoiding Errors In Data

The rate at which errors occur depends on thequantity of data, since the smaller the amountof data fewer the opportunities for errors.The analyst can reduce the number of errorsby reducing the volume of data that must beentered for each transaction.

Objectives of Input Design(Contd..)• Avoiding Extra Steps

When the volume of transactions can't bereduced, the analyst must be sure the processis as efficient as possible. Such input designsthat cause extra steps should be avoided.

• Keeping The Process Simple

There should not be so many controls onerrors that people will have difficulty usingthe system. The system should be such that itis comfortable to use while providing the errorcontrol methods.

Summary (Lecture 8)

• File Design

• Output Design

• Input Design

Education

itft-File design