35
File and Database File and Database Concepts Concepts

File and Database Concepts

Embed Size (px)

DESCRIPTION

File and Database Concepts. Match the definitions. Learning Objectives. Explain how data is stored in files in the form of fixed length records comprising items in fields, with the key as a unique identifier - PowerPoint PPT Presentation

Citation preview

File and Database File and Database ConceptsConcepts

Match the definitionsMatch the definitionsA. RecordA. Record 1.1. A single table unconnected to anything else that A single table unconnected to anything else that

has been designed to supply data to a specific has been designed to supply data to a specific applicationapplication

B. Sequential fileB. Sequential file 2.2. Reading a file starting with some records other than Reading a file starting with some records other than the first on on the diskthe first on on the disk

C. Random accessC. Random access 3.3. An organised collection of information about an An organised collection of information about an objectobject

D. Hashing algorithmD. Hashing algorithm 4.4. A group of records stored in some sort of orderA group of records stored in some sort of order

E. CollisionE. Collision 5.5. An access level allowing the editing or recordsAn access level allowing the editing or records

F. RealF. Real 6.6. A collection of related records in a database and A collection of related records in a database and the physical implementation of an entitythe physical implementation of an entity

G. Flat-file databaseG. Flat-file database 7.7. The term for what happens when a hashing The term for what happens when a hashing algorithm tries to put to records in the same placealgorithm tries to put to records in the same place

H. WriteH. Write 8.8. A data type expressing fractional numbersA data type expressing fractional numbers

I. TableI. Table 9.9. A way of translating a record key to the memory A way of translating a record key to the memory address at which that record should be storedaddress at which that record should be stored

Learning ObjectivesLearning Objectives• Explain how data is stored in files in the form of fixed length records comprising items Explain how data is stored in files in the form of fixed length records comprising items

in fields, with the key as a unique identifierin fields, with the key as a unique identifier• Describe the characteristics, advantages and disadvantages of serial, sequential, Describe the characteristics, advantages and disadvantages of serial, sequential,

indexed sequential and random access to data, showing an understanding of the type indexed sequential and random access to data, showing an understanding of the type of application suitable for eachof application suitable for each

• Describe how data can be organised in secondary memory to facilitate different modes Describe how data can be organised in secondary memory to facilitate different modes of access, including an explanation of the nature of hashing algorithms (with an of access, including an explanation of the nature of hashing algorithms (with an example, illustrating what happens if there are collisions) to create addresses from example, illustrating what happens if there are collisions) to create addresses from keys, thereby facilitating direct access and the use of an index or set of indexes to keys, thereby facilitating direct access and the use of an index or set of indexes to facilitate indexed sequential accessfacilitate indexed sequential access

• Select appropriate data types for a given set of data, and explain the advantages and Select appropriate data types for a given set of data, and explain the advantages and disadvantages of alternative data typesdisadvantages of alternative data types

• Describe the difference between flat files, relational and hierarchical database systems, Describe the difference between flat files, relational and hierarchical database systems, discussing the comparative benefits and drawbacks of eachdiscussing the comparative benefits and drawbacks of each

• Be able to describe the following terms: table, key, field, record, relationship, primary Be able to describe the following terms: table, key, field, record, relationship, primary key, foreign key, referential integrity, entity, attributekey, foreign key, referential integrity, entity, attribute

• Describe the different access levels required for on-line files and databases, identifying Describe the different access levels required for on-line files and databases, identifying the need for the different levels of access with reference to user/supervisor modes and the need for the different levels of access with reference to user/supervisor modes and user IDs and passwordsuser IDs and passwords

Source: OCRSource: OCR

RecordsRecords

““A record is an organised collection of A record is an organised collection of information about an object or item”information about an object or item”

Ted PostlethwaiteTed Postlethwaite10 West Hill10 West HillHighgateHighgateLondonLondonN6 3PYN6 3PY

Fixed or variable length records: pros and cons?Fixed or variable length records: pros and cons?

TedTed PostlethwaitePostlethwaite 10 West Hill10 West Hill HighgateHighgate LondonLondon N6 3PYN6 3PY

TedTed PostlethwaitePostlethwaite 10 West Hill10 West Hill HighgateHighgate LondonLondon N6 3PYN6 3PY

Pros and Cons: Fixed and Pros and Cons: Fixed and Variable Length RecordsVariable Length Records

Fixed-Length RecordsFixed-Length Records Easy to implementEasy to implement Can predict where records will start and end so faster accessCan predict where records will start and end so faster access Wastes spaceWastes space Can be inflexible as space allocated may become too smallCan be inflexible as space allocated may become too small

Variable-Length RecordsVariable-Length Records Uses disk space economicallyUses disk space economically FlexibleFlexible Difficult to implementDifficult to implement Slower accessSlower access

Data typesData types

BBooleanoolean True/FalseTrue/False

IIntegernteger Whole numbers, e.g. 1, 2, 999Whole numbers, e.g. 1, 2, 999

RRealeal Fractional numbers, e.g. 123.456Fractional numbers, e.g. 123.456

DDateTimeateTime Any date- or time-based value, e.g. 29/11/2005, Any date- or time-based value, e.g. 29/11/2005,

13:08, 0.001 seconds13:08, 0.001 seconds SString (text)tring (text)

Names, addresses, emails, etc.Names, addresses, emails, etc.

Why use data types?Why use data types?

Storage space and formatStorage space and format A Boolean type is often just one bit of storage space!A Boolean type is often just one bit of storage space! A 64-bit integer type will store 2A 64-bit integer type will store 26464 numbers. That’s anything numbers. That’s anything

from 0 to 18446744073709551615 or from 0 to 18446744073709551615 or -9223372036854775808 to 9223372036854775807!-9223372036854775808 to 9223372036854775807!

If you use a 64-bit integer to represent something that can If you use a 64-bit integer to represent something that can only be between 1 and 10 then you’re wasting a lot of only be between 1 and 10 then you’re wasting a lot of space!space!

Differences in operationsDifferences in operations You can’t do fractional arithmetic with integers – it gives You can’t do fractional arithmetic with integers – it gives

you nonsense results, e.g. 10 you nonsense results, e.g. 10 6 = 1 6 = 1 And with text “x” + “y” = “xy”! (concatenation)And with text “x” + “y” = “xy”! (concatenation)

Five-Minute TaskFive-Minute Task

What data types would you use for the What data types would you use for the following:following: A field which holds your account balanceA field which holds your account balance A field which shows how many times A field which shows how many times

you’ve logged in to this websiteyou’ve logged in to this website A field which says whether you are a A field which says whether you are a

supervisor or notsupervisor or not A field which contains your phone numberA field which contains your phone number

Five-Minute Task FeedbackFive-Minute Task Feedback

Account balanceAccount balance must be must be RealReal as it holds as it holds fractional numbers, e.g. £123.87fractional numbers, e.g. £123.87

How many times you’ve logged inHow many times you’ve logged in will be an will be an IntegerInteger, since you can’t log in, say, 6.34 , since you can’t log in, say, 6.34 times!times!

Supervisor fieldSupervisor field should be should be BooleanBoolean – you – you either are or you aren’teither are or you aren’t

Telephone numbersTelephone numbers should be should be StringsStrings You will never do any arithmetic with them (when did you last You will never do any arithmetic with them (when did you last

add two telephone numbers?)add two telephone numbers?) They start with a “0” which would be removed in a numeric typeThey start with a “0” which would be removed in a numeric type

Serial AccessSerial Access

► Harris

Shah

Oworu

Tooey

Wickham

Hussein

Heim

Hansen

Schmidt

Arnold

O’Hanlon

Zachary

Christopher

Records not in Records not in order.order.

Search starts at Search starts at the beginning of the beginning of the file.the file.

Records read in Records read in the order that they the order that they are stored.are stored.

Sequential AccessSequential Access

Records are in Records are in some ordersome order

Search starts at Search starts at the beginning of the beginning of the filethe file

► Arnold

  Christopher

  Hansen

Harris

  Heim

  Hussein

  O’Hanlon

  Oworu

  Schmidt

  Shah

  Tooey

  Wickham

  Zachary

Serial vs Sequential?Serial vs Sequential?

Imagine you have a two files, each of Imagine you have a two files, each of 100,000 surnames, many of which are 100,000 surnames, many of which are repeated.repeated.

One file is unordered, the other is One file is unordered, the other is alphabetical.alphabetical.

You want to find all occurrences of the You want to find all occurrences of the name “Smith”.name “Smith”.

Five-Minute TaskFive-Minute Task

How much of the serial (unordered) file How much of the serial (unordered) file do you have to search? do you have to search?

How much of the sequential (ordered) How much of the sequential (ordered) file do you have to search?file do you have to search?

Geek question: In terms of search time, Geek question: In terms of search time, how much more efficient are sequential how much more efficient are sequential files than serial files files than serial files on averageon average in this in this type of search?type of search?

Five-Minute Task Feedback: Five-Minute Task Feedback: Serial FileSerial File

With a serial file you would always have With a serial file you would always have to search all of the file.to search all of the file.

Why? Because you have no way of Why? Because you have no way of knowing that the last record of the file knowing that the last record of the file isn’t a “Smith”!isn’t a “Smith”!

Five-Minute Task Feedback: Five-Minute Task Feedback: Sequential FileSequential File

You would only have to search until the You would only have to search until the end of the “Smiths”end of the “Smiths”

Why? Because the file is in order, you Why? Because the file is in order, you know there are no more “Smiths” to know there are no more “Smiths” to look for.look for.

Five-Minute Task Feedback: Five-Minute Task Feedback: Geek QuestionGeek Question

Sequential files would be Sequential files would be twicetwice as as efficient on average.efficient on average.

Why? Of 100,000 sequenced records, you would Why? Of 100,000 sequenced records, you would sometimes need to search 1 records, sometimes 2, sometimes need to search 1 records, sometimes 2, sometimes 3… sometimes 99,999, sometimes 100,000. sometimes 3… sometimes 99,999, sometimes 100,000. The average of 1 … 100,000 is (100,001 / 2) The average of 1 … 100,000 is (100,001 / 2) ≈≈ 50,000. 50,000. So on average you would have to search 50,000 records So on average you would have to search 50,000 records to find the block you want. With a serial file we know we to find the block you want. With a serial file we know we always have to search all 100,000 records. That’s twice always have to search all 100,000 records. That’s twice as many as the sequential file, so a sequential file is as many as the sequential file, so a sequential file is twice as efficient for this type of search!twice as efficient for this type of search!

Random AccessRandom Access

Hansen?Hashing

Algorithm

Direct access to a single Direct access to a single record with no need to search.record with no need to search.

Hashing algorithm creates Hashing algorithm creates disk address from record key.disk address from record key.

Harris

Shah

Oworu

Tooey

Wickham

Hussein

Heim

Hansen

Schmidt

Arnold

O’Hanlon

Zachary

Christopher

““Collision” occurs when Collision” occurs when hashing algorithm tries to hashing algorithm tries to put two things in the same put two things in the same place!place!

Dealing with CollisionsDealing with Collisions

Harris

Shah

Oworu

Tooey

Wickham Hussein

Heim

Schmidt

Arnold

O’Hanlon

Zachary

Christopher

Hansen

HansenHashing

Algorithm FULL O

verflow

‘Bucket’

Hashing algorithm gives Hashing algorithm gives ‘bucket’ address rather than ‘bucket’ address rather than record address.record address.

Overflow to next bucket if Overflow to next bucket if target bucket is full.target bucket is full.

Performance deteriorates Performance deteriorates over time as more overflows over time as more overflows happen.happen.

How Hashing WorksHow Hashing Works

There are lots of ways, but we’ll look at There are lots of ways, but we’ll look at N Mod M.N Mod M.

Modulo (Mod) is a mathematical Modulo (Mod) is a mathematical operation like operation like , , , , , , , which gives the , which gives the remainder when one number is divided remainder when one number is divided by another.by another.

Question: What is the highest possible Question: What is the highest possible remainder when a number is divided by remainder when a number is divided by M?M?

N Mod MN Mod M

0 1243

1 1244

2 1234

3 1235

4 1236

5 1237

6 1238

7 1239

8 1240

9 1241

10 1242

1234123412351235123612361237123712381238123912391240124012411241124212421243124312441244

Mod 11Mod 11

Rec

ord

Key

sR

eco

rd K

eysB

uc

kets

Bu

cke

ts

1234 Mod 11 = 2, so the 1234 Mod 11 = 2, so the record with the key 1234 record with the key 1234 goes into bucket number 2.goes into bucket number 2.

We have 11 buckets,We have 11 buckets,so M = 11so M = 11

Indexed SequentialIndexed Sequential

Wickham?

HashingAlgorithm

Harris

Shah

Oworu

Tooey

Wickham

Hussein

Heim

Hansen

Schmidt

Arnold

O’Hanlon

Zachary

Christopher

IND

EX

Supports direct access and Supports direct access and sequential access.sequential access.

A table with more than one A table with more than one column can have more than column can have more than one index.one index.

A very large index may have A very large index may have its own index.its own index.

The Power of IndexesThe Power of Indexes

Contestant number one find…Contestant number one find…

Laguardia, GLaguardia, G

Contestant number two find…Contestant number two find…

020 7624 6921020 7624 6921

From the phone bookFrom the phone book

Five-Minute TaskFive-Minute Task

In groups, think carefully about exactly In groups, think carefully about exactly what you do when you look up what you do when you look up someone’s name in the phone book.someone’s name in the phone book.

Which access methods do you use?Which access methods do you use? What real-world objects correspond to:What real-world objects correspond to:

The keyThe key The recordThe record

Five-Minute Task FeedbackFive-Minute Task Feedback

You open the book somewhere in the middle: You open the book somewhere in the middle: Direct AccessDirect Access

You skip a few pages till you find the right one You skip a few pages till you find the right one and then run your finger down the list to find the and then run your finger down the list to find the right name: right name: Sequential AccessSequential Access

These two add up to: These two add up to: Indexed Sequential Indexed Sequential AccessAccess

The key must be the The key must be the surnamesurname The record could either be the The record could either be the address and address and

phone number phone number or, if you consider the address as or, if you consider the address as a pointer, the a pointer, the househouse itself! itself!

ReviewReview

So far we have looked at:So far we have looked at: Serial AccessSerial Access Sequential AccessSequential Access Random Access (using hashing)Random Access (using hashing) Sequential AccessSequential Access

Five-minute task: Fill in the tableFive-minute task: Fill in the table

Access startsAccess starts Can access Can access records in records in sequence?sequence?

ExampleExample

SerialSerial

SequentialSequential

Random Access Random Access (hashing)(hashing)

Indexed Indexed SequentialSequential

Five-minute task feedback:Five-minute task feedback:File Organisation SummaryFile Organisation Summary

Access startsAccess starts Can access Can access records in records in sequence?sequence?

ExampleExample

SerialSerial Beginning of fileBeginning of file HTTP logHTTP log

SequentialSequential Beginning of fileBeginning of file Video tapeVideo tape

Random Access Random Access (hashing)(hashing)

AnywhereAnywhere Booking Booking systemsystem

Indexed Indexed SequentialSequential

AnywhereAnywhere Relational Relational database database tabletable

3 Types of Database3 Types of Database

Flat-file databaseFlat-file database One fileOne file One record per lineOne record per line Essentially a long listEssentially a long list

Hierarchical databaseHierarchical database Data held in a tree structureData held in a tree structure Data held at “leaf nodes”Data held at “leaf nodes” One-to-many relationship One-to-many relationship

between parent and child nodesbetween parent and child nodes

Relational databaseRelational database Data held in separate tablesData held in separate tables Each table relates to an “entity” in the data modelEach table relates to an “entity” in the data model Tables linked by foreign key relationshipsTables linked by foreign key relationships

Pros and Cons: Flat-file DatabasesPros and Cons: Flat-file Databases

Can be very Can be very fastfastSimpleSimple to design and implement to design and implement InflexibleInflexible – only useful for the – only useful for the

application it was designed forapplication it was designed forData redundancyData redundancy (same data repeated (same data repeated

throughout the database)throughout the database)

Pros and Cons: Hierarchical Pros and Cons: Hierarchical DatabasesDatabases

Fast access to dataFast access to dataEspecially good for hierarchical data, e.g. Especially good for hierarchical data, e.g.

family treefamily tree Inflexible structure, hard to reorganiseInflexible structure, hard to reorganise Can be hard to do “leaf traversal”Can be hard to do “leaf traversal”

Pros and Cons: Relational Pros and Cons: Relational DatabasesDatabases

Extremely flexible, can model almost Extremely flexible, can model almost any systemany system

No data redundancyNo data redundancyComplex softwareComplex softwareRequires skilled administrationRequires skilled administration

Relational Database ConceptsRelational Database Concepts

TableTable FieldField RecordRecord RelationshipRelationship Foreign KeyForeign Key

Referential IntegrityReferential Integrity EntityEntity AttributeAttribute Primary KeyPrimary Key

You must know the definitions for You must know the definitions for these terms before the exam!these terms before the exam!

Access Rights and RestrictionsAccess Rights and Restrictions

Different levels of Different levels of access:access: ReadRead WriteWrite AppendAppend CreateCreate DeleteDelete

Different user types:Different user types: UserUser

Day-to-day information Day-to-day information useuse

SupervisorSupervisor Information Information

maintenancemaintenance Technical skill requiredTechnical skill required

Discussion Point:Discussion Point: The Managing Director is the boss of The Managing Director is the boss of the whole company. Does that mean she should have the whole company. Does that mean she should have Supervisor rights?Supervisor rights?

Five-Minute TaskFive-Minute Task

ReadRead

WriteWrite

AppendAppend

CreateCreate

DeleteDelete

Define these access levels.Define these access levels.

Five-Minute Task FeedbackFive-Minute Task Feedback

ReadRead Can look at information but not change Can look at information but not change anythinganything

WriteWrite Can change existing record informationCan change existing record information

AppendAppend Can add information to an existing file but Can add information to an existing file but not change anything that’s already therenot change anything that’s already there

CreateCreate Can create whole new filesCan create whole new files

DeleteDelete Can delete records (and perhaps whole Can delete records (and perhaps whole files)files)