41
Slide 8-1 Copyright © 2003 Pearson Education, Inc. Overzicht Informatica College 9 – November 1 Computer Science an overview EDITION 7 J. Glenn Brookshear

Slide 8-1 Copyright © 2003 Pearson Education, Inc. Overzicht Informatica College 9 – November 1 Computer Science an overview EDITION 7 J. Glenn Brookshear

Embed Size (px)

Citation preview

Page 1: Slide 8-1 Copyright © 2003 Pearson Education, Inc. Overzicht Informatica College 9 – November 1 Computer Science an overview EDITION 7 J. Glenn Brookshear

Slide 8-1 Copyright © 2003 Pearson Education, Inc.

Overzicht Informatica

College 9 – November 1

Computer Sciencean overview

EDITION 7J. Glenn Brookshear

Page 2: Slide 8-1 Copyright © 2003 Pearson Education, Inc. Overzicht Informatica College 9 – November 1 Computer Science an overview EDITION 7 J. Glenn Brookshear

Slide 8-2 Copyright © 2003 Pearson Education, Inc.

C H A P T E R 8 (now chap. 9, 2nd part)

File Structures

• Abstractions of the actual data organization on mass storage

• Again: differences between conceptual and actual data organization

Page 3: Slide 8-1 Copyright © 2003 Pearson Education, Inc. Overzicht Informatica College 9 – November 1 Computer Science an overview EDITION 7 J. Glenn Brookshear

Slide 8-3 Copyright © 2003 Pearson Education, Inc.

8.1: Files, Directories & the Operating System

• OS storage structure:– conceptual hierarchy of directories and files

directory tree

files

Page 4: Slide 8-1 Copyright © 2003 Pearson Education, Inc. Overzicht Informatica College 9 – November 1 Computer Science an overview EDITION 7 J. Glenn Brookshear

Slide 8-4 Copyright © 2003 Pearson Education, Inc.

8.1: Files: Conceptual vs. Actual View

• View at OS-level is conceptual– actual storage may differ significantly!

Page 5: Slide 8-1 Copyright © 2003 Pearson Education, Inc. Overzicht Informatica College 9 – November 1 Computer Science an overview EDITION 7 J. Glenn Brookshear

Slide 8-5 Copyright © 2003 Pearson Education, Inc.

8.2: Sequential Files

• To ‘remember’ where data resides on disk, the OS maintains a list of sectors for each file

• Result: sequential view of scattered set of data

Page 6: Slide 8-1 Copyright © 2003 Pearson Education, Inc. Overzicht Informatica College 9 – November 1 Computer Science an overview EDITION 7 J. Glenn Brookshear

Slide 8-6 Copyright © 2003 Pearson Education, Inc.

8.2: Text Files

• Sequential file consisting of long string of encoded characters (e.g. ASCII-code)– But: character-string still interpreted by word processor!

Same file in “MS Word”File in “Notepad”

Page 7: Slide 8-1 Copyright © 2003 Pearson Education, Inc. Overzicht Informatica College 9 – November 1 Computer Science an overview EDITION 7 J. Glenn Brookshear

Slide 8-7 Copyright © 2003 Pearson Education, Inc.

8.2: Text files & Markup Languages (e.g. HTML)

Page 8: Slide 8-1 Copyright © 2003 Pearson Education, Inc. Overzicht Informatica College 9 – November 1 Computer Science an overview EDITION 7 J. Glenn Brookshear

Slide 8-8 Copyright © 2003 Pearson Education, Inc.

8.2: From actual storage to conceptual view

sequential view

Interpretation by Application Program

Assembly by Operating System

actual storage

conceptual view

Sequential buffer

Page 9: Slide 8-1 Copyright © 2003 Pearson Education, Inc. Overzicht Informatica College 9 – November 1 Computer Science an overview EDITION 7 J. Glenn Brookshear

Slide 8-9 Copyright © 2003 Pearson Education, Inc.

8.2: Data Conversion

• When programming: note that data transfer to/from file may involve data conversion:– e.g., from two’s complement notation to ASCII:

• So: again it’s about the interpretation of data

Page 10: Slide 8-1 Copyright © 2003 Pearson Education, Inc. Overzicht Informatica College 9 – November 1 Computer Science an overview EDITION 7 J. Glenn Brookshear

Slide 8-10 Copyright © 2003 Pearson Education, Inc.

8.3: Quick File Access

• Disadvantage of sequential files:– no quick access to particular file data

• Two techniques to overcome this problem:– (1) Indexing or (2) Hashing

keys

12N67 John Smith 23-Jul-71 17,000.00 New York …13C08 Andrew White 27-Jun-70 24,500.00 Boston …23G19 Mary Jackson 5-Mar-39 41,000.00 San Francisco …24X17 Eleanor Tracy 17-Sep-63 9,635.00 Fort Lauderdale …26X28 Michael Flanagan 1-Nov-44 18,800.00 Washington …32E76 Glenn White 29-Feb-68 17,000.00 Detroit …36Z05 Virginia Moore 27-Jun-70 32,000.00 San Francisco …

: : : : : …: : : : : …: : : : : …

• Indexing:Indexed File Index

12N67 location13C08 location23G19 location24X17 location26X28 location32E76 location36Z05 location

: :: :: :

loaded into mainmemory when opened

Page 11: Slide 8-1 Copyright © 2003 Pearson Education, Inc. Overzicht Informatica College 9 – November 1 Computer Science an overview EDITION 7 J. Glenn Brookshear

Slide 8-11 Copyright © 2003 Pearson Education, Inc.

Opdracht: Chapter 8 - Problem 10

Why is a ‘patient identification number’ a better choice for a key field than the last name of each patient?

• If key unique:– additional sequential search never required

• Patient’s last name is not always unique

Page 12: Slide 8-1 Copyright © 2003 Pearson Education, Inc. Overzicht Informatica College 9 – November 1 Computer Science an overview EDITION 7 J. Glenn Brookshear

Slide 8-12 Copyright © 2003 Pearson Education, Inc.

8.3: Inverted Files

• Variation to (single) indexing: inverted file

Page 13: Slide 8-1 Copyright © 2003 Pearson Education, Inc. Overzicht Informatica College 9 – November 1 Computer Science an overview EDITION 7 J. Glenn Brookshear

Slide 8-13 Copyright © 2003 Pearson Education, Inc.

8.4: Hashing

• Disadvantage of indexing is… the index– requires extra space + includes 1 extra indirection

• Solution: ‘hashing’– finds position in file using a key value (as in indexing)…

– … simply by identifying location directly from the key

• How?– define set of ‘buckets’

& ‘hash function’ that converts keys to bucket numbers …

key value

bucket number

0 1 2 3 … N

hash function

Page 14: Slide 8-1 Copyright © 2003 Pearson Education, Inc. Overzicht Informatica College 9 – November 1 Computer Science an overview EDITION 7 J. Glenn Brookshear

Slide 8-14 Copyright © 2003 Pearson Education, Inc.

8.4: Hash Function: Example

• If storage space divided into 40 buckets and hash function is division:– key values 14, 54, & 94 all map onto same bucket

(collision)

Key values

Page 15: Slide 8-1 Copyright © 2003 Pearson Education, Inc. Overzicht Informatica College 9 – November 1 Computer Science an overview EDITION 7 J. Glenn Brookshear

Slide 8-15 Copyright © 2003 Pearson Education, Inc.

8.4: Key field value can be anything

Page 16: Slide 8-1 Copyright © 2003 Pearson Education, Inc. Overzicht Informatica College 9 – November 1 Computer Science an overview EDITION 7 J. Glenn Brookshear

Slide 8-16 Copyright © 2003 Pearson Education, Inc.

8.4: Handling Bucket Overflow

• When bucket-sizes are fixed:– buckets can fill up and overflow

• One solution:– designate special overflow storage area

not fixed in size!

Page 17: Slide 8-1 Copyright © 2003 Pearson Education, Inc. Overzicht Informatica College 9 – November 1 Computer Science an overview EDITION 7 J. Glenn Brookshear

Slide 8-17 Copyright © 2003 Pearson Education, Inc.

Opdracht: Chapter 8 - Problem 22

If we use division as a hash function and have 23 buckets, in which bucket should we search to find the record whose key is interpreted as the integer value 101?

101

bucket number: 9

0 1 2 … 9 … 23

Division: 101 / 23 = 4, remainder 9

Page 18: Slide 8-1 Copyright © 2003 Pearson Education, Inc. Overzicht Informatica College 9 – November 1 Computer Science an overview EDITION 7 J. Glenn Brookshear

Slide 8-18 Copyright © 2003 Pearson Education, Inc.

Opdracht: Chapter 8 - Problem 16

a) What advantage does an indexed file have over a hash file?b) What advantage does a hash file have over an indexed file?

• a) When key unique: index directly points to required data, while hashing oftens require an additional (sequential) bucket search (incl. bucket overflow).

• b) No additional index file storage is required.

Page 19: Slide 8-1 Copyright © 2003 Pearson Education, Inc. Overzicht Informatica College 9 – November 1 Computer Science an overview EDITION 7 J. Glenn Brookshear

Slide 8-19 Copyright © 2003 Pearson Education, Inc.

Chapter 8 - File Structures: Conclusions

• File Structures:– abstractions of actual data organization on mass

storage

• Changes of ‘view’:– actual storage -> sequential view by OS ->

conceptual view presented to user

• Quick access to particular file data by– (1) indexing (many forms)– (2) hashing (requires no index, but requires bucket search!)

Page 20: Slide 8-1 Copyright © 2003 Pearson Education, Inc. Overzicht Informatica College 9 – November 1 Computer Science an overview EDITION 7 J. Glenn Brookshear

Slide 8-20 Copyright © 2003 Pearson Education, Inc.

C H A P T E R 9

Database Structures

• (Large) integrated collections of data that can be accessed quickly

• Combination of data structures (chap. 7) and file structures (chap. 8)

Page 21: Slide 8-1 Copyright © 2003 Pearson Education, Inc. Overzicht Informatica College 9 – November 1 Computer Science an overview EDITION 7 J. Glenn Brookshear

Slide 8-21 Copyright © 2003 Pearson Education, Inc.

9.1: Historical Perspective

• Originally: departments of large organizations stored all data separately in flat files

• Problems: redundancy & inconsistencies

Page 22: Slide 8-1 Copyright © 2003 Pearson Education, Inc. Overzicht Informatica College 9 – November 1 Computer Science an overview EDITION 7 J. Glenn Brookshear

Slide 8-22 Copyright © 2003 Pearson Education, Inc.

9.1: Integrated Database System

• Better approach: integrate all data in a single system, to be accessed by all departments

Page 23: Slide 8-1 Copyright © 2003 Pearson Education, Inc. Overzicht Informatica College 9 – November 1 Computer Science an overview EDITION 7 J. Glenn Brookshear

Slide 8-23 Copyright © 2003 Pearson Education, Inc.

9.1: Disadvantages of Data Integration

• Disadvantages:– Control of access to sensitive data?!

• Bijvoorbeeld: personeelszaken heeft niets te maken met persoonlijke gegevens opgeslagen door de bedrijfsarts!

– Misinterpretation of integrated data• Supermarkt-database zegt dat een klant veel medicijnen

koopt. Wat betekent dit? Wat als deze klant solliciteert op een baan bij de supermarkt-keten?

– What about the right to hold/collect/interpret data?• Heeft een credit card company het recht gegevens over

koopgedrag van personen te gebruiken/verkopen?

Page 24: Slide 8-1 Copyright © 2003 Pearson Education, Inc. Overzicht Informatica College 9 – November 1 Computer Science an overview EDITION 7 J. Glenn Brookshear

Slide 8-24 Copyright © 2003 Pearson Education, Inc.

9.2: Conceptual Database Layers

OperatingSystem

Actual datastorage

Data seen interms of asequential view

• Compare:

Page 25: Slide 8-1 Copyright © 2003 Pearson Education, Inc. Overzicht Informatica College 9 – November 1 Computer Science an overview EDITION 7 J. Glenn Brookshear

Slide 8-25 Copyright © 2003 Pearson Education, Inc.

9.3: The Relational Model

• Relational Model– shows data as being stored in rectangular tables,

called relations, e.g.:

– row in a relation is called ‘tuple’– column in a relation is called ‘attribute’

Page 26: Slide 8-1 Copyright © 2003 Pearson Education, Inc. Overzicht Informatica College 9 – November 1 Computer Science an overview EDITION 7 J. Glenn Brookshear

Slide 8-26 Copyright © 2003 Pearson Education, Inc.

9.3: Issues of Relational Design

• So, relations make up a relational database… • … but this is not so straightforward:

• Problem: more than one concept combined in single relation

Page 27: Slide 8-1 Copyright © 2003 Pearson Education, Inc. Overzicht Informatica College 9 – November 1 Computer Science an overview EDITION 7 J. Glenn Brookshear

Slide 8-27 Copyright © 2003 Pearson Education, Inc.

9.3: Redesign by extraction of 3 concepts

Any information obtained

by combining information

from multiple relations

Page 28: Slide 8-1 Copyright © 2003 Pearson Education, Inc. Overzicht Informatica College 9 – November 1 Computer Science an overview EDITION 7 J. Glenn Brookshear

Slide 8-28 Copyright © 2003 Pearson Education, Inc.

9.3: Example:

• Finding all departments in which employee 23Y34 has worked:

Page 29: Slide 8-1 Copyright © 2003 Pearson Education, Inc. Overzicht Informatica College 9 – November 1 Computer Science an overview EDITION 7 J. Glenn Brookshear

Slide 8-29 Copyright © 2003 Pearson Education, Inc.

9.3: Relational Operations

• Extracting information from a relational database by way of relational operations– Most important ones:

• (1) extract tuples (rows) : SELECT

• (2) extract attributes (columns) : PROJECT

• (3) combine relations : JOIN

• Such operations on relations produce other relations– so: they can be used in combination, to create

complex database requests (or ‘queries’)

Page 30: Slide 8-1 Copyright © 2003 Pearson Education, Inc. Overzicht Informatica College 9 – November 1 Computer Science an overview EDITION 7 J. Glenn Brookshear

Slide 8-30 Copyright © 2003 Pearson Education, Inc.

9.3: The SELECT operation

Page 31: Slide 8-1 Copyright © 2003 Pearson Education, Inc. Overzicht Informatica College 9 – November 1 Computer Science an overview EDITION 7 J. Glenn Brookshear

Slide 8-31 Copyright © 2003 Pearson Education, Inc.

9.3: The PROJECT operation

Page 32: Slide 8-1 Copyright © 2003 Pearson Education, Inc. Overzicht Informatica College 9 – November 1 Computer Science an overview EDITION 7 J. Glenn Brookshear

Slide 8-32 Copyright © 2003 Pearson Education, Inc.

9.3: The JOIN operation

Page 33: Slide 8-1 Copyright © 2003 Pearson Education, Inc. Overzicht Informatica College 9 – November 1 Computer Science an overview EDITION 7 J. Glenn Brookshear

Slide 8-33 Copyright © 2003 Pearson Education, Inc.

Opdracht: Chapter 9 - Problem 10

• RESULT := PROJECT W from X

X relationU V W

A Z 5B D 3C Q 5

Y relation R S

3 J 4 K

RESULTX.U X.V X.W Y.R Y.S

A Z 5 3 J A Z 5 4 K C Q 5 3 J C Q 5 4 K

SELECT from X where W=5PROJECT S from Y JOIN X and Y where X.W > Y.R

Page 34: Slide 8-1 Copyright © 2003 Pearson Education, Inc. Overzicht Informatica College 9 – November 1 Computer Science an overview EDITION 7 J. Glenn Brookshear

Slide 8-34 Copyright © 2003 Pearson Education, Inc.

Opdracht: Chapter 9 - Problem 11

PART relation

PartName Weight

Bolt 2X 1 Bolt 2Z 1.5 Nut V5 0.5

• a) Which companies make Bolt 2Z?

– NEW := SELECT from MANUFACTURER where PartName = Bolt2Z

– RESULT := PROJECT CompanyName from NEW

MANUFACTURER relation

CompanyName PartName Cost

Company X Bolt 2Z .03 Company X Nut V5 .01 Company Y Bolt 2X .02 Company Y Nut V5 .01 Company Y Bolt 2Z .04 Company Z Nut V5 .01

Page 35: Slide 8-1 Copyright © 2003 Pearson Education, Inc. Overzicht Informatica College 9 – November 1 Computer Science an overview EDITION 7 J. Glenn Brookshear

Slide 8-35 Copyright © 2003 Pearson Education, Inc.

Opdracht: Chapter 9 - Problem 11

PART relation

PartName Weight

Bolt 2X 1 Bolt 2Z 1.5 Nut V5 0.5

• b) Obtain a list of the parts (+cost) made by Company X?

– NEW := SELECT from MANU’ER where CompanyName=CompanyX

– RESULT := PROJECT PartName, Cost from NEW

MANUFACTURER relation

CompanyName PartName Cost

Company X Bolt 2Z .03 Company X Nut V5 .01 Company Y Bolt 2X .02 Company Y Nut V5 .01 Company Y Bolt 2Z .04 Company Z Nut V5 .01

Page 36: Slide 8-1 Copyright © 2003 Pearson Education, Inc. Overzicht Informatica College 9 – November 1 Computer Science an overview EDITION 7 J. Glenn Brookshear

Slide 8-36 Copyright © 2003 Pearson Education, Inc.

Opdracht: Chapter 9 - Problem 11

PART relation

PartName Weight

Bolt 2X 1 Bolt 2Z 1.5 Nut V5 0.5

• c) Which companies make a part with weight 1?

– NEW1 := JOIN MANUCTURER and PART where MANUFACTURER.PartName = PART.PartName

– NEW2 := SELECT from NEW1 where PART.Weight = 1

– RESULT := PROJECT MANU’ER.CompanyName from NEW2

MANUFACTURER relation

CompanyName PartName Cost

Company X Bolt 2Z .03 Company X Nut V5 .01 Company Y Bolt 2X .02 Company Y Nut V5 .01 Company Y Bolt 2Z .04 Company Z Nut V5 .01

Page 37: Slide 8-1 Copyright © 2003 Pearson Education, Inc. Overzicht Informatica College 9 – November 1 Computer Science an overview EDITION 7 J. Glenn Brookshear

Slide 8-37 Copyright © 2003 Pearson Education, Inc.

Opdracht: Chapter 9 - Problem 11

PART relation

PartName Weight

Bolt 2X 1 Bolt 2Z 1.5 Nut V5 0.5

MANUFACTURER relation

CompanyName PartName Cost

Company X Bolt 2Z .03 Company X Nut V5 .01 Company Y Bolt 2X .02 Company Y Nut V5 .01 Company Y Bolt 2Z .04 Company Z Nut V5 .01

• c) Which companies make a part with weight 1?

– NEW1 := SELECT from PART where Weight = 1

– NEW2 := JOIN MANUCTURER and NEW1 where MANUFACTURER.PartName = NEW1.PartName

– RESULT := PROJECT MANU’ER.CompanyName from NEW2

Page 38: Slide 8-1 Copyright © 2003 Pearson Education, Inc. Overzicht Informatica College 9 – November 1 Computer Science an overview EDITION 7 J. Glenn Brookshear

Slide 8-38 Copyright © 2003 Pearson Education, Inc.

Chapter 9 - Database Structures: Conclusions

• Database Structures:– (large) integrated collections of data that can be

accessed quickly

• Database Management System– provides high-level view of actual data storage

(database model)

• Relational Model most often used– relational operations: SELECT, PROJECT, JOIN, …

– high-level language for database access: SQL

Page 39: Slide 8-1 Copyright © 2003 Pearson Education, Inc. Overzicht Informatica College 9 – November 1 Computer Science an overview EDITION 7 J. Glenn Brookshear

Slide 8-39 Copyright © 2003 Pearson Education, Inc.

Overzicht Informatica – Tentamen (1)

• Most important sections (editie 7) & keywords:

– Ch. 0 - 1, 3, 4: abstractie / algoritme

– Ch. 1 - 1, 2, 3, 4, 5, 6, 7: bits / data opslag & representatie (ASCII, etc) / Boolse operaties / flipflops / geheugen-vormen en -karakteristieken / getalstelsels (binair, hexadecimaal, etc…) / overflow & truncation errors

– Ch. 2 - 1, 2, 3, 4, 6: cpu architectuur / machine language & instructions / programma executie / machine cycle / alternatieve architecturen

– Ch. 3 - 1, 2, 3, 4: operating systems / batch processing / time-sharing / multitasking / OS componenten / process vs. programma / competition

– Ch. 4 - 1, 2, 3, 4, 5, 6: algoritme (formeel) / primitiven / pseudo-code / syntax / semantiek / iteratie / loop control / recursie / efficientie

Page 40: Slide 8-1 Copyright © 2003 Pearson Education, Inc. Overzicht Informatica College 9 – November 1 Computer Science an overview EDITION 7 J. Glenn Brookshear

Slide 8-40 Copyright © 2003 Pearson Education, Inc.

Overzicht Informatica – Tentamen (2)

• Most important sections (editie 7) & keywords:

– Ch. 5 - 1, 2, 3, 4, 5: generaties: 1e, 2e, 3e / assembly language / compilers / machine independence / paradigma’s / imperatief / object-georienteerd / programming concepts / procedures / parameters / call by value/reference

– Ch. 6 - 1, 2, 3: software life cycle / ontwikkelings-fase / modulariteit / koppeling / cohesie / documentatie / complexiteits-maat voor software

– Ch. 7 - 1, (2-5): datastructuren / abstractie / statisch vs. dynamisch / pointers / (arrays, lists, stacks, queues, etc…)

– Ch. 8 - 1, 2, 3, 4: files / sequential / tekst / indexed / hashing

– Ch. 9 - 1, 2, 3: databases vs. ‘platte’ files / relaties / tuples / attributen / relationele operaties: SELECT, PROJECT, JOIN

Page 41: Slide 8-1 Copyright © 2003 Pearson Education, Inc. Overzicht Informatica College 9 – November 1 Computer Science an overview EDITION 7 J. Glenn Brookshear

Slide 8-41 Copyright © 2003 Pearson Education, Inc.

Overzicht Informatica – Tentamen (3)

• Geen tentamenstof:– Ch. 3.5 - 3.7 (editie 7) : Networks

– Ch. 4 (editie 8) : Networking and the Internet

– Ch. 10 (editie 7 & 8) : Artificial Intelligence

– Ch. 11 (editie 7 & 8) : Theory of Computation

Veel succes!