Upload
jeanette-meachum
View
213
Download
0
Tags:
Embed Size (px)
Citation preview
Slide 8-1 Copyright © 2003 Pearson Education, Inc.
Overzicht Informatica
College 9 – November 1
Computer Sciencean overview
EDITION 7J. Glenn Brookshear
Slide 8-2 Copyright © 2003 Pearson Education, Inc.
C H A P T E R 8 (now chap. 9, 2nd part)
File Structures
• Abstractions of the actual data organization on mass storage
• Again: differences between conceptual and actual data organization
Slide 8-3 Copyright © 2003 Pearson Education, Inc.
8.1: Files, Directories & the Operating System
• OS storage structure:– conceptual hierarchy of directories and files
directory tree
files
Slide 8-4 Copyright © 2003 Pearson Education, Inc.
8.1: Files: Conceptual vs. Actual View
• View at OS-level is conceptual– actual storage may differ significantly!
Slide 8-5 Copyright © 2003 Pearson Education, Inc.
8.2: Sequential Files
• To ‘remember’ where data resides on disk, the OS maintains a list of sectors for each file
• Result: sequential view of scattered set of data
Slide 8-6 Copyright © 2003 Pearson Education, Inc.
8.2: Text Files
• Sequential file consisting of long string of encoded characters (e.g. ASCII-code)– But: character-string still interpreted by word processor!
Same file in “MS Word”File in “Notepad”
Slide 8-7 Copyright © 2003 Pearson Education, Inc.
8.2: Text files & Markup Languages (e.g. HTML)
Slide 8-8 Copyright © 2003 Pearson Education, Inc.
8.2: From actual storage to conceptual view
sequential view
Interpretation by Application Program
Assembly by Operating System
actual storage
conceptual view
Sequential buffer
Slide 8-9 Copyright © 2003 Pearson Education, Inc.
8.2: Data Conversion
• When programming: note that data transfer to/from file may involve data conversion:– e.g., from two’s complement notation to ASCII:
• So: again it’s about the interpretation of data
Slide 8-10 Copyright © 2003 Pearson Education, Inc.
8.3: Quick File Access
• Disadvantage of sequential files:– no quick access to particular file data
• Two techniques to overcome this problem:– (1) Indexing or (2) Hashing
keys
12N67 John Smith 23-Jul-71 17,000.00 New York …13C08 Andrew White 27-Jun-70 24,500.00 Boston …23G19 Mary Jackson 5-Mar-39 41,000.00 San Francisco …24X17 Eleanor Tracy 17-Sep-63 9,635.00 Fort Lauderdale …26X28 Michael Flanagan 1-Nov-44 18,800.00 Washington …32E76 Glenn White 29-Feb-68 17,000.00 Detroit …36Z05 Virginia Moore 27-Jun-70 32,000.00 San Francisco …
: : : : : …: : : : : …: : : : : …
• Indexing:Indexed File Index
12N67 location13C08 location23G19 location24X17 location26X28 location32E76 location36Z05 location
: :: :: :
loaded into mainmemory when opened
Slide 8-11 Copyright © 2003 Pearson Education, Inc.
Opdracht: Chapter 8 - Problem 10
Why is a ‘patient identification number’ a better choice for a key field than the last name of each patient?
• If key unique:– additional sequential search never required
• Patient’s last name is not always unique
Slide 8-12 Copyright © 2003 Pearson Education, Inc.
8.3: Inverted Files
• Variation to (single) indexing: inverted file
Slide 8-13 Copyright © 2003 Pearson Education, Inc.
8.4: Hashing
• Disadvantage of indexing is… the index– requires extra space + includes 1 extra indirection
• Solution: ‘hashing’– finds position in file using a key value (as in indexing)…
– … simply by identifying location directly from the key
• How?– define set of ‘buckets’
& ‘hash function’ that converts keys to bucket numbers …
key value
bucket number
0 1 2 3 … N
hash function
Slide 8-14 Copyright © 2003 Pearson Education, Inc.
8.4: Hash Function: Example
• If storage space divided into 40 buckets and hash function is division:– key values 14, 54, & 94 all map onto same bucket
(collision)
Key values
Slide 8-15 Copyright © 2003 Pearson Education, Inc.
8.4: Key field value can be anything
Slide 8-16 Copyright © 2003 Pearson Education, Inc.
8.4: Handling Bucket Overflow
• When bucket-sizes are fixed:– buckets can fill up and overflow
• One solution:– designate special overflow storage area
not fixed in size!
Slide 8-17 Copyright © 2003 Pearson Education, Inc.
Opdracht: Chapter 8 - Problem 22
If we use division as a hash function and have 23 buckets, in which bucket should we search to find the record whose key is interpreted as the integer value 101?
…
101
bucket number: 9
0 1 2 … 9 … 23
Division: 101 / 23 = 4, remainder 9
…
Slide 8-18 Copyright © 2003 Pearson Education, Inc.
Opdracht: Chapter 8 - Problem 16
a) What advantage does an indexed file have over a hash file?b) What advantage does a hash file have over an indexed file?
• a) When key unique: index directly points to required data, while hashing oftens require an additional (sequential) bucket search (incl. bucket overflow).
• b) No additional index file storage is required.
Slide 8-19 Copyright © 2003 Pearson Education, Inc.
Chapter 8 - File Structures: Conclusions
• File Structures:– abstractions of actual data organization on mass
storage
• Changes of ‘view’:– actual storage -> sequential view by OS ->
conceptual view presented to user
• Quick access to particular file data by– (1) indexing (many forms)– (2) hashing (requires no index, but requires bucket search!)
Slide 8-20 Copyright © 2003 Pearson Education, Inc.
C H A P T E R 9
Database Structures
• (Large) integrated collections of data that can be accessed quickly
• Combination of data structures (chap. 7) and file structures (chap. 8)
Slide 8-21 Copyright © 2003 Pearson Education, Inc.
9.1: Historical Perspective
• Originally: departments of large organizations stored all data separately in flat files
• Problems: redundancy & inconsistencies
Slide 8-22 Copyright © 2003 Pearson Education, Inc.
9.1: Integrated Database System
• Better approach: integrate all data in a single system, to be accessed by all departments
Slide 8-23 Copyright © 2003 Pearson Education, Inc.
9.1: Disadvantages of Data Integration
• Disadvantages:– Control of access to sensitive data?!
• Bijvoorbeeld: personeelszaken heeft niets te maken met persoonlijke gegevens opgeslagen door de bedrijfsarts!
– Misinterpretation of integrated data• Supermarkt-database zegt dat een klant veel medicijnen
koopt. Wat betekent dit? Wat als deze klant solliciteert op een baan bij de supermarkt-keten?
– What about the right to hold/collect/interpret data?• Heeft een credit card company het recht gegevens over
koopgedrag van personen te gebruiken/verkopen?
Slide 8-24 Copyright © 2003 Pearson Education, Inc.
9.2: Conceptual Database Layers
OperatingSystem
Actual datastorage
Data seen interms of asequential view
• Compare:
Slide 8-25 Copyright © 2003 Pearson Education, Inc.
9.3: The Relational Model
• Relational Model– shows data as being stored in rectangular tables,
called relations, e.g.:
– row in a relation is called ‘tuple’– column in a relation is called ‘attribute’
Slide 8-26 Copyright © 2003 Pearson Education, Inc.
9.3: Issues of Relational Design
• So, relations make up a relational database… • … but this is not so straightforward:
• Problem: more than one concept combined in single relation
Slide 8-27 Copyright © 2003 Pearson Education, Inc.
9.3: Redesign by extraction of 3 concepts
Any information obtained
by combining information
from multiple relations
Slide 8-28 Copyright © 2003 Pearson Education, Inc.
9.3: Example:
• Finding all departments in which employee 23Y34 has worked:
Slide 8-29 Copyright © 2003 Pearson Education, Inc.
9.3: Relational Operations
• Extracting information from a relational database by way of relational operations– Most important ones:
• (1) extract tuples (rows) : SELECT
• (2) extract attributes (columns) : PROJECT
• (3) combine relations : JOIN
• Such operations on relations produce other relations– so: they can be used in combination, to create
complex database requests (or ‘queries’)
Slide 8-30 Copyright © 2003 Pearson Education, Inc.
9.3: The SELECT operation
Slide 8-31 Copyright © 2003 Pearson Education, Inc.
9.3: The PROJECT operation
Slide 8-32 Copyright © 2003 Pearson Education, Inc.
9.3: The JOIN operation
Slide 8-33 Copyright © 2003 Pearson Education, Inc.
Opdracht: Chapter 9 - Problem 10
• RESULT := PROJECT W from X
X relationU V W
A Z 5B D 3C Q 5
Y relation R S
3 J 4 K
RESULTX.U X.V X.W Y.R Y.S
A Z 5 3 J A Z 5 4 K C Q 5 3 J C Q 5 4 K
SELECT from X where W=5PROJECT S from Y JOIN X and Y where X.W > Y.R
Slide 8-34 Copyright © 2003 Pearson Education, Inc.
Opdracht: Chapter 9 - Problem 11
PART relation
PartName Weight
Bolt 2X 1 Bolt 2Z 1.5 Nut V5 0.5
• a) Which companies make Bolt 2Z?
– NEW := SELECT from MANUFACTURER where PartName = Bolt2Z
– RESULT := PROJECT CompanyName from NEW
MANUFACTURER relation
CompanyName PartName Cost
Company X Bolt 2Z .03 Company X Nut V5 .01 Company Y Bolt 2X .02 Company Y Nut V5 .01 Company Y Bolt 2Z .04 Company Z Nut V5 .01
Slide 8-35 Copyright © 2003 Pearson Education, Inc.
Opdracht: Chapter 9 - Problem 11
PART relation
PartName Weight
Bolt 2X 1 Bolt 2Z 1.5 Nut V5 0.5
• b) Obtain a list of the parts (+cost) made by Company X?
– NEW := SELECT from MANU’ER where CompanyName=CompanyX
– RESULT := PROJECT PartName, Cost from NEW
MANUFACTURER relation
CompanyName PartName Cost
Company X Bolt 2Z .03 Company X Nut V5 .01 Company Y Bolt 2X .02 Company Y Nut V5 .01 Company Y Bolt 2Z .04 Company Z Nut V5 .01
Slide 8-36 Copyright © 2003 Pearson Education, Inc.
Opdracht: Chapter 9 - Problem 11
PART relation
PartName Weight
Bolt 2X 1 Bolt 2Z 1.5 Nut V5 0.5
• c) Which companies make a part with weight 1?
– NEW1 := JOIN MANUCTURER and PART where MANUFACTURER.PartName = PART.PartName
– NEW2 := SELECT from NEW1 where PART.Weight = 1
– RESULT := PROJECT MANU’ER.CompanyName from NEW2
MANUFACTURER relation
CompanyName PartName Cost
Company X Bolt 2Z .03 Company X Nut V5 .01 Company Y Bolt 2X .02 Company Y Nut V5 .01 Company Y Bolt 2Z .04 Company Z Nut V5 .01
Slide 8-37 Copyright © 2003 Pearson Education, Inc.
Opdracht: Chapter 9 - Problem 11
PART relation
PartName Weight
Bolt 2X 1 Bolt 2Z 1.5 Nut V5 0.5
MANUFACTURER relation
CompanyName PartName Cost
Company X Bolt 2Z .03 Company X Nut V5 .01 Company Y Bolt 2X .02 Company Y Nut V5 .01 Company Y Bolt 2Z .04 Company Z Nut V5 .01
• c) Which companies make a part with weight 1?
– NEW1 := SELECT from PART where Weight = 1
– NEW2 := JOIN MANUCTURER and NEW1 where MANUFACTURER.PartName = NEW1.PartName
– RESULT := PROJECT MANU’ER.CompanyName from NEW2
Slide 8-38 Copyright © 2003 Pearson Education, Inc.
Chapter 9 - Database Structures: Conclusions
• Database Structures:– (large) integrated collections of data that can be
accessed quickly
• Database Management System– provides high-level view of actual data storage
(database model)
• Relational Model most often used– relational operations: SELECT, PROJECT, JOIN, …
– high-level language for database access: SQL
Slide 8-39 Copyright © 2003 Pearson Education, Inc.
Overzicht Informatica – Tentamen (1)
• Most important sections (editie 7) & keywords:
– Ch. 0 - 1, 3, 4: abstractie / algoritme
– Ch. 1 - 1, 2, 3, 4, 5, 6, 7: bits / data opslag & representatie (ASCII, etc) / Boolse operaties / flipflops / geheugen-vormen en -karakteristieken / getalstelsels (binair, hexadecimaal, etc…) / overflow & truncation errors
– Ch. 2 - 1, 2, 3, 4, 6: cpu architectuur / machine language & instructions / programma executie / machine cycle / alternatieve architecturen
– Ch. 3 - 1, 2, 3, 4: operating systems / batch processing / time-sharing / multitasking / OS componenten / process vs. programma / competition
– Ch. 4 - 1, 2, 3, 4, 5, 6: algoritme (formeel) / primitiven / pseudo-code / syntax / semantiek / iteratie / loop control / recursie / efficientie
Slide 8-40 Copyright © 2003 Pearson Education, Inc.
Overzicht Informatica – Tentamen (2)
• Most important sections (editie 7) & keywords:
– Ch. 5 - 1, 2, 3, 4, 5: generaties: 1e, 2e, 3e / assembly language / compilers / machine independence / paradigma’s / imperatief / object-georienteerd / programming concepts / procedures / parameters / call by value/reference
– Ch. 6 - 1, 2, 3: software life cycle / ontwikkelings-fase / modulariteit / koppeling / cohesie / documentatie / complexiteits-maat voor software
– Ch. 7 - 1, (2-5): datastructuren / abstractie / statisch vs. dynamisch / pointers / (arrays, lists, stacks, queues, etc…)
– Ch. 8 - 1, 2, 3, 4: files / sequential / tekst / indexed / hashing
– Ch. 9 - 1, 2, 3: databases vs. ‘platte’ files / relaties / tuples / attributen / relationele operaties: SELECT, PROJECT, JOIN
Slide 8-41 Copyright © 2003 Pearson Education, Inc.
Overzicht Informatica – Tentamen (3)
• Geen tentamenstof:– Ch. 3.5 - 3.7 (editie 7) : Networks
– Ch. 4 (editie 8) : Networking and the Internet
– Ch. 10 (editie 7 & 8) : Artificial Intelligence
– Ch. 11 (editie 7 & 8) : Theory of Computation
Veel succes!