Upload
erick-hudson
View
215
Download
1
Embed Size (px)
Citation preview
VCE IT Theory Slideshows - ITA
By Mark KellyMcKinnon Secondary CollegeVceit.com
Updated by Jenny Gielb Chisholm Institute of TAFE, Dandenong
Database NormalisationVersion 1
Contents
• What is normalisation?• Why normalise?• Normal forms 1,2,3
What is normalisation?
• Organising the data in a relational database so…– Data repetition is minimised– Data access is maximised
Why normalise?• Removing data repetition saves lots of storage
space, speeds up data access and reduces errors.
• Changes need only be made in one place rather than in many places.
• More powerful data access is possible• Allows more information to be easily stored• Allows users to get all sorts of information out
of the stored data i.e. How many widgets did we sell last month?
The normal forms• Are called 1NF (first normal form) to 5NF, but
only 1-3 matter here.• Are guidelines (not laws) for structuring
database tables and fields.• Note: they are often applied instinctively as
part of skilled database design, and are not an extra step to do after databases are created.
REMEMBER – 1st and 2nd normal forms are stage/steps to achieving the objective, which
is 3rd normal form
History of Data Storage Techniques
• Data first stored as records only, everything on one line, usually on a tape – Sequential
• To get to a certain record you had to read through all the other records first, and start at the beginning each time. Took forever!
History of Data Storage Techniques
• Hard disks, and indexing, allowed businesses to store data more effectively.
• The data can then be stored in different areas on the hard disk and an index used to access it
Database Indexes• Indexes become very important• An index is a list that records where
everything is placed on the hard disks– The disk/platter– The track number– The section of track
Database Indexes• This meant that data could be stored anywhere on
hard disks, it didn’t have to all be together• The Index would find the required data no matter
what information you entered Also, computers were getting much faster, so accessing this data was much faster and easier so they could make
more complex indexes
• Have you ever looked up the index of a recipe book? You can look up Chocolate Sponge cake under Chocolate and Cakes
Database Indexes
Hierarchical databases
• The first types of databases
Hierarchical databases
• Data flowed from top to bottom. • To get the price of cucumber, you had to
know that it was Produce.• Slow, could only answer a few questions and
needed complex programs to use them• Could not answer the question ‘What aisle are
the lettuces in? Quicker to go find a shelf-packer.
Relational Databases
Then someone (Edgar F Codd?) invented a more complex indexing system that:•allowed access to all the data from any angle,•used codes to link tables together,•used ‘relationships’ to show the links between tables
Relational Databases
To answer the question – What aisle are Lebanese cucumbers in? The database uses the Item Type Code to look for the Contents Code to get the answer – Produce in Aisle 1
The Challenge• The challenge is to get data into these
meaningful, organised groupings• The data that you, as a programmer,
will be presented with, will be in a mess!
• If you are lucky important information will be in spread sheets, but it could be in files, hand written on scraps of paper, stuck on the side of the filing cabinet, even on the back of the office toilet door!
Steps1. Collect all the data2. Find out what information the users want from the
data3. Design the database4. Organise the data:
– Break it down into meaningful groups of data– Work out your linking codes so that each table points to
another one– Work out which data is being changed all the time and
which data is changed rarely
As you organise the data, you usually go through stages – these are called normalising the data.
The Normal Forms
• First Normal Form (1NF) – fields split up properly
• Second Normal Form (2NF) – first stage of breaking up the data into meaningful groupings called tables, some codes used
• Third Normal Form (3NF) – data completely broken up into tables and linked by codes
1NF 1NF
1NF• First Normal Form - sets the most basic rules
for an organised database• The 1NF guidelines are common sense.
1. Eliminate duplicate data where possible2. Break up fields so only one data item is in each
field3. Convert any data into correct format4. Start to organise the data into meaningful
groupings
Things 1NF wants
• No duplicate rows (records). Each row must be unique in some way.
• Each field entry can only contain one piece of data.– A name field containing “Fred Smith” has surname
and first name, violating 1NF.– A phone number field with more than one phone
number entered for a person
Things 1NF wants
Each field entry can only contain one piece of data. Why?•You cannot easily access the data embedded in the single field (e.g. grab a postcode)•You can’t use embedded data for sorting•You can’t use data like “2kg” as a number for calculations, sorting, summaries etc.
Your turn… repair this!
Customer ID Name Phone
111 Fred Smith 4566 3456
222 Mary Jones 4567 8900
333 Tim Blogs 3254 5676
Repaired!
Customer ID FirstName Surname
111 Fred Smith
222 Mary Jones
333 Tim Blogs
Now, customers can be sorted and searched by first name and/or surname separately.
Also, the names can be used individually, like “Dear Fred” instead of “Dear Fred Smith”
Repair This!Product ID Colour Weight
A345 Red 4kg
A568 Blue 300g
B695 White 1.5kg
Repaired!Product ID Colour Weight (g)
A345 Red 4000
A568 Blue 300
B695 White 1500
Repair This!
• An address like “3 Fred St, Sale, 3586” has 3 pieces of data: street address, town, postcode.
Customer ID Address
111 66 Lake Rd, Mentone, 3198
222 2/45 Richmond Lane, Richmond, 3121
333 135 Spring St, Melbourne, 3000
Repaired!
Now each field can be searched & sorted and used individually (e.g. addressing envelopes)
Customer ID Street Suburb Postcode
111 66 Lake Rd Mentone 3198
222 2/45 Richmond Lane Richmond 3121
333 135 Spring St Melbourne 3000
2NF 2NF
2NF – Second Normal Form• Achieving 2NF means 1NF has already been
achieved• Each normal form builds on the previous forms• Removes more duplicate data. • Deals with design problems that could threaten
data integrity.
2NF – Second Normal Form• Remove subsets of data that apply to multiple
rows of a table and place them in separate tables.
• Create relationships between these new tables and their predecessors using unique keys.
CUSTOMER
Customer ID Name Phone
111 Fred Smith 4566 3456
222 Mary Jones 4567 8900 (BH)3456 2314 (AH)
333 Tim Blogs 3254 56760402 697 495
Raw data
First normal form…
Repetition removed Fields broken up
but…
Customer ID Last Name First Name Phone1 Phone2
111 Smith Fred 4566 3456
222 Jones Mary 4567 8900 3456 2314
333 Blogs Tim 3254 5676 0402 697 495
Problems:•Trouble querying the table: “Which customer has phone # 3456 2314?” Have to search more than 1 field… messy.• Can’t enforce validation rules to prevent duplicate phone #s• Can’t enter three or more phone numbers• Waste of space for all people with only 1 number•If Mary Jones got married and changed her name, changes would need to be made in more than one record. If one change were missed, the integrity of the data would be damaged. •Making multiple changes like this is also time-consuming and repetitious, thereby eating up storage space.
Solution: Put the phone numbers into their own table as there can be more than one phone number for each name.
2nd Normal Form (2NF)CUSTOMER PHONE TABLE
Customer ID Phone
111 4566 3456
222 4567 8900
222 3456 2314
333 3254 5676
333 0402 697 495
Customer ID Last Name First Name111 Smith Fred222 Jones Mary333 Blogs Tim
Relationship
Called a ‘1 to many relationship’One customer record to many phone numbers
Also written as 1:many or 1:∞
Database Design
• The design would be drawn like this
Benefits:
•Name changes now only need to be made once.•Unlimited phone numbers for everyone!• No need to search multiple Phone fields• No need to search through all text to extract a particular phone number• All we need is a 1:many relationship between customer name table and customer phone table using the Customer ID as the key field.
2nd Normal Form (2NF)
Without 2NF: flat file With 2NF: relational
Department data is only stored once. So:• Less storage space required• Department changes now only made once, not once for each worker in that dept!
Another example
2NF
The table above is a problem. Let’s say {Model Full Name} is the primary key.The {Manufacturer Country} field is based on the {Manufacturer}
field, and will need to be constantly updated if manufacturers change their location.
To be properly 2NF, you’d need to do this…
Electric Toothbrush Models
Manufacturer Model Model Full Name Manufacturer CountryForte X-Prime Forte X-Prime ItalyForte Ultraclean Forte Ultraclean ItalyDent-o-Fresh EZBrush Dent-o-Fresh EZBrush USAKobayashi SR=60 Koboyashi ST-60 JapanHoch Toothmaster Hoch Toothmaster GermanyHoch X-Prime Hoch X-Prime Germany
2NFManufacturer Manufacturer Country
Forte Italy
Forte Italy
Dent-o-Fresh USA
Kobayashi Japan
Hoch Germany
Hoch Germany
Model ModelFullName
X-Prime Forte X-Prime
Ultraclean Forte Ultraclean
EZBrush Dent-o-Fresh EZBrush
ST-60 Koboyashi ST-60
Toothmaster Hoch Toothmaster
X-Prime Hoch X-Prime
Now the data is grouped – Manufacturer details in one table, Model details in the other, BUT how do you know which manufacturer makes which model now?
2NF
Make the same key fields in each table
Manufacturer Manufacturer CountryForte ItalyDent-o-Fresh USAKobayashi JapanHoch Germany
Manufacturer Model ModelFullName
Forte X-Prime Forte X-Prime
Forte Ultraclean Forte Ultraclean
Dent-o-Fresh EZBrush Dent-o-Fresh EZBrush
Kobayashi SR=60 Koboyashi ST-60
Hoch Toothmaster Hoch Toothmaster
Hoch X-Prime Hoch X-Prime
Set up the relationship
between the key fields in each table
3NF 3NF
3NFThird normal form (3NF) goes one step further•Use codes to minimize the amount of storage •Use codes as links to other tables so can find any information•Sets up relationships between tables•In each table only need to have fields that are dependant on the primary key•Also divides data as reference and transaction data.
Using the previous example - 2NFManufacturer Manufacturer Country
Forte Italy
Dent-o-Fresh USA
Kobayashi Japan
Hoch Germany
Manufacturer Model Model Full Name
Forte X-Prime Forte X-Prime
Forte Ultraclean Forte Ultraclean
Dent-o-Fresh EZBrush Dent-o-Fresh EZBrush
Kobayashi SR=60 Koboyashi ST-60
Hoch Toothmaster Hoch Toothmaster
Hoch X-Prime Hoch X-Prime
3NFTo get it to 3rd normal form, replace repeating data with codes.
MCode Manufacturer Manufacturer Country
1 Forte Italy
2 Dent-o-Fresh USA
3 Kobayashi Japan
4 Hoch Germany
MCode Model ModelFullName
1 X-Prime Forte X-Prime
1 Ultraclean Forte Ultraclean
2 EZBrush Dent-o-Fresh EZBrush
3 ST-60 Koboyashi ST-60
4 Toothmaster Hoch Toothmaster
4 X-Prime Hoch X-Prime
Reference and Transaction DataAll data can be classified as either reference data or transaction data
Reference Data is data that rarely changes and is ‘referred’ to (or used in lookups):•people’s names•addresses•Products
Starts with a unique code that is used in other tables
Reference and Transaction DataTransaction Data is data that is regularly changed (edit, add or delete) •when a customer buys something, •when someone withdraws money, •when someone wins a tournament.
Usually has a unique code, a date, and information about the transaction, i.e. the
purchase price and who made the purchase. Uses the codes set up in Reference Data tables
3NF
Field name underlining indicates key fields.You may have a gut feeling that this table is not good. But why?
3NF
Each attribute (‘field’) should be giving information about the key field (a particular tournament + year).
3NF
This is wrong because the DOB does not describe the key field (tournament). It describes a looked-up value (the tournament’s winner).
3NF FAIL
It’s like your mum keeping her knickers in your sock drawer because you’re related to her.
They don’t belong there!
Raw Data
1NF
First Name Last Name DOB Tournament Year
Chip Masterton 14/03/1977 Indiana Invitational 1999
Al Fredrickson 21/07/1975 Indiana Invitational 1998
Bob Albertson 28/09/1968 Cleveland Open 1999
Al Fredrickson 21/07/1975 Des Moines Masters 1999
• Data broken up into separate fields• Date of birth converted into proper format
2NF
• Data grouped but …• Data is still repeated
Player Code First Name Last Name DOB1 Chip Masterton 14/03/19772 Al Fredrickson 21/07/19753 Bob Albertson 28/09/1968
Player Phone Numbers
Tournament Winners
Player Code First Name Last Name Tournament Year1 Chip Masterton Indiana Invitational 19992 Al Fredrickson Indiana Invitational 19983 Bob Albertson Cleveland Open 19992 Al Fredrickson Des Moines Masters 1999
3NF
• Data grouped meaningfully - Tournaments, Players, Winners• No repeating data• Codes used to link tables• Relationships created
TournamentCode Tournament1 Indiana Invitational2 Cleveland Open3 Des Moines Masters
Player Code TournamentCode Year1 1 19992 1 19982 3 19993 2 1999
Player Code First Name Last Name DOB1 Chip Masterton 14/03/19772 Al Fredrickson 21/07/19753 Bob Albertson 28/09/1968
Tournaments
Players
Tournament Winners
Reference and Transaction Data
• Transaction Data– The Tournaments Winners tables is regularly
updated, every time someone wins a tournament
• Reference Data– The Players table only changes when someone
else joins or leaves a tournament– The Tournaments table changes when the
tournament name changes or new tournaments are added or deleted.
Reference DataTournamentCode Tournament
1 Indiana Invitational2 Cleveland Open3 Des Moines Masters
Player Code First Name Last Name DOB1 Chip Masterton 14/03/19772 Al Fredrickson 21/07/19753 Bob Albertson 28/09/1968
Player Code TournamentCode Year1 1 19992 1 19982 3 19993 2 1999
Transaction Data
• Unique code• Lookup data• Changed rarely
• Uses codes from reference data• Has extra information about event• Changes frequently
Tournaments
Players
Tournament Winners table
Entering the data
Don’t worry about the logistics of putting the codes into the data yet. This is dealt with later in the program.
Normalise this dataBounces Online Books
Name Address Book purchased Item Cost Date of purchase Quantity Total Cost
Tom Jones 56 Latrobe Street,Melbourne, VIC 3000 The Girl in the Hornet's Nest $24.95 08/03/2011 1 $24.95
Tom Jones 65 Latrobe Street,Melbourne, VIC 3000 Curiosity Killed the Cat $14.95 08/03/2011 1 $14.95
Mary Small 236 Smith Street, Collingwood VIC 3002 Lord of the Necklaces $18.95 10/03/2011 2 $37.90
Mary Small 237 Smith Street, Collingwood VIC 3002 The Girl in the Hornet's Nest $24.95 10/03/2011 1 $24.95
Fred Blogs 45 High Street, Sydney, NSW, 2000 The Hobby $13.95 12/03/2011 2 $27.90
Fred Blogs 45 High Street, Sydney, NSW, 2000 Lord of the Necklaces $24.95 12/03/2011 1 $24.95
Fred Blogs 45 High Street, Newcastle, NSW, 2000 The Girl in the Hornet's Nest $24.95 12/03/2011 1 $24.95
First stage - 1NFFirst Name
Last Name Address1
Address2 Suburb State Postcode Book purchased Item Cost
Date of purchase Quantity Total Cost
Tom Jones56 Latrobe Street Melbourne VIC 3000
The Girl in the Hornet's Nest $24.95 08/03/2011 1 $24.95
Tom Jones65 Latrobe Street Melbourne VIC 3000
Curiosity Killed the Cat $14.95 08/03/2011 1 $14.95
Mary Small236 Smith Street Collingwood VIC 3002Lord of the Necklaces $18.95 10/03/2011 2 $37.90
Mary Small236 Smith Street Collingwood VIC 3002
The Girl in the Hornet's Nest $24.95 10/03/2011 1 $24.95
Fred Blogs 45 High Street Sydney NSW 2000The Hobby $13.95 12/03/2011 2 $27.90
Fred Blogs 45 High Street Sydney NSW 2000Lord of the Necklaces $24.95 12/03/2011 1 $24.95
Fred Blogs 45 High Street Sydney NSW 2000The Girl in the Hornet's Nest $24.95 12/03/2011 1 $24.95
Second Stage – 2NFCustomerCode
First Name Last Name Address1 Address2 Suburb State Postcode
116 Tom Jones56 Latrobe Street Melbourne VIC 3000
457 Mary Small 236 Smith Street Collingwood VIC 3002
890 Fred Blogs 45 High Street Sydney NSW 2000
CustomerCode Book purchased Item CostDate of
purchase Quantity Total Cost
116The Girl in the Hornet's Nest $24.95 08/03/2011 1 $24.95
116 Curiosity Killed the Cat $14.95 08/03/2011 1 $14.95
457 Lord of the Necklaces $18.95 10/03/2011 2 $37.90
457The Girl in the Hornet's Nest $24.95 10/03/2011 1 $24.95
890 The Hobby $13.95 12/03/2011 2 $27.90
890 Lord of the Necklaces $24.95 12/03/2011 1 $24.95
890The Girl in the Hornet's Nest $24.95 12/03/2011 1 $24.95
Customer table
Books Purchased table
Third Stage - 3NFCustomer Table
CustomerCodeFirst Name Last Name Address1 Address2 Suburb State Postcode
116 Tom Jones 56 Latrobe Street Melbourne VIC 3000457 Mary Small 236 Smith Street Collingwood VIC 3002890 Fred Blogs 45 High Street Sydney NSW 2000
Purchases TableCustomerCode BookCode Date of purchase Quantity Total
116 1 08/03/2011 1 $24.95116 15 08/03/2011 1 $14.95457 36 10/03/2011 2 $37.90457 1 10/03/2011 1 $24.95890 4 12/03/2011 2 $27.95890 36 12/03/2011 1 $28.95890 1 12/03/2011 1 $24.95
Books TableBookCode Book Name Genre Item Cost
1 The Girl in the Hornet's Nest Murder Mystery $24.9515 Curiosity Killed the Cat Romance $14.9536 Lord of the Necklaces Fantasy $18.954 The Hobby Fantasy $13.95
Reference and Transaction Data
• Which tables are Reference Data tables?– Customer table– Book table
• Which table is a Transaction data table?– Purchases table
The front-end screen would look something like this:
Purchases data entered into the Transaction table, with drop-down lists which use data from the
Reference Data tables
In other words
• Let X → A be a nontrivial FD (i.e. one where X does not contain A) and let A be a non-key attribute. Also let Y be a key of R. Then Y → X. Therefore A is not transitively dependent on Y if and only if X → Y, that is, if and only if X is a superkey.
’kay?
By Mark KellyMcKinnon Secondary Collegevceit.com
These slideshows may be freely used, modified or distributed by teachers and students anywhere on the planet (but not elsewhere).
They may NOT be sold. They must NOT be redistributed if you modify them.
VCE IT THEORY SLIDESHOWS