Upload
willa-hodges
View
224
Download
0
Tags:
Embed Size (px)
Citation preview
IS6125 Database Analysis and DesignLecture 11: Normalization of Data TablesRob Gleasure
IS6125
Today’s session Normalisation Revision: subjects covered and the types of questions to expect
Essay style questions Modelling questions
Normalisation
Not actually as terrifying as it sounds…
Just about making a database as efficient as possible by breaking big tables with redundant data into smaller tables with less redundant data
We do this by taking advantage of functional dependencies
Inferring FunctionalDependencies (The Armstrong Axioms) 1. Reflexivity:
If Y is a subset of X, then X Y
2. Augmentation: If X Y, then XZ YZ
3: Transitivity: If X Y, and Y Z, then X Z
Normalisation: Orders Table
Full_Name
Address Zone Order_ID
Date Product_1
Cost_P1
Units_P1
Product_2
Cost_P2
Units_P2
Product_3
Cost_P3
Units_P3
John Murphy
123 Fake St
Inner-city
S345 31/12/2014
Football $20.00 2 Gloves $53.50 1 Whistle $5.00 1
Mary Byrne
Kildaman-fadar
Rural R367 9/9/2014
Helmet $30.50 1
Anne Dunne
123 Fake St
Inner-city
N654 10/6/2014
Pants $13.75 2 Hat $11.00 2
Jim Feltz
20c Fake St
Inner-city
D896 13/06/2014
Hat $28.75 2 Boots $75.95 1
John Murphy
123 Fake St
Inner-city
S354 1/01/2015
Socks $3.50 5
Normalisation: First Normal Form
First_Name
Address Zone Order_ID
Date Product Cost Units
John Murphy
123 Fake St
Inner-city S345 31/12/2014
Football $20.00 2
John Murphy
123 Fake St
Inner-city S345 31/12/2014
Gloves $53.50 1
John Murphy
123 Fake St
Inner-city S345 31/12/2014
Whistle $5.00 1
John Murphy
123 Fake St
Inner-city S354 31/12/2014
Socks $3.50 5
Mary Ahern
Kildaman-fadar
Rural R367 9/9/2014
Helmet $30.50 1
Anne Dunne
123 Fake St
Inner-city N654 10/6/2014
Pants $13.75 2
Anne Dunne
123 Fake St
Inner-city N654 10/6/2014
Hat $11.00 2
Jim Feltz
20c Fake St
Inner-city D896 13/06/2014
Hat $28.75 2
Jim Feltz
20c Fake St
Inner-city D896 13/06/2014
Boots $75.95 1
First Normal Form (continued)
First_Name
Last_Name
Address Zone Order_ID
Date Product Cost Units
John Murphy 123 Fake St
Inner-city
S345 31/12/2014
Football $20.00 2
John Murphy 123 Fake St
Inner-city
S345 31/12/2014
Gloves $53.50 1
John Murphy 123 Fake St
Inner-city
S345 31/12/2014
Whistle $5.00 1
John Murphy 123 Fake St
Inner-city
S354 31/12/2014
Socks $3.50 5
Mary Byrne Kildaman-fadar
Rural R367 9/9/2014
Helmet $30.50 1
Anne Dunne 123 Fake St
Inner-city
N654 10/6/2014
Pants $13.75 2
Anne Dunne 123 Fake St
Inner-city
N654 10/6/2014
Hat $11.00 2
Jim Feltz 20c Fake St
Inner-city
D896 13/06/2014
Hat $28.75 2
Jim Feltz 20c Fake St
Inner-city
D896 13/06/2014
Boots $75.95 1
Summary of First Normal Form (1NF) A database is in the first normal form when
Attributes store only atomic values Duplicate columns are removed
Moving to Second Normal Form
First_Name
Last_Name
Address Zone Order_ID
Date Product Cost Units
John Murphy 123 Fake St
Inner-city
S345 31/12/2014
Football $20.00 2
John Murphy 123 Fake St
Inner-city
S345 31/12/2014
Gloves $53.50 1
John Murphy 123 Fake St
Inner-city
S345 31/12/2014
Whistle $5.00 1
John Murphy 123 Fake St
Inner-city
S354 31/12/2014
Socks $3.50 5
Mary Byrne Kildaman-fadar
Rural R367 9/9/2014
Helmet $30.50 1
Anne Dunne 123 Fake St
Inner-city
N654 10/6/2014
Pants $13.75 2
Anne Dunne 123 Fake St
Inner-city
N654 10/6/2014
Hat $11.00 2
Jim Feltz 20c Fake St
Inner-city
D896 13/06/2014
Hat $28.75 2
Jim Feltz 20c Fake St
Inner-city
D896 13/06/2014
Boots $75.95 1
Second Normal Form
Cust_ID
Order_ID
Date Product Cost Units
1 S345 31/12/2014
Football $20.00
2
1 S345 31/12/2014
Gloves $53.50
1
1 S345 31/12/2014
Whistle $5.00 1
1 S354 31/12/2014
Socks $3.50 5
2 R367 9/9/2014
Helmet $30.50
1
3 N654 10/6/2014
Pants $13.75
2
3 N654 10/6/2014
Hat $11.00
2
4 D896 13/06/2014
Hat $28.75
2
4 D896 13/06/2014
Boots $75.95
1
Cust_ID First_Name
Last_Name
Address Zone
1 John Murphy 123 Fake St Inner-city
2 Mary Byrne Kildaman-fadar
Rural
3 Anne Dunne 123 Fake St Inner-city
4 Jim Feltz 20c Fake St Inner-city
Second Normal Form (Continued)Cust_ID
Order_ID
Date Product Units
1 S345 31/12/2014
1 2
1 S345 31/12/2014
2 1
1 S345 31/12/2014
3 1
1 S354 31/12/2014
4 5
2 R367 9/9/2014
5 1
3 N654 10/6/2014
6 2
3 N654 10/6/2014
7 2
4 D896 13/06/2014
7 2
4 D896 13/06/2014
8 1
Cust_ID
First_Name
Last_Name
Address Zone
1 John Murphy 123 Fake St Inner-city
2 Mary Byrne Kildaman-fadar
Rural
3 Anne Dunne 123 Fake St Inner-city
4 Jim Feltz 20c Fake St Inner-city
Product_ID
Product_1
Cost_P1
1 Football $20.00
2 Gloves $53.50
3 Whistle $5.00
4 Socks $3.50
5 Helmet $30.50
6 Pants $13.75
7 Hat $11.00
8 Boots $75.95
Second Normal Form (Continued)
Cust_ID
Order_ID
Product Units
1 S345 1 2
1 S345 2 1
1 S345 3 1
1 S354 4 5
2 R367 5 1
3 N654 6 2
3 N654 7 2
4 D896 7 2
4 D896 8 1
Cust_ID
First_Name
Last_Name
Address Zone
1 John Murphy 123 Fake St Inner-city
2 Mary Byrne Kildaman-fadar
Rural
3 Anne Dunne 123 Fake St Inner-city
4 Jim Feltz 20c Fake St Inner-city
Product_ID
Product_1
Cost_P1
1 Football $20.00
2 Gloves $53.50
3 Whistle $5.00
4 Socks $3.50
5 Helmet $30.50
6 Pants $13.75
7 Hat $11.00
8 Boots $75.95
Order_ID
Date
S345 31/12/2014
S354 31/12/2014
R367 09/09/2014
N654 10/6/2014
D896 13/06/2014
Second Normal Form (Continued)
Order_ID
Product Units
S345 1 2
S345 2 1
S345 3 1
S354 4 5
R367 5 1
N654 6 2
N654 7 2
D896 7 2
D896 8 1
Cust_ID
First_Name
Last_Name
Address Zone
1 John Murphy 123 Fake St Inner-city
2 Mary Byrne Kildaman-fadar
Rural
3 Anne Dunne 123 Fake St Inner-city
4 Jim Feltz 20c Fake St Inner-city
Product_ID
Product_1
Cost_P1
1 Football $20.00
2 Gloves $53.50
3 Whistle $5.00
4 Socks $3.50
5 Helmet $30.50
6 Pants $13.75
7 Hat $11.00
8 Boots $75.95
Order_ID
Cust_ID
S345 1
R367 2
N654 3
D896 4
Order_ID
Date
S345 31/12/2014
S354 31/12/2014
R367 09/09/2014
N654 10/6/2014
D896 13/06/2014
Summary of Second Normal Form (2NF) A database is in the second normal form when
It satisfies the criteria for the first normal form Each non-candidate key is dependent on the whole candidate
key (i.e. subsets of data across multiple rows are removed) Put differently, we have no partial dependencies via a
concatenated key
Takes advantage of reflexivity and augmentation
Moving to Third Normal Form
Order_ID
Product Units
S345 1 2
S345 2 1
S345 3 1
S354 4 5
R367 5 1
N654 6 2
N654 7 2
D896 7 2
D896 8 1
Cust_ID
First_Name
Last_Name
Address Zone
1 John Murphy 123 Fake St Inner-city
2 Mary Byrne Kildaman-fadar
Rural
3 Anne Dunne 123 Fake St Inner-city
4 Jim Feltz 20c Fake St Inner-city
Product_ID
Product_1
Cost_P1
1 Football $20.00
2 Gloves $53.50
3 Whistle $5.00
4 Socks $3.50
5 Helmet $30.50
6 Pants $13.75
7 Hat $11.00
8 Boots $75.95
Order_ID
Cust_ID
S345 1
R367 2
N654 3
D896 4
Order_ID
Date
S345 31/12/2014
S354 31/12/2014
R367 09/09/2014
N654 10/6/2014
D896 13/06/2014
Moving to Third Normal Form
Order_ID
Product Units
S345 1 2
S345 2 1
S345 3 1
S354 4 5
R367 5 1
N654 6 2
N654 7 2
D896 7 2
D896 8 1
Cust_ID
First_Name
Last_Name
Address
1 John Murphy 123 Fake St
2 Mary Byrne Kildaman-fadar
3 Anne Dunne 123 Fake St
4 Jim Feltz 20c Fake St
Product_ID
Product_1
Cost_P1
1 Football $20.00
2 Gloves $53.50
3 Whistle $5.00
4 Socks $3.50
5 Helmet $30.50
6 Pants $13.75
7 Hat $11.00
8 Boots $75.95
Order_ID
Cust_ID
S345 1
R367 2
N654 3
D896 4
Address Zone
123 Fake St
Inner-city
20c Fake St
Inner-city
Kildaman-fadar
Rural
Order_ID
Date
S345 31/12/2014
S354 31/12/2014
R367 09/09/2014
N654 10/6/2014
D896 13/06/2014
Summary of Third Normal Form (3NF) A database is in the second normal form when
It satisfies the criteria for the second normal form Each non-key attribute that depends on anything other than the
entire primary key is removed (insertion anomalies are impossible)
Put differently, we have no transitive dependencies via non-key attributes
Takes advantage of transitivity
Exam Revision
Image from http://www.studentmoneysaver.co.uk/article/6-revision-tips-which-actually-work/
Essay-Style Questions
Topics covered The cloud and datafication
What is data and how does something become ‘datafied’? How and why did cloud technologies evolve? What does it mean in terms of technological and business
capabilities? What is the Internet of Things? What does the future hold? Can you use contrasting examples of different businesses to
discuss each of these headings?
Essay-Style Questions
Topics covered Big data
When is data ‘big data’? How and why did we get from ‘small data’ to ‘big data’? What does big data let businesses do that they couldn’t do
previously? What businesses are a good example off this? What are the issues and challenges arising from big data? Can you use contrasting examples of different businesses to
discuss each of these headings?
Essay-Style Questions
Topics covered Business intelligence
What kind of intelligence is enabled as we increase our measurement and analysis capabilities?
How do we get from an individual case to a large-scale pattern, and back again?
What are the challenges of translating intelligence from an individual case to large-scale patterns, and back again?
What businesses exemplify the ability to generate intelligence from the increased capacity for data handling, and why?
Can you use contrasting examples of different businesses to discuss each of these headings?
Modelling Questions
Topics covered Modelling question will expect
1. A model
2. Constraints
3. Assumptions
You may also be asked to discuss issues, such as Differences between stages of ER modelling The reasons for a staggered approach to data modelling Commonly encountered issues
Answering Questions
Exam technique Manage your time Plan your answers Sketch out your diagrams very quickly as roughwork if you’re not
sure how to make them fit together Answer your best questions first Use examples
Have these lined up as part of your revision
Readings
Some more descriptions of normal forms http://databases.about.com/od/specificproducts/a/
normalization.htm http://phlonx.com/resources/nf3/ http://www.bkent.net/Doc/simple5.htm