24
IS6125 Database Analysis and Design Lecture 11: Normalization of Data Tables Rob Gleasure [email protected] www.robgleasure.com

IS6125 Database Analysis and Design Lecture 11: Normalization of Data Tables Rob Gleasure [email protected]

Embed Size (px)

Citation preview

Page 1: IS6125 Database Analysis and Design Lecture 11: Normalization of Data Tables Rob Gleasure R.Gleasure@ucc.ie

IS6125 Database Analysis and DesignLecture 11: Normalization of Data TablesRob Gleasure

[email protected]

Page 2: IS6125 Database Analysis and Design Lecture 11: Normalization of Data Tables Rob Gleasure R.Gleasure@ucc.ie

IS6125

Today’s session Normalisation Revision: subjects covered and the types of questions to expect

Essay style questions Modelling questions

Page 3: IS6125 Database Analysis and Design Lecture 11: Normalization of Data Tables Rob Gleasure R.Gleasure@ucc.ie

Normalisation

Not actually as terrifying as it sounds…

Just about making a database as efficient as possible by breaking big tables with redundant data into smaller tables with less redundant data

We do this by taking advantage of functional dependencies

Page 4: IS6125 Database Analysis and Design Lecture 11: Normalization of Data Tables Rob Gleasure R.Gleasure@ucc.ie

Inferring FunctionalDependencies (The Armstrong Axioms) 1. Reflexivity:

If Y is a subset of X, then X Y

2. Augmentation: If X Y, then XZ YZ

3: Transitivity: If X Y, and Y Z, then X Z

Page 5: IS6125 Database Analysis and Design Lecture 11: Normalization of Data Tables Rob Gleasure R.Gleasure@ucc.ie

Normalisation: Orders Table

Full_Name

Address Zone Order_ID

Date Product_1

Cost_P1

Units_P1

Product_2

Cost_P2

Units_P2

Product_3

Cost_P3

Units_P3

John Murphy

123 Fake St

Inner-city

S345 31/12/2014

Football $20.00 2 Gloves $53.50 1 Whistle $5.00 1

Mary Byrne

Kildaman-fadar

Rural R367 9/9/2014

Helmet $30.50 1

Anne Dunne

123 Fake St

Inner-city

N654 10/6/2014

Pants $13.75 2 Hat $11.00 2

Jim Feltz

20c Fake St

Inner-city

D896 13/06/2014

Hat $28.75 2 Boots $75.95 1

John Murphy

123 Fake St

Inner-city

S354 1/01/2015

Socks $3.50 5

Page 6: IS6125 Database Analysis and Design Lecture 11: Normalization of Data Tables Rob Gleasure R.Gleasure@ucc.ie

Normalisation: First Normal Form

First_Name

Address Zone Order_ID

Date Product Cost Units

John Murphy

123 Fake St

Inner-city S345 31/12/2014

Football $20.00 2

John Murphy

123 Fake St

Inner-city S345 31/12/2014

Gloves $53.50 1

John Murphy

123 Fake St

Inner-city S345 31/12/2014

Whistle $5.00 1

John Murphy

123 Fake St

Inner-city S354 31/12/2014

Socks $3.50 5

Mary Ahern

Kildaman-fadar

Rural R367 9/9/2014

Helmet $30.50 1

Anne Dunne

123 Fake St

Inner-city N654 10/6/2014

Pants $13.75 2

Anne Dunne

123 Fake St

Inner-city N654 10/6/2014

Hat $11.00 2

Jim Feltz

20c Fake St

Inner-city D896 13/06/2014

Hat $28.75 2

Jim Feltz

20c Fake St

Inner-city D896 13/06/2014

Boots $75.95 1

Page 7: IS6125 Database Analysis and Design Lecture 11: Normalization of Data Tables Rob Gleasure R.Gleasure@ucc.ie

First Normal Form (continued)

First_Name

Last_Name

Address Zone Order_ID

Date Product Cost Units

John Murphy 123 Fake St

Inner-city

S345 31/12/2014

Football $20.00 2

John Murphy 123 Fake St

Inner-city

S345 31/12/2014

Gloves $53.50 1

John Murphy 123 Fake St

Inner-city

S345 31/12/2014

Whistle $5.00 1

John Murphy 123 Fake St

Inner-city

S354 31/12/2014

Socks $3.50 5

Mary Byrne Kildaman-fadar

Rural R367 9/9/2014

Helmet $30.50 1

Anne Dunne 123 Fake St

Inner-city

N654 10/6/2014

Pants $13.75 2

Anne Dunne 123 Fake St

Inner-city

N654 10/6/2014

Hat $11.00 2

Jim Feltz 20c Fake St

Inner-city

D896 13/06/2014

Hat $28.75 2

Jim Feltz 20c Fake St

Inner-city

D896 13/06/2014

Boots $75.95 1

Page 8: IS6125 Database Analysis and Design Lecture 11: Normalization of Data Tables Rob Gleasure R.Gleasure@ucc.ie

Summary of First Normal Form (1NF) A database is in the first normal form when

Attributes store only atomic values Duplicate columns are removed

Page 9: IS6125 Database Analysis and Design Lecture 11: Normalization of Data Tables Rob Gleasure R.Gleasure@ucc.ie

Moving to Second Normal Form

First_Name

Last_Name

Address Zone Order_ID

Date Product Cost Units

John Murphy 123 Fake St

Inner-city

S345 31/12/2014

Football $20.00 2

John Murphy 123 Fake St

Inner-city

S345 31/12/2014

Gloves $53.50 1

John Murphy 123 Fake St

Inner-city

S345 31/12/2014

Whistle $5.00 1

John Murphy 123 Fake St

Inner-city

S354 31/12/2014

Socks $3.50 5

Mary Byrne Kildaman-fadar

Rural R367 9/9/2014

Helmet $30.50 1

Anne Dunne 123 Fake St

Inner-city

N654 10/6/2014

Pants $13.75 2

Anne Dunne 123 Fake St

Inner-city

N654 10/6/2014

Hat $11.00 2

Jim Feltz 20c Fake St

Inner-city

D896 13/06/2014

Hat $28.75 2

Jim Feltz 20c Fake St

Inner-city

D896 13/06/2014

Boots $75.95 1

Page 10: IS6125 Database Analysis and Design Lecture 11: Normalization of Data Tables Rob Gleasure R.Gleasure@ucc.ie

Second Normal Form

Cust_ID

Order_ID

Date Product Cost Units

1 S345 31/12/2014

Football $20.00

2

1 S345 31/12/2014

Gloves $53.50

1

1 S345 31/12/2014

Whistle $5.00 1

1 S354 31/12/2014

Socks $3.50 5

2 R367 9/9/2014

Helmet $30.50

1

3 N654 10/6/2014

Pants $13.75

2

3 N654 10/6/2014

Hat $11.00

2

4 D896 13/06/2014

Hat $28.75

2

4 D896 13/06/2014

Boots $75.95

1

Cust_ID First_Name

Last_Name

Address Zone

1 John Murphy 123 Fake St Inner-city

2 Mary Byrne Kildaman-fadar

Rural

3 Anne Dunne 123 Fake St Inner-city

4 Jim Feltz 20c Fake St Inner-city

Page 11: IS6125 Database Analysis and Design Lecture 11: Normalization of Data Tables Rob Gleasure R.Gleasure@ucc.ie

Second Normal Form (Continued)Cust_ID

Order_ID

Date Product Units

1 S345 31/12/2014

1 2

1 S345 31/12/2014

2 1

1 S345 31/12/2014

3 1

1 S354 31/12/2014

4 5

2 R367 9/9/2014

5 1

3 N654 10/6/2014

6 2

3 N654 10/6/2014

7 2

4 D896 13/06/2014

7 2

4 D896 13/06/2014

8 1

Cust_ID

First_Name

Last_Name

Address Zone

1 John Murphy 123 Fake St Inner-city

2 Mary Byrne Kildaman-fadar

Rural

3 Anne Dunne 123 Fake St Inner-city

4 Jim Feltz 20c Fake St Inner-city

Product_ID

Product_1

Cost_P1

1 Football $20.00

2 Gloves $53.50

3 Whistle $5.00

4 Socks $3.50

5 Helmet $30.50

6 Pants $13.75

7 Hat $11.00

8 Boots $75.95

Page 12: IS6125 Database Analysis and Design Lecture 11: Normalization of Data Tables Rob Gleasure R.Gleasure@ucc.ie

Second Normal Form (Continued)

Cust_ID

Order_ID

Product Units

1 S345 1 2

1 S345 2 1

1 S345 3 1

1 S354 4 5

2 R367 5 1

3 N654 6 2

3 N654 7 2

4 D896 7 2

4 D896 8 1

Cust_ID

First_Name

Last_Name

Address Zone

1 John Murphy 123 Fake St Inner-city

2 Mary Byrne Kildaman-fadar

Rural

3 Anne Dunne 123 Fake St Inner-city

4 Jim Feltz 20c Fake St Inner-city

Product_ID

Product_1

Cost_P1

1 Football $20.00

2 Gloves $53.50

3 Whistle $5.00

4 Socks $3.50

5 Helmet $30.50

6 Pants $13.75

7 Hat $11.00

8 Boots $75.95

Order_ID

Date

S345 31/12/2014

S354 31/12/2014

R367 09/09/2014

N654 10/6/2014

D896 13/06/2014

Page 13: IS6125 Database Analysis and Design Lecture 11: Normalization of Data Tables Rob Gleasure R.Gleasure@ucc.ie

Second Normal Form (Continued)

Order_ID

Product Units

S345 1 2

S345 2 1

S345 3 1

S354 4 5

R367 5 1

N654 6 2

N654 7 2

D896 7 2

D896 8 1

Cust_ID

First_Name

Last_Name

Address Zone

1 John Murphy 123 Fake St Inner-city

2 Mary Byrne Kildaman-fadar

Rural

3 Anne Dunne 123 Fake St Inner-city

4 Jim Feltz 20c Fake St Inner-city

Product_ID

Product_1

Cost_P1

1 Football $20.00

2 Gloves $53.50

3 Whistle $5.00

4 Socks $3.50

5 Helmet $30.50

6 Pants $13.75

7 Hat $11.00

8 Boots $75.95

Order_ID

Cust_ID

S345 1

R367 2

N654 3

D896 4

Order_ID

Date

S345 31/12/2014

S354 31/12/2014

R367 09/09/2014

N654 10/6/2014

D896 13/06/2014

Page 14: IS6125 Database Analysis and Design Lecture 11: Normalization of Data Tables Rob Gleasure R.Gleasure@ucc.ie

Summary of Second Normal Form (2NF) A database is in the second normal form when

It satisfies the criteria for the first normal form Each non-candidate key is dependent on the whole candidate

key (i.e. subsets of data across multiple rows are removed) Put differently, we have no partial dependencies via a

concatenated key

Takes advantage of reflexivity and augmentation

Page 15: IS6125 Database Analysis and Design Lecture 11: Normalization of Data Tables Rob Gleasure R.Gleasure@ucc.ie

Moving to Third Normal Form

Order_ID

Product Units

S345 1 2

S345 2 1

S345 3 1

S354 4 5

R367 5 1

N654 6 2

N654 7 2

D896 7 2

D896 8 1

Cust_ID

First_Name

Last_Name

Address Zone

1 John Murphy 123 Fake St Inner-city

2 Mary Byrne Kildaman-fadar

Rural

3 Anne Dunne 123 Fake St Inner-city

4 Jim Feltz 20c Fake St Inner-city

Product_ID

Product_1

Cost_P1

1 Football $20.00

2 Gloves $53.50

3 Whistle $5.00

4 Socks $3.50

5 Helmet $30.50

6 Pants $13.75

7 Hat $11.00

8 Boots $75.95

Order_ID

Cust_ID

S345 1

R367 2

N654 3

D896 4

Order_ID

Date

S345 31/12/2014

S354 31/12/2014

R367 09/09/2014

N654 10/6/2014

D896 13/06/2014

Page 16: IS6125 Database Analysis and Design Lecture 11: Normalization of Data Tables Rob Gleasure R.Gleasure@ucc.ie

Moving to Third Normal Form

Order_ID

Product Units

S345 1 2

S345 2 1

S345 3 1

S354 4 5

R367 5 1

N654 6 2

N654 7 2

D896 7 2

D896 8 1

Cust_ID

First_Name

Last_Name

Address

1 John Murphy 123 Fake St

2 Mary Byrne Kildaman-fadar

3 Anne Dunne 123 Fake St

4 Jim Feltz 20c Fake St

Product_ID

Product_1

Cost_P1

1 Football $20.00

2 Gloves $53.50

3 Whistle $5.00

4 Socks $3.50

5 Helmet $30.50

6 Pants $13.75

7 Hat $11.00

8 Boots $75.95

Order_ID

Cust_ID

S345 1

R367 2

N654 3

D896 4

Address Zone

123 Fake St

Inner-city

20c Fake St

Inner-city

Kildaman-fadar

Rural

Order_ID

Date

S345 31/12/2014

S354 31/12/2014

R367 09/09/2014

N654 10/6/2014

D896 13/06/2014

Page 17: IS6125 Database Analysis and Design Lecture 11: Normalization of Data Tables Rob Gleasure R.Gleasure@ucc.ie

Summary of Third Normal Form (3NF) A database is in the second normal form when

It satisfies the criteria for the second normal form Each non-key attribute that depends on anything other than the

entire primary key is removed (insertion anomalies are impossible)

Put differently, we have no transitive dependencies via non-key attributes

Takes advantage of transitivity

Page 18: IS6125 Database Analysis and Design Lecture 11: Normalization of Data Tables Rob Gleasure R.Gleasure@ucc.ie

Exam Revision

Image from http://www.studentmoneysaver.co.uk/article/6-revision-tips-which-actually-work/

Page 19: IS6125 Database Analysis and Design Lecture 11: Normalization of Data Tables Rob Gleasure R.Gleasure@ucc.ie

Essay-Style Questions

Topics covered The cloud and datafication

What is data and how does something become ‘datafied’? How and why did cloud technologies evolve? What does it mean in terms of technological and business

capabilities? What is the Internet of Things? What does the future hold? Can you use contrasting examples of different businesses to

discuss each of these headings?

Page 20: IS6125 Database Analysis and Design Lecture 11: Normalization of Data Tables Rob Gleasure R.Gleasure@ucc.ie

Essay-Style Questions

Topics covered Big data

When is data ‘big data’? How and why did we get from ‘small data’ to ‘big data’? What does big data let businesses do that they couldn’t do

previously? What businesses are a good example off this? What are the issues and challenges arising from big data? Can you use contrasting examples of different businesses to

discuss each of these headings?

Page 21: IS6125 Database Analysis and Design Lecture 11: Normalization of Data Tables Rob Gleasure R.Gleasure@ucc.ie

Essay-Style Questions

Topics covered Business intelligence

What kind of intelligence is enabled as we increase our measurement and analysis capabilities?

How do we get from an individual case to a large-scale pattern, and back again?

What are the challenges of translating intelligence from an individual case to large-scale patterns, and back again?

What businesses exemplify the ability to generate intelligence from the increased capacity for data handling, and why?

Can you use contrasting examples of different businesses to discuss each of these headings?

Page 22: IS6125 Database Analysis and Design Lecture 11: Normalization of Data Tables Rob Gleasure R.Gleasure@ucc.ie

Modelling Questions

Topics covered Modelling question will expect

1. A model

2. Constraints

3. Assumptions

You may also be asked to discuss issues, such as Differences between stages of ER modelling The reasons for a staggered approach to data modelling Commonly encountered issues

Page 23: IS6125 Database Analysis and Design Lecture 11: Normalization of Data Tables Rob Gleasure R.Gleasure@ucc.ie

Answering Questions

Exam technique Manage your time Plan your answers Sketch out your diagrams very quickly as roughwork if you’re not

sure how to make them fit together Answer your best questions first Use examples

Have these lined up as part of your revision

Page 24: IS6125 Database Analysis and Design Lecture 11: Normalization of Data Tables Rob Gleasure R.Gleasure@ucc.ie

Readings

Some more descriptions of normal forms http://databases.about.com/od/specificproducts/a/

normalization.htm http://phlonx.com/resources/nf3/ http://www.bkent.net/Doc/simple5.htm