Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data...

Preview:

Citation preview

Relational Data Analysis II

Plan

• Introduction

• Structured Methods– Data Flow Modelling– Data Modelling– Relational Data Analysis

• Feasibility

• Maintenance

Definitions

• A relation corresponds to a table• A tuple is a row in a table• An attribute is a column in a table• A Primary Key is the attribute by which we

uniquely identify each row• The number of rows in a table is called the

cardinality• The number of attributes in a table is called the

degree

Example Relation (Table)

Student ID

Student Name Course Module Code

Module Name

Grade

1000001 Peter Stringfellow

BSc Basket Weaving

W1001 Flower Arranging

A

1001234 Terrence Halfwit

BA Surfing Studies

S2003 Hazardous Fishes

B

1234567 Big John BSc Business B3333 Selling Stuff E

1234567 Big John BSc Business B3334 Buying Stuff A

Student

Example Relation (Table)

• The table can also be described without its data as follows:Student (Student ID, Student Name, Course, Module

Code, Module Name, Grade)

• OrStudent ID

Student Name

Course

Module Code

Module Name

Grade

Rules

• No two rows in a table are identical– i.e. there are no duplicate tuples/rows

• Every relation has a Primary Key attribute• Each tuple has a primary key value• The sequence of the rows should not be

significant• The sequence of the columns should not be

significant• Each attribute must have a unique name

Problems with Tables

• Problems with tables can be classified into three groups:– Insert Anomalies – Problems caused when

inserting new information– Update Anomalies – Problems caused when

updating existing data– Delete Anomalies – Problems caused when

deleting data

The Solution?

• To remove these anomalies we must re-arrange the data and create new tables

• The process for doing this is called Normalisation

First Normal Form

• All data in a table must be dependant on the key

• In order to do this we must remove “repeating groups”

• This is done by analysing the relationship between the primary key and the rest of the data

Example 1 - Students

• Student ID• Student Name• Course• Course ID• Module Code• Module Name• Grade

• Attributes are moved if there is more than one for each instance of the primary key

Example 1 - Students

• Student ID• Student Name• Course• Course ID• Module Code• Module Name• Grade

• For each Student ID• How many Student

names are there?• 1 or Many?

Example 1 - Students

• Student ID• Student Name• Course• Course ID

– Module Code• Module Name• Grade

• For each Student ID• How many Module

Codes are there?

Example 1 - Students

• Student ID• Student Name• Course• Course ID

– Module Code– Module Name– Grade

• Indented data is a repeating group

• We need to put it into a new table

• This table will describe the module a student is taking

• We will call it Student Module

Example 1 - Students

• Student ID• Student Name• Course• Course ID

• Student ID• Module Code• Module Name• Grade

• We now have two tables

• Student details– Primary Key = Student ID

• Student’s module details– PK = Student ID, Module

Code– Called a compound Key

Yes… But… No… But…

• There are still Anomalies…

• Update– Cannot change a module name without finding all students on it

• Insert– Cannot add a new module unless we have a student enrolled

• Delete– When a student leaves we could lose course information

• Further Normalisation is therefore required…

Example 2 - Library

• Student ID• Name• Faculty• Book ID• Title• Author• Return Date

• Put this data into First Normal Form

Example 2 - Library

• Student ID• Name• Faculty

– Book ID– Title– Author– Return Date

• Identify Repeating group

Example 2 - Library

• Student ID• Name• Faculty

• Student ID • Book ID• Title• Author• Return Date

• Create a New table

• Remember to keep the original PK in that of the new table

• This maintains the relationship between the two tables

Example 3

• Customer ID• Customer Name• Address• Branch No• Branch Manager• Stock ID• Title• Format

• Put this data into First Normal Form

Example 3

• Customer ID• Customer Name• Address• Branch No• Branch Manager

– Stock ID– Title– Format

• Identify Repeating group

Example 3 – Borrowing Videos

• Customer ID• Customer Name• Address• Branch No• Branch Manager

• Customer ID• Stock ID• Title• Format

• Create New table

Remember

• 1NF can be considered as Normalised

• But it doesn’t solve all of our problems

Second Normal Form

• Only Applies to tables with compound keys

• Data in a table must depend on the whole key

• We must remove any partial dependencies

Example 1 – Students (1NF)

• Student ID• Student Name• Course• Course ID

• Student ID• Module Code• Module Name• Grade

• This table is already in 1NF as it does not have a compound key

• This table may not be in 2NF

• Need to analyse the relationship between attributes and the key

Example 1 – 2NF

• Student ID• Module Code• Module Name• Grade

• Examine the attribute ‘Module Name’

• If we removed the Student ID would we expect the module name to remain in our system?

• Yes or No?

Example 1 – 2NF

• Student ID• Module Code• Module Name• Grade

• Examine the attribute ‘Module Name’

• If we removed the Student ID would we expect the module name to remain in our system?

• Yes

Example 1 – 2NF

• Student ID• Module Code• Module Name• Grade

• Examine the attribute ‘Module Name’

• This tells us that Module Name IS NOT dependent on StudentID

Example 1 – 2NF

• Student ID• Module Code• Module Name• Grade

• Examine the attribute ‘Module Name’

• If we removed the Module Code would we expect the module name to remain in our system?

• Yes or No

Example 1 – 2NF

• Student ID• Module Code• Module Name• Grade

• Examine the attribute ‘Module Name’

• If we removed the Module Code would we expect the module name to remain in our system?

• No

Example 1 – 2NF

• Student ID• Module Code• Module Name• Grade

• Examine the attribute ‘Module Name’

• This tells us that Module Name IS dependent on Module Code

Example 1 – 2NF

• Student ID• Module Code

– Module Name• Grade

• Examine the attribute ‘Module Name’

• Module name is therefore dependant on only PART of the primary key

• This is called a partial dependency and must be removed

Example 1 – 2NF

• Student ID• Module Code

– Module Name• Grade

• Examine the attribute ‘Grade’

• Is it dependent on Student ID?

• Is it dependent on Module Code?

Example 1 – 2NF

• Student ID• Module Code

– Module Name• Grade

• Examine the attribute ‘Grade’

• Is it dependent on Student ID? Yes

• Is it dependent on Module Code?

Example 1 – 2NF

• Student ID• Module Code

– Module Name• Grade

• Examine the attribute ‘Grade’

• Is it dependent on Student ID? Yes

• Is it dependent on Module Code? Yes

Example 1 – 2NF

• Student ID• Module Code

– Module Name• Grade

• Examine the attribute ‘Grade’

• There is no partial dependency so it stays in this table

Example 1 – 2NF

• Student ID• Module Code

– Module Name• Grade

• Module Name must be removed

Example 1 – 2NF

• Student ID• Module Code• Grade

• Module Name

• Module Name must be removed

Example 1 – 2NF

• Student ID• Module Code• Grade

• Module Code• Module Name

• Module Name must be removed

• We need to give it a primary key

• This will be the part of the key on which it is dependent

• The data is now in 2NF

Example 1 – 2NF

• Student ID• Student Name• Course• Course ID

• Student ID• Module Code• Grade

• Module Code• Module Name

Example 2

• Student ID• Name• Faculty

• Student ID • Book ID• Title• Author• Return Date

• Take this to 2NF

Example 2 - 2NF

• Student ID• Name• Faculty

• Student ID • Book ID• Return Date

• Book ID• Title• Author

Example 3

• Customer ID• Customer Name• Address• Branch No• Branch Manager

• Customer ID• Stock ID• Title• Format

• Take this to 2NF

Example 3 - 2NF

• Customer ID• Customer Name• Address• Branch No• Branch Manager

• Customer ID• Stock ID

• Stock ID • Title• Format

Example 1 - Anomalies

• There are still problems with Example 1• Insert

– Still cannot add a course unless there are students taking it

• Update– Cannot update course name without finding all

students on the course

• Delete– If we delete a student then we could also lose course

information

Third Normal Form

• Applies to all tables

• Data in a table must depend on Nothing but the Key

• We must remove any non-key dependencies

Example 1 – 3NF

• Student ID• Student Name• Course• Course ID

• Student ID• Module Code• Grade

• Module Code• Module Name

Example 1 – 3NF

• Student ID• Student Name• Course• Course ID

• Problems all seem to affect Course Data

• Student ID• Module Code• Grade

• Module Code• Module Name

Example 1 – 3NF

• Student ID• Student Name• Course• Course ID

• Examine the attribute “Student Name”

• If we removed the Student ID would we expect the Student Name to remain in our system?

Example 1 – 3NF

• Student ID• Student Name• Course• Course ID

• Examine the attribute “Student Name”

• If we removed the Student ID would we expect the Student Name to remain in our system? No

Example 1 – 3NF

• Student ID• Student Name• Course• Course ID

• Examine the attribute “Student Name”

• Therefore “Student Name” is dependent on Student ID and is in the correct table

Example 1 – 3NF

• Student ID• Student Name• Course• Course ID

• Examine the attribute “Course”, which is the name of a course

• If we removed the Student ID would we expect the Course to remain in our system?

Example 1 – 3NF

• Student ID• Student Name• Course• Course ID

• Examine the attribute “Course”, which is the name of a course

• If we removed the Student ID would we expect the Course to remain in our system? Yes

Example 1 – 3NF

• Student ID• Student Name• Course• Course ID

• Examine the attribute “Course”

• Therefore “Course” IS NOT dependent on Student ID and must be moved

Example 1 – 3NF

• Student ID• Student Name• Course ID

• Course ID • Course

• Examine the attribute “Course”

• The new table needs an appropriate Primary Key

• CourseID is the logical option

Example 1 – 3NF

• Student ID• Student Name• Course ID

• Course ID • Course

• Examine the attribute “Course ID”

• If we removed the Student ID would we expect the student’s Course ID to remain in our system?

Example 1 – 3NF

• Student ID• Student Name• Course ID

• Course ID • Course

• Examine the attribute “Course ID”

• Course ID is dependent on Student ID so must remain in the existing table

• Acts as a link between student and course

Example 1 – 3NF

• Student ID• Student Name• Course ID

• Course ID • Course

• Examine the attribute “Course ID”

• If we removed the Student ID would we expect the student’s Course ID to remain in our system? No

Example 1 – 3NF

• Check the remaining tables to ensure they are in 3NF

Example 1 – 3NF

• Student ID• Student Name• Course ID

• Course ID • Course

• Student ID• Module Code• Grade

• Module Code• Module Name

Third Normal Form

• The Data is now in 3NF

• Data in a table must depend on – The Key– The Whole Key– And Nothing but the Key

• All Anomalies have now been removed

Example 2 - 2NF

• Student ID• Name• Faculty

• Student ID • Book ID• Return Date

• Book ID• Title• Author

• Take this to 3NF

Example 2 - 3NF

• Student ID• Name• Faculty

• Student ID • Book ID• Return Date

• Book ID• Title• Author

• Already in 3NF

Example 3 - 2NF

• Customer ID• Customer Name• Address• Branch No• Branch Manager

• Customer ID• Stock ID

• Stock ID • Title• Format

• Take this to 3NF

Example 3 - 3NF

• Customer ID• Customer Name• Address• Branch No

• Branch No • Branch Manager

• Customer ID• Stock ID

• Stock ID • Title• Format

References

• Whiteley, D. (2004) Introduction to Information Systems, Palgrave, 2004.

• Lejk, M. and D. Deeks (2002) Systems Analysis Techniques, Addison Wesley 2002

• Mason, D. and L. Willcocks (1994), Systems Analysis, Systems Design, Alfred Waller, 1994.

References

• Yeates, D. and T. Wakefield (2004) Systems Analysis and Design, FT/Prentice Hall 2004

• Gane, C. and T. Sarson (1979) Structured Systems Analysis, Prentice Hall, 1979

• Eva, M (1994) SSADM Version 4: A users guide, McGraw hill, 1994

References

• DeMarco, T. (1979) Structured Analysis and System Specification, Yourdon, 1979

• Royce, W. (1970) Managing the development of large software systems, In: Proceedings of IEEE WESCON, 1970 pp1-9.

• Connolly, T. and C. Begg (2000) Database Solutions, Addison-Wesley, 2000

Recommended