67
Relational Data Analysis II

Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

  • View
    221

  • Download
    4

Embed Size (px)

Citation preview

Page 1: Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Relational Data Analysis II

Page 2: Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Plan

• Introduction

• Structured Methods– Data Flow Modelling– Data Modelling– Relational Data Analysis

• Feasibility

• Maintenance

Page 3: Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Definitions

• A relation corresponds to a table• A tuple is a row in a table• An attribute is a column in a table• A Primary Key is the attribute by which we

uniquely identify each row• The number of rows in a table is called the

cardinality• The number of attributes in a table is called the

degree

Page 4: Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example Relation (Table)

Student ID

Student Name Course Module Code

Module Name

Grade

1000001 Peter Stringfellow

BSc Basket Weaving

W1001 Flower Arranging

A

1001234 Terrence Halfwit

BA Surfing Studies

S2003 Hazardous Fishes

B

1234567 Big John BSc Business B3333 Selling Stuff E

1234567 Big John BSc Business B3334 Buying Stuff A

Student

Page 5: Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example Relation (Table)

• The table can also be described without its data as follows:Student (Student ID, Student Name, Course, Module

Code, Module Name, Grade)

• OrStudent ID

Student Name

Course

Module Code

Module Name

Grade

Page 6: Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Rules

• No two rows in a table are identical– i.e. there are no duplicate tuples/rows

• Every relation has a Primary Key attribute• Each tuple has a primary key value• The sequence of the rows should not be

significant• The sequence of the columns should not be

significant• Each attribute must have a unique name

Page 7: Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Problems with Tables

• Problems with tables can be classified into three groups:– Insert Anomalies – Problems caused when

inserting new information– Update Anomalies – Problems caused when

updating existing data– Delete Anomalies – Problems caused when

deleting data

Page 8: Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

The Solution?

• To remove these anomalies we must re-arrange the data and create new tables

• The process for doing this is called Normalisation

Page 9: Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

First Normal Form

• All data in a table must be dependant on the key

• In order to do this we must remove “repeating groups”

• This is done by analysing the relationship between the primary key and the rest of the data

Page 10: Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 1 - Students

• Student ID• Student Name• Course• Course ID• Module Code• Module Name• Grade

• Attributes are moved if there is more than one for each instance of the primary key

Page 11: Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 1 - Students

• Student ID• Student Name• Course• Course ID• Module Code• Module Name• Grade

• For each Student ID• How many Student

names are there?• 1 or Many?

Page 12: Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 1 - Students

• Student ID• Student Name• Course• Course ID

– Module Code• Module Name• Grade

• For each Student ID• How many Module

Codes are there?

Page 13: Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 1 - Students

• Student ID• Student Name• Course• Course ID

– Module Code– Module Name– Grade

• Indented data is a repeating group

• We need to put it into a new table

• This table will describe the module a student is taking

• We will call it Student Module

Page 14: Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 1 - Students

• Student ID• Student Name• Course• Course ID

• Student ID• Module Code• Module Name• Grade

• We now have two tables

• Student details– Primary Key = Student ID

• Student’s module details– PK = Student ID, Module

Code– Called a compound Key

Page 15: Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Yes… But… No… But…

• There are still Anomalies…

• Update– Cannot change a module name without finding all students on it

• Insert– Cannot add a new module unless we have a student enrolled

• Delete– When a student leaves we could lose course information

• Further Normalisation is therefore required…

Page 16: Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 2 - Library

• Student ID• Name• Faculty• Book ID• Title• Author• Return Date

• Put this data into First Normal Form

Page 17: Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 2 - Library

• Student ID• Name• Faculty

– Book ID– Title– Author– Return Date

• Identify Repeating group

Page 18: Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 2 - Library

• Student ID• Name• Faculty

• Student ID • Book ID• Title• Author• Return Date

• Create a New table

• Remember to keep the original PK in that of the new table

• This maintains the relationship between the two tables

Page 19: Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 3

• Customer ID• Customer Name• Address• Branch No• Branch Manager• Stock ID• Title• Format

• Put this data into First Normal Form

Page 20: Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 3

• Customer ID• Customer Name• Address• Branch No• Branch Manager

– Stock ID– Title– Format

• Identify Repeating group

Page 21: Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 3 – Borrowing Videos

• Customer ID• Customer Name• Address• Branch No• Branch Manager

• Customer ID• Stock ID• Title• Format

• Create New table

Page 22: Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Remember

• 1NF can be considered as Normalised

• But it doesn’t solve all of our problems

Page 23: Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Second Normal Form

• Only Applies to tables with compound keys

• Data in a table must depend on the whole key

• We must remove any partial dependencies

Page 24: Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 1 – Students (1NF)

• Student ID• Student Name• Course• Course ID

• Student ID• Module Code• Module Name• Grade

• This table is already in 1NF as it does not have a compound key

• This table may not be in 2NF

• Need to analyse the relationship between attributes and the key

Page 25: Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 1 – 2NF

• Student ID• Module Code• Module Name• Grade

• Examine the attribute ‘Module Name’

• If we removed the Student ID would we expect the module name to remain in our system?

• Yes or No?

Page 26: Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 1 – 2NF

• Student ID• Module Code• Module Name• Grade

• Examine the attribute ‘Module Name’

• If we removed the Student ID would we expect the module name to remain in our system?

• Yes

Page 27: Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 1 – 2NF

• Student ID• Module Code• Module Name• Grade

• Examine the attribute ‘Module Name’

• This tells us that Module Name IS NOT dependent on StudentID

Page 28: Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 1 – 2NF

• Student ID• Module Code• Module Name• Grade

• Examine the attribute ‘Module Name’

• If we removed the Module Code would we expect the module name to remain in our system?

• Yes or No

Page 29: Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 1 – 2NF

• Student ID• Module Code• Module Name• Grade

• Examine the attribute ‘Module Name’

• If we removed the Module Code would we expect the module name to remain in our system?

• No

Page 30: Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 1 – 2NF

• Student ID• Module Code• Module Name• Grade

• Examine the attribute ‘Module Name’

• This tells us that Module Name IS dependent on Module Code

Page 31: Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 1 – 2NF

• Student ID• Module Code

– Module Name• Grade

• Examine the attribute ‘Module Name’

• Module name is therefore dependant on only PART of the primary key

• This is called a partial dependency and must be removed

Page 32: Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 1 – 2NF

• Student ID• Module Code

– Module Name• Grade

• Examine the attribute ‘Grade’

• Is it dependent on Student ID?

• Is it dependent on Module Code?

Page 33: Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 1 – 2NF

• Student ID• Module Code

– Module Name• Grade

• Examine the attribute ‘Grade’

• Is it dependent on Student ID? Yes

• Is it dependent on Module Code?

Page 34: Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 1 – 2NF

• Student ID• Module Code

– Module Name• Grade

• Examine the attribute ‘Grade’

• Is it dependent on Student ID? Yes

• Is it dependent on Module Code? Yes

Page 35: Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 1 – 2NF

• Student ID• Module Code

– Module Name• Grade

• Examine the attribute ‘Grade’

• There is no partial dependency so it stays in this table

Page 36: Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 1 – 2NF

• Student ID• Module Code

– Module Name• Grade

• Module Name must be removed

Page 37: Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 1 – 2NF

• Student ID• Module Code• Grade

• Module Name

• Module Name must be removed

Page 38: Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 1 – 2NF

• Student ID• Module Code• Grade

• Module Code• Module Name

• Module Name must be removed

• We need to give it a primary key

• This will be the part of the key on which it is dependent

• The data is now in 2NF

Page 39: Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 1 – 2NF

• Student ID• Student Name• Course• Course ID

• Student ID• Module Code• Grade

• Module Code• Module Name

Page 40: Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 2

• Student ID• Name• Faculty

• Student ID • Book ID• Title• Author• Return Date

• Take this to 2NF

Page 41: Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 2 - 2NF

• Student ID• Name• Faculty

• Student ID • Book ID• Return Date

• Book ID• Title• Author

Page 42: Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 3

• Customer ID• Customer Name• Address• Branch No• Branch Manager

• Customer ID• Stock ID• Title• Format

• Take this to 2NF

Page 43: Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 3 - 2NF

• Customer ID• Customer Name• Address• Branch No• Branch Manager

• Customer ID• Stock ID

• Stock ID • Title• Format

Page 44: Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 1 - Anomalies

• There are still problems with Example 1• Insert

– Still cannot add a course unless there are students taking it

• Update– Cannot update course name without finding all

students on the course

• Delete– If we delete a student then we could also lose course

information

Page 45: Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Third Normal Form

• Applies to all tables

• Data in a table must depend on Nothing but the Key

• We must remove any non-key dependencies

Page 46: Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 1 – 3NF

• Student ID• Student Name• Course• Course ID

• Student ID• Module Code• Grade

• Module Code• Module Name

Page 47: Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 1 – 3NF

• Student ID• Student Name• Course• Course ID

• Problems all seem to affect Course Data

• Student ID• Module Code• Grade

• Module Code• Module Name

Page 48: Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 1 – 3NF

• Student ID• Student Name• Course• Course ID

• Examine the attribute “Student Name”

• If we removed the Student ID would we expect the Student Name to remain in our system?

Page 49: Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 1 – 3NF

• Student ID• Student Name• Course• Course ID

• Examine the attribute “Student Name”

• If we removed the Student ID would we expect the Student Name to remain in our system? No

Page 50: Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 1 – 3NF

• Student ID• Student Name• Course• Course ID

• Examine the attribute “Student Name”

• Therefore “Student Name” is dependent on Student ID and is in the correct table

Page 51: Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 1 – 3NF

• Student ID• Student Name• Course• Course ID

• Examine the attribute “Course”, which is the name of a course

• If we removed the Student ID would we expect the Course to remain in our system?

Page 52: Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 1 – 3NF

• Student ID• Student Name• Course• Course ID

• Examine the attribute “Course”, which is the name of a course

• If we removed the Student ID would we expect the Course to remain in our system? Yes

Page 53: Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 1 – 3NF

• Student ID• Student Name• Course• Course ID

• Examine the attribute “Course”

• Therefore “Course” IS NOT dependent on Student ID and must be moved

Page 54: Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 1 – 3NF

• Student ID• Student Name• Course ID

• Course ID • Course

• Examine the attribute “Course”

• The new table needs an appropriate Primary Key

• CourseID is the logical option

Page 55: Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 1 – 3NF

• Student ID• Student Name• Course ID

• Course ID • Course

• Examine the attribute “Course ID”

• If we removed the Student ID would we expect the student’s Course ID to remain in our system?

Page 56: Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 1 – 3NF

• Student ID• Student Name• Course ID

• Course ID • Course

• Examine the attribute “Course ID”

• Course ID is dependent on Student ID so must remain in the existing table

• Acts as a link between student and course

Page 57: Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 1 – 3NF

• Student ID• Student Name• Course ID

• Course ID • Course

• Examine the attribute “Course ID”

• If we removed the Student ID would we expect the student’s Course ID to remain in our system? No

Page 58: Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 1 – 3NF

• Check the remaining tables to ensure they are in 3NF

Page 59: Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 1 – 3NF

• Student ID• Student Name• Course ID

• Course ID • Course

• Student ID• Module Code• Grade

• Module Code• Module Name

Page 60: Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Third Normal Form

• The Data is now in 3NF

• Data in a table must depend on – The Key– The Whole Key– And Nothing but the Key

• All Anomalies have now been removed

Page 61: Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 2 - 2NF

• Student ID• Name• Faculty

• Student ID • Book ID• Return Date

• Book ID• Title• Author

• Take this to 3NF

Page 62: Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 2 - 3NF

• Student ID• Name• Faculty

• Student ID • Book ID• Return Date

• Book ID• Title• Author

• Already in 3NF

Page 63: Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 3 - 2NF

• Customer ID• Customer Name• Address• Branch No• Branch Manager

• Customer ID• Stock ID

• Stock ID • Title• Format

• Take this to 3NF

Page 64: Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 3 - 3NF

• Customer ID• Customer Name• Address• Branch No

• Branch No • Branch Manager

• Customer ID• Stock ID

• Stock ID • Title• Format

Page 65: Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

References

• Whiteley, D. (2004) Introduction to Information Systems, Palgrave, 2004.

• Lejk, M. and D. Deeks (2002) Systems Analysis Techniques, Addison Wesley 2002

• Mason, D. and L. Willcocks (1994), Systems Analysis, Systems Design, Alfred Waller, 1994.

Page 66: Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

References

• Yeates, D. and T. Wakefield (2004) Systems Analysis and Design, FT/Prentice Hall 2004

• Gane, C. and T. Sarson (1979) Structured Systems Analysis, Prentice Hall, 1979

• Eva, M (1994) SSADM Version 4: A users guide, McGraw hill, 1994

Page 67: Relational Data Analysis II. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

References

• DeMarco, T. (1979) Structured Analysis and System Specification, Yourdon, 1979

• Royce, W. (1970) Managing the development of large software systems, In: Proceedings of IEEE WESCON, 1970 pp1-9.

• Connolly, T. and C. Begg (2000) Database Solutions, Addison-Wesley, 2000