50
Relational Data Analysis

Relational Data Analysis. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Embed Size (px)

Citation preview

Page 1: Relational Data Analysis. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Relational Data Analysis

Page 2: Relational Data Analysis. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Plan

• Introduction

• Structured Methods– Data Flow Modelling– Data Modelling– Relational Data Analysis

• Feasibility

• Maintenance

Page 3: Relational Data Analysis. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Relational Data Analysis

• Prepares Business’ data for representation using the relational model

• The relational model is implemented in a number of popular database systems– Access– Oracle– MySQL– DB2

Page 4: Relational Data Analysis. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

The Relational Model

• A relation is a table of data• A relational database is therefore one in which

tables are used to store data– This implies that there are other ways of storing data

• Tables will be related to each other in some way– Because the data held in them is related– The context of the system we are developing governs

which data items are related and how they are related

Page 5: Relational Data Analysis. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Relational Data Analysis

• Relational data analysis therefore involves– Building related tables of data– Retrieval of data from one or more related

tables– Inserting, Updating and Deleting data from

related tables

Page 6: Relational Data Analysis. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Relational Data Analysis

• Relational data analysis is quite formal – Based on set theory– Uses Relational Algebra to define operations

on tables

• We will take a less formal approach

Page 7: Relational Data Analysis. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Definitions

• A relation corresponds to a table• A tuple is a row in a table• An attribute is a column in a table• A Primary Key is the attribute by which we

uniquely identify each row• The number of rows in a table is called the

cardinality• The number of attributes in a table is called the

degree

Page 8: Relational Data Analysis. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example Relation (Table)

Student ID Student Name Course Module Code Module Name Grade

1000001 Peter Stringfellow BSc Basket Weaving W1001 Flower Arranging A

1001234 Terrence Halfwit BA Surfing Studies S2003 Hazardous Fishes B

1234567 Big John BSc Business B3333 Selling Stuff E

1234567 Big John BSc Business B3334 Buying Stuff A

Student

Page 9: Relational Data Analysis. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example Relation (Table)

• The table can also be described without its data as follows:– Student (Student ID, Student Name, Course,

Module Code, Module Name, Grade)

Page 10: Relational Data Analysis. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example Relation (Tuple)

Student ID Student Name Course Module Code Module Name Grade

1000001 Peter Stringfellow BSc Basket Weaving W1001 Flower Arranging A

1001234 Terrence Halfwit BA Surfing Studies S2003 Hazardous Fishes B

1234567 Big John BSc Business B3333 Selling Stuff E

1234567 Big John BSc Business B3334 Buying Stuff A

Student

Page 11: Relational Data Analysis. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example Relation (Attribute)

Student ID Student Name Course Module Code Module Name Grade

1000001 Peter Stringfellow BSc Basket Weaving W1001 Flower Arranging A

1001234 Terrence Halfwit BA Surfing Studies S2003 Hazardous Fishes B

1234567 Big John BSc Business B3333 Selling Stuff E

1234567 Big John BSc Business B3334 Buying Stuff A

Student

Page 12: Relational Data Analysis. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example Relation (Table)

• Our example has a cardinality of 4 and a degree of 6

• The primary key will be student ID as this will uniquely identify each row– We cannot know this without having an

understanding of the data

• If there is no existing primary key then we must invent one

Page 13: Relational Data Analysis. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Exercise

Name Number Town No of contracts Depot

Tom 0050065 Manchester 2 Manchester

Dick 0338178 Leeds 1 Manchester

Harry 1922029 Manchester 3 Stoke

Sue 0002911 Oxford 1 Reading

Frieda 1001001 Cardiff 7 Cardiff

Imran 23455678 Manchester 1 Stoke

Yue 32156545 Manchester 7 London

Page 14: Relational Data Analysis. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Exercise

• What is the cardinality of the table?

• What is the degree of the table?

• Identify the Primary Key of the table?

Page 15: Relational Data Analysis. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Exercise

Name Number Town No of contracts Depot

Tom 0050065 Manchester 2 Manchester

Dick 0338178 Leeds 1 Manchester

Harry 1922029 Manchester 3 Stoke

Sue 0002911 Oxford 1 Reading

Frieda 1001001 Cardiff 7 Cardiff

Imran 23455678 Manchester 1 Stoke

Yue 32156545 Manchester 7 London

Page 16: Relational Data Analysis. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Exercise

• What is the cardinality of the table?– How many rows? 7

• What is the degree of the table?– How many attributes? 5

• Identify the Primary Key of the following table?– Number– But how do we know? Why not Name?

Page 17: Relational Data Analysis. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Tables and Entities

• Each table is equivalent to an entity in an ERD

• Each attribute is equivalent to an attribute in an ERD

• Each tuple is an occurrence of an entity in an ERD

• The primary key is equivalent to the key attribute in an ERD entity

Page 18: Relational Data Analysis. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Rules

• No two rows in a table are identical– i.e. there are no duplicate tuples/rows

• Every relation has a Primary Key attribute• The sequence of the rows should not be

significant• The sequence of the columns should not

be significant• Each attribute must have a unique name

Page 19: Relational Data Analysis. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Problems with Tables

• Problems with tables can be classified into three groups:– Insert Anomalies – Problems caused when

inserting new information– Update Anomalies – Problems caused when

updating existing data– Delete Anomalies – Problems caused when

deleting data

Page 20: Relational Data Analysis. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Problems with Tables

• For the student table in the handout:– The primary key doesn’t uniquely describe

each row

• Insert anomaly– We cannot add new courses unless we have

a student ID• Perhaps we chose the wrong primary key?• Try using a different one to see if it helps

Page 21: Relational Data Analysis. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Problems with Tables

• Update Anomaly– Big John Changes his name– We now have to find all instances of Big John

and change them– This could take some time– We could miss one– What if there is more than one Big John? Can

we be sure we are changing the right one?

Page 22: Relational Data Analysis. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Problems with Tables

• Delete anomaly– Terrence Halfwit decides he no longer wishes

to take Module S2003– If we delete this from Terrence’s row we lose

all information about Module S2003 as no one else is taking it at the moment

Page 23: Relational Data Analysis. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

The Solution?

• To remove these anomalies we must re-arrange the data and create new tables

• The process for doing this is called Normalisation

Page 24: Relational Data Analysis. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Normalisation

• First Three Stages– First Normal Form (1NF)– Second Normal Form (2NF)– Third Normal Form (3NF)

• 1NF can be considered as Normalised– But there will still be problems– All common problems are solved by 3NF– Further Normalisation will solve rarer problems

Page 25: Relational Data Analysis. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

First Normal Form

• All data in a table must be dependant on the key

• In order to do this we must remove “repeating groups”

• This is done by analysing the relationship between the primary key and the rest of the data

Page 26: Relational Data Analysis. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 1 - Students

• Student ID• Student Name• Course• Course ID• Module Code• Module Name• Grade

• Attributes are moved if there is more than one for each instance of the primary key

Page 27: Relational Data Analysis. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 1 - Students

• Student ID• Student Name• Course• Course ID• Module Code• Module Name• Grade

• For each Student ID• How many Student

names are there?• 1 or Many?

Page 28: Relational Data Analysis. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 1 - Students

• Student ID• Student Name• Course• Course ID• Module Code• Module Name• Grade

• For each Student ID• How many Courses

are there?• 1 or Many?

Page 29: Relational Data Analysis. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 1 - Students

• Student ID• Student Name• Course• Course ID• Module Code• Module Name• Grade

• For each Student ID• How many Course

IDs are there?

Page 30: Relational Data Analysis. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 1 - Students

• Student ID• Student Name• Course• Course ID• Module Code• Module Name• Grade

• For each Student ID• How many Module

Codes are there?

Page 31: Relational Data Analysis. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 1 - Students

• Student ID• Student Name• Course• Course ID

– Module Code• Module Name• Grade

• For each Student ID• How many Module

Codes are there?

Page 32: Relational Data Analysis. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 1 - Students

• Student ID• Student Name• Course• Course ID

– Module Code• Module Name• Grade

• For each Student ID• How many Module

Names are there?

Page 33: Relational Data Analysis. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 1 - Students

• Student ID• Student Name• Course• Course ID

– Module Code– Module Name

• Grade

• For each Student ID• How many Module

Names are there?

Page 34: Relational Data Analysis. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 1 - Students

• Student ID• Student Name• Course• Course ID

– Module Code– Module Name

• Grade

• For each Student ID• How many Grades

are there?

Page 35: Relational Data Analysis. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 1 - Students

• Student ID• Student Name• Course• Course ID

– Module Code– Module Name– Grade

• For each Student ID• How many Grades

are there?

Page 36: Relational Data Analysis. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 1 - Students

• Student ID• Student Name• Course• Course ID

– Module Code– Module Name– Grade

• Indented data is a repeating group

• We need to put it into a new table

• This table will describe the module a student is taking

• We will call it Student Module

Page 37: Relational Data Analysis. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 1 - Students

• Student ID• Student Name• Course• Course ID

• Student ID• Module Code• Module Name• Grade

• We now have two tables

• Student details– Primary Key = Student ID

• Student’s module details– PK = Student ID, Module

Code– Called a compound Key

Page 38: Relational Data Analysis. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Does this help?

• Insert– We can now add students who have no modules

• Delete– We can now keep students when they leave modules– We can keep Terrence’s details even if he leaves the

module

• Update– We now only need to change student details once– Big John’s Name could be changed easily without

error

Page 39: Relational Data Analysis. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Yes… But… No… But…

• There are still Anomalies…

• Update– If Creative Accounting name is changed…

• Insert– Cannot add a new module unless we have a student enrolled

• Delete– When a student leaves we could lose module information

• These are dealt with by later Normal Forms

Page 40: Relational Data Analysis. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 2 - Library

• Student ID• Name• Faculty• Book ID• Title• Author• Return Date

• Put this data into First Normal Form

Page 41: Relational Data Analysis. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 2 - Library

• Student ID• Name• Faculty

– Book ID– Title– Author– Return Date

• Identify Repeating group

Page 42: Relational Data Analysis. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 2 - Library

• Student ID• Name• Faculty

• Student ID • Book ID• Title• Author• Return Date

• Create a New table

• Remember to keep the original PK in that of the new table

• This maintains the relationship between the two tables

Page 43: Relational Data Analysis. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 3

• Customer ID• Customer Name• Address• Branch No• Branch Manager• Stock ID• Title• Format

• Put this data into First Normal Form

Page 44: Relational Data Analysis. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 3

• Customer ID• Customer Name• Address• Branch No• Branch Manager

– Stock ID– Title– Format

• Identify Repeating group

Page 45: Relational Data Analysis. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Example 3 – Borrowing Videos

• Customer ID• Customer Name• Address• Branch No• Branch Manager

• Customer ID• Stock ID• Title• Format

• Create New table

Page 46: Relational Data Analysis. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Remember

• 1NF can be considered as Normalised

• But it doesn’t solve all of our problems

• Need to go through second and third Normal Forms in Tutorials and next week

Page 47: Relational Data Analysis. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

Second Normal Form

• Only Applies to tables with compound keys

• Data in a table must depend on the whole key

• We must remove any partial dependencies

Page 48: Relational Data Analysis. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

References

• Whiteley, D. (2004) Introduction to Information Systems, Palgrave, 2004.

• Lejk, M. and D. Deeks (2002) Systems Analysis Techniques, Addison Wesley 2002

• Mason, D. and L. Willcocks (1994), Systems Analysis, Systems Design, Alfred Waller, 1994.

Page 49: Relational Data Analysis. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

References

• Yeates, D. and T. Wakefield (2004) Systems Analysis and Design, FT/Prentice Hall 2004

• Gane, C. and T. Sarson (1979) Structured Systems Analysis, Prentice Hall, 1979

• Eva, M (1994) SSADM Version 4: A users guide, McGraw hill, 1994

Page 50: Relational Data Analysis. Plan Introduction Structured Methods –Data Flow Modelling –Data Modelling –Relational Data Analysis Feasibility Maintenance

References

• DeMarco, T. (1979) Structured Analysis and System Specification, Yourdon, 1979

• Royce, W. (1970) Managing the development of large software systems, In: Proceedings of IEEE WESCON, 1970 pp1-9.

• Connolly, T. and C. Begg (2000) Database Solutions, Addison-Wesley, 2000