52
CSCI N207: CSCI N207: Data Analysis Using Spreadsheets Data Analysis Using Spreadsheets Copyright Copyright ©2005 ©2005 Department of Computer & Information Science Department of Computer & Information Science Introducing Databases Introducing Databases

CSCI N207: Data Analysis Using Spreadsheets Copyright ©2005 Department of Computer & Information Science Introducing Databases

Embed Size (px)

Citation preview

Page 1: CSCI N207: Data Analysis Using Spreadsheets Copyright ©2005  Department of Computer & Information Science Introducing Databases

CSCI N207:CSCI N207: Data Analysis Using SpreadsheetsData Analysis Using Spreadsheets

Copyright Copyright ©2005 ©2005 Department of Computer & Information ScienceDepartment of Computer & Information Science

Introducing DatabasesIntroducing Databases

Page 2: CSCI N207: Data Analysis Using Spreadsheets Copyright ©2005  Department of Computer & Information Science Introducing Databases

CSCI N207: Data Analysis Using SpreadsheetsCSCI N207: Data Analysis Using SpreadsheetsCopyright Copyright ©2004 ©2004 Department of Computer & Information ScienceDepartment of Computer & Information Science

GoalsGoals

By the end of this unit, you should understand By the end of this unit, you should understand ……

• … … what a database is.what a database is.• … … what components comprise a database.what components comprise a database.• … … what a Database Management System is.what a Database Management System is.• … … the difference among the different types the difference among the different types

of database structures.of database structures.• … … generally, how database administrators generally, how database administrators

construct databases.construct databases.

Page 3: CSCI N207: Data Analysis Using Spreadsheets Copyright ©2005  Department of Computer & Information Science Introducing Databases

CSCI N207: Data Analysis Using SpreadsheetsCSCI N207: Data Analysis Using SpreadsheetsCopyright Copyright ©2004 ©2004 Department of Computer & Information ScienceDepartment of Computer & Information Science

So, what is a database?So, what is a database?

– Grocery ListGrocery List– Audio CD CatalogAudio CD Catalog– Phone BookPhone Book– Airline Ticketing Airline Ticketing

SoftwareSoftware– Tax Preparation Tax Preparation

SoftwareSoftware

– OncourseOncourse– GoogleGoogle– MapQuestMapQuest– AmazonAmazon– eBayeBay

• In a general sense, a In a general sense, a databasedatabase is any is any organized collection of data.organized collection of data.

• Examples:Examples:

Page 4: CSCI N207: Data Analysis Using Spreadsheets Copyright ©2005  Department of Computer & Information Science Introducing Databases

CSCI N207: Data Analysis Using SpreadsheetsCSCI N207: Data Analysis Using SpreadsheetsCopyright Copyright ©2004 ©2004 Department of Computer & Information ScienceDepartment of Computer & Information Science

Databases in the Digital Databases in the Digital WorldWorld

• When we think of applications we commonly When we think of applications we commonly use, we often think of word processors as use, we often think of word processors as tools for solving projects that require us to tools for solving projects that require us to write; we think of spreadsheets as tools to write; we think of spreadsheets as tools to help us solve problems dealing with numbers help us solve problems dealing with numbers (statistics, averages, etc.)(statistics, averages, etc.)

• Whereas spreadsheets are good at answering Whereas spreadsheets are good at answering questions involving numbers ("What is the questions involving numbers ("What is the average … ?"), databases are good at solving average … ?"), databases are good at solving other types of questions ("Are there any other types of questions ("Are there any compact discs available by … ?"). compact discs available by … ?").

Page 5: CSCI N207: Data Analysis Using Spreadsheets Copyright ©2005  Department of Computer & Information Science Introducing Databases

CSCI N207: Data Analysis Using SpreadsheetsCSCI N207: Data Analysis Using SpreadsheetsCopyright Copyright ©2004 ©2004 Department of Computer & Information ScienceDepartment of Computer & Information Science

Databases in the Digital World Databases in the Digital World (continued)(continued)

• Word processors Word processors process textprocess text..• Spreadsheets Spreadsheets process number dataprocess number data..• Databases Databases process dataprocess data..

(from geekgirl's plain-english computing)(from geekgirl's plain-english computing)

Page 6: CSCI N207: Data Analysis Using Spreadsheets Copyright ©2005  Department of Computer & Information Science Introducing Databases

CSCI N207: Data Analysis Using SpreadsheetsCSCI N207: Data Analysis Using SpreadsheetsCopyright Copyright ©2004 ©2004 Department of Computer & Information ScienceDepartment of Computer & Information Science

Data vs. InformationData vs. Information

• For the user of a database, the end goal is For the user of a database, the end goal is to view to view meaningful informationmeaningful information..

• Raw data, the values we store in a Raw data, the values we store in a database, by themselves are essentially database, by themselves are essentially useless. For instance, do we know what useless. For instance, do we know what the value the value 8521585215 means? Is it a zip code? means? Is it a zip code? Is it a student ID number? Is it a code for a Is it a student ID number? Is it a code for a billing application? We don't know … billing application? We don't know … ((HernandezHernandez))

Page 7: CSCI N207: Data Analysis Using Spreadsheets Copyright ©2005  Department of Computer & Information Science Introducing Databases

CSCI N207: Data Analysis Using SpreadsheetsCSCI N207: Data Analysis Using SpreadsheetsCopyright Copyright ©2004 ©2004 Department of Computer & Information ScienceDepartment of Computer & Information Science

Data ProcessingData Processing

• When we When we process dataprocess data, we connect sets of , we connect sets of data to make meaningful information.data to make meaningful information.

• For instance, if we connect the value For instance, if we connect the value 8521585215 to the value to the value "Tax Preparation – 1040 "Tax Preparation – 1040 (Schedule C)"(Schedule C)", we're probably able to , we're probably able to discern that the value discern that the value 8521585215 is a code that is a code that represents some type of billable service – represents some type of billable service – tax preparation, in this case (tax preparation, in this case (HernandezHernandez).).

• The end result of The end result of data processing data processing is is meaningful informationmeaningful information..

• Data is stored; information is retrieved.Data is stored; information is retrieved.

Page 8: CSCI N207: Data Analysis Using Spreadsheets Copyright ©2005  Department of Computer & Information Science Introducing Databases

CSCI N207: Data Analysis Using SpreadsheetsCSCI N207: Data Analysis Using SpreadsheetsCopyright Copyright ©2004 ©2004 Department of Computer & Information ScienceDepartment of Computer & Information Science

Types of Modern DatabasesTypes of Modern Databases

Operational Operational DatabasesDatabases

• Used for online Used for online transaction transaction processing (OLTP)processing (OLTP)

• Dynamic in nature Dynamic in nature ("just in time" ("just in time" information)information)

• Used heavily by Used heavily by commercial entitiescommercial entities

Analytical DatabasesAnalytical Databases• Used for online Used for online

analytical analytical processing (OLAP)processing (OLAP)

• Static in natureStatic in nature• Often, use OLTPs to Often, use OLTPs to

populate datapopulate data• Used heavily by Used heavily by

research entitiesresearch entities-from Herenandez

Page 9: CSCI N207: Data Analysis Using Spreadsheets Copyright ©2005  Department of Computer & Information Science Introducing Databases

CSCI N207: Data Analysis Using SpreadsheetsCSCI N207: Data Analysis Using SpreadsheetsCopyright Copyright ©2004 ©2004 Department of Computer & Information ScienceDepartment of Computer & Information Science

Historical Database ModelsHistorical Database Models

• A A database modeldatabase model speaks to speaks to how how we we create a database. create a database.

• Throughout the years, people have used Throughout the years, people have used these models for creating databases:these models for creating databases:– The Hierarchical ModelThe Hierarchical Model– The Network ModelThe Network Model– The Relational Model (most commonly used today)The Relational Model (most commonly used today)– The Object-Oriented Model (the future?)The Object-Oriented Model (the future?)

Page 10: CSCI N207: Data Analysis Using Spreadsheets Copyright ©2005  Department of Computer & Information Science Introducing Databases

CSCI N207: Data Analysis Using SpreadsheetsCSCI N207: Data Analysis Using SpreadsheetsCopyright Copyright ©2004 ©2004 Department of Computer & Information ScienceDepartment of Computer & Information Science

The Hierarchical ModelThe Hierarchical Model

• The The hierarchical modelhierarchical model connects tables of connects tables of data via parent/child relationships. In such data via parent/child relationships. In such relations, a parent table can have 1 or more relations, a parent table can have 1 or more children, but a child table must have 1 and children, but a child table must have 1 and only 1 parent.only 1 parent.

• Tables connect using the physical Tables connect using the physical arrangement of records.arrangement of records.

• The hierarchical model requires that a user The hierarchical model requires that a user know the structure of the database. Access know the structure of the database. Access always starts at the always starts at the root tableroot table..

Page 11: CSCI N207: Data Analysis Using Spreadsheets Copyright ©2005  Department of Computer & Information Science Introducing Databases

CSCI N207: Data Analysis Using SpreadsheetsCSCI N207: Data Analysis Using SpreadsheetsCopyright Copyright ©2004 ©2004 Department of Computer & Information ScienceDepartment of Computer & Information Science

Hierarchical Model ExampleHierarchical Model Example

AgentsAgents

EntertainersEntertainers ClientsClients

ScheduleSchedule EngagementsEngagements PaymentsPayments

- Figure 1.1 from Herenandez

Page 12: CSCI N207: Data Analysis Using Spreadsheets Copyright ©2005  Department of Computer & Information Science Introducing Databases

CSCI N207: Data Analysis Using SpreadsheetsCSCI N207: Data Analysis Using SpreadsheetsCopyright Copyright ©2004 ©2004 Department of Computer & Information ScienceDepartment of Computer & Information Science

Network Database ModelNetwork Database Model

• Introduces Introduces nodes nodes and and sets structuressets structures. Nodes . Nodes are collections of records and set structures are collections of records and set structures are the relationships in the database.are the relationships in the database.

• The relationship between nodes has 1 node as The relationship between nodes has 1 node as the owner node, with 1 or more member the owner node, with 1 or more member nodes. A record in a member node can only be nodes. A record in a member node can only be related to only 1 record in an owner node. related to only 1 record in an owner node. Records in a member node cannot exist Records in a member node cannot exist without being related to a record in an owner without being related to a record in an owner node.node.

Page 13: CSCI N207: Data Analysis Using Spreadsheets Copyright ©2005  Department of Computer & Information Science Introducing Databases

CSCI N207: Data Analysis Using SpreadsheetsCSCI N207: Data Analysis Using SpreadsheetsCopyright Copyright ©2004 ©2004 Department of Computer & Information ScienceDepartment of Computer & Information Science

Network Model ExampleNetwork Model Example

- Figure 1.3 from Herenandez

AgentsAgents

ClientsClients EntertainersEntertainers

PaymentsPayments EngagementsEngagements Musical StylesMusical Styles

Represent Manage

Make Schedule Perform Play

Page 14: CSCI N207: Data Analysis Using Spreadsheets Copyright ©2005  Department of Computer & Information Science Introducing Databases

CSCI N207: Data Analysis Using SpreadsheetsCSCI N207: Data Analysis Using SpreadsheetsCopyright Copyright ©2004 ©2004 Department of Computer & Information ScienceDepartment of Computer & Information Science

Relational ModelRelational Model

• Derived from two branches of mathematics – Derived from two branches of mathematics – set theory & first-order predicate logic.set theory & first-order predicate logic.

• Stores data in relations (tables). Each table is Stores data in relations (tables). Each table is composed of tuples (records) and attributes composed of tuples (records) and attributes (fields). (fields).

• Two features of this model allow us to access Two features of this model allow us to access data without knowing database structure:data without knowing database structure:– The physical structure of the records and fields in a table The physical structure of the records and fields in a table

doesn’t matter.doesn’t matter.– We identify each individual record in a table by a unique We identify each individual record in a table by a unique

value.value.

Page 15: CSCI N207: Data Analysis Using Spreadsheets Copyright ©2005  Department of Computer & Information Science Introducing Databases

CSCI N207: Data Analysis Using SpreadsheetsCSCI N207: Data Analysis Using SpreadsheetsCopyright Copyright ©2004 ©2004 Department of Computer & Information ScienceDepartment of Computer & Information Science

Table RelationshipsTable Relationships

• We categorize table relationships in We categorize table relationships in the Relational Model as follows:the Relational Model as follows:– One-to-One (1:1)One-to-One (1:1)– One-to-Many (1:N)One-to-Many (1:N)– Many-to-Many (N:N)Many-to-Many (N:N)

• To establish a relationship between To establish a relationship between tables, we need to match values of a tables, we need to match values of a shared field.shared field.

Page 16: CSCI N207: Data Analysis Using Spreadsheets Copyright ©2005  Department of Computer & Information Science Introducing Databases

CSCI N207: Data Analysis Using SpreadsheetsCSCI N207: Data Analysis Using SpreadsheetsCopyright Copyright ©2004 ©2004 Department of Computer & Information ScienceDepartment of Computer & Information Science

Relationship ExampleRelationship Example

Agent Agent IDID

Agent First NameAgent First Name Agent Last Agent Last NameName

Hire Hire DateDate

100100 MikeMike HernandezHernandez 05/16/9505/16/95

101101 GregGreg PiercyPiercy 10/15/9510/15/95

102102 KatherineKatherine EhrlichEhrlich 03/01/9603/01/96

Client Client IDID

Agent IDAgent ID Client First NameClient First Name Client Last Client Last NameName

90019001 100100 StewartStewart JamesonJameson

90029002 100100 ShannonShannon McLainMcLain

90039003 102102 EstellaEstella PundtPundt- Figure 1.5 from Herenandez

Page 17: CSCI N207: Data Analysis Using Spreadsheets Copyright ©2005  Department of Computer & Information Science Introducing Databases

CSCI N207: Data Analysis Using SpreadsheetsCSCI N207: Data Analysis Using SpreadsheetsCopyright Copyright ©2004 ©2004 Department of Computer & Information ScienceDepartment of Computer & Information Science

Advantages of Relational Advantages of Relational DatabasesDatabases

• Layers of data integrityLayers of data integrity– Table level data integrity: ensures records aren’t Table level data integrity: ensures records aren’t

duplicated and key values are presentduplicated and key values are present– Relationship level data integrity: ensures that the Relationship level data integrity: ensures that the

relationship between two tables is validrelationship between two tables is valid– Business level: ensures that data is accurate in terms of Business level: ensures that data is accurate in terms of

business rulesbusiness rules

• Data consistency & accuracy – result of Data consistency & accuracy – result of built-in data integrity.built-in data integrity.

• Independence from physical structureIndependence from physical structure• Easy data retrieval Easy data retrieval

Page 18: CSCI N207: Data Analysis Using Spreadsheets Copyright ©2005  Department of Computer & Information Science Introducing Databases

CSCI N207: Data Analysis Using SpreadsheetsCSCI N207: Data Analysis Using SpreadsheetsCopyright Copyright ©2004 ©2004 Department of Computer & Information ScienceDepartment of Computer & Information Science

Database Management Database Management SoftwareSoftware

• Relational database management systems Relational database management systems (RDBMS) are applications used to “create, (RDBMS) are applications used to “create, maintain, modify and manipulate” a maintain, modify and manipulate” a database.database.

• Typically, RDBMSs include:Typically, RDBMSs include:– Tools to build tables and establish table relationshipsTools to build tables and establish table relationships– Tools for creating forms for user input/output.Tools for creating forms for user input/output.– Tools for querying a database (asking the database a Tools for querying a database (asking the database a

question)question)– Tools for creating reports for output. Tools for creating reports for output.

Page 19: CSCI N207: Data Analysis Using Spreadsheets Copyright ©2005  Department of Computer & Information Science Introducing Databases

CSCI N207: Data Analysis Using SpreadsheetsCSCI N207: Data Analysis Using SpreadsheetsCopyright Copyright ©2004 ©2004 Department of Computer & Information ScienceDepartment of Computer & Information Science

Phases of Database DesignPhases of Database Design

1.1. Requirements AnalysisRequirements Analysis – Understanding the – Understanding the information needs of a business client information needs of a business client through interviews to understand their through interviews to understand their current (and future) business environment.current (and future) business environment.

2.2. Data ModelingData Modeling – Modeling the database – Modeling the database structure using one of the established structure using one of the established data-modeling methods, like entity-data-modeling methods, like entity-relationship diagrams; end goal is to relationship diagrams; end goal is to visually represent the database structure.visually represent the database structure.

Page 20: CSCI N207: Data Analysis Using Spreadsheets Copyright ©2005  Department of Computer & Information Science Introducing Databases

CSCI N207: Data Analysis Using SpreadsheetsCSCI N207: Data Analysis Using SpreadsheetsCopyright Copyright ©2004 ©2004 Department of Computer & Information ScienceDepartment of Computer & Information Science

Phases of Database Design Phases of Database Design (cont.)(cont.)

3.3. Data NormalizationData Normalization – Breaking large – Breaking large tables into smaller ones to tables into smaller ones to eliminate redundant data and avoid eliminate redundant data and avoid problems when manipulating data.problems when manipulating data.

Page 21: CSCI N207: Data Analysis Using Spreadsheets Copyright ©2005  Department of Computer & Information Science Introducing Databases

CSCI N207: Data Analysis Using SpreadsheetsCSCI N207: Data Analysis Using SpreadsheetsCopyright Copyright ©2004 ©2004 Department of Computer & Information ScienceDepartment of Computer & Information Science

Database TablesDatabase Tables

• A database stores data in relations, A database stores data in relations, perceived by the user as tables.perceived by the user as tables.– Comprised of tuples (records) and attributes (fields)Comprised of tuples (records) and attributes (fields)– Chief structures in a databaseChief structures in a database– Logical and physical order of fields and records Logical and physical order of fields and records

doesn’t matterdoesn’t matter– Every table must contain a Every table must contain a Primary Key FieldPrimary Key Field, which , which

uniquely identifies each of the table’s records.uniquely identifies each of the table’s records.– Tables can represent objects or events.Tables can represent objects or events.

Page 22: CSCI N207: Data Analysis Using Spreadsheets Copyright ©2005  Department of Computer & Information Science Introducing Databases

CSCI N207: Data Analysis Using SpreadsheetsCSCI N207: Data Analysis Using SpreadsheetsCopyright Copyright ©2004 ©2004 Department of Computer & Information ScienceDepartment of Computer & Information Science

Types of TablesTypes of Tables

• Data TableData Table – Most common type of table in a relational Most common type of table in a relational

databasedatabase– Store data that supplies informationStore data that supplies information– Dynamic in natureDynamic in nature

• Validation Table (Lookup Table)Validation Table (Lookup Table)– Stores data used when enforcing data integrityStores data used when enforcing data integrity– Usually static in natureUsually static in nature– Examples: job codes, city names, billing Examples: job codes, city names, billing

categories, etc.categories, etc.

Page 23: CSCI N207: Data Analysis Using Spreadsheets Copyright ©2005  Department of Computer & Information Science Introducing Databases

CSCI N207: Data Analysis Using SpreadsheetsCSCI N207: Data Analysis Using SpreadsheetsCopyright Copyright ©2004 ©2004 Department of Computer & Information ScienceDepartment of Computer & Information Science

FieldsFields

• A A fieldfield, or , or attributeattribute, is the smallest , is the smallest structure in a database.structure in a database.

• Represents a characteristic of the subject Represents a characteristic of the subject of the table to which it belongs.of the table to which it belongs.

• The quality of information retrieved from The quality of information retrieved from the database depends heavily on the time the database depends heavily on the time invested in ensuring the structural and data invested in ensuring the structural and data integrity of fields (more on that later …).integrity of fields (more on that later …).

• A field should contain 1 and only 1 distinct A field should contain 1 and only 1 distinct value (FirstName or LastName versus value (FirstName or LastName versus FullName, for instance.)FullName, for instance.)

Page 24: CSCI N207: Data Analysis Using Spreadsheets Copyright ©2005  Department of Computer & Information Science Introducing Databases

CSCI N207: Data Analysis Using SpreadsheetsCSCI N207: Data Analysis Using SpreadsheetsCopyright Copyright ©2004 ©2004 Department of Computer & Information ScienceDepartment of Computer & Information Science

RecordsRecords

• A A recordrecord, or , or tupletuple, is a specific , is a specific instance of the subject of a table. A instance of the subject of a table. A record is made up of all fields in a record is made up of all fields in a table. Some fields may have empty table. Some fields may have empty values.values.

• The primary key field stores a value The primary key field stores a value that uniquely identifies the record that uniquely identifies the record throughout the database.throughout the database.

Page 25: CSCI N207: Data Analysis Using Spreadsheets Copyright ©2005  Department of Computer & Information Science Introducing Databases

CSCI N207: Data Analysis Using SpreadsheetsCSCI N207: Data Analysis Using SpreadsheetsCopyright Copyright ©2004 ©2004 Department of Computer & Information ScienceDepartment of Computer & Information Science

Record & Field ExampleRecord & Field Example

Student IDStudent ID Student First Student First NameName

Student Last Student Last NameName

Student Major Student Major 11

40853 William Harden Political Science

98364 Maria Garcia-Grande Nursing

15792 Michael Bobersky Psychology

FieldsFields

RecordsRecords

Table Name is Table Name is StudentsStudents

Page 26: CSCI N207: Data Analysis Using Spreadsheets Copyright ©2005  Department of Computer & Information Science Introducing Databases

CSCI N207: Data Analysis Using SpreadsheetsCSCI N207: Data Analysis Using SpreadsheetsCopyright Copyright ©2004 ©2004 Department of Computer & Information ScienceDepartment of Computer & Information Science

ViewsViews

• A A viewview, or a , or a virtual table virtual table or or saved querysaved query, is , is made up of fields from other tables in the made up of fields from other tables in the database. The contributing tables are database. The contributing tables are called called base tablesbase tables..

• Since data is stored in other tables, Since data is stored in other tables, databases do not store data associated databases do not store data associated with views (thus eliminating redundancy). with views (thus eliminating redundancy). Databases only store the structure of the Databases only store the structure of the view. view.

Page 27: CSCI N207: Data Analysis Using Spreadsheets Copyright ©2005  Department of Computer & Information Science Introducing Databases

CSCI N207: Data Analysis Using SpreadsheetsCSCI N207: Data Analysis Using SpreadsheetsCopyright Copyright ©2004 ©2004 Department of Computer & Information ScienceDepartment of Computer & Information Science

Advantages of ViewsAdvantages of Views

• You can work with data from You can work with data from multiple base tables multiple base tables simultaneously.simultaneously.

• Security – views prevent restricted Security – views prevent restricted users from manipulating data users from manipulating data stored in base tables.stored in base tables.

• Views are useful for implementing Views are useful for implementing data integrity (a data integrity (a validation viewvalidation view).).

Page 28: CSCI N207: Data Analysis Using Spreadsheets Copyright ©2005  Department of Computer & Information Science Introducing Databases

CSCI N207: Data Analysis Using SpreadsheetsCSCI N207: Data Analysis Using SpreadsheetsCopyright Copyright ©2004 ©2004 Department of Computer & Information ScienceDepartment of Computer & Information Science

Primary KeysPrimary Keys

• A A primary keyprimary key is a field or group of fields is a field or group of fields that uniquely identifies a record. A primary that uniquely identifies a record. A primary key comprised of two or more fields is key comprised of two or more fields is called a called a composite primary keycomposite primary key. . Every Every table must have a primary keytable must have a primary key!!

• The The most important most important key in a table:key in a table:– Uniquely identifies a specific record throughout a Uniquely identifies a specific record throughout a

databasedatabase– Identifies a specific table throughout the databaseIdentifies a specific table throughout the database– Enforces table-level integrityEnforces table-level integrity– Helps to establish relationships between tablesHelps to establish relationships between tables

Page 29: CSCI N207: Data Analysis Using Spreadsheets Copyright ©2005  Department of Computer & Information Science Introducing Databases

CSCI N207: Data Analysis Using SpreadsheetsCSCI N207: Data Analysis Using SpreadsheetsCopyright Copyright ©2004 ©2004 Department of Computer & Information ScienceDepartment of Computer & Information Science

Foreign KeysForeign Keys

• A A foreign keyforeign key is important when we establish is important when we establish relationships between tables.relationships between tables.

• To create a foreign key, you would take a To create a foreign key, you would take a primary key from one table and copy it in a primary key from one table and copy it in a second table. In the second table, the key second table. In the second table, the key becomes a foreign key.becomes a foreign key.

• Foreign keys enforce relationship-level Foreign keys enforce relationship-level integrity – values in one table's foreign key integrity – values in one table's foreign key field field must match exactlymust match exactly with the with the corresponding values of a second table's corresponding values of a second table's primary key field.primary key field.

Page 30: CSCI N207: Data Analysis Using Spreadsheets Copyright ©2005  Department of Computer & Information Science Introducing Databases

CSCI N207: Data Analysis Using SpreadsheetsCSCI N207: Data Analysis Using SpreadsheetsCopyright Copyright ©2004 ©2004 Department of Computer & Information ScienceDepartment of Computer & Information Science

Example of Primary & Foreign Example of Primary & Foreign KeysKeys

Agent Agent IDID

Agent First NameAgent First Name Agent Last Agent Last NameName

Hire Hire DateDate

100100 MikeMike HernandezHernandez 05/16/9505/16/95

101101 GregGreg PiercyPiercy 10/15/9510/15/95

102102 KatherineKatherine EhrlichEhrlich 03/01/9603/01/96Client Client

IDIDAgent IDAgent ID Client First NameClient First Name Client Last Client Last

NameName

90019001 100100 StewartStewart JamesonJameson

90029002 100100 ShannonShannon McLainMcLain

90039003 102102 EstellaEstella PundtPundt

- Adapted from Figure 3.11 from Herenandez

AgentsAgentsTableTable

ClientsClientsTableTable

Agent IDAgent ID is the is the Primary KeyPrimary Key in the Agents Table in the Agents Tableand a and a Foreign KeyForeign Key in the Clients Table. in the Clients Table.

Page 31: CSCI N207: Data Analysis Using Spreadsheets Copyright ©2005  Department of Computer & Information Science Introducing Databases

CSCI N207: Data Analysis Using SpreadsheetsCSCI N207: Data Analysis Using SpreadsheetsCopyright Copyright ©2004 ©2004 Department of Computer & Information ScienceDepartment of Computer & Information Science

RelationshipsRelationships

• We can build a We can build a relationshiprelationship between between tables if we can relate the records in tables if we can relate the records in one table with the records in the one table with the records in the joining table.joining table.

• Two methods for building a Two methods for building a relationship:relationship:– Linking primary and foreign keysLinking primary and foreign keys– Linking tables via a third table called a Linking tables via a third table called a linking linking

tabletable or or associative tableassociative table

Page 32: CSCI N207: Data Analysis Using Spreadsheets Copyright ©2005  Department of Computer & Information Science Introducing Databases

CSCI N207: Data Analysis Using SpreadsheetsCSCI N207: Data Analysis Using SpreadsheetsCopyright Copyright ©2004 ©2004 Department of Computer & Information ScienceDepartment of Computer & Information Science

Importance of RelationshipsImportance of Relationships

• Relationships allow users to establish Relationships allow users to establish views based on multiple base tables.views based on multiple base tables.

• Relationships help to reduce data Relationships help to reduce data redundancy and eliminate duplicate redundancy and eliminate duplicate data, thus reinforcing data integrity.data, thus reinforcing data integrity.

Page 33: CSCI N207: Data Analysis Using Spreadsheets Copyright ©2005  Department of Computer & Information Science Introducing Databases

CSCI N207: Data Analysis Using SpreadsheetsCSCI N207: Data Analysis Using SpreadsheetsCopyright Copyright ©2004 ©2004 Department of Computer & Information ScienceDepartment of Computer & Information Science

Categorizing RelationshipsCategorizing Relationships

• We categorize relationships between We categorize relationships between tables in three ways:tables in three ways:– The type of relationship between tablesThe type of relationship between tables– The way that each table in relationship The way that each table in relationship

participates in that relationshipparticipates in that relationship– The degree of participation that each table The degree of participation that each table

participates in a relationshipparticipates in a relationship

Page 34: CSCI N207: Data Analysis Using Spreadsheets Copyright ©2005  Department of Computer & Information Science Introducing Databases

CSCI N207: Data Analysis Using SpreadsheetsCSCI N207: Data Analysis Using SpreadsheetsCopyright Copyright ©2004 ©2004 Department of Computer & Information ScienceDepartment of Computer & Information Science

Different Types of Different Types of RelationshipsRelationships

•One-to-One Relationship (1:1)One-to-One Relationship (1:1)•One-to-Many Relationship One-to-Many Relationship

(1:N)(1:N)•Many-to-Many Relationship Many-to-Many Relationship

(N:N)(N:N)

Page 35: CSCI N207: Data Analysis Using Spreadsheets Copyright ©2005  Department of Computer & Information Science Introducing Databases

CSCI N207: Data Analysis Using SpreadsheetsCSCI N207: Data Analysis Using SpreadsheetsCopyright Copyright ©2004 ©2004 Department of Computer & Information ScienceDepartment of Computer & Information Science

One-To-One Relationships One-To-One Relationships (1:1)(1:1)

• In a one-to-one relationship (1:1), we relate In a one-to-one relationship (1:1), we relate one and only oneone and only one record from a parent table record from a parent table to to one and only one one and only one record in a second record in a second table (a table (a child tablechild table). ).

• To create a 1:1 relationship, we copy the To create a 1:1 relationship, we copy the primary key of a parent table into a child primary key of a parent table into a child table, where it becomes a foreign key.table, where it becomes a foreign key.

• This type of relationship is unique because This type of relationship is unique because both tables share the same primary key. both tables share the same primary key. The primary key in the child table serves The primary key in the child table serves both as that table's primary key and a both as that table's primary key and a foreign key .foreign key .

Page 36: CSCI N207: Data Analysis Using Spreadsheets Copyright ©2005  Department of Computer & Information Science Introducing Databases

CSCI N207: Data Analysis Using SpreadsheetsCSCI N207: Data Analysis Using SpreadsheetsCopyright Copyright ©2004 ©2004 Department of Computer & Information ScienceDepartment of Computer & Information Science

Example of a 1:1 Example of a 1:1 RelationshipRelationship

Employee Employee IDID

Employee First Employee First NameName Employee Last NameEmployee Last Name

100100 ZacharyZachary ErlichErlich

101101 SusanSusan McClainMcClain

102102 JoeJoe RosalesRosalesEmployee Employee IDID

Hourly Hourly RateRate

Commission Commission RateRate

100100 25.0025.00 5.0%5.0%

101101 19.7519.75 3.5%3.5%

102102 22.5022.50 5.0%5.0%

- Adapted from Figure 3.13 from Herenandez

EmployeeEmployeeTableTable

Compensation TableCompensation TableEmployee IDEmployee ID is the is the Primary KeyPrimary Key for for both both tablestablesand also a and also a Foreign KeyForeign Key in the Compensation Table. in the Compensation Table.

Page 37: CSCI N207: Data Analysis Using Spreadsheets Copyright ©2005  Department of Computer & Information Science Introducing Databases

CSCI N207: Data Analysis Using SpreadsheetsCSCI N207: Data Analysis Using SpreadsheetsCopyright Copyright ©2004 ©2004 Department of Computer & Information ScienceDepartment of Computer & Information Science

One-To-Many Relationships One-To-Many Relationships (1:N)(1:N)

• In a one-to-many (1:N) relationship, we relate In a one-to-many (1:N) relationship, we relate a record in one table (a a record in one table (a parent tableparent table) to ) to many many recordsrecords in a second table (a in a second table (a child tablechild table). ).

• To create a 1:N relationship, we copy the To create a 1:N relationship, we copy the primary key of a parent table into a child primary key of a parent table into a child table, where it becomes a foreign key.table, where it becomes a foreign key.

• This type of relationship is the most common This type of relationship is the most common type of relationship in the relational database type of relationship in the relational database model.model.

Page 38: CSCI N207: Data Analysis Using Spreadsheets Copyright ©2005  Department of Computer & Information Science Introducing Databases

CSCI N207: Data Analysis Using SpreadsheetsCSCI N207: Data Analysis Using SpreadsheetsCopyright Copyright ©2004 ©2004 Department of Computer & Information ScienceDepartment of Computer & Information Science

Example of a 1:N Example of a 1:N RelationshipRelationship

Agent Agent IDID

Agent First NameAgent First Name Agent Last Agent Last NameName

Hire Hire DateDate

100100 MikeMike HernandezHernandez 05/16/9505/16/95

101101 GregGreg PiercyPiercy 10/15/9510/15/95

102102 KatherineKatherine EhrlichEhrlich 03/01/9603/01/96Client Client

IDIDAgent IDAgent ID Client First NameClient First Name Client Last Client Last

NameName

90019001 100100 StewartStewart JamesonJameson

90029002 100100 ShannonShannon McLainMcLain

90039003 102102 EstellaEstella PundtPundt

- Adapted from Figure 3.14 from Herenandez

AgentsAgentsTableTable

ClientsClientsTableTable

Agent IDAgent ID is the is the Primary KeyPrimary Key in the Agents Table in the Agents Tableand a and a Foreign KeyForeign Key in the Clients Table. in the Clients Table.

Page 39: CSCI N207: Data Analysis Using Spreadsheets Copyright ©2005  Department of Computer & Information Science Introducing Databases

CSCI N207: Data Analysis Using SpreadsheetsCSCI N207: Data Analysis Using SpreadsheetsCopyright Copyright ©2004 ©2004 Department of Computer & Information ScienceDepartment of Computer & Information Science

Many-To-Many Relationships Many-To-Many Relationships (N:N)(N:N)

• In a many-to-many relationship, we relate In a many-to-many relationship, we relate many recordsmany records in one table to in one table to many records many records in a second table. in a second table.

• We cannot inherently create a N:N We cannot inherently create a N:N relationship. Instead, we can resolve a N:N relationship. Instead, we can resolve a N:N relationship by copying the primary keys of relationship by copying the primary keys of each table into a third table, called a each table into a third table, called a linking linking (associative) table(associative) table. Together, the copied . Together, the copied keys form a keys form a composite primary keycomposite primary key. . Individually, they serve as foreign keys for Individually, they serve as foreign keys for the other table.the other table.

Page 40: CSCI N207: Data Analysis Using Spreadsheets Copyright ©2005  Department of Computer & Information Science Introducing Databases

CSCI N207: Data Analysis Using SpreadsheetsCSCI N207: Data Analysis Using SpreadsheetsCopyright Copyright ©2004 ©2004 Department of Computer & Information ScienceDepartment of Computer & Information Science

Example of Resolving an Example of Resolving an N:N RelationshipN:N Relationship

Page 41: CSCI N207: Data Analysis Using Spreadsheets Copyright ©2005  Department of Computer & Information Science Introducing Databases

CSCI N207: Data Analysis Using SpreadsheetsCSCI N207: Data Analysis Using SpreadsheetsCopyright Copyright ©2004 ©2004 Department of Computer & Information ScienceDepartment of Computer & Information Science

Relationship ParticipationRelationship Participation

• There are two ways that we There are two ways that we categorize relationships based on categorize relationships based on participation:participation:– Mandatory ParticipationMandatory Participation:: If a user MUST enter If a user MUST enter

at least one record into a parent table before at least one record into a parent table before s/he may enter records in a child table.s/he may enter records in a child table.

– Optional ParticipationOptional Participation: : If a user MAY enter If a user MAY enter records in a child table without entering records in a child table without entering records in the parent table.records in the parent table.

Page 42: CSCI N207: Data Analysis Using Spreadsheets Copyright ©2005  Department of Computer & Information Science Introducing Databases

CSCI N207: Data Analysis Using SpreadsheetsCSCI N207: Data Analysis Using SpreadsheetsCopyright Copyright ©2004 ©2004 Department of Computer & Information ScienceDepartment of Computer & Information Science

Degrees of ParticipationDegrees of Participation

• We calculate a table's degree of We calculate a table's degree of participation by:participation by:– The minimum number of records it must associate with The minimum number of records it must associate with

a single record in the related table.a single record in the related table.– The maximum number of records that a related table The maximum number of records that a related table

may associate with a single record in the given table.may associate with a single record in the given table.

• Think of the degree of participation as the Think of the degree of participation as the minimum and maximum number of minimum and maximum number of relationships for a single record in a table.relationships for a single record in a table.

Page 43: CSCI N207: Data Analysis Using Spreadsheets Copyright ©2005  Department of Computer & Information Science Introducing Databases

CSCI N207: Data Analysis Using SpreadsheetsCSCI N207: Data Analysis Using SpreadsheetsCopyright Copyright ©2004 ©2004 Department of Computer & Information ScienceDepartment of Computer & Information Science

Example of Degree of Example of Degree of AssociationAssociation

• Assume that for a Department, advisors Assume that for a Department, advisors are assigned are assigned at least at least 1 student and up to 1 student and up to 50 students, but no more. 50 students, but no more.

• The degree of participation of the The degree of participation of the Advisor Table Advisor Table would be 1,50. That is, an would be 1,50. That is, an advisor must be assigned to at least one advisor must be assigned to at least one student in the student in the Student TableStudent Table, but has a , but has a limit of 50 students in the limit of 50 students in the Student TableStudent Table..

Page 44: CSCI N207: Data Analysis Using Spreadsheets Copyright ©2005  Department of Computer & Information Science Introducing Databases

CSCI N207: Data Analysis Using SpreadsheetsCSCI N207: Data Analysis Using SpreadsheetsCopyright Copyright ©2004 ©2004 Department of Computer & Information ScienceDepartment of Computer & Information Science

Field SpecificationField Specification

• Field SpecificationField Specification (also called (also called domaindomain) ) includes all of the elements of a field. includes all of the elements of a field. There are three types of field elements:There are three types of field elements:– General ElementsGeneral Elements:: Include all of the basic Include all of the basic

information about a field, including the field name, information about a field, including the field name, the field description and a field's parent table.the field description and a field's parent table.

– Physical ElementsPhysical Elements: : Include information on how the Include information on how the field is constructed and how a user views the field; field is constructed and how a user views the field; data type, field length and display format are all data type, field length and display format are all physical elements.physical elements.

Page 45: CSCI N207: Data Analysis Using Spreadsheets Copyright ©2005  Department of Computer & Information Science Introducing Databases

CSCI N207: Data Analysis Using SpreadsheetsCSCI N207: Data Analysis Using SpreadsheetsCopyright Copyright ©2004 ©2004 Department of Computer & Information ScienceDepartment of Computer & Information Science

Field Specification Field Specification (continued)(continued)

– Logical ElementsLogical Elements: : Describe the values that a Describe the values that a field can store, including required values, field can store, including required values, range of values and default values.range of values and default values.

• Field specification is an important Field specification is an important part of database design because it part of database design because it helps to enforce field-level integrity helps to enforce field-level integrity of a database.of a database.

Page 46: CSCI N207: Data Analysis Using Spreadsheets Copyright ©2005  Department of Computer & Information Science Introducing Databases

CSCI N207: Data Analysis Using SpreadsheetsCSCI N207: Data Analysis Using SpreadsheetsCopyright Copyright ©2004 ©2004 Department of Computer & Information ScienceDepartment of Computer & Information Science

Data IntegrityData Integrity

• "Data integrity refers to the validity, "Data integrity refers to the validity, consistency, and accuracy of the data consistency, and accuracy of the data in a database." (in a database." (HernandezHernandez, p. 71), p. 71)

• Four Types of Data Integrity:Four Types of Data Integrity:– Table-level integrityTable-level integrity– Field-level integrityField-level integrity– Relationship-level integrityRelationship-level integrity– Business rulesBusiness rules

Page 47: CSCI N207: Data Analysis Using Spreadsheets Copyright ©2005  Department of Computer & Information Science Introducing Databases

CSCI N207: Data Analysis Using SpreadsheetsCSCI N207: Data Analysis Using SpreadsheetsCopyright Copyright ©2004 ©2004 Department of Computer & Information ScienceDepartment of Computer & Information Science

Table-Level IntegrityTable-Level Integrity

• Also known as Also known as entity integrityentity integrity• Ensures there are no duplicate Ensures there are no duplicate

records throughout a databaserecords throughout a database• Makes sure that primary keys with a Makes sure that primary keys with a

table are unique never nulltable are unique never null

Page 48: CSCI N207: Data Analysis Using Spreadsheets Copyright ©2005  Department of Computer & Information Science Introducing Databases

CSCI N207: Data Analysis Using SpreadsheetsCSCI N207: Data Analysis Using SpreadsheetsCopyright Copyright ©2004 ©2004 Department of Computer & Information ScienceDepartment of Computer & Information Science

Field-Level IntegrityField-Level Integrity

• Also known as Also known as domain integritydomain integrity• Guarantees that that structure of Guarantees that that structure of

each field is sound:each field is sound:– Values are "valid, consistent and accurate" Values are "valid, consistent and accurate"

((HernandezHernandez, p. 71), p. 71)– Values of the same type (for instance, we Values of the same type (for instance, we

would define fields related to an academic would define fields related to an academic major in a consistent manner throughout the major in a consistent manner throughout the database).database).

Page 49: CSCI N207: Data Analysis Using Spreadsheets Copyright ©2005  Department of Computer & Information Science Introducing Databases

CSCI N207: Data Analysis Using SpreadsheetsCSCI N207: Data Analysis Using SpreadsheetsCopyright Copyright ©2004 ©2004 Department of Computer & Information ScienceDepartment of Computer & Information Science

Relationship-Level IntegrityRelationship-Level Integrity

• Also known as Also known as referential integrityreferential integrity• Checks to make sure that the Checks to make sure that the

relationships between tables are relationships between tables are sound.sound.

• Also, ensures that records in related Also, ensures that records in related tables are synchronized when tables are synchronized when someone enters data, deletes data or someone enters data, deletes data or otherwise manipulates it.otherwise manipulates it.

Page 50: CSCI N207: Data Analysis Using Spreadsheets Copyright ©2005  Department of Computer & Information Science Introducing Databases

CSCI N207: Data Analysis Using SpreadsheetsCSCI N207: Data Analysis Using SpreadsheetsCopyright Copyright ©2004 ©2004 Department of Computer & Information ScienceDepartment of Computer & Information Science

Business RulesBusiness Rules

• A database is framed to fit the ways in A database is framed to fit the ways in which an organization runs its which an organization runs its business.business.

• Business rules may affect several Business rules may affect several aspects of database design, including:aspects of database design, including:– Field ranges and valid valuesField ranges and valid values– Types of table relationshipsTypes of table relationships– Degree of relationshipsDegree of relationships– Degree of participationDegree of participation– Synchronization of tablesSynchronization of tables

Page 51: CSCI N207: Data Analysis Using Spreadsheets Copyright ©2005  Department of Computer & Information Science Introducing Databases

CSCI N207: Data Analysis Using SpreadsheetsCSCI N207: Data Analysis Using SpreadsheetsCopyright Copyright ©2004 ©2004 Department of Computer & Information ScienceDepartment of Computer & Information Science

Questions?Questions?

Page 52: CSCI N207: Data Analysis Using Spreadsheets Copyright ©2005  Department of Computer & Information Science Introducing Databases

CSCI N207: Data Analysis Using SpreadsheetsCSCI N207: Data Analysis Using SpreadsheetsCopyright Copyright ©2004 ©2004 Department of Computer & Information ScienceDepartment of Computer & Information Science

ReferencesReferences

• geekgirl's plain-english computing geekgirl's plain-english computing (website): (website): http://www.geekgirls.com/menu_databasehttp://www.geekgirls.com/menu_databases.htms.htm

• Database Design for Mere Mortals, Database Design for Mere Mortals, 22ndnd Edition Edition by Michael Hernandez by Michael Hernandez (Addison-Wesley, 2004)(Addison-Wesley, 2004)