Upload
robert-jackson
View
219
Download
0
Embed Size (px)
Citation preview
1
Relational Databases
2
Find Databases here…
3
And here…
4
The “Deep Web”
• Dynamic pages, generated from databases• Not easily discovered using crawling • Perhaps 400-500 times larger than surface
Web• Fastest growing source of new information
5
Deep Web• 60 Deep Sites Exceed Surface Web by 40 Times
NameType URL
Web Size
(GBs)
National Climatic Data Center (NOAA) Public http://www.ncdc.noaa.gov/ol/satellite/satelliteresources.html
366,000
NASA EOSDIS Public http://harp.gsfc.nasa.gov/~imswww/pub/imswelcome/plain.html
219,600
National Oceanographic (combined with Geophysical) Data Center (NOAA)
Public/Fee http://www.nodc.noaa.gov/, http://www.ngdc.noaa.gov/
32,940
Alexa Public (partial)
http://www.alexa.com/ 15,860
Right-to-Know Network (RTK Net) Public http://www.rtk.net/ 14,640
MP3.com Public http://www.mp3.com/
6
Content of the Deep Web
7
Database Basics
• What is a database?– Collection of data, organized to support access– Models some aspects of reality
• Components of a relational database:– Field = an “atomic” unit of data– Record = a collection of related fields– Table = a collection of related records
• Each record is one row in the table• Each field is one column in the table
– Primary Key = the field that uniquely identifies a record
– Database = a collection of tables
8
Why “Relational”?
• Databases model some aspects of reality
• A relational database views the world in terms of entities and relations between them
9
The Registrar Example
• What do we need to know (i.e., model)?– Something about the students (e.g.,
first name, last name, email, department)
– Something about the courses (e.g., course ID, description, enrolled students, grades)
– Which students are in which courses
10
A First TryPut everything in a big table…
Discussion: Why is this a bad idea?
Student ID Last Name First Name Dept ID Dept Course ID Course name Grade email
1 Arrows John EE EE lbsc690 Information Technology 90 jarrows@wam1 Arrows John EE Elec Engin ee750 Communication 95 ja_2002@yahoo
2 Peters Kathy HIST HIST lbsc690 Informatino Technology 95 kpeters2@wam2 Peters Kathy HIST history hist405 American History 80 kpeters2@wma
3 Smith Chris HIST history hist405 American History 90 smith2002@glue4 Smith John CLIS Info Sci lbsc690 Information Technology 98 js03@wam
11
Good Database Design
• Save space– Save each fact only once
• More rapid updates– Every fact only needs to be updated once
• More rapid search– Finding something once is good enough
• Avoid inconsistency– Changing data once changes it everywhere
12
Another Try...
Department ID DepartmentEE Electrical EngineeringHIST HistoryCLIS Information Studies
Course ID Course Namelbsc690 Information Technologyee750 Communicationhist405 American History
Student ID Course ID Grade1 lbsc690 901 ee750 952 lbsc690 952 hist405 803 hist405 904 lbsc690 98
Student ID Last Name First Name Department ID email1 Arrows John EE jarrows@wam2 Peters Kathy HIST kpeters2@wam3 Smith Chris HIST smith2002@glue4 Smith John CLIS js03@wam
Student Table
Department Table Course Table
Enrollment Table
13
Approaches to Normalization
• For simple problems:– Start with “binary relationships”: pairs
of fields that are related– Group together wherever possible– Add keys where necessary
• For more complicated problems:– Entity relationship modeling (LBSC
670)
14
Some Lingo
• “Primary Key” uniquely identifies a record– e.g., student ID in the student table
• “Foreign Key” is primary key in the other table– It need not be unique in this table
15
The Data Model
Department ID DepartmentEE Electrical EngineeringHIST HistoryCLIS Information Studies
Course ID Course Namelbsc690 Information Technologyee750 Communicationhist405 American History
Student ID Course ID Grade1 lbsc690 901 ee750 952 lbsc690 952 hist405 803 hist405 904 lbsc690 98
Student ID Last Name First Name Department ID email1 Arrows John EE jarrows@wam2 Peters Kathy HIST kpeters2@wam3 Smith Chris HIST smith2002@glue4 Smith John CLIS js03@wam
Student Table
Department Table Course Table
Enrollment Table
16
Project
SELECT Student ID, Department
Student ID Last Name First Name Dept ID Department email1 Arrows John EE Electrical Engineering jarrows@wam2 Peters Kathy HIST History kpeters2@wam3 Smith Chris HIST History smith2002@glue4 Smith John CLIS Information Stuides js03@wam
Student ID Department1 Electrical Engineering2 History3 History4 Information Stuides
17
RestrictStudent ID Last Name First Name Dept ID Department email
1 Arrows John EE Electrical Engineering jarrows@wam2 Peters Kathy HIST History kpeters2@wam3 Smith Chris HIST History smith2002@glue4 Smith John CLIS Information Stuides js03@wam
Student ID Last Name First Name Department ID Department email2 Peters Kathy HIST History kpeters2@wam3 Smith Chris HIST History smith2002@glue
WHERE Department ID = “HIST”
18
Join
Student ID Last Name First Name Dept ID Department email1 Arrows John EE Electrical Engineering jarrows@wam2 Peters Kathy HIST History kpeters2@wam3 Smith Chris HIST History smith2002@glue4 Smith John CLIS Information Stuides js03@wam
“Joined” Table
Student ID Last Name First Name Department ID email1 Arrows John EE jarrows@wam2 Peters Kathy HIST kpeters2@wam3 Smith Chris HIST smith2002@glue4 Smith John CLIS js03@wam
Student Table
Department TableDepartment ID DepartmentEE Electrical EngineeringHIST HistoryCLIS Information Studies
19
Relational Operations
• Choosing columns: SELECT– Based on their label
• Choosing rows: WHERE– Based on their contents
• Joining tables: JOIN • These can be specified together
department ID = “HIST”
SELECT Student ID, Dept WHERE Dept = “History”
20
Some SQL
• SQL = Structured Query Language• Used in many types of database
systems
21
Select query
• SELECT LastName, FirstName from StudentTable
Student ID Last Name First Name Dept ID Department email1 Arrows John EE Electrical Engineering jarrows@wam2 Peters Kathy HIST History kpeters2@wam3 Smith Chris HIST History smith2002@glue4 Smith John CLIS Information Stuides js03@wam
StudentTable
22
Select with Restriction
• SELECT LastName, FirstName from StudentTable where DeptID = ‘HIST’
• Will return – Peters, Kathy– Smith, Chris
Student ID Last Name First Name Dept ID Department email1 Arrows John EE Electrical Engineering jarrows@wam2 Peters Kathy HIST History kpeters2@wam3 Smith Chris HIST History smith2002@glue4 Smith John CLIS Information Stuides js03@wam
23
Select with Restriction
• SELECT StudentID from EnrollmentTable where Grade > 81
Student ID Course ID Grade1 lbsc690 901 ee750 952 lbsc690 952 hist405 803 hist405 904 lbsc690 98
Enrollment Table
24
Select with JoinSELECT LastName, FirstName from StudentTable JOIN EnrollmentTable on StudentTable.StudentID =EnrollmentTable.StudentID where EnrollmentTable.Grade > 95
Results:Smith, John
Student ID Course ID Grade1 lbsc690 901 ee750 952 lbsc690 952 hist405 803 hist405 904 lbsc690 98
Student ID Last Name First Name Department ID email1 Arrows John EE jarrows@wam2 Peters Kathy HIST kpeters2@wam3 Smith Chris HIST smith2002@glue4 Smith John CLIS js03@wam
Student Table
Enrollment Table
25
Discussion Point
• How is a relational database different from a spreadsheet?