17
CS 541 Lecture Slides CS 541 Lecture Slides Sunil Prabhakar Sunil Prabhakar CS541 Database Systems

CS 541 Lecture Slides Sunil Prabhakar CS541 Database Systems

Embed Size (px)

Citation preview

Page 1: CS 541 Lecture Slides Sunil Prabhakar CS541 Database Systems

CS 541 Lecture SlidesCS 541 Lecture Slides

Sunil PrabhakarSunil Prabhakar

CS541 Database Systems

Page 2: CS 541 Lecture Slides Sunil Prabhakar CS541 Database Systems

April 21, 2023 Sunil Prabhakar 2

Instructor

Sunil PrabhakarSunil Prabhakar LWSN 2142CLWSN 2142C Office Hours: catch me or by appointmentOffice Hours: catch me or by appointment [email protected]@cs.purdue.edu http://www.cs.purdue.edu/homes/sunil/http://www.cs.purdue.edu/homes/sunil/

Teaching Assistant: Yasin SilvaTeaching Assistant: Yasin Silva [email protected]@cs.purdue.edu Office hours: TBAOffice hours: TBA Assignments and ProjectsAssignments and Projects

Page 3: CS 541 Lecture Slides Sunil Prabhakar CS541 Database Systems

April 21, 2023 Sunil Prabhakar 3

Course Information

Web page:Web page: http://www.cs.purdue.edu/homes/sunil/syllabi/http://www.cs.purdue.edu/homes/sunil/syllabi/

CS541_Fall2004.htmlCS541_Fall2004.html Projects, Assignments, Solutions, SlidesProjects, Assignments, Solutions, Slides

Email aliasEmail alias Announcements: IMPORTANTAnnouncements: IMPORTANT [email protected]@cs.purdue.edu mailer add me to cs541mailer add me to cs541

WebCTWebCT GradesGrades Check that you can log inCheck that you can log in

Page 4: CS 541 Lecture Slides Sunil Prabhakar CS541 Database Systems

April 21, 2023 Sunil Prabhakar 4

Course Description

Introductory graduate course on databasesIntroductory graduate course on databases Fundamental concepts & internalsFundamental concepts & internals Some coverage of use of databases (Oracle Some coverage of use of databases (Oracle

projects)projects) Will not teach use of databases!!!Will not teach use of databases!!! Focus on Relational DatabasesFocus on Relational Databases

Page 5: CS 541 Lecture Slides Sunil Prabhakar CS541 Database Systems

April 21, 2023 Sunil Prabhakar 5

Topics

DBMS Concepts and ArchitectureDBMS Concepts and Architecture Relational Database Model Relational Database Model Relational Languages (Algebra, Calculus, SQL)Relational Languages (Algebra, Calculus, SQL) Storage and IndexingStorage and Indexing Query ProcessingQuery Processing Query OptimizationQuery Optimization Transaction ProcessingTransaction Processing

Concurrency ControlConcurrency Control RecoveryRecovery

Advanced Topics: TBD (Mining, Indexing, Advanced Topics: TBD (Mining, Indexing, Sensors, …)Sensors, …)

Page 6: CS 541 Lecture Slides Sunil Prabhakar CS541 Database Systems

April 21, 2023 Sunil Prabhakar 6

Pre-Requisites

Data StructuresData Structures Notions of trees, hashing, linked lists etc.Notions of trees, hashing, linked lists etc.

Operating Systems Operating Systems I/OI/O

JavaJava Project 3 will be done in Java Project 3 will be done in Java RMIRMI Simple GUISimple GUI

Page 7: CS 541 Lecture Slides Sunil Prabhakar CS541 Database Systems

April 21, 2023 Sunil Prabhakar 7

Text

Database System Concepts (4th Edition)Database System Concepts (4th Edition) Silberschatz, Korth, SudarshanSilberschatz, Korth, Sudarshan ISBN: 0-07-228363-7ISBN: 0-07-228363-7 McGraw HillMcGraw Hill

Supplemental Text:Supplemental Text: Concurrency Control and Recovery in Database Concurrency Control and Recovery in Database

SystemsSystems Bernstein, Hadzilacos, Goodman.Bernstein, Hadzilacos, Goodman. Out of Print: Avaliable free on the InternetOut of Print: Avaliable free on the Internet Link from course web page.Link from course web page.

Page 8: CS 541 Lecture Slides Sunil Prabhakar CS541 Database Systems

April 21, 2023 Sunil Prabhakar 8

Grading Policy

TentativeTentative Written Assignments (2) Written Assignments (2) 20%20% Programming Projects (3-4)Programming Projects (3-4) 40%40% Mid-term ExamMid-term Exam 20%20% Final ExamFinal Exam 20%20%

Final not comprehensiveFinal not comprehensive Grading is curvedGrading is curved No extra credit assignmentsNo extra credit assignments

Page 9: CS 541 Lecture Slides Sunil Prabhakar CS541 Database Systems

April 21, 2023 Sunil Prabhakar 9

Academic Integrity

CS PolicyCS Policy IMPORTANT: visit, read and accept!!!IMPORTANT: visit, read and accept!!! https://portals.cs.purdue.edu/studenthttps://portals.cs.purdue.edu/student Need CS login and password.Need CS login and password.

Cheating will be taken very seriously.Cheating will be taken very seriously. Make sure that you are familiar with what CS Make sure that you are familiar with what CS

considers to be cheating!!considers to be cheating!! You may discuss the problems, but the final You may discuss the problems, but the final

solution must be your own.solution must be your own.

Page 10: CS 541 Lecture Slides Sunil Prabhakar CS541 Database Systems

April 21, 2023 Sunil Prabhakar 10

Course Policy

NO LATE SUBMISSIONSNO LATE SUBMISSIONS NO LATE SUBMISSIONSNO LATE SUBMISSIONS NO EXTENSIONSNO EXTENSIONS NO EXTENSIONSNO EXTENSIONS

******

Only on Documented Medical Reasons or Family Only on Documented Medical Reasons or Family emergency.emergency.

Page 11: CS 541 Lecture Slides Sunil Prabhakar CS541 Database Systems

April 21, 2023 Sunil Prabhakar 11

Databases

What is a database?What is a database? S/w to manage data.S/w to manage data.

Why do we need a database?Why do we need a database? Ease of development,Ease of development, EfficiencyEfficiency ConcurrencyConcurrency ReliabilityReliability Ease of administrationEase of administration Data independenceData independence

Importance of databases?Importance of databases? Increasing or decreasing? What is changing?Increasing or decreasing? What is changing?

Page 12: CS 541 Lecture Slides Sunil Prabhakar CS541 Database Systems

April 21, 2023 Sunil Prabhakar 12

What is interesting?

Essential to modern applications?Essential to modern applications? Data is a valuable commodity.Data is a valuable commodity.

Is there anything challenging?Is there anything challenging? Encompass PL, OS, Logic, Theory, …Encompass PL, OS, Logic, Theory, … Novel solutions with wider applicability: Transactions, Novel solutions with wider applicability: Transactions,

Locking, …Locking, … What remains to be done?What remains to be done?

Modern applications: Multimedia, Sensors, Streams, Modern applications: Multimedia, Sensors, Streams, Data Warehouses, Data Mining, Privacy and Security, Data Warehouses, Data Mining, Privacy and Security, Knowledge, Data on the Web, XML, ….Knowledge, Data on the Web, XML, ….

Page 13: CS 541 Lecture Slides Sunil Prabhakar CS541 Database Systems

April 21, 2023 Sunil Prabhakar 13

Abstraction

How to provide a generic, application-How to provide a generic, application-independent solution?independent solution?

Data ModelsData Models Abstract view of dataAbstract view of data Database efficiently supports this modelDatabase efficiently supports this model Examples: Network, Relational, OO, O-R, …Examples: Network, Relational, OO, O-R, … Most successful model: RELATIONALMost successful model: RELATIONAL

Users access the database as a black box that Users access the database as a black box that supports the model.supports the model.

Languages are used to interact with this Box:Languages are used to interact with this Box: Relational Algebra, SQL, Relational Algebra, SQL,

Page 14: CS 541 Lecture Slides Sunil Prabhakar CS541 Database Systems

April 21, 2023 Sunil Prabhakar 14

Independence

Databases allow applications and users to be Databases allow applications and users to be shielded from the internal details:shielded from the internal details: Physical data independencePhysical data independence

How data is stored (bits, pages, formats, etc.)How data is stored (bits, pages, formats, etc.) Compare with Flat file alternativeCompare with Flat file alternative

Logical data independenceLogical data independence How data is structured logically.How data is structured logically. Allows applications to make changes to the logical Allows applications to make changes to the logical

organization of data without have to rebuild applicationsorganization of data without have to rebuild applications

Page 15: CS 541 Lecture Slides Sunil Prabhakar CS541 Database Systems

April 21, 2023 Sunil Prabhakar 15

Concurrency Control & Recovery

Two highly desirable requirements:Two highly desirable requirements: Enable multiple users to access the data at the same Enable multiple users to access the data at the same

time.time. Automatic recovery from crashes.Automatic recovery from crashes.

Challenge:Challenge: How to do this in an application-independent manner?How to do this in an application-independent manner?

Solution:Solution: TransactionsTransactions ““Contract” between the DB Black Box and users.Contract” between the DB Black Box and users.

Page 16: CS 541 Lecture Slides Sunil Prabhakar CS541 Database Systems

April 21, 2023 Sunil Prabhakar 16

Performance

Critical for databasesCritical for databases Research focus for many yearsResearch focus for many years Must be transparent to the usersMust be transparent to the users Query processing & OptimizationQuery processing & Optimization Indexing, storage organization (data Indexing, storage organization (data

independence)independence) Challenge:Challenge:

How to optimize without understanding the semantics How to optimize without understanding the semantics of an application?of an application?

Solution:Solution: Relation data model -- clean mathematical abstraction, Relation data model -- clean mathematical abstraction,

allows for alternative equivalent evaluationsallows for alternative equivalent evaluations

Page 17: CS 541 Lecture Slides Sunil Prabhakar CS541 Database Systems

April 21, 2023 Sunil Prabhakar 17

This course

Study the relational model, ER model, Study the relational model, ER model, languages.languages.

TransactionsTransactions Concurrency ControlConcurrency Control RecoveryRecovery

Storage and File StructuresStorage and File Structures Indexing and HashingIndexing and Hashing Query Processing and OptimizationQuery Processing and Optimization Advanced TopicsAdvanced Topics

New data types, applications, multi-dimensional data, New data types, applications, multi-dimensional data, data warehousing, data mining, design, …data warehousing, data mining, design, …