26
CENG 351 1 CENG 351 Introduction to Data Management and File Structures Nihan Kesim Çiçekli Department of Computer Engineering METU

CENG 3511 CENG 351 Introduction to Data Management and File Structures Nihan Kesim Çiçekli Department of Computer Engineering METU

Embed Size (px)

Citation preview

Page 1: CENG 3511 CENG 351 Introduction to Data Management and File Structures Nihan Kesim Çiçekli Department of Computer Engineering METU

CENG 351 1

CENG 351 Introduction to Data Management

and File Structures

Nihan Kesim Çiçekli

Department of Computer Engineering

METU

Page 2: CENG 3511 CENG 351 Introduction to Data Management and File Structures Nihan Kesim Çiçekli Department of Computer Engineering METU

CENG 351 2

CENG 351-Section 2

• Instructor: Nihan Kesim Çiçekli• Office: A308• Email: [email protected]• Lecture Hours:

Tue. 15:40; Thu. 11:40,12:40 (BMB3)

• Course Web page: http://cow.ceng.metu.edu.tr• Teaching Assistants:

Hande Çelikkanat, Cüneyt Mertayak, Çağatay Çallı

Page 3: CENG 3511 CENG 351 Introduction to Data Management and File Structures Nihan Kesim Çiçekli Department of Computer Engineering METU

CENG 351 3

References

1. Betty Salzberg, File Structures: An Analytic Approach, Prentice Hall, 1988.

2. Raghu Ramakrishnan, Database Management Systems (3rd. ed.), McGraw Hill, 2003.

3. Michael J. Folk, Bill Zoellick and Greg Riccardi, File Structures, An object oriented approach with C++, Addison-Wesley, 1998.

4. R. Elmasri, S.B. Navathe, Fundamentals of Database Systems, 4th edition, Addison-Wesley, 2004.

Page 4: CENG 3511 CENG 351 Introduction to Data Management and File Structures Nihan Kesim Çiçekli Department of Computer Engineering METU

CENG 351 4

Course Outline1. Introduction: Secondary storage devices2. Fundamental File Structure Concepts:

Sequential Files 3. External Sorting4. Indexed Sequential Files (B-trees)5. Direct access (Hashing)6. Introduction to Database Systems:

E/R modeling, relational model, 7. Query languages: Relational algebra, relational

calculus, SQL8. Query Evaluation

Page 5: CENG 3511 CENG 351 Introduction to Data Management and File Structures Nihan Kesim Çiçekli Department of Computer Engineering METU

CENG 351 5

Grading

3 written HW, 3 programming assignments 30%

Midterm Exam 1 20%

Midterm Exam 2 20%

Final Exam 30%

Tentative Exam Dates:

Midterm Exam 1: Nov. 6, 2008

Midterm Exam 2: Dec. 18, 2007

Page 6: CENG 3511 CENG 351 Introduction to Data Management and File Structures Nihan Kesim Çiçekli Department of Computer Engineering METU

CENG 351 6

Grading Policies

• Policy on missed midterm: – no make-up exam

• Lateness policy:– Late assignments are penalized up to 10% per day.

• All assignments and programs are to be your own work. No group projects or assignments are allowed.

Page 7: CENG 3511 CENG 351 Introduction to Data Management and File Structures Nihan Kesim Çiçekli Department of Computer Engineering METU

CENG 351 7

Introduction to File management

Page 8: CENG 3511 CENG 351 Introduction to Data Management and File Structures Nihan Kesim Çiçekli Department of Computer Engineering METU

CENG 351 8

Motivation Most computers are used for data processing

(over $100 billion/year). A big growth area in the “information age”

This course covers data processing from a computer science perspective:

– Storage of data– Organization of data– Access to data– Processing of data

Page 9: CENG 3511 CENG 351 Introduction to Data Management and File Structures Nihan Kesim Çiçekli Department of Computer Engineering METU

CENG 351 9

Data Structures vs File Structures

• Both involve:– Representation of Data

+– Operations for accessing data

• Difference:– Data structures: deal with data in main memory– File structures: deal with data in secondary

storage

Page 10: CENG 3511 CENG 351 Introduction to Data Management and File Structures Nihan Kesim Çiçekli Department of Computer Engineering METU

CENG 351 10

Hardware

Operating System

DBMS

File system

Application

Where do File Structures fit in Computer Science?

Page 11: CENG 3511 CENG 351 Introduction to Data Management and File Structures Nihan Kesim Çiçekli Department of Computer Engineering METU

CENG 351 11

Computer Architecture

Main Memory (RAM)

Secondary

Storage

data transfer

data is manipulated here

data is stored here

- Semiconductors

- Fast, expensive, volatile, small

- disks, tape

- Slow,cheap, stable, large

Page 12: CENG 3511 CENG 351 Introduction to Data Management and File Structures Nihan Kesim Çiçekli Department of Computer Engineering METU

CENG 351 12

Advantages• Main memory is fast• Secondary storage is big (because it is cheap)• Secondary storage is stable (non-volatile) i.e.

data is not lost during power failures

Disadvantages• Main memory is small. Many databases are too

large to fit in main memory (MM).• Main memory is volatile, i.e. data is lost during

power failures.• Secondary storage is slow (10,000 times slower

than MM)

Page 13: CENG 3511 CENG 351 Introduction to Data Management and File Structures Nihan Kesim Çiçekli Department of Computer Engineering METU

CENG 351 13

How fast is main memory?

• Typical time for getting info from:Main memory: ~12 nanosec = 120 x 10-9 secMagnetic disks: ~30 milisec = 30 x 10-3 sec

• An analogy keeping same time proportion as above:Looking at the index of a book : 20 sec

versusGoing to the library: 58 days

Page 14: CENG 3511 CENG 351 Introduction to Data Management and File Structures Nihan Kesim Çiçekli Department of Computer Engineering METU

CENG 351 14

Normal Arrangement

• Secondary storage (SS) provides reliable, long-term storage for large volumes of data

• At any given time, we are usually interested in only a small portion of the data

• This data is loaded temporarily into main memory, where it can be rapidly manipulated and processed.

• As our interests shift, data is transferred automatically between MM and SS, so the data we are focused on is always in MM.

Page 15: CENG 3511 CENG 351 Introduction to Data Management and File Structures Nihan Kesim Çiçekli Department of Computer Engineering METU

CENG 351 15

Goal of the file structures

• Minimize the number of trips to the disk in order to get desired information

• Grouping related information so that we are likely to get everything we need with only one trip to the disk.

Page 16: CENG 3511 CENG 351 Introduction to Data Management and File Structures Nihan Kesim Çiçekli Department of Computer Engineering METU

CENG 351 16

Physical Files and Logical Files• physical file: a collection of bytes stored on a disk or

tape• logical file: a "channel" (like a telephone line) that

connects the program to a physical file• The program (application) sends (or receives) bytes

to (from) a file through the logical file. The program knows nothing about where the bytes go (came from).

• The operating system is responsible for associating a logical file in a program to a physical file in disk or tape. Writing to or reading from a file in a program is done through the operating system.

Page 17: CENG 3511 CENG 351 Introduction to Data Management and File Structures Nihan Kesim Çiçekli Department of Computer Engineering METU

CENG 351 17

Files

• The physical file has a name, for instance myfile.txt

• The logical file has a logical name (a varibale) inside the program.– In C :

FILE * outfile;

– In C++:fstream outfile;

Page 18: CENG 3511 CENG 351 Introduction to Data Management and File Structures Nihan Kesim Çiçekli Department of Computer Engineering METU

CENG 351 18

Basic File Processing Operations

• Opening

• Closing

• Reading

• Writing

• Seeking

Page 19: CENG 3511 CENG 351 Introduction to Data Management and File Structures Nihan Kesim Çiçekli Department of Computer Engineering METU

CENG 351 19

Opening Files

• Opening Files:– links a logical file to a physical file.

• In C:FILE * outfile;

outfile = fopen(“myfile.txt”, “w”);

• In C++:fstream outfile;

outfile.open(“myfile.txt”, ios::out);

Page 20: CENG 3511 CENG 351 Introduction to Data Management and File Structures Nihan Kesim Çiçekli Department of Computer Engineering METU

CENG 351 20

Closing Files

• Cuts the link between the physical and logical files. • After closing a file, the logical name is free to be

associated to another physical file.• Closing a file used for output guarantees everything

has been written to the physical file. (When the file is closed the leftover from the buffer is flushed to the file.)

• In C :fclose(outfile);

• In C++ :outfile.close();

Page 21: CENG 3511 CENG 351 Introduction to Data Management and File Structures Nihan Kesim Çiçekli Department of Computer Engineering METU

CENG 351 21

Reading• Read data from a file and place it in a variable

inside the program.• In C:

char c;FILE * infile;infile = fopen(“myfile.txt”,”r”);fread(&c, 1, 1, infile);

• In C++:char c;fstream infile;infile.open(“myfile.txt”,ios::in);infile >> c;

Page 22: CENG 3511 CENG 351 Introduction to Data Management and File Structures Nihan Kesim Çiçekli Department of Computer Engineering METU

CENG 351 22

Writing

• Write data from a variable inside the program into the file.

• In C:char c;FILE * outfile;outfile = fopen(“mynew.txt”,”w”);fwrite(&c, 1, 1, outfile);

• In C++:char c;fstream outfile;outfile.open(“mynew.txt”,ios::out);outfile << c;

Page 23: CENG 3511 CENG 351 Introduction to Data Management and File Structures Nihan Kesim Çiçekli Department of Computer Engineering METU

CENG 351 23

Seeking

• Used for direct access; an item can be accessed by specifying its position in the file.

• In C:fseek(infile,0, 0); // moves to the beginningfseek(infile, 0, 2); // moves to the endfseek(infile,-10, 1); //moves 10 bytes from //current position

• In C++:infile.seekg(0,ios::beg);infile.seekg(0,ios::end);infile.seekg(-10,ios::cur);

Page 24: CENG 3511 CENG 351 Introduction to Data Management and File Structures Nihan Kesim Çiçekli Department of Computer Engineering METU

CENG 351 24

File Systems

• Data is not scattered hither and thither on disk.

• Instead, it is organized into files.

• Files are organized into records.

• Records are organized into fields.

Page 25: CENG 3511 CENG 351 Introduction to Data Management and File Structures Nihan Kesim Çiçekli Department of Computer Engineering METU

CENG 351 25

Example• A student file may be a collection of student

records, one record for each student• Each student record may have several fields, such

as– Name– Address– Student number– Gender– Age– GPA

• Typically, each record in a file has the same fields.

Page 26: CENG 3511 CENG 351 Introduction to Data Management and File Structures Nihan Kesim Çiçekli Department of Computer Engineering METU

CENG 351 26

Properties of Files

1) Persistance: Data written into a file persists after the program stops, so the data can be used later.

2) Sharability: Data stored in files can be shared by many programs and users simultaneously.

3) Size: Data files can be very large. Typically, they cannot fit into MM.