25
EXTERNAL SORTING ALGORITHMS AND IMPLEMENTATIONS 05011004 Ayhan KARGIN 05011027 Ahmet MERAL

EXTERNAL SORTING ALGORITHMS AND IMPLEMENTATIONS 05011004 Ayhan KARGIN 05011027 Ahmet MERAL

Embed Size (px)

Citation preview

Page 1: EXTERNAL SORTING ALGORITHMS AND IMPLEMENTATIONS 05011004 Ayhan KARGIN 05011027 Ahmet MERAL

EXTERNAL SORTING ALGORITHMSAND IMPLEMENTATIONS

05011004 Ayhan KARGIN05011027 Ahmet MERAL

Page 2: EXTERNAL SORTING ALGORITHMS AND IMPLEMENTATIONS 05011004 Ayhan KARGIN 05011027 Ahmet MERAL

Contents External Sorting Needs and Usage Areas External Sorting Algorithms Environments for Implementing External Sorting Algorithms Used Technologies Phases of External Sorting K-Way Merge Sort Multi-Step K-Way Merge Sort Replacement Selection Sort SimpleDB Layered Components of SimpleDB Classes of SimpleDB Query Layer of SimpleDB Relational Algebra, that SimpleDB Supports Relational Algebra Preparatory Work External Sorting on SimpleDB

Page 3: EXTERNAL SORTING ALGORITHMS AND IMPLEMENTATIONS 05011004 Ayhan KARGIN 05011027 Ahmet MERAL

External Sorting Needs and Usage Areas

DBMS Group By, Join, Order By

Data Warehouse (ETL) Data Mining Data Processing

Page 4: EXTERNAL SORTING ALGORITHMS AND IMPLEMENTATIONS 05011004 Ayhan KARGIN 05011027 Ahmet MERAL

External Sorting Algorithms

K-Way Merge Sort Multi-Step K-Way Merge Sort Replacement Selection Sort

Page 5: EXTERNAL SORTING ALGORITHMS AND IMPLEMENTATIONS 05011004 Ayhan KARGIN 05011027 Ahmet MERAL

Environments for ImplementingExternal Sorting Algorithms

MinSQL PosgreSQL SimpleDB

Page 6: EXTERNAL SORTING ALGORITHMS AND IMPLEMENTATIONS 05011004 Ayhan KARGIN 05011027 Ahmet MERAL

Used Technologies

Java VM Java SE JDBC Java RMI Eclipse

Page 7: EXTERNAL SORTING ALGORITHMS AND IMPLEMENTATIONS 05011004 Ayhan KARGIN 05011027 Ahmet MERAL

Phases of External Sorting

Run Generation Phase Merge Phase

Page 8: EXTERNAL SORTING ALGORITHMS AND IMPLEMENTATIONS 05011004 Ayhan KARGIN 05011027 Ahmet MERAL

K-Way Merge Sort

Page 9: EXTERNAL SORTING ALGORITHMS AND IMPLEMENTATIONS 05011004 Ayhan KARGIN 05011027 Ahmet MERAL

Multi-Step K-Way Merge Sort

Page 10: EXTERNAL SORTING ALGORITHMS AND IMPLEMENTATIONS 05011004 Ayhan KARGIN 05011027 Ahmet MERAL

Multi-Step K-Way Merge Sort

Page 11: EXTERNAL SORTING ALGORITHMS AND IMPLEMENTATIONS 05011004 Ayhan KARGIN 05011027 Ahmet MERAL

Replacement Selection

Page 12: EXTERNAL SORTING ALGORITHMS AND IMPLEMENTATIONS 05011004 Ayhan KARGIN 05011027 Ahmet MERAL

Replacement Selection

Test results depending on stage area size

Note: Main memory is 8x.

Stage Area Disc Access Count Run Time K (Run Count)

1x 3817 0.690 17

2x 420 0.742 8

3x 252 1.049 6

4x 258 1.420 4

5x 266 1.952 4

6x 279 2.333 3

Page 13: EXTERNAL SORTING ALGORITHMS AND IMPLEMENTATIONS 05011004 Ayhan KARGIN 05011027 Ahmet MERAL

SimpleDB

The Client Side That contains the JDBC interfaces and

implements the JDBC driver The Basic Server

Which provides complete funcionality of DB but ignores efficiency issues

Extensions To the basic server that support

efficient query processing.

Page 14: EXTERNAL SORTING ALGORITHMS AND IMPLEMENTATIONS 05011004 Ayhan KARGIN 05011027 Ahmet MERAL

Layered Components of SimpleDB

Remote Perform JDBC requests received from clients.

Parse Extract the tables, fields, and predicate mentioned in an SQL statement.

Planner Create an execution strategy for an SQL statement, and translate it to a

relational algebra plan. Query

Implement queries expressed in relational algebra. Metadata

Maintain metadata about the tables in the database, so that its records and fields are accessible.

Page 15: EXTERNAL SORTING ALGORITHMS AND IMPLEMENTATIONS 05011004 Ayhan KARGIN 05011027 Ahmet MERAL

Layered Components of SimpleDB

Record Provide methods for storing data records in pages

Transaction Support concurrency by restricting page access. Enable recovery by

logging changes to pages. Buffer

Maintain a cache of pages in memory to hold recently-accessed user data.

Log Append log records to the log file, and scan the records in the log file.

File Read and write between file blocks and memory pages.

Page 16: EXTERNAL SORTING ALGORITHMS AND IMPLEMENTATIONS 05011004 Ayhan KARGIN 05011027 Ahmet MERAL

Classes of SimpleDB

Page 17: EXTERNAL SORTING ALGORITHMS AND IMPLEMENTATIONS 05011004 Ayhan KARGIN 05011027 Ahmet MERAL

Classes of SimpleDB

Page 18: EXTERNAL SORTING ALGORITHMS AND IMPLEMENTATIONS 05011004 Ayhan KARGIN 05011027 Ahmet MERAL

Classes of SimpleDB

Page 19: EXTERNAL SORTING ALGORITHMS AND IMPLEMENTATIONS 05011004 Ayhan KARGIN 05011027 Ahmet MERAL

Query Layer of SimpleDB

Relational Algebra Select Project Product Sort

Page 20: EXTERNAL SORTING ALGORITHMS AND IMPLEMENTATIONS 05011004 Ayhan KARGIN 05011027 Ahmet MERAL

Relational Algebra, that SimpleDB Supports

Relational Algebra

SelectProjectProduct

Sort

SQL

WhereSelect

JoinOrder By

Page 21: EXTERNAL SORTING ALGORITHMS AND IMPLEMENTATIONS 05011004 Ayhan KARGIN 05011027 Ahmet MERAL

Relational Algebra

Select SId, SName, DName from STUDENT, DEPT where

MajorId=DId and DName='math'

Q1: Product (STUDENT, DEPT)Q2: Select (Q2, MajorId=DId )

Q3: Select (Q2, DName='math')Q4: Project (Q3, {SId, SName,

DName })

Page 22: EXTERNAL SORTING ALGORITHMS AND IMPLEMENTATIONS 05011004 Ayhan KARGIN 05011027 Ahmet MERAL

Relational Algebra

Page 23: EXTERNAL SORTING ALGORITHMS AND IMPLEMENTATIONS 05011004 Ayhan KARGIN 05011027 Ahmet MERAL

Preparatory Work

We have to introduce order by operator (in SQL) to parser layer of SimpleDB.

The columns, which will be sort, must be presenced in QueryData class in structural form.

If sorting will be happened, sorting plans must be called by BasicQueryPlanner class.

Calculating number of random accesses per file on FileMgr layer. Calculating duration of transaction with adding start time and end

time to Transaction class. Calculating number of unsorted records on each transaction. We may have to compare, copy and exchange congeneric records

on different scans. So, RecordExchange class is written for these purposes.

StepCalculator class is written. It splits any number to two close integer multipliers. For example; from 29, 6 and 5; from 49, 7 and 7; from 50, 8 and 7; etc.

Page 24: EXTERNAL SORTING ALGORITHMS AND IMPLEMENTATIONS 05011004 Ayhan KARGIN 05011027 Ahmet MERAL

External Sorting on SimpleDB

Materialize Plan Edward Sciore’s Merge Sort Plan&Scan K-Way Merge Sort Plan&Scan Multi-Step K-Way Merge Sort Plan&Scan Replacement Selection Sort Plan&Scan Multi-Step Replacement Selection Sort

Plan&Scan

Page 25: EXTERNAL SORTING ALGORITHMS AND IMPLEMENTATIONS 05011004 Ayhan KARGIN 05011027 Ahmet MERAL

Algorithms Choice Dependency