25
The Case for a Learned Sorting Algorithm Authors: Ani Kristo, Kapil Vaidya, Ugur Cetintemel, Sanchit Misra, Tim Kraska Presenter: Terryn Brunelle

Sorting Algorithm The Case for a Learned

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Sorting Algorithm The Case for a Learned

The Case for a Learned Sorting AlgorithmAuthors: Ani Kristo, Kapil Vaidya, Ugur Cetintemel, Sanchit Misra, Tim Kraska

Presenter: Terryn Brunelle

Page 2: Sorting Algorithm The Case for a Learned

Motivation

● Fundamental CS problem

● Database operations○ Sort query results

○ Perform joins

Page 3: Sorting Algorithm The Case for a Learned

Existing Work

● Comparison sort

● Distribution sort○ Counting sort

○ Radix sort

● ML-enhanced algorithms

Page 4: Sorting Algorithm The Case for a Learned

Learned Sort

● Train CDF model

● Use predicted prob for each key to predict final position for every key

in sorted output

Linear time possible!(if have perfect model)

Page 5: Sorting Algorithm The Case for a Learned

Problems

● Perfect model = expensive to train

● Random-access memory problem

Page 6: Sorting Algorithm The Case for a Learned

Algorithm 1

Page 7: Sorting Algorithm The Case for a Learned

Cache-Efficient Learned Sort

Page 8: Sorting Algorithm The Case for a Learned

Pseudo-Code: Step 1

Page 9: Sorting Algorithm The Case for a Learned

Pseudo-Code: Steps 2-4

Page 10: Sorting Algorithm The Case for a Learned

● Process elements in batches (cache locality)

● One bucket at a time (temporal locality)

● Bucket buffer space (reduce overflows)

Optimizations

Page 11: Sorting Algorithm The Case for a Learned

CDF Model

Page 12: Sorting Algorithm The Case for a Learned

CDF Model

Page 13: Sorting Algorithm The Case for a Learned

Theoretical Results

● Step 1: O(N*L)

● Step 2: O(N)

● Step 3: O(Nt) (non-dominant)

● Step 4: O(s log s) + O(N)

Space complexity: order of O(N)

Page 14: Sorting Algorithm The Case for a Learned

Experimental Results

Page 15: Sorting Algorithm The Case for a Learned

Experimental Results

Page 16: Sorting Algorithm The Case for a Learned

Experimental Results

Page 17: Sorting Algorithm The Case for a Learned

In-Place Sorting

Page 18: Sorting Algorithm The Case for a Learned

Performance Decomposition

Page 19: Sorting Algorithm The Case for a Learned

Strengths/Weaknesses

Strengths

● Performance on real-world data

● Improvement over default Java/Python

sorting function

● Cache-efficient

● Model training time accounted for

Weaknesses

● Other CDF implementations?

● Duplicate keys

Page 20: Sorting Algorithm The Case for a Learned

Directions for Future Work

● Sorting complex objects

● Parallel Sorting

● Using in DB systems

Page 21: Sorting Algorithm The Case for a Learned

Discussion Questions

● Can you think of adversarial inputs that may be good to evaluate this

specific approach on?

● What parallelization techniques may apply to this algorithm/sorting

algorithms in general?

● What are other ways through which collisions might be handled? What

is attractive about the spill bucket method?

Page 22: Sorting Algorithm The Case for a Learned

Additional Materials

Page 23: Sorting Algorithm The Case for a Learned

String Sorting

Page 24: Sorting Algorithm The Case for a Learned

Duplicates

Page 25: Sorting Algorithm The Case for a Learned

CDF Model Training