Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Analysis of TimSort AlgorithmNicolas Auger, Vincent Jugé, Cyril Nicaud, Carine Pivoteau
MOA – LIGM (UMR 8049)
A Brief History of TimSort
old TimSort
2001 ’02 ’03 ’04 ’05 ’06 ’07 ’08 ’09 ’10 ’11 ’12 ’13 ’14 ’15 ’16 ’17 ’18 ’19 ’20 ’21
invented by Tim Peterssorting algorithm in Python
used in AndroidJava
Java’s fix
certified fix
bug uncovered! [1]Python → certified fixJava → custom fix
Octave
O(n log n) running time proved [2]
new bug in Java! [3]refined analysis: O(n + nH)
Our contributions are written in blue.
TimSort Principle
Natural decomposition of a sequence into maximal monotonic runs:
a b x h f e d c b a b b c d a b3 7 4 2
Algorithm
get a run decomposition of the initial sequencestart with an empty stackwhile the run decomposition is not empty do
take the next run and push it onto the stack #1
while this is possible doapply one of the merge rules #2 #3 #4 #5
while the stack has more than 1 element domerge the two topmost runs
1, 2, 3, 4, . . . are the top-most runs on the stack.The length of the ith runis denoted by ri.
123. . .
r1 > r3
#2
if then
1
new 2
. . .
merge 2 & 3
12
. . .r1 > r2
#3
elseif or
123
. . .r1 + r2 > r3
#4
or then
1234. . .
r2 + r3 > r4
#5
new 1
new 2new 3. . .
merge 1 & 2
Example of stack evolution for runs of length 24, 18, 50, 28, 20, 6, 4, 8, 1:(stack values represent run lengths)
24#1
2418#1
1824
50#1
5042
#2
92#3
9228#1
2892
20#1
202892
6#1
6202892
4#1
46292892
8#1
810202892
#2
18202892
#5
382892
#4
6692
#36692
1#1
Results
13 years after its announcement by Tim Peters, we obtained the firstproof of its worst-case running time:
Theorem [2, 3]: TimSort runs in O(n log n) time.
By design TimSort is well suited for partially sorted data. It fallsinto the adaptive sort family. Taking the number of runs ρ as a(natural) parameter for a refined analysis we obtained:
Theorem [3]: TimSort runs in O(n + n log ρ) time.
Going further, we introduce the notion of entropy H to take the vari-ations in run lengths into account, defined by:
H = −∑ ri
nlog2
(rin
)Theorem [3]: TimSort runs in O(n + nH) time.
Our in-depth analysis of the algorithm enabled us to produce a bugin Java sorting! It was fixed within a few weeks after we reported it.
References
[1] S. de Gouw, J. Rot, F. de Boer, R. Bubel, and R. Hähnle. OpenJDK’s Java.utils.Collection.sort() is broken: The good, the bad and the worst case. In CAV, 2015.[2] N. Auger, C. Nicaud, and C. Pivoteau. Merge strategies: From Merge Sort to TimSort. Research Report hal-01212839, hal, 2015.[3] N. Auger, V. Jugé, C. Nicaud, and C. Pivoteau. On the worst-case complexity of TimSort. In ESA, 2018.