-
Notifications
You must be signed in to change notification settings - Fork 18
Description
I'm opening this issue following this suggestion: #363 (comment).
Data.List.sort computes its output incrementally thanks to lazy evaluation. In fact, it is an optimal incremental sorting function, meaning that take k . sort has O(n + k log k) time complexity. However, when one rather wants the entire output immediately, strict (tail-recursive) mergesort is more than twice faster for sufficiently long input.
Code can be found in the tailrec branch of my fork of the sort-perf repo. I obtained the following benchmark result. It corresponds to the commit 9251ca2 in my fork.
All
List tests
correctness
Old (GHC < 9.12.1, 2-way merge): OK
+++ OK, passed 100 tests.
New (GHC >= 9.12.1, 4-way merge): OK
+++ OK, passed 100 tests.
NewStrict (tail-recursive 4-way merge): OK
+++ OK, passed 100 tests.
stability
Old (GHC < 9.12.1, 2-way merge): OK
+++ OK, passed 100 tests.
New (GHC >= 9.12.1, 4-way merge): OK
+++ OK, passed 100 tests.
NewStrict (tail-recursive 4-way merge): OK
+++ OK, passed 100 tests.
1 Elements
sort
Old (GHC < 9.12.1, 2-way merge): OK
82.0 ns ± 5.2 ns, 0.97x
New (GHC >= 9.12.1, 4-way merge): OK
84.3 ns ± 5.5 ns
NewStrict (tail-recursive 4-way merge): OK
101 ns ± 5.2 ns, 1.20x
comparisons
Old (GHC < 9.12.1, 2-way merge): OK
0 comparisons
New (GHC >= 9.12.1, 4-way merge): OK
0 comparisons
NewStrict (tail-recursive 4-way merge): OK
0 comparisons
min by sort
Old (GHC < 9.12.1, 2-way merge): OK
87.0 ns ± 5.3 ns, 0.92x
New (GHC >= 9.12.1, 4-way merge): OK
94.3 ns ± 7.1 ns
NewStrict (tail-recursive 4-way merge): OK
115 ns ± 5.6 ns, 1.21x
5 Elements
sort
Old (GHC < 9.12.1, 2-way merge): OK
542 ns ± 45 ns, 1.35x
New (GHC >= 9.12.1, 4-way merge): OK
402 ns ± 22 ns
NewStrict (tail-recursive 4-way merge): OK
432 ns ± 25 ns, 1.07x
comparisons
Old (GHC < 9.12.1, 2-way merge): OK
10 comparisons
New (GHC >= 9.12.1, 4-way merge): OK
10 comparisons
NewStrict (tail-recursive 4-way merge): OK
10 comparisons
min by sort
Old (GHC < 9.12.1, 2-way merge): OK
328 ns ± 22 ns, 1.21x
New (GHC >= 9.12.1, 4-way merge): OK
271 ns ± 22 ns
NewStrict (tail-recursive 4-way merge): OK
406 ns ± 24 ns, 1.50x
25 Elements
sort
Old (GHC < 9.12.1, 2-way merge): OK
4.36 μs ± 353 ns, 1.60x
New (GHC >= 9.12.1, 4-way merge): OK
2.72 μs ± 165 ns
NewStrict (tail-recursive 4-way merge): OK
2.51 μs ± 189 ns, 0.92x
comparisons
Old (GHC < 9.12.1, 2-way merge): OK
96 comparisons
New (GHC >= 9.12.1, 4-way merge): OK
96 comparisons
NewStrict (tail-recursive 4-way merge): OK
95 comparisons
min by sort
Old (GHC < 9.12.1, 2-way merge): OK
1.40 μs ± 92 ns, 1.43x
New (GHC >= 9.12.1, 4-way merge): OK
983 ns ± 88 ns
NewStrict (tail-recursive 4-way merge): OK
2.31 μs ± 219 ns, 2.35x
100 Elements
sort
Old (GHC < 9.12.1, 2-way merge): OK
28.7 μs ± 2.8 μs, 1.56x
New (GHC >= 9.12.1, 4-way merge): OK
18.4 μs ± 1.4 μs
NewStrict (tail-recursive 4-way merge): OK
14.6 μs ± 740 ns, 0.79x
comparisons
Old (GHC < 9.12.1, 2-way merge): OK
601 comparisons
New (GHC >= 9.12.1, 4-way merge): OK
604 comparisons
NewStrict (tail-recursive 4-way merge): OK
599 comparisons
min by sort
Old (GHC < 9.12.1, 2-way merge): OK
5.85 μs ± 359 ns, 1.34x
New (GHC >= 9.12.1, 4-way merge): OK
4.37 μs ± 342 ns
NewStrict (tail-recursive 4-way merge): OK
13.7 μs ± 684 ns, 3.14x
1000 Elements
sort
Old (GHC < 9.12.1, 2-way merge): OK
504 μs ± 44 μs, 1.39x
New (GHC >= 9.12.1, 4-way merge): OK
364 μs ± 23 μs
NewStrict (tail-recursive 4-way merge): OK
262 μs ± 20 μs, 0.72x
comparisons
Old (GHC < 9.12.1, 2-way merge): OK
9216 comparisons
New (GHC >= 9.12.1, 4-way merge): OK
9183 comparisons
NewStrict (tail-recursive 4-way merge): OK
9131 comparisons
min by sort
Old (GHC < 9.12.1, 2-way merge): OK
61.6 μs ± 5.4 μs, 1.30x
New (GHC >= 9.12.1, 4-way merge): OK
47.3 μs ± 2.7 μs
NewStrict (tail-recursive 4-way merge): OK
246 μs ± 23 μs, 5.22x
10000 Elements
sort
Old (GHC < 9.12.1, 2-way merge): OK
10.1 ms ± 709 μs, 1.79x
New (GHC >= 9.12.1, 4-way merge): OK
5.63 ms ± 345 μs
NewStrict (tail-recursive 4-way merge): OK
3.38 ms ± 199 μs, 0.60x
comparisons
Old (GHC < 9.12.1, 2-way merge): OK
132567 comparisons
New (GHC >= 9.12.1, 4-way merge): OK
125353 comparisons
NewStrict (tail-recursive 4-way merge): OK
124153 comparisons
min by sort
Old (GHC < 9.12.1, 2-way merge): OK
690 μs ± 44 μs, 1.34x
New (GHC >= 9.12.1, 4-way merge): OK
517 μs ± 48 μs
NewStrict (tail-recursive 4-way merge): OK
3.30 ms ± 198 μs, 6.39x
100000 Elements
sort
Old (GHC < 9.12.1, 2-way merge): OK
281 ms ± 10 ms, 1.87x
New (GHC >= 9.12.1, 4-way merge): OK
150 ms ± 14 ms
NewStrict (tail-recursive 4-way merge): OK
48.4 ms ± 2.7 ms, 0.32x
comparisons
Old (GHC < 9.12.1, 2-way merge): OK
1612216 comparisons
New (GHC >= 9.12.1, 4-way merge): OK
1600983 comparisons
NewStrict (tail-recursive 4-way merge): OK
1580830 comparisons
min by sort
Old (GHC < 9.12.1, 2-way merge): OK
16.2 ms ± 868 μs, 1.42x
New (GHC >= 9.12.1, 4-way merge): OK
11.4 ms ± 702 μs
NewStrict (tail-recursive 4-way merge): OK
53.6 ms ± 2.9 ms, 4.71x
1000000 Elements
sort
Old (GHC < 9.12.1, 2-way merge): OK
4.348 s ± 16 ms, 1.64x
New (GHC >= 9.12.1, 4-way merge): OK
2.646 s ± 84 ms
NewStrict (tail-recursive 4-way merge): OK
1.083 s ± 15 ms, 0.41x
comparisons
Old (GHC < 9.12.1, 2-way merge): OK
19234811 comparisons
New (GHC >= 9.12.1, 4-way merge): OK
19164388 comparisons
NewStrict (tail-recursive 4-way merge): OK
19154090 comparisons
min by sort
Old (GHC < 9.12.1, 2-way merge): OK
205 ms ± 938 μs, 1.42x
New (GHC >= 9.12.1, 4-way merge): OK
144 ms ± 13 ms
NewStrict (tail-recursive 4-way merge): OK
1.084 s ± 58 ms, 7.50x
All 78 tests passed (98.63s)
As you can see, NewStrict is more than twice faster than New (the current implementation) when the input has more than 100000 elements. Obviously, it loses the laziness (see min by sort). So, I propose to add the tail-recursive versions as separate functions, e.g., Data.List.sort'.