Skip to content

Add strict versions of sorting functions in Data.List #364

@pi8027

Description

@pi8027

I'm opening this issue following this suggestion: #363 (comment).

Data.List.sort computes its output incrementally thanks to lazy evaluation. In fact, it is an optimal incremental sorting function, meaning that take k . sort has O(n + k log k) time complexity. However, when one rather wants the entire output immediately, strict (tail-recursive) mergesort is more than twice faster for sufficiently long input.

Code can be found in the tailrec branch of my fork of the sort-perf repo. I obtained the following benchmark result. It corresponds to the commit 9251ca2 in my fork.

All
  List tests
    correctness
      Old (GHC < 9.12.1, 2-way merge):        OK
        +++ OK, passed 100 tests.
      New (GHC >= 9.12.1, 4-way merge):       OK
        +++ OK, passed 100 tests.
      NewStrict (tail-recursive 4-way merge): OK
        +++ OK, passed 100 tests.
    stability
      Old (GHC < 9.12.1, 2-way merge):        OK
        +++ OK, passed 100 tests.
      New (GHC >= 9.12.1, 4-way merge):       OK
        +++ OK, passed 100 tests.
      NewStrict (tail-recursive 4-way merge): OK
        +++ OK, passed 100 tests.
  1 Elements
    sort
      Old (GHC < 9.12.1, 2-way merge):        OK
        82.0 ns ± 5.2 ns, 0.97x
      New (GHC >= 9.12.1, 4-way merge):       OK
        84.3 ns ± 5.5 ns
      NewStrict (tail-recursive 4-way merge): OK
        101  ns ± 5.2 ns, 1.20x
    comparisons
      Old (GHC < 9.12.1, 2-way merge):        OK
        0 comparisons
      New (GHC >= 9.12.1, 4-way merge):       OK
        0 comparisons
      NewStrict (tail-recursive 4-way merge): OK
        0 comparisons
    min by sort
      Old (GHC < 9.12.1, 2-way merge):        OK
        87.0 ns ± 5.3 ns, 0.92x
      New (GHC >= 9.12.1, 4-way merge):       OK
        94.3 ns ± 7.1 ns
      NewStrict (tail-recursive 4-way merge): OK
        115  ns ± 5.6 ns, 1.21x
  5 Elements
    sort
      Old (GHC < 9.12.1, 2-way merge):        OK
        542  ns ±  45 ns, 1.35x
      New (GHC >= 9.12.1, 4-way merge):       OK
        402  ns ±  22 ns
      NewStrict (tail-recursive 4-way merge): OK
        432  ns ±  25 ns, 1.07x
    comparisons
      Old (GHC < 9.12.1, 2-way merge):        OK
        10 comparisons
      New (GHC >= 9.12.1, 4-way merge):       OK
        10 comparisons
      NewStrict (tail-recursive 4-way merge): OK
        10 comparisons
    min by sort
      Old (GHC < 9.12.1, 2-way merge):        OK
        328  ns ±  22 ns, 1.21x
      New (GHC >= 9.12.1, 4-way merge):       OK
        271  ns ±  22 ns
      NewStrict (tail-recursive 4-way merge): OK
        406  ns ±  24 ns, 1.50x
  25 Elements
    sort
      Old (GHC < 9.12.1, 2-way merge):        OK
        4.36 μs ± 353 ns, 1.60x
      New (GHC >= 9.12.1, 4-way merge):       OK
        2.72 μs ± 165 ns
      NewStrict (tail-recursive 4-way merge): OK
        2.51 μs ± 189 ns, 0.92x
    comparisons
      Old (GHC < 9.12.1, 2-way merge):        OK
        96 comparisons
      New (GHC >= 9.12.1, 4-way merge):       OK
        96 comparisons
      NewStrict (tail-recursive 4-way merge): OK
        95 comparisons
    min by sort
      Old (GHC < 9.12.1, 2-way merge):        OK
        1.40 μs ±  92 ns, 1.43x
      New (GHC >= 9.12.1, 4-way merge):       OK
        983  ns ±  88 ns
      NewStrict (tail-recursive 4-way merge): OK
        2.31 μs ± 219 ns, 2.35x
  100 Elements
    sort
      Old (GHC < 9.12.1, 2-way merge):        OK
        28.7 μs ± 2.8 μs, 1.56x
      New (GHC >= 9.12.1, 4-way merge):       OK
        18.4 μs ± 1.4 μs
      NewStrict (tail-recursive 4-way merge): OK
        14.6 μs ± 740 ns, 0.79x
    comparisons
      Old (GHC < 9.12.1, 2-way merge):        OK
        601 comparisons
      New (GHC >= 9.12.1, 4-way merge):       OK
        604 comparisons
      NewStrict (tail-recursive 4-way merge): OK
        599 comparisons
    min by sort
      Old (GHC < 9.12.1, 2-way merge):        OK
        5.85 μs ± 359 ns, 1.34x
      New (GHC >= 9.12.1, 4-way merge):       OK
        4.37 μs ± 342 ns
      NewStrict (tail-recursive 4-way merge): OK
        13.7 μs ± 684 ns, 3.14x
  1000 Elements
    sort
      Old (GHC < 9.12.1, 2-way merge):        OK
        504  μs ±  44 μs, 1.39x
      New (GHC >= 9.12.1, 4-way merge):       OK
        364  μs ±  23 μs
      NewStrict (tail-recursive 4-way merge): OK
        262  μs ±  20 μs, 0.72x
    comparisons
      Old (GHC < 9.12.1, 2-way merge):        OK
        9216 comparisons
      New (GHC >= 9.12.1, 4-way merge):       OK
        9183 comparisons
      NewStrict (tail-recursive 4-way merge): OK
        9131 comparisons
    min by sort
      Old (GHC < 9.12.1, 2-way merge):        OK
        61.6 μs ± 5.4 μs, 1.30x
      New (GHC >= 9.12.1, 4-way merge):       OK
        47.3 μs ± 2.7 μs
      NewStrict (tail-recursive 4-way merge): OK
        246  μs ±  23 μs, 5.22x
  10000 Elements
    sort
      Old (GHC < 9.12.1, 2-way merge):        OK
        10.1 ms ± 709 μs, 1.79x
      New (GHC >= 9.12.1, 4-way merge):       OK
        5.63 ms ± 345 μs
      NewStrict (tail-recursive 4-way merge): OK
        3.38 ms ± 199 μs, 0.60x
    comparisons
      Old (GHC < 9.12.1, 2-way merge):        OK
        132567 comparisons
      New (GHC >= 9.12.1, 4-way merge):       OK
        125353 comparisons
      NewStrict (tail-recursive 4-way merge): OK
        124153 comparisons
    min by sort
      Old (GHC < 9.12.1, 2-way merge):        OK
        690  μs ±  44 μs, 1.34x
      New (GHC >= 9.12.1, 4-way merge):       OK
        517  μs ±  48 μs
      NewStrict (tail-recursive 4-way merge): OK
        3.30 ms ± 198 μs, 6.39x
  100000 Elements
    sort
      Old (GHC < 9.12.1, 2-way merge):        OK
        281  ms ±  10 ms, 1.87x
      New (GHC >= 9.12.1, 4-way merge):       OK
        150  ms ±  14 ms
      NewStrict (tail-recursive 4-way merge): OK
        48.4 ms ± 2.7 ms, 0.32x
    comparisons
      Old (GHC < 9.12.1, 2-way merge):        OK
        1612216 comparisons
      New (GHC >= 9.12.1, 4-way merge):       OK
        1600983 comparisons
      NewStrict (tail-recursive 4-way merge): OK
        1580830 comparisons
    min by sort
      Old (GHC < 9.12.1, 2-way merge):        OK
        16.2 ms ± 868 μs, 1.42x
      New (GHC >= 9.12.1, 4-way merge):       OK
        11.4 ms ± 702 μs
      NewStrict (tail-recursive 4-way merge): OK
        53.6 ms ± 2.9 ms, 4.71x
  1000000 Elements
    sort
      Old (GHC < 9.12.1, 2-way merge):        OK
        4.348 s ±  16 ms, 1.64x
      New (GHC >= 9.12.1, 4-way merge):       OK
        2.646 s ±  84 ms
      NewStrict (tail-recursive 4-way merge): OK
        1.083 s ±  15 ms, 0.41x
    comparisons
      Old (GHC < 9.12.1, 2-way merge):        OK
        19234811 comparisons
      New (GHC >= 9.12.1, 4-way merge):       OK
        19164388 comparisons
      NewStrict (tail-recursive 4-way merge): OK
        19154090 comparisons
    min by sort
      Old (GHC < 9.12.1, 2-way merge):        OK
        205  ms ± 938 μs, 1.42x
      New (GHC >= 9.12.1, 4-way merge):       OK
        144  ms ±  13 ms
      NewStrict (tail-recursive 4-way merge): OK
        1.084 s ±  58 ms, 7.50x

All 78 tests passed (98.63s)

As you can see, NewStrict is more than twice faster than New (the current implementation) when the input has more than 100000 elements. Obviously, it loses the laziness (see min by sort). So, I propose to add the tail-recursive versions as separate functions, e.g., Data.List.sort'.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions