Skip to content

Conversation

@omus
Copy link
Member

@omus omus commented Sep 17, 2020

Fixes #289. While looking into ways to avoid having interpret create a vector I realized that improvements to transition_range could result in us avoiding having to perform checks later in interpret. Doing this made transition_range slightly slower but allowed us to use a generator in interpret which avoided allocations. I then found myself wanting to index into a generator to avoid having to construct unnecessary ZonedDateTimes which resulted in me creating the IndexableGenerator (I'll try to make a Base PR for this). Finally, I noticed one more allocation in the ZonedDateTime(dt::DateTime, tz::VariableTimeZone ; from_utc::Bool) constructor which I addressed by introducing a function barrier.

The end result brings us down to a single allocation for the ZonedDateTime and zero allocation when using the isbits type PR.

Benchmarks on Julia 1.5.1:

PR:

julia> using TimeZones, BenchmarkTools

julia> @benchmark TimeZones.transition_range($(DateTime(2015,11,1,1)), $(tz"America/Winnipeg"), $Local)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     22.847 ns (0.00% GC)
  median time:      22.910 ns (0.00% GC)
  mean time:        30.983 ns (0.00% GC)
  maximum time:     1.253 ÎĽs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     996

julia> @benchmark TimeZones.interpret($(DateTime(2015,11,1,1)), $(tz"America/Winnipeg"), $Local)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     24.096 ns (0.00% GC)
  median time:      24.993 ns (0.00% GC)
  mean time:        35.547 ns (0.00% GC)
  maximum time:     1.875 ÎĽs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     995

julia> @benchmark ZonedDateTime($(DateTime(2015,11,1,1)), $(tz"America/Winnipeg"), 2)
BenchmarkTools.Trial:
  memory estimate:  48 bytes
  allocs estimate:  1
  --------------
  minimum time:     38.280 ns (0.00% GC)
  median time:      38.915 ns (0.00% GC)
  mean time:        50.162 ns (2.74% GC)
  maximum time:     2.441 ÎĽs (96.29% GC)
  --------------
  samples:          10000
  evals/sample:     992

julia> @benchmark ZonedDateTime($(DateTime(2015,6,1)), $(tz"America/Winnipeg"), from_utc=true)
BenchmarkTools.Trial:
  memory estimate:  48 bytes
  allocs estimate:  1
  --------------
  minimum time:     42.385 ns (0.00% GC)
  median time:      43.016 ns (0.00% GC)
  mean time:        53.853 ns (1.98% GC)
  maximum time:     2.580 ÎĽs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     990

Current master (48fe988):

julia> using TimeZones, BenchmarkTools

julia> @benchmark TimeZones.transition_range($(DateTime(2015,11,1,1)), $(tz"America/Winnipeg"), $Local)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     15.793 ns (0.00% GC)
  median time:      15.833 ns (0.00% GC)
  mean time:        19.110 ns (0.00% GC)
  maximum time:     926.346 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     998

julia> @benchmark TimeZones.interpret($(DateTime(2015,11,1,1)), $(tz"America/Winnipeg"), $Local)
BenchmarkTools.Trial:
  memory estimate:  352 bytes
  allocs estimate:  4
  --------------
  minimum time:     98.732 ns (0.00% GC)
  median time:      101.708 ns (0.00% GC)
  mean time:        134.360 ns (9.06% GC)
  maximum time:     2.939 ÎĽs (94.61% GC)
  --------------
  samples:          10000
  evals/sample:     944

julia> @benchmark ZonedDateTime($(DateTime(2015,11,1,1)), $(tz"America/Winnipeg"), 2)
BenchmarkTools.Trial:
  memory estimate:  352 bytes
  allocs estimate:  4
  --------------
  minimum time:     104.684 ns (0.00% GC)
  median time:      107.021 ns (0.00% GC)
  mean time:        139.278 ns (8.58% GC)
  maximum time:     3.300 ÎĽs (94.68% GC)
  --------------
  samples:          10000
  evals/sample:     919

julia> @benchmark ZonedDateTime($(DateTime(2015,6,1)), $(tz"America/Winnipeg"), from_utc=true)
BenchmarkTools.Trial:
  memory estimate:  176 bytes
  allocs estimate:  2
  --------------
  minimum time:     55.916 ns (0.00% GC)
  median time:      57.818 ns (0.00% GC)
  mean time:        76.190 ns (5.72% GC)
  maximum time:     2.519 ÎĽs (96.18% GC)
  --------------
  samples:          10000
  evals/sample:     983

@codecov-commenter
Copy link

codecov-commenter commented Sep 17, 2020

Codecov Report

Merging #291 into master will decrease coverage by 0.01%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #291      +/-   ##
==========================================
- Coverage   93.53%   93.51%   -0.02%     
==========================================
  Files          31       32       +1     
  Lines        1531     1527       -4     
==========================================
- Hits         1432     1428       -4     
  Misses         99       99              
Impacted Files Coverage Δ
src/TimeZones.jl 100.00% <ø> (ø)
src/indexable_generator.jl 100.00% <100.00%> (ø)
src/interpret.jl 100.00% <100.00%> (ø)
src/types/variabletimezone.jl 100.00% <100.00%> (ø)
src/types/zoneddatetime.jl 94.66% <100.00%> (+0.14%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 60ab3e9...e07eb66. Read the comment docs.

@omus
Copy link
Member Author

omus commented Sep 17, 2020

Increasing code coverage. Also removed no longer needed imports.

@omus
Copy link
Member Author

omus commented Sep 18, 2020

I'll apply my fixups

@omus omus force-pushed the cv/transition-range-refactor branch from 7257d9d to f423b4f Compare September 18, 2020 14:09
@omus
Copy link
Member Author

omus commented Sep 18, 2020

I'm performing some final validation (I definitely need to work on formalizing benchmarks for this package next) and have discovered a regression.

On Julia 1.5.1:

Current PR:

julia> @benchmark collect($(ZonedDateTime(2020, tz"America/Winnipeg"):Hour(1):ZonedDateTime(2021, tz"America/Winnipeg")))
BenchmarkTools.Trial:
  memory estimate:  1.27 MiB
  allocs estimate:  26381
  --------------
  minimum time:     1.054 ms (0.00% GC)
  median time:      1.361 ms (0.00% GC)
  mean time:        1.574 ms (5.58% GC)
  maximum time:     9.904 ms (0.00% GC)
  --------------
  samples:          3175
  evals/sample:     1

Current master (48fe988):

julia> @benchmark collect($(ZonedDateTime(2020, tz"America/Winnipeg"):Hour(1):ZonedDateTime(2021, tz"America/Winnipeg")))
BenchmarkTools.Trial:
  memory estimate:  1.81 MiB
  allocs estimate:  17588
  --------------
  minimum time:     787.054 ÎĽs (0.00% GC)
  median time:      1.169 ms (0.00% GC)
  mean time:        1.978 ms (6.01% GC)
  maximum time:     107.202 ms (0.00% GC)
  --------------
  samples:          2526
  evals/sample:     1

The increase in allocations is especially interesting. I'll try to get to the bottom of this before proceeding.

@omus
Copy link
Member Author

omus commented Sep 18, 2020

I've manged to address the regression and improve things further. It appears that the introduction of a function barrier when using a FixedTimeZone resulted in the slowdown and increased allocations:

Current PR (1ed4d09):

julia> @benchmark collect($(ZonedDateTime(2020, tz"America/Winnipeg"):Hour(1):ZonedDateTime(2021, tz"America/Winnipeg")))
BenchmarkTools.Trial:
  memory estimate:  755.44 KiB
  allocs estimate:  8795
  --------------
  minimum time:     444.120 ÎĽs (0.00% GC)
  median time:      683.956 ÎĽs (0.00% GC)
  mean time:        859.080 ÎĽs (6.40% GC)
  maximum time:     15.121 ms (0.00% GC)
  --------------
  samples:          5815
  evals/sample:     1

Current master (48fe988):

julia> @benchmark collect($(ZonedDateTime(2020, tz"America/Winnipeg"):Hour(1):ZonedDateTime(2021, tz"America/Winnipeg")))
BenchmarkTools.Trial:
  memory estimate:  1.81 MiB
  allocs estimate:  17588
  --------------
  minimum time:     788.836 ÎĽs (0.00% GC)
  median time:      998.876 ÎĽs (0.00% GC)
  mean time:        1.210 ms (5.79% GC)
  maximum time:     25.093 ms (0.00% GC)
  --------------
  samples:          4128
  evals/sample:     1

Initially posted benchmarks in the description are still accurate.

@omus
Copy link
Member Author

omus commented Sep 19, 2020

Will merge on Monday. I want to get the benchmark suite merged in first

@omus omus mentioned this pull request Sep 23, 2020
@omus
Copy link
Member Author

omus commented Sep 23, 2020

Due to unrelated time constraints I didn't get to this on Monday. The benchmark suite has now been added in #292 and I'll rebase these changes after I merge #293

The previous implementation could return indexes which were not
applicable for the local datetime provided. This would require callers
of this function to perform additional checks to ensure correct answers
were provided.
By setting `start` to `length(tz.transitions)` when
`finish >= length(tz.transitions)` we avoid any possible issues with
`searchsortedlast` creating a `finish` value `> length(tz.transitions)`
which would cause Julia to crash.
Adding the function barrier with FixedTimeZone ended up increasing
allocations for this benchmark:

```julia
@benchmark collect($(ZonedDateTime(2020, tz"America/Winnipeg"):Hour(1):ZonedDateTime(2021, tz"America/Winnipeg")))
```
@omus omus force-pushed the cv/transition-range-refactor branch from 1ed4d09 to e07eb66 Compare September 23, 2020 18:04
@omus
Copy link
Member Author

omus commented Sep 23, 2020

Rebased to be able to use package benchmarks. Here's the results from running locally:

Benchmark Report for /Users/omus/.julia/dev/TimeZones

Job Properties

  • Time of benchmarks:
    • Target: 23 Sep 2020 - 13:02
    • Baseline: 23 Sep 2020 - 13:03
  • Package commits:
    • Target: e07eb6
    • Baseline: 60ab3e
  • Julia commits:
    • Target: 697e78
    • Baseline: 697e78
  • Julia command flags:
    • Target: None
    • Baseline: None
  • Environment variables:
    • Target: None
    • Baseline: None

Results

A ratio greater than 1.0 denotes a possible regression (marked with ❌), while a ratio less
than 1.0 denotes a possible improvement (marked with âś…). Only significant results - results
that indicate possible regressions or improvements - are shown below (thus, an empty table means that all
benchmark results remained invariant between builds).

ID time ratio memory ratio
["ZonedDateTime", "date-period-range"] 0.79 (5%) âś… 0.25 (1%) âś…
["ZonedDateTime", "local", "ambiguous"] 0.38 (5%) âś… 0.14 (1%) âś…
["ZonedDateTime", "local", "standard"] 0.76 (5%) âś… 0.16 (1%) âś…
["ZonedDateTime", "time-period-range"] 0.59 (5%) âś… 0.41 (1%) âś…
["ZonedDateTime", "utc"] 0.83 (5%) âś… 0.27 (1%) âś…
["arithmetic", "DatePeriod"] 0.79 (5%) âś… 0.16 (1%) âś…
["arithmetic", "TimePeriod"] 0.58 (5%) âś… 0.27 (1%) âś…
["interpret", "local", "ambiguous"] 0.25 (5%) âś… 0.00 (1%) âś…
["interpret", "local", "non-existent"] 0.47 (5%) âś… 0.00 (1%) âś…
["interpret", "local", "standard"] 0.28 (5%) âś… 0.00 (1%) âś…
["interpret", "utc"] 0.23 (5%) âś… 0.00 (1%) âś…
["parse", "issue#25"] 1.03 (5%) 0.75 (1%) âś…
["transition_range", "local", "ambiguous"] 1.52 (5%) ❌ 1.00 (1%)
["transition_range", "local", "non-existent"] 1.35 (5%) ❌ 1.00 (1%)
["transition_range", "local", "standard"] 1.44 (5%) ❌ 1.00 (1%)
["transition_range", "utc"] 0.87 (5%) âś… 1.00 (1%)

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["ZonedDateTime"]
  • ["ZonedDateTime", "local"]
  • ["arithmetic"]
  • ["interpret", "local"]
  • ["interpret"]
  • ["parse"]
  • ["transition_range", "local"]
  • ["transition_range"]
  • ["tryparsenext_fixedtz"]
  • ["tryparsenext_tz"]
  • ["tz_data"]

Julia versioninfo

Target

Julia Version 1.5.1
Commit 697e782ab8 (2020-08-25 20:08 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin18.7.0)
  uname: Darwin 18.7.0 Darwin Kernel Version 18.7.0: Thu Jun 18 20:50:10 PDT 2020; root:xnu-4903.278.43~1/RELEASE_X86_64 x86_64 i386
  CPU: Intel(R) Core(TM) i7-7567U CPU @ 3.50GHz:
              speed         user         nice          sys         idle          irq
       #1  3500 MHz     101488 s          0 s      91551 s     655122 s          0 s
       #2  3500 MHz      60850 s          0 s      33044 s     754262 s          0 s
       #3  3500 MHz     103518 s          0 s      63707 s     680931 s          0 s
       #4  3500 MHz      55042 s          0 s      27664 s     765450 s          0 s

  Memory: 16.0 GB (53.5 MB free)
  Uptime: 84816.0 sec
  Load Avg:  3.88671875  3.37353515625  3.076171875
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, skylake)

Baseline

Julia Version 1.5.1
Commit 697e782ab8 (2020-08-25 20:08 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin18.7.0)
  uname: Darwin 18.7.0 Darwin Kernel Version 18.7.0: Thu Jun 18 20:50:10 PDT 2020; root:xnu-4903.278.43~1/RELEASE_X86_64 x86_64 i386
  CPU: Intel(R) Core(TM) i7-7567U CPU @ 3.50GHz:
              speed         user         nice          sys         idle          irq
       #1  3500 MHz     101916 s          0 s      91652 s     655409 s          0 s
       #2  3500 MHz      61195 s          0 s      33104 s     754674 s          0 s
       #3  3500 MHz     103981 s          0 s      63800 s     681191 s          0 s
       #4  3500 MHz      55372 s          0 s      27721 s     765880 s          0 s

  Memory: 16.0 GB (115.89453125 MB free)
  Uptime: 84898.0 sec
  Load Avg:  9.79638671875  5.681640625  3.99462890625
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, skylake)

@omus
Copy link
Member Author

omus commented Sep 23, 2020

The slowdown in the transition_range tests are expected and are one of the main reasons we see massive improvements elsewhere. Just waiting on CI before merging.

@omus
Copy link
Member Author

omus commented Sep 23, 2020

Expected failure on macOS Julia nightly.

@omus omus merged commit 42bc067 into master Sep 23, 2020
@omus omus deleted the cv/transition-range-refactor branch September 23, 2020 18:43
@omus omus added the performance Changes that impact code performance label Sep 23, 2020
@iamed2
Copy link
Member

iamed2 commented Sep 23, 2020

Is it worth even benchmarking transition_range since it's not ever called by users? Could we benchmark functions that call it instead?

@omus
Copy link
Member Author

omus commented Sep 24, 2020

The function is still indirectly called by users. Having it benchmarked just makes it easier to track down sources of regressions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Changes that impact code performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Stop creating at a vector in interpret

4 participants