Skip to content

refactor: cache signature structure in ops.testing state classes#1499

Merged
tonyandrewmeyer merged 4 commits intocanonical:mainfrom
tonyandrewmeyer:cache-scenario-maxargs-calculations
Dec 17, 2024
Merged

refactor: cache signature structure in ops.testing state classes#1499
tonyandrewmeyer merged 4 commits intocanonical:mainfrom
tonyandrewmeyer:cache-scenario-maxargs-calculations

Conversation

@tonyandrewmeyer
Copy link
Collaborator

@tonyandrewmeyer tonyandrewmeyer commented Dec 13, 2024

The Scenario code that limits the number of positional arguments (needed in Python 3.8 since the dataclasses there do not have KW_ONLY) is quite expensive. We have a method of providing this functionality in 3.8, but it is unfortunately expensive, and therefore a regression from older versions of Scenario.

I considered 4 options here:

  • Manually write out the dozen or so __init__ signatures (as we had at one point in the original PR). I'd rather not for the same reasons we made the decision at that time.
  • Figure out a way to have a syntax that allows using regular dataclasses.KW_ONLY with modern Python and only use our custom code on older Python. I spent some time looking into this, and couldn't figure out a way to do it where the code still seemed readable and maintainable enough.
  • Cache the most expensive work (what's in this PR).
  • Don't bother doing anything now, and eagerly wait for the ability to drop 3.8 support. A roughly 5% performance regression felt worth fixing as long as the change is fairly straightforward (although this change only gets about half of that back).

The most expensive parts of the __new__ method are the ones that work with inspect: inspect.signature in particular, but also getting the default value of the parameters. If we assume that no-one is run-time altering the signature of the class (I believe no-one should be) then the values of these never actually change, but we are currently calculating them every time an instance of the class is created. This PR changes that to cache those three values the first time they are needed.

There's one additional performance tweak in the branch that doesn't make a significant difference, but is trivial to do: when checking if the YAML files exist, skip the filesystem exists() call if we just created an empty temporary directory a few lines above, since we know that it will never exist.

A drive-by: I happened to notice while working on this branch Harness referring to options.yaml, which does not exist (any more?) as far as I know, so a docs tweak to address that.

Timing (best of 3):

Suite main branch
operator unit tests (no concurrency) 165.25s 161.89s
traefik scenario tests 45.49 44.30s
kafka-k8s-operator scenario tests 4.48s 4.38s

Refs #1434

@tonyandrewmeyer tonyandrewmeyer marked this pull request as ready for review December 13, 2024 07:31
Copy link
Contributor

@james-garner-canonical james-garner-canonical left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, clever and simple solution, and a nice performance win.

I was briefly confused when tracing the new vs old logic, because required_args in __new__ stores required (as args -- not kwargs), while cls._init_required_args stores required (as args or kwargs). Idk if there's a clearer short name for _init_required_args though, and it seems easy enough to follow when not trying to go side by side (so fine for future readers).

Copy link
Contributor

@dimaqq dimaqq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High level: the time saved is marginal, but then code is rather straightforward and I'm not sure what would the alternative be, maybe custom types where the inspected bits are inlined or hardcoded?

Anyway +1 on this.

).parameters
cls._init_kw_only = {
name
for name in tuple(parameters)[cls._max_positional_args :]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming this was auto-formatted by ruff, the dangling :] is kinda funny :]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was, yeah - I lazily just wrote it out on one line and let tox -e fmt clean up.

Does the trailing happy face bother you? I could do something like move tuple(parameters) out to a separate line to get the dict comprehension to be on a single line, if that would be better.

@tonyandrewmeyer tonyandrewmeyer merged commit eb80926 into canonical:main Dec 17, 2024
31 checks passed
@tonyandrewmeyer tonyandrewmeyer deleted the cache-scenario-maxargs-calculations branch December 17, 2024 21:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants