Skip to content

Conversation

@cakedev0
Copy link
Contributor

@cakedev0 cakedev0 commented Oct 21, 2025

Resolves #340

Specs:

  • no weights: ND-support, propagate NaNs, supported methods: "linear", "inverted_cdf", "averaged_inverted_cdf"
    • adding support for omitting NaNs shouldn't be too hard (if you assume NaNs are sorted at the end, which is what everyone does), but I don't think it's needed for now.
  • with weights: Only 1D or 2D support, propagate or omit NaNs, supported methods: "inverted_cdf", "averaged_inverted_cdf"
  • Delegates for all major supported backends when possible (numpy, cupy, torch and jax), except dask, as it was raising errors.

Comparison with NumPy:

Those specs matches NumPy quantile for nan_policy="propagate" and nanquantile for nan_policy="omit". Differences are:

  • using nan_policy="propagate"|"omit" (inspired by scipy) instead of two functions like numpy (quantile for nan_policy="propagate" and nanquantile. Why? Less code/doc duplication, and clearer behavior IMO.
  • weighted case:
    • support for "averaged_inverted_cdf" method (only "inverted_cdf" method is supported by numpy): we need this in scikit-learn.
    • only up to 2D: we don't need more for now.
  • non-weighted case:
    • less methods. Why? We don't need more. I've only found one call with a method in scikit-learn and it's a deprecated method.
    • no support for nan_policy="omit": there are a few calls to nanpercentile in scikit-learn. Would be easy to implement, following what has been done in scipy.
    • implementation: we sort instead of relying on partitioning like numpy does. The partitioning-based implementation would be quite more complex and would not benefits perf in most cases, as currently xpx.partion relies on sorting when not delegating. This could be worked

Comparison with SciPy:

Main difference is the broadcasting/reduction behavior. We aligned on numpy for this one.

Also, SciPy doesn't support weights.

Implementation for the non-weighted case is more or less copy-pasted from SciPy, except that I've only kept support for 3 methods.

Future use of xpx.quantile in scikit-learn

Those specs should be enough to replace sklearn.utils.stats._weighted_percentile and to replace most uses of np.quantile/percentile when rewriting numpy-based functions to support Array API standard.

Note that the implementation for the weighted case differs a bit from sklearn's _weighted_percentile: it's mostly the same approach (sort weights based on a and compute cumulative sum), but they differ in the way edge cases are handled (null weights mostly). I believe my implementation to be easier to read and to get right, and equivalent in terms of performance (dominated by argsort for big inputs).


Still TODO:

  • decide whether to support dask. I don't think we should it's very slow and doesn't match the purpose of dask of mentioned here. Supporting dask would slow down the CI quite a bit.
  • read scikit-learn's tests for _weighted_percentile to see if there is something I missed in my tests

@cakedev0 cakedev0 marked this pull request as draft October 21, 2025 10:05
@lucascolley lucascolley changed the title FEA: Add quantile function - method="linear", no weights ENH: add quantile function - method="linear", no weights Oct 21, 2025
@lucascolley lucascolley added enhancement New feature or request new function labels Oct 21, 2025
@lucascolley lucascolley mentioned this pull request Oct 21, 2025
@lucyleeow
Copy link

From scipy/scipy#23832

There is a slight difference between NumPy gufunc and scipy.stats rules that has to do with prepending 1s to the shapes when the arguments have different dimensionality. Specifically, in scipy.stats, axis identifies the core dimension after 1s are prepended to match the dimensionalities.

I am assuming all delegate packages do what numpy does, in terms of broadcasting? If that is the case we should probably follow unless we think scipy has better rules?

@cakedev0 cakedev0 changed the title ENH: add quantile function - method="linear", no weights ENH: add quantile function with weights support Oct 23, 2025
@cakedev0
Copy link
Contributor Author

cakedev0 commented Oct 23, 2025

This PR is now ~95% finished, so marking it ready for review. Updated PR desc.

@cakedev0 cakedev0 marked this pull request as ready for review October 23, 2025 15:19
raise ValueError(msg)
else:
if method not in {"inverted_cdf", "averaged_inverted_cdf"}:
msg = f"`method` '{method}' not supported with weights."

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only method x/ y support weights?


methods = {"linear", "inverted_cdf", "averaged_inverted_cdf"}
if method not in methods:
msg = f"`method` must be one of {methods}"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sort methods to get a deterministic output?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's deterministic already. But do you mean declaring methods in the sorted order?

Like this: methods = {"averaged_inverted_cdf", "inverted_cdf", "linear"}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this was not deterministic - but maybe this was in older python versions or between different OS, or maybe I just misremembered? In any case - sorry about the noise.

raise ValueError(msg)
nan_policies = {"propagate", "omit"}
if nan_policy not in nan_policies:
msg = f"`nan_policy` must be one of {nan_policies}"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@lucascolley
Copy link
Member

@cakedev0 would you be able to fill out the PR description a little more to justify API decisions, primarily comparing how the implementation compares to what is available in NumPy and SciPy respectively?

Is there a link to a PR which would use this in sklearn, or are we not at that stage yet?

@cakedev0
Copy link
Contributor Author

Added some comparisons with numpy/scipy in the PR description.

Is there a link to a PR which would use this in sklearn, or are we not at that stage yet?

We are not at this stage yet, but I would be happy to open an "experiment PR" where I replace most calls to np.percentile/np.quantile/_weighted_percentile by calls to xpx.quantile to check that the tests pass. This would help a lot being confident about this PR (in terms of API decisions and implementation correctness).

@lucascolley
Copy link
Member

Thanks, yeah, that sounds like a good idea.

@cakedev0
Copy link
Contributor Author

Updates from working on the "experiment PR" (you can see the diff here: cakedev0/scikit-learn#3):

  • all tests pass πŸŽ‰ (including tests for _weighted_percentile)
  • supporting nan_policy="omit" is important/needed. It unblocks 3 preprocessors: QuantileTransformer, RobustScaler, SplineTransformer, which is more of less half of scikit-learn things needing a xpx.quantile on the list Make more of the "tools" of scikit-learn Array API compatibleΒ scikit-learn/scikit-learn#26024 - So I'll think
  • supporting more methods would be a bit useful for QuantileTransformer (but I think it's fine to not support all the methods for all the backends...). Question: do we need to raise an error for a method that's implemented by numpy but not by xpx when the backend is numpy?
  • it would be convenient that method default to have a "default" method that default to "averaged_inverted_cdf" for the weighted-case and "linear" for the non-weighted case. Unsure if this a good practice in terms of interface though?

@cakedev0
Copy link
Contributor Author

(putting this in draft mode while I'm working on supporting nan_policy="omit" for the non-weighted case)

@cakedev0 cakedev0 marked this pull request as draft October 27, 2025 19:43
@lucyleeow
Copy link

would you be able to fill out the PR description a little more to justify API decisions,

FYI @cakedev0, there is currently a discussion about nan API in numpy: https://mail.python.org/archives/list/[email protected]/thread/P4EEL5P2PNDASWLI24CPHOW2KG2LZ3YY/

supporting more methods would be a bit useful for QuantileTransformer

Can you expand?

@cakedev0
Copy link
Contributor Author

cakedev0 commented Oct 28, 2025

supporting more methods would be a bit useful for QuantileTransformer

Sorry, not the QuantileTransformer but the KBinsDiscretizer, it supports various quantile methods: https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/preprocessing/_discretization.py#L59-L66

(but I guess it's fine not to support all the methods in the array API version)

@mdhaber
Copy link
Contributor

mdhaber commented Nov 4, 2025

Perhaps I can't be entirely objective, but I think there's also objective argument against a split between xpx.quantile and stats.quantile.

It sounds like the pain points are the broadcasting behavior and weights. Let's see if I can address those.

Main difference is the broadcasting/reduction behavior. We aligned on numpy for this one.

NumPy is self-inconsistent on this one. NEP 5 specifies what generalized universal functions should look like. I don't know the history - maybe quantile/percentile existed first - but they could (in principle) follow the NEP and don't. That behavior makes it impossible to take quantiles at different probabilities for different slices, which has real-world uses (e.g. BCa bootstrap). The behavior of stats.quantiles permits that and permits the Cartesian product calculation.

Now, this is not to say that either of the NumPy precedents should be the final answer for the array API or array-api-extra. I think we should do what's right for the ecosystem. Consistency with other array API functions (does anything else "broadcast" by taking a Cartesian product? Imagine if this were the standard for other array operations like addition!) and functionality (do we really want to preclude the possibility of different probabilities corresponding with each slice?) needs to be considered. If sklearn only needs the Cartesian product behavior and doesn't like the standard broadcasting rules, a lightweight wrapper can easily translate.

Also, SciPy doesn't support weights.

weights were requested in scipy/scipy#22894. I was willing to help with it, but there was no response to scipy/scipy#22794 (comment). A PR for weights appeared, but it didn't address the concerns in scipy/scipy#22894. I didn't say anything to block it, but apparently nobody else is interested, so neither has progressed. weights are not very hard to implement; it's the theoretical/interface question that is tricky.

If I can get some feedback about scipy/scipy#22644 - which is about quantile and predates the weights request - and get that merged, I would be more proactive about adding weights support to SciPy's quantile. That PR actually has a similar interface question about how to accept method-specific options, which we might have if weights were not supported by the continuous methods.

@lorentzenchr
Copy link

After getting weights into numpy quantile/percentile, I was so exhausted to continue pushing it elsewhere. At least now, there is one reference, numpy. I would love to see it adopted elsewhere. Quantiles are just very important in many fields.

@mdhaber
Copy link
Contributor

mdhaber commented Nov 5, 2025

I drafted it for stats.quantile last night, and I studied existing uses of weights in scipy.stats. I'll submit a PR today or tomorrow.

The same code works for all H&F methods. I was a little concerned about non-integer weights before, but I don't think there's a real problem. For integer weights, it does the same thing as repeated samples, so we can document that these are frequency weights1. Its just that for most methods, the interpolation actually does depend on the total amount of data / total weight.

import numpy as np
import matplotlib.pyplot as plt
rng = np.random.default_rng(328983495438219)
p = np.linspace(0, 1, 300)
res = np.quantile([0, 1], p, method='linear')
res2 = np.quantile([0, 0, 1, 1], p, method='linear')
res3 = np.quantile([0, 0, 0, 1, 1, 1], p, method='linear')
plt.plot(p, res, p, res2, p, res3)
image

Footnotes

  1. If they can be interpreted as some other kind of weight, too, great. But almost all uses of weights in stats turn out to be frequency weights. ↩

@mdhaber
Copy link
Contributor

mdhaber commented Nov 6, 2025

PR is up at scipy/scipy#23941. Here's a wrapper that hamstrings stats.quantile to get NumPy's Cartesian product behavior:

import numpy as np
from scipy import stats

def quantile_cartesian_product(x, p, method='linear', axis=0):
    p_shape = p.shape
    p = np.ravel(p)
    x = np.moveaxis(x, axis, -1)
    res = stats.quantile(x, p, method=method, axis=-1, keepdims=True)
    if len(p_shape):
        res = np.moveaxis(res, -1, 0)
        res = np.reshape(res, p_shape + res.shape[1:])
    else:
        res = np.squeeze(res, axis=-1)
    return res

# test
rng = np.random.default_rng(234892398458345982)
x = rng.random((6, 7, 8))

for axis in [-1, 0, 1]:
    for p_shape in [(), (1,), (2,), (2, 3)]:
        p = rng.random(p_shape)
        kwargs = dict(axis = 1, method='weibull')
        res = quantile_with_wonky_cartesian_product_behavior(x, p, **kwargs)
        ref = np.quantile(x, p, **kwargs)
        np.testing.assert_allclose(res, ref)

@lucascolley
Copy link
Member

where does this stand now that scipy/scipy#23941 has merged?

@cakedev0
Copy link
Contributor Author

cakedev0 commented Dec 8, 2025

Personally, I don't really agree with some decisions that were made in scipy.stats but I was unable to make comments on the PR due to my wrist issues.

Also, I'm more busy lately and I don't want to spend time porting the quantile function from scipy here, it's really big now and relies on a lot of internal scipy helpers, it will be a lot of work.

Being very new to the ecosystem I'm not sure my opinion is well informed, still here are my disagreements with the choices that were made in scipy:

About the broadcasting behavior: I agree that the behavior of scipy.stats.quantile is indeed the most flexible.
But I don't think it's natural. In the doc, it's written that it behaves the same as other functions in scipy.stats but I disagree. Other functions I could find with broadcasting behavior in scipy.stats all work this way:
f(x, y, axis) -> broadcast(x, y); apply f along axis (and reduce it away if keepdims=False).
In those, x and y are of the same nature, so it's natural to broadcast them together. And, then axis is reduced away. It's equivalent to the behavior you'll get for f(x, y, axis): return (x + y).sum(axis=axis). Very natural.
For the quantile, first it's not very natural to broadcast "data" with "parameters", and secondly the reduction behavior is much more complex. One thing I really don't like is that the output dimension is not determined by the input dimensions, for instance: both quantile([0, 1], [0.5]) and quantile([0, 1], 0.5) have dimension 0 but quantile([0, 1], [0.5, 0.9]) has dimension 1.

So for me, it's not clear which option is the best. IMO, here are the pros and cons of the scipy approach:
+++ more powerful, unlocks things that would require a Python loop otherwise. But how much this loop would cost? It's generally very ok to loop on 100 features for instance.
-- more complex behavior for users, changes from what they are used to for this function (all other array libraries align with numpy I think).
- for array-api-extra: more complex implementation, more complex delegation code, less cases where delegation is possible.

About the weights: personally, I don't understand the interest of frequency weights when you have a valid interpretation only for integers. Those should be called counts, not weights, and they are just an optimization, not a feature, because f(x, weights=counts) returns the same result as f(xp.repeat(x, counts)), it just computes it in a more efficient way.
So I don't think supporting weights as frequency weights for all methods is a good idea, it would be better to implement a counts argument if the optimization is needed.

For methods "inverted_cdf" and "averaged_inverted_cdf", there are clear interpretations for real-valued weights: the one in the numpy doc, or the link with the argmin of the weighted pinball loss (which is why scikit-learn needs weighted quantile in many cases I think).

@lucascolley
Copy link
Member

I don't have a horse in the race w.r.t. the nicest user interface, so I will leave that discussion to those already involved.

What I will say is that from an ecosystem perspective, now that SciPy has committed to the public API, I would in principle be happy with a situation where we:

  • offer a certain API in array-api-extra which has some limitations but is more lightweight and familiar for users of preceding tools
  • direct users to SciPy for use-cases not covered by that implementation, or for situations where things like the SciPy broadcasting may result in code that is easier to understand

The downside of having two functions with the same name providing different semantics is real. But if the consensus from the technical discussion is that there are real tradeoffs such that each set of semantics has situations in which it is preferable, that might be acceptable.

@mdhaber
Copy link
Contributor

mdhaber commented Dec 8, 2025

Naturalness is subjective, so we valued consistency with the rest of scipy.stats1. Unfortunately, this meant lack of consistency with np.quantile because np.quantile was an anomaly and didn't respect decisions made by NumPy previously (NEP 5).

One thing I really don't like is that the output dimension is not determined by the input dimensions,

Of course they are. But like every other stats function, they also depend on keepdims, which controls whether the axis is reduced away. The default of keepdims is False in all other functions because it can always be False. We tried to keep in line with that by making the default False where possible. (It has to be True if the second argument has length > 1 along axis; )

Now, I don't necessarily like the existence of keepdims as a parameter throughout stats. But it is what it is.

But I don't think it's natural...

Every other independent two-sample statistic in SciPy (e.g. ttest_ind, mannwhitneyu) broadcasts the arrays exactly as quantile does. You can tell because they use the same function to do the broadcasting. So quantile does f(x, y, axis) -> broadcast(x, y); apply f along axis (and reduce it away if keepdims=False) just as described.

If you mean "they don't broadcast the arrays exactly like np.broadcast_arrays", then you're right. We make one modification to the broadcast_arrays rule because we have to generalize to the case in which the length along axis is not equal.

After that, the only difference is that the length of the output core dimension of quantile is that of the second argument, and the length of the output core dimension of other statistics is 1.

About the weights: personally, I don't understand the interest of frequency weights when you have a valid interpretation only for integers.

About a dozen functions in SciPy accept frequency weights via the weights parameter, so we stuck with that name instead of inventing a new one like counts. And also because sometimes the implementation of frequency and other kinds of weights are exactly the same: for inverted_cdf and averaged_inverted_cdf, the weighted statistic is calculated the same way as NumPy and as proposed here. So the weights are not limited to integers, and you can interpret them however you wish. There is no limitation here; we just added the opportunity to use frequency weights with other methods.

Footnotes

  1. which I find quite natural, thinking about arrays as arrays. I do not introduce a notion of some arrays being "data" and others being "parameters"; from that perspective, I can see how array broadcasting in general would be unintuitive. ↩

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request new function

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ENH: add quantile

6 participants