-
-
Notifications
You must be signed in to change notification settings - Fork 26k
ENH Use scipy.special.inv_boxcox in PowerTransformer #27875
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@pytest.mark.parametrize(
"method, lmbda",
[
("box-cox", 0.1),
("box-cox", 0.5),
("yeo-johnson", 0.1),
("yeo-johnson", 0.5),
("yeo-johnson", 1.0),
],
)
def test_optimization_power_transformer(method, lmbda):
# Test the optimization procedure:
# - set a predefined value for lambda
# - apply inverse_transform to a normal dist (we get X_inv)
# - apply fit_transform to X_inv (we get X_inv_trans)
# - check that X_inv_trans is roughly equal to X
rng = np.random.RandomState(0)
n_samples = 20000
X = rng.normal(loc=0, scale=1, size=(n_samples, 1))
pt = PowerTransformer(method=method, standardize=False)
pt.lambdas_ = [lmbda]
X_inv = pt.inverse_transform(X)
pt = PowerTransformer(method=method, standardize=False)
X_inv_trans = pt.fit_transform(X_inv)
assert_almost_equal(0, np.linalg.norm(X - X_inv_trans) / n_samples, decimal=2)
assert_almost_equal(0, X_inv_trans.mean(), decimal=1)
assert_almost_equal(1, X_inv_trans.std(), decimal=1) I think we need to reconsider the test above since some values of Recall that Box-Cox transformation requires And the build error is during computing the inverse. lmbda = 0.5
x = -2.1
print((x * lmbda + 1) ** (1 / lmbda)) # 0.0025000000000000044 (scikit-learn's inverse of box-cox)
print(np.exp(np.log1p(lmbda * x) / lmbda)) # nan (scipy's inverse of box-cox) |
In the original PR #10210, Maybe @glemaitre would remember? @jjerphan might also have an idea here. |
Sorry, I don't have any idea nor time to have a look at it at the moment. 😕 |
@xuefeng-xu you're right regarding the test as well. It would be nice if you could fix the test for this change as well and improve the test. We can also add a test to make sure if input is not resonable, we output |
It seems that the minimum supported scipy doesn't have special.inv_boxcox at that time, that's why we implemeted ourself? I have fixed the broken test and added a new one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. This might require a changelog entry since it's changing the behavior slightly though.
cc @glemaitre
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR @xuefeng-xu !
@@ -2312,7 +2312,7 @@ def test_power_transformer_lambda_one(): | |||
"method, lmbda", | |||
[ | |||
("box-cox", 0.1), | |||
("box-cox", 0.5), | |||
("box-cox", 0.2), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the test is not generating valid data for a given value of lambda
. Setting lambda=0.2
resolves the issue because the generated data happens to be all less than -5.
Can we work around this by clipping X
by 1/lambda
when the method is box-coox?
X = rng.normal(loc=0, scale=1, size=(n_samples, 1))
if method == "box-cox":
# For box-cox, means that lmbda * y + 1 > 0 or y > - 1 / lmbda
# Clip the data here to make sure the inequality is valid.
X = np.clip(X, - 1 / lmbda, None)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Clipping the data is a good idea, but may need to add a safety number, e.g., 1e-5.
X = np.clip(X, - 1 / lmbda + 1e-5, None)
from scipy.special import inv_boxcox
lmbda = 0.5
inv_boxcox(-1/lmbda, lmbda) # 0.0 (boxcox requires strictly positive data)
inv_boxcox(-1/lmbda + 1e-5, lmbda) # 2.5e-11
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, adding 1e-5
makes sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Reference Issues/PRs
What does this implement/fix? Explain your changes.
Since we already use
scipy.special.boxcox
, I think we can usescipy.special.inv_boxcox
for the inverse too.https://github.com/scipy/scipy/blob/fcf7b652bc27e47d215557bda61c84d19adc3aae/scipy/special/_boxcox.pxd#L30-L34
Any other comments?
The Box-Cox transformation.
