Skip to content

Custom estimator's fit() method throws "RuntimeWarning: invalid value encountered in cast" in Linux Python 3.11/3.12 #29929

@cacti77

Description

@cacti77

Describe the bug

We have a custom estimator class that inherits from sklearn.base.BaseEstimator and RegressorMixin. We run automated unit tests in Azure DevOps pipelines on both Windows Server 2022 and Ubuntu 22.04.1. All the tests pass on Windows. On Python 3.12.6 in Linux the test with the stacktrace shown below fails with:

RuntimeWarning: invalid value encountered in cast

This causes the test and hence build to fail because we set PYTHONWARNINGS=error before running the tests. On Python 3.11.10 in Linux this test actually passes; but a different test using the same custom estimator fails with an identical stacktrace. And yet this latter test passes on Python 3.12 in Linux!

Note this change in numpy 1.24.0: https://numpy.org/doc/stable/release/1.24.0-notes.html#numpy-now-gives-floating-point-errors-in-casts; especially this bit:

The precise behavior is subject to the C99 standard and its implementation in both software and hardware.

I can probably work around this error in our tests by using a numpy.errstate context manager, but could there be a bug in sklearn?

I don't know if this issue is related to #25319. AFAIK the test data has no nan values; the feature data columns are all float64.

Steps/Code to Reproduce

Sorry, this is proprietary code which I didn't write and don't understand!

Expected Results

The call to fit() succeeds without throwing a RuntimeWarning.

Actual Results

Stacktrace from Python 3.12.6 x64 on Linux (Ubuntu 22.04.1):

Traceback (most recent call last):
  File "/home/vsts/work/1/tests/<our_test_module>", line 76, in test_gen_data
    grid_search.fit(data[features].values)
  File "/opt/hostedtoolcache/Python/3.12.6/x64/lib/python3.12/site-packages/sklearn/base.py", line 1473, in wrapper
    return fit_method(estimator, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.12.6/x64/lib/python3.12/site-packages/sklearn/model_selection/_search.py", line 1019, in fit
    self._run_search(evaluate_candidates)
  File "/opt/hostedtoolcache/Python/3.12.6/x64/lib/python3.12/site-packages/sklearn/model_selection/_search.py", line 1573, in _run_search
    evaluate_candidates(ParameterGrid(self.param_grid))
  File "/opt/hostedtoolcache/Python/3.12.6/x64/lib/python3.12/site-packages/sklearn/model_selection/_search.py", line 1013, in evaluate_candidates
    results = self._format_results(
              ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.12.6/x64/lib/python3.12/site-packages/sklearn/model_selection/_search.py", line 1137, in _format_results
    for param, ma in _yield_masked_array_for_each_param(candidate_params):
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.12.6/x64/lib/python3.12/site-packages/sklearn/model_selection/_search.py", line 429, in _yield_masked_array_for_each_param
    ma = MaskedArray(np.empty(n_candidates), mask=True, dtype=arr_dtype)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.12.6/x64/lib/python3.12/site-packages/numpy/ma/core.py", line 2820, in __new__
    _data = np.array(data, dtype=dtype, copy=copy,
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeWarning: invalid value encountered in cast

Versions

Relevant pip-installed package versions, which were all the same in Python 3.11 and 3.12 in both Linux and Windows on Azure DevOps:

numpy           1.26.4
pandas          2.2.3
scikit-learn    1.5.2
scipy           1.14.1

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions