-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
TST Skip test when unstable openblas configuration #21702
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
544d564
to
b4c5650
Compare
Force-pushing to include @adrinjalali in 544d564 as a co-author. |
sklearn/utils/_testing.py
Outdated
@@ -452,6 +457,10 @@ def set_random_state(estimator, random_state=0): | |||
not joblib.parallel.mp, reason="joblib is in serial mode" | |||
) | |||
|
|||
fails_if_unstable_openblas = pytest.mark.xfail( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if xfail is a good idea for code that can segfault. Maybe skip is a better idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was marked resolved without being addressed or commented on: I am pretty sure this is a bad idea to execute code that can segfault, especially when pytest-xdist
is not used. XFAIL will run the code, SKIP will not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My mistake multitasking and forgetting to take your suggestion in the patch of this commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A green CI is a good one.
Co-authored-By: Adrin Jalali <adrin.jalali@gmail.com>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
4c8374e
to
5ca918a
Compare
I force-pushed to:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The prescott kernel will cause a segfault on any OpenBLAS version when called on non-memory aligned data, whatever the version of OpenBLAS. What changes in old version of OpenBLAS is that the prescott kernel is detected instead of Haswell on SkylakeX machines indirectly causing the bug.
But the bug could still happen if the test is actually run on a prescott machine with a recent version of OpenBLAS.
Instead of introducing a utility function in sklearn.utils
with a generic name such as _in_unstable_openblas_configuration
, I would rather change the function sklearn.utils.estimators_checks.check_classifiers_train
to call create_memmap_backed_data
with aligned=has_prescott_openblas
once #21654 is merged, has_prescott_openblas
is defined as:
# OpenBLAS is known to segfault with unaligned data on the prescott architecture
has_prescott_openblas = any(
True
for info in threadpool_info()
if info["internal_api"] == "openblas"
and info.get("architecture").lower() == "prescott"
)
Also for the sake of keeping the git history easier to understand, would it be possible to separate the fix for the calibration plots labels separate from the fix for openblas segfaults? Those are unrelated problems, right? Since we squash merge PRs, I would rather have those 2 fixes in 2 separate commits in the git history, hence 2 separate PRs.
I wrote the previous review against before this fix, please ignore that part of the review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming CI is green, +1 for merge with the last comment below:
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Green! @adrinjalali still +1 for merge with the new way to fix this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A much cleaner solution indeed
Actually I just found out that it does not work :) $ OPENBLAS_CORETYPE=Prescott pytest -v sklearn/tests/test_common.py -k "check_classifiers_train and svc"
[...]
def create_memmap_backed_data(data, mmap_mode="r", return_folder=False, aligned=False):
"""
Parameters
----------
data
mmap_mode : str, default='r'
return_folder : bool, default=False
aligned : bool, default=False
If True, if input is a single numpy array and if the input array is aligned,
the memory mapped array will also be aligned. This is a workaround for
https://github.com/joblib/joblib/issues/563.
"""
temp_folder = tempfile.mkdtemp(prefix="sklearn_testing_")
atexit.register(functools.partial(_delete_folder, temp_folder, warn=True))
if aligned:
if isinstance(data, np.ndarray) and data.flags.aligned:
# https://numpy.org/doc/stable/reference/generated/numpy.memmap.html
filename = op.join(temp_folder, "data.dat")
fp = np.memmap(filename, dtype=data.dtype, mode="w+", shape=data.shape)
fp[:] = data[:] # write data to memmap array
fp.flush()
memmap_backed_data = np.memmap(
filename, dtype=data.dtype, mode=mmap_mode, shape=data.shape
)
else:
> raise ValueError("If aligned=True, input must be a single numpy array.")
E ValueError: If aligned=True, input must be a single numpy array. |
I will open a PR with a quick fix for the above. |
…rn#21702) * TST XFAIL test when unstable openblas configuration * Adapt versions parsing Co-authored-By: Adrin Jalali <adrin.jalali@gmail.com> * DOC Prefer ifskip and reference scikit-learn#21361 Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> * Fix Julien's scheduler incorrect dispatch * Simplify Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> * Do not remove blank line * fixup! Do not remove blank line * Retrigger CI * Prudently assume Prescott might be the architecture if it is unknown Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
…rn#21702) * TST XFAIL test when unstable openblas configuration * Adapt versions parsing Co-authored-By: Adrin Jalali <adrin.jalali@gmail.com> * DOC Prefer ifskip and reference scikit-learn#21361 Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> * Fix Julien's scheduler incorrect dispatch * Simplify Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> * Do not remove blank line * fixup! Do not remove blank line * Retrigger CI * Prudently assume Prescott might be the architecture if it is unknown Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
…rn#21702) * TST XFAIL test when unstable openblas configuration * Adapt versions parsing Co-authored-By: Adrin Jalali <adrin.jalali@gmail.com> * DOC Prefer ifskip and reference scikit-learn#21361 Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> * Fix Julien's scheduler incorrect dispatch * Simplify Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> * Do not remove blank line * fixup! Do not remove blank line * Retrigger CI * Prudently assume Prescott might be the architecture if it is unknown Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
…rn#21702) * TST XFAIL test when unstable openblas configuration * Adapt versions parsing Co-authored-By: Adrin Jalali <adrin.jalali@gmail.com> * DOC Prefer ifskip and reference scikit-learn#21361 Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> * Fix Julien's scheduler incorrect dispatch * Simplify Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> * Do not remove blank line * fixup! Do not remove blank line * Retrigger CI * Prudently assume Prescott might be the architecture if it is unknown Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
* TST XFAIL test when unstable openblas configuration * Adapt versions parsing Co-authored-By: Adrin Jalali <adrin.jalali@gmail.com> * DOC Prefer ifskip and reference #21361 Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> * Fix Julien's scheduler incorrect dispatch * Simplify Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> * Do not remove blank line * fixup! Do not remove blank line * Retrigger CI * Prudently assume Prescott might be the architecture if it is unknown Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Reference Issues/PRs
Temporarily fixes #21361.
What does this implement/fix? Explain your changes.
This is a temporary fix for some tests on
main
.This fixes some CI jobs on
main
.This is a first fix toward preventing contributors spending their days exploring logs
of issues they don't understand the problems being reported because they aren't
related to their PRs.
As informally proposed by @ogrisel, a long term fix would be to pin the maximum
version dependencies and have a process to update each dependency version
when possible.
Any other comments?
This fix can be made more specific on some values of the parametrisation.