-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
"The Python kernel is unresponsive" when fitting a reasonable sized sparse matrix into NearestNeighbors #31059
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hey @fabienarnaud, It looks like the issue might be related to OpenMP threading conflicts or an unstable OpenBLAS version. |
Thanks for reporting this. I think "Kernel not responsive" means that you are running your code in a Jupyter notebook or something similar to that. It isn't a error message from scikit-learn. I recommend you take this up with databricks support. Once a version of scikit-learn is released it does not change retroactively. |
Thank you both for your quick response! I will report this back to Databricks |
I know that we introduced some improvement in pairwise distance but it was in 1.2. |
Hi glemaitre, I tried to collect info with traceback but as the Databricks notebook crashes, my code never gets to print the traces from the exception handler. I'm pasting below the Log4j output of the Databricks cluster, in case it can help.
|
Describe the bug
Hi all,
I have a python code that has been running every day for the past years, which uses NearestNeighbors to find best matches.
All of a sudden, in both our TEST and PRD environments, our code has been crashing on the NearestNeighbors function with the following message: "The Python kernel is unresponsive". This started last Friday 21st of March 2025.
What puzzles me is that we haven't made any modifications to our code, the data hasn't changed (at least in our TEST environment) and we didn't change the version of scikit-learn.
The exact command that throws the error is:
where X is a sparse matrix compressed to sparse rows that contains 38506x53709 elements.
We run the code on Databricks (runtime 15.4LTS, where scikit-learn is on 1.3.0).
I also tried with scikit-learn 1.4.2 (preinstalled in Databricks runtime 16.2) but had the same issue.
The error suggests a memory issue, but I'm struggling to understand why this would happen now while the context is exactly the same as what it was before. Furthermore, we use the same code with the same Databricks cluster for another data set which is at least 6x bigger and that one runs successfully in just a few seconds.
I'm not a data scientist and therefore quite confused as to why this would no longer run. Since our environment didn't change, I was wondering if anything would have changed in respect to scikit-learn v1.3.0 for any odd reason, or if you heard anything similar recently from some other user(s)?
Steps/Code to Reproduce
Expected Results
No error should be thrown
Actual Results
The NearestNeighbor function now returns this:
Versions
The text was updated successfully, but these errors were encountered: