You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been fitting an SVC to experimental data and came across an issue where RocCurveDisplay sometimes was putting the curves upside down.
I know this happens if the positive label gets mixed up between fitting and plotting but here it is happening randomly.
It only happens when I specify probability=True to the model which the docs say requires randomness to compute the probabilities. I did a bit more investigation to get to a much smaller code example. I found that passing in a RandomState gives repeatable behaviour with some random states plot the ROC curve correctly and some wrongly.
From this I found that predict_proba seems to do one of two things depending on the random state and this leads to one of the two ROC curves.
I couldn't get any further because it seems very data dependent; just deleting a few data rows made the problem go away but I was able to manually truncate my data to 2 decimal places and nothing changed. Sorting the data by increasing value also seems to be necessary.
Here is an illustration - the first column is a 'correct' ROC curve and the corresponding probabilities, and the second is a 'wrong' one. Note that the y-scale of probabilities is very different.
Any comments welcome, especially if I am doing something daft here.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
I've been fitting an SVC to experimental data and came across an issue where RocCurveDisplay sometimes was putting the curves upside down.
I know this happens if the positive label gets mixed up between fitting and plotting but here it is happening randomly.
It only happens when I specify
probability=True
to the model which the docs say requires randomness to compute the probabilities. I did a bit more investigation to get to a much smaller code example. I found that passing in aRandomState
gives repeatable behaviour with some random states plot the ROC curve correctly and some wrongly.From this I found that
predict_proba
seems to do one of two things depending on the random state and this leads to one of the two ROC curves.I couldn't get any further because it seems very data dependent; just deleting a few data rows made the problem go away but I was able to manually truncate my data to 2 decimal places and nothing changed. Sorting the data by increasing value also seems to be necessary.
Here is an illustration - the first column is a 'correct' ROC curve and the corresponding probabilities, and the second is a 'wrong' one. Note that the y-scale of probabilities is very different.
Any comments welcome, especially if I am doing something daft here.
Thanks.
Here is the code, from a Jupyter notebook
I am using scikit-learn 1.5.2, numpy 2.1.3 and python 3.12.10.
Beta Was this translation helpful? Give feedback.
All reactions