-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG] Replaced float with int in examples #8040
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -43,7 +43,7 @@ def _not_in_sphinx(): | |||
def atomic_benchmark_estimator(estimator, X_test, verbose=False): | |||
"""Measure runtime prediction of each instance.""" | |||
n_instances = X_test.shape[0] | |||
runtimes = np.zeros(n_instances, dtype=np.float) | |||
runtimes = np.zeros(n_instances, dtype=np.int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain why it solves the problem ?
If I remember well VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
is launched when you use a float as indices but I dont see where runtimes
is used as indices ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I myself was not sure while making the change whether this'll help or not. The problem is, as I mentioned above, I couldn't produce most of the VisibleDeprecationWarning
s in the examples present in the build linked in the issue thread. This was one such example. I added it here to check if the build shows the warning or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please help me with this problem of being unable to reproduce?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still not see why it solves the problem ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I don't think this solves anything really now. I'll remove it.
In fact I can reproduce it. If I do : cd doc
sphinx-build -b html -d _build/doctrees . _build/html/stable The first error is :
Now if I do
If you go on the corresponding line you will find floating as id. |
@tguillemot The problem is that the example where it's showing this warning, e.g. Executing file ../examples/plot_kernel_ridge_regression.py
SVR complexity and bandwidth selected and model fitted in 0.948 s
KRR complexity and bandwidth selected and model fitted in 0.540 s
Support vector ratio: 0.320
SVR prediction for 100000 inputs in 0.157 s
KRR prediction for 100000 inputs in 0.423 s
../examples/plot_kernel_ridge_regression.py ran in : 32 seconds This is what I originally stated that there is a mismatch between the examples showing the warning when run locally and those shown in the build. Do you then suggest that I remove warnings from those examples that I get when run locally? |
Interesting indeed. I don't know why they are different.
Yes, there are some |
Yes, I agree to that. |
@@ -54,7 +54,7 @@ | |||
y = np.sin(X).ravel() | |||
|
|||
# Add noise to targets | |||
y[::5] += 3 * (0.5 - rng.rand(X.shape[0]/5)) | |||
y[::5] += 3 * (0.5 - rng.rand(int(X.shape[0]/5))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unclear to me too. But for some reason, after I changed this, the VisibleDeprecationWarning
was gone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As said @amueller, I think you can use integer division //
for those issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, this is the shape, not the scale... alright then. But yes, use //
|
||
|
||
# Now predict the value of the digit on the second half: | ||
data_test, targets_test = data[n_samples / 2:], digits.target[n_samples / 2:] | ||
#data_test = scaler.transform(data_test) | ||
data_test, targets_test = (data[int(n_samples / 2):], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
depending on whether n_samples is even or not this might actually be a line that raises a warning. But why not just use integer division?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, much cleaner. I'll change this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int(n_samples / 2)
->
n_samples // 2
You can make the warnings become errors btw, which will make it easier to debug them. |
On a side node, I get the following error: ../examples/applications/plot_out_of_core_classification.py failed to execute correctly:Traceback (most recent call last):
File "/media/aman/BE66ECBA66EC7515/Open Source/scikit-learn/doc/sphinxext/sphinx_gallery/gen_rst.py", line 518, in execute_code_block
exec(code_block, example_globals)
File "<string>", line 57, in <module>
File "/media/aman/BE66ECBA66EC7515/Open Source/scikit-learn/sklearn/feature_extraction/text.py", line 487, in transform
X = self._get_hasher().transform(analyzer(doc) for doc in X)
File "/media/aman/BE66ECBA66EC7515/Open Source/scikit-learn/sklearn/feature_extraction/hashing.py", line 146, in transform
raise ValueError("Cannot vectorize empty sequence.")
ValueError: Cannot vectorize empty sequence. Should I create an issue for this? |
Not getting any more |
@dalmia Thx |
I have investigate the problem and I think is linked to #8058. Let's see once it's solved. Anyway, when I do |
@tguillemot Right on it. Thanks for looking up. |
@tguillemot While I'm at it, could you please have a look at the issue I mentioned above about the example not running correctly? We could solve it as a part of this PR or create a separate issue for the same. Let me know of your opinion. |
Okay, I have tried for a very long time now but have been unable to reproduce the warnings. I have uninstalled and reinstalled numpy and scikit-learn and already rebuilt it many times. The result is that I am unable to reproduce even the warnings I was getting previously in the master branch. For some unknown reason, the examples showing the warnings are not being run. For example, I have a few examples running: Executing file ../examples/feature_stacker.py
../examples/feature_stacker.py ran in : 0 seconds
Executing file ../examples/hetero_feature_union.py
../examples/hetero_feature_union.py ran in : 0 seconds
Executing file ../examples/missing_values.py
../examples/missing_values.py ran in : 0 seconds
Executing file ../examples/applications/face_recognition.py
../examples/applications/face_recognition.py ran in : 0 seconds Then I have this error: /examples/applications/plot_out_of_core_classification.py failed to execute correctly:Traceback (most recent call last):
File "/media/aman/BE66ECBA66EC7515/Open Source/scikit-learn/doc/sphinxext/sphinx_gallery/gen_rst.py", line 518, in execute_code_block
exec(code_block, example_globals)
File "<string>", line 57, in <module>
File "/usr/local/lib/python2.7/dist-packages/sklearn/feature_extraction/text.py", line 487, in transform
X = self._get_hasher().transform(analyzer(doc) for doc in X)
File "/usr/local/lib/python2.7/dist-packages/sklearn/feature_extraction/hashing.py", line 146, in transform
raise ValueError("Cannot vectorize empty sequence.")
ValueError: Cannot vectorize empty sequence. And the examples supposed to show the warnings are present only here throughout the build: Computation time summary:
- plot_out_of_core_classification.py : 0.0011 sec
- plot_compare_reduction.py : 0 sec
- plot_kernel_approximation.py : 0 sec
- plot_partial_dependence.py : 0 sec
- plot_learning_curve.py : 0 sec
- plot_label_propagation_digits_active_learning.py : 0 sec I am really out of ideas now. Please suggest me an alternative. |
The above error on the example seems solved now. |
I can't explain why you don't see the cd doc
make html Maybe if it helps you I use :
I have checked and there are other real |
@tguillemot Thank you for looking it up! I am using Python 2.7. Is this a deprecation for 3.5? If it is, sorry I was not aware of that. |
I tried doing the whole process on another pc by cloning the master repo, installing it using 'python setup.py develop` and then the above 2 commands that you mentioned and did not get those warnings. |
In python 2.7, the |
Maybe because you use python 2.7. I have given to you the line where the warnings comes from. You can see that they do not use an integer division. In python 2.7 no problem, in python 3.5 the result is a float => If you cannot find the warnings, maybe it's better to merge your work and correct the other |
@tguillemot I did try as you said by running the files with python3. On the examples that I have corrected, wherever the warning was being raised due to integer division, I was indeed able to see them. However, I don't think this would work for cases where the indexes are defined as float arrays. Thank you very much for looking up, I was able to get some progress. However, since I am still unable to reproduce it on the examples that you mentioned, I guess the only option is to go over the examples and try to find the case. |
What is this blocking on? I've not kept abreast of the conversation and an executive summary may be helpful! |
@jnothman The issue demands to solve |
Sorry for the late answer. |
So, after having trying several things, I have been unable to reproduce the warnings in the files that CircleCI shows. However, I have found |
@tguillemot Please have a look. Thanks! |
Also, on observing pdp, axes = partial_dependence(clf, target_feature,
X=X_train, grid_resolution=50)
XX, YY = np.meshgrid(axes[0], axes[1])
Z = pdp[0].reshape(list(map(np.size, axes))).T
ax = Axes3D(fig)
surf = ax.plot_surface(XX, YY, Z, rstride=1, cstride=1, cmap=plt.cm.BuPu) Please let me know what do you feel. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know neither.
X_test = test[:, :np.ceil(0.5 * n_pixels)] | ||
y_test = test[:, np.floor(0.5 * n_pixels):] | ||
# Upper half of the faces | ||
X_train = train[:, :np.int64(np.ceil(0.5 * n_pixels))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
n_pixels//2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np.ceil
returns float
as the dtype. So, this conversion can't be avoided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(n_pixels + 1) // 2 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using (n_pixels // 2), the value is 2048, which becomes 2048.0 from np.ceil.
>>> np.ceil(2048)
2048.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need ceil
or I miss something ?
Just remove np.int64(np.ceil(0.5 * n_pixels))
and replace it by (n_pixels +1) // 2
.
>>> np.int64(np.ceil(0.5 * 110))
55
>>> (110 + 1) // 2
55
>>> np.int64(np.ceil(0.5 * 111))
56
>>> (111 + 1) // 2
56
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh! Sorry I had misunderstood what you were trying to say. I'll make the change. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No pb, I wasn't clear enough :)
examples/plot_compare_reduction.py
Outdated
# scores are in the order of param_grid iteration, which is alphabetical | ||
mean_scores = mean_scores.reshape(len(C_OPTIONS), -1, len(N_FEATURES_OPTIONS)) | ||
# select score for best C | ||
mean_scores = mean_scores.max(axis=0) | ||
bar_offsets = (np.arange(len(N_FEATURES_OPTIONS)) * | ||
(len(reducer_labels) + 1) + .5) | ||
bar_offsets = np.array(bar_offsets, dtype=np.int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it solving the problem ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, this is not required. Reverting it.
@@ -82,6 +82,7 @@ | |||
for j in range(repeat): | |||
|
|||
rng = np.random.RandomState(i * j) | |||
n_outliers = np.int64(n_outliers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we change dtype(range_n_outliers) to be np.int ?
@@ -43,7 +43,7 @@ def _not_in_sphinx(): | |||
def atomic_benchmark_estimator(estimator, X_test, verbose=False): | |||
"""Measure runtime prediction of each instance.""" | |||
n_instances = X_test.shape[0] | |||
runtimes = np.zeros(n_instances, dtype=np.float) | |||
runtimes = np.zeros(n_instances, dtype=np.int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still not see why it solves the problem ?
Please rebase on master to get rid of the CircleCI failure:
as noted in #7565 (comment). |
Seems ready for merge then. |
examples/plot_compare_reduction.py
Outdated
@@ -53,7 +53,7 @@ | |||
digits = load_digits() | |||
grid.fit(digits.data, digits.target) | |||
|
|||
mean_scores = np.array(grid.cv_results_['mean_test_score']) | |||
mean_scores = np.array(grid.cv_results_['mean_test_score'], dtype=np.int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, what? Why do you turn scores into ints?
Also this line is too convoluted anyway, it could be:
mean_scores = grid.cv_results_['mean_test_score']
which is already an array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Best is not to modify the example, I don't see any warning when I run it.
@@ -68,6 +68,7 @@ | |||
range_n_outliers = np.concatenate( | |||
(np.linspace(0, n_samples / 8, 5), | |||
np.linspace(n_samples / 8, n_samples / 2, 5)[1:-1])) | |||
range_n_outliers = np.array(range_n_outliers, dtype=np.int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use .astype(np.int)
on the previous array (you can do everything in a single statement)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes, I didn't see it. I'll make the change.
The Travis error seems like a glitch too me, The test are passing and the status is red somehow ... ignore it for now. |
I have no idea what is happening with Travis, I'll restart the build to see if that goes away ... |
Is it retaining this behavior in other PRs too? |
I am seeing a similar thing in one of my joblib PR. |
So, is it safe enough to just merge the PR? |
We could merge the PR but then the risk is that master is red and I'd rather avoid unnecessary confusion. I am hoping that Travis will be fixed in a reasonable amount of time. |
Fair enough. Waiting for the fix then. |
I restarted the builds a few time and now it seems like Travis is passing. Let's merge this one and see what happens. Thanks @dalmia! |
Thank you @lesteve. |
Thanks @dalmia |
For the record after merging this PR, master is red. I opened an issue on the Travis repo about this: |
* Fix tests on numpy master (#7946) Until now we were in a edge case on assert_array_equal * Fix tests on numpy master (#8355) numpy.apply_along_axis has changed behaviour when the function passed in returns a 2d array * [MRG] Updated plot_stock_market.py to use Google Finance (#9010) * DOC updated plot_stock_market.py to use Google Finance The implementations is intentionally very basic not to distract the users from the example. Specifically unlike ``quotes_historical_yahoo_ochl`` it does not cache downloaded data. I also had to remove some symbols because the have no data on Google for the specified date interval. These are WBA, LMT, KFT and MTU. Closes #8899 * DOC removed plot_stock_market.py from expected failing examples * Addressed review comments * Addressed another pass of review comments * [MRG] Remove DeprecationWarnings in examples due to using floats instead of ints (#8040)
Reference Issue
Fixes #8017
What does this implement/fix? Explain your changes.
This replaces
float
withint
in the examples to remove theVisibleDeprecationWarning
encountered due to the indices being of a non-integer type.Any other comments?
As mentioned in the issue thread, I wasn't able to replicate most of the warnings appearing on the build that was linked to. However, I encountered warnings in other examples that were not present in the build and hence, I chose to not modify them. Please give your suggestions in this regard.