-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
1d array support in MinMaxScaler #1549
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
So to me, the best way to do this is to:
Is there a more elegant way to deal with rank 1 vs rank n issues in Numpy? |
Another related question: does it make sense to give MinMaxScaler a Numpy array with ndim > 2? Right now, this doesn't raise any error. I wonder if there might be "hidden logic" that assumes ndim = 2 in various preprocessing routines. |
@kyleabeauchamp that is a good comment. I would have imagined that |
One thing that worries me in retrospect about this issue is that reshaping a 1d X or 1d y is not the same (reshape(1, -1) for one and reshape(-1, 1) for the other). So if we do decide to implement this feature, we need to document clearly that 1d array support if for y, not X (or we could add a constructor option to specify the way to reshape). |
To me, I would think that 1D X and 1D y would be the same. My thinking is that if you reshape a 1D y vector, you are actually treating y as a single feature from a hypothetical feature matrix. If you have a 1D X, my first guess would be the case of linear regression, where you have a single feature with several observations. I think regardless, we might want to add tighter checking of ndim. |
Our utility function >>> from sklearn.utils.validation import array2d
>>> array2d([1,2,3]).shape
(1, 3) So For >>> np.array([1,2,3]).reshape(-1,1).shape
(3, 1) Or maybe I'm just confused. |
@mblondel I think you are right. Though the docs and tests on this issue (treating single column X) are not so great :-/ |
I think that a single sample (1, n_features) is more useful for |
we need to add tests that the treatment of 1D arrays is consistent everywhere. preferrably before merging any code that changes the behavior. Mathieu Blondel notifications@github.com schrieb:
Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet. |
-1 on overloading the meaning of 1-d arrays; I'd rather issue a warning (or even an exception) with an explanatory message when one is handed to |
So utils.check_arrays() doesn't actually check the shapes of arrays. The docstring is somewhat misleading: "Checks whether all objects in arrays have the same shape or length." To me, I think what we need is to break check_arrays() into several sub-functions or maybe a class:
It's not clear how we would go about checking y versus x--in that case, we want to basically ignore point 4. |
Number 4. is not done by Interestingly, when you start a line with |
To avoid taking over the original intent of this issue (MinMaxScaler), I created a new issue for us to discuss the ndim / shape stuff: #1597 . |
thanks @kyleabeauchamp, sorry I have been super busy the last two days. |
Hi, I tried inputting 1D array and found that function MinMaxScaler.fit does not support arrays if its shape is (N, ). However, it works for 1D array if its shape is (N, 1) or (1, N). For instance, the following code won't work
However if we reshape x1d by either So I think we could simply do a reshape if len(x1d.shape) == 1. I also think we should reshape it to a column vector (N, 1) by x1d = x1d.reshape(-1, 1). In this way, the 1D input array is considered as N samples for one single feature. And it makes more sense this way as compared with reshaping to row vector of shape (1, N). That's because we want to fit data that have a number of samples, but not to fit data that have only one sample. I am new here but just wanted to start working on some easy issues:). I could be wrong and I'd be appreciated to hear your opinions. |
That's expected. X should be a 2D array, of shape (n_samples, n_features) |
I'm closing the issue since reshaping |
MinMaxScaler is useful for regression to scale y. For this reason, it would be nice to support 1d arrays in fit and transform.
The text was updated successfully, but these errors were encountered: