Skip to content

Should our custom DTypes have a __len__ ? #24033

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jorisvandenbossche opened this issue Dec 1, 2018 · 3 comments
Closed

Should our custom DTypes have a __len__ ? #24033

jorisvandenbossche opened this issue Dec 1, 2018 · 3 comments
Labels
ExtensionArray Extending pandas with custom dtypes or arrays.

Comments

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Dec 1, 2018

This came up in scikit-learn/scikit-learn#12699.

Numpy dtypes have a lenght defined. For normal dtypes, this is 0:

In [21]: len(np.dtype('int64'))
Out[21]: 0

but for structured dtypes this is the number of fields.

For our custom dtypes, you get a TypeError:

In [25]: s = pd.Series(['a', 'b', 'c']).astype('category')

In [26]: s.dtypes  
Out[26]: CategoricalDtype(categories=['a', 'b', 'c'], ordered=False)

In [27]: len(s.dtypes) 
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-27-3d39f9ba3f74> in <module>
----> 1 len(s.dtypes)

TypeError: object of type 'CategoricalDtype' has no len()

In hindsight, using the len of the dtypes in sklearn was maybe not the most robust idea, but that said: should we follow numpy here and also define a __len__ on our custom dtypes?
(I am personally not fully sure it is needed)

@TomAugspurger
Copy link
Contributor

Is scikit-learn likely to change its check? If so, I'd rather not define __len__.

@jorisvandenbossche
Copy link
Member Author

Yes, in any case we will fix this in sklearn, since we want it to work with released versions of pandas (PR scikit-learn/scikit-learn#12706).

Given that, I also don't see strong reasons to define it for pandas. The question is a bit how far we want to have them be compatible with numpy dtypes.

@mroeschke
Copy link
Member

Sounds like it's unlikely that this will be implemented. Closing, but happy to reopen if there's further interest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays.
Projects
None yet
Development

No branches or pull requests

3 participants