Should our custom DTypes have a len ? #24033

jorisvandenbossche · 2018-12-01T07:46:59Z

This came up in scikit-learn/scikit-learn#12699.

Numpy dtypes have a lenght defined. For normal dtypes, this is 0:

In [21]: len(np.dtype('int64'))
Out[21]: 0

but for structured dtypes this is the number of fields.

For our custom dtypes, you get a TypeError:

In [25]: s = pd.Series(['a', 'b', 'c']).astype('category')

In [26]: s.dtypes  
Out[26]: CategoricalDtype(categories=['a', 'b', 'c'], ordered=False)

In [27]: len(s.dtypes) 
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-27-3d39f9ba3f74> in <module>
----> 1 len(s.dtypes)

TypeError: object of type 'CategoricalDtype' has no len()

In hindsight, using the len of the dtypes in sklearn was maybe not the most robust idea, but that said: should we follow numpy here and also define a __len__ on our custom dtypes?
(I am personally not fully sure it is needed)

The text was updated successfully, but these errors were encountered:

TomAugspurger · 2018-12-01T12:58:40Z

Is scikit-learn likely to change its check? If so, I'd rather not define __len__.

jorisvandenbossche · 2018-12-01T13:14:58Z

Yes, in any case we will fix this in sklearn, since we want it to work with released versions of pandas (PR scikit-learn/scikit-learn#12706).

Given that, I also don't see strong reasons to define it for pandas. The question is a bit how far we want to have them be compatible with numpy dtypes.

mroeschke · 2021-06-23T05:03:18Z

Sounds like it's unlikely that this will be implemented. Closing, but happy to reopen if there's further interest

jorisvandenbossche added the ExtensionArray Extending pandas with custom dtypes or arrays. label Dec 1, 2018

jorisvandenbossche mentioned this issue Dec 1, 2018

utils.validation.check_array throws bad TypeError pandas series is passed in scikit-learn/scikit-learn#12699

Closed

mroeschke closed this as completed Jun 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should our custom DTypes have a len ? #24033

Should our custom DTypes have a len ? #24033

jorisvandenbossche commented Dec 1, 2018 •

edited

Loading

TomAugspurger commented Dec 1, 2018

jorisvandenbossche commented Dec 1, 2018

mroeschke commented Jun 23, 2021

Should our custom DTypes have a __len__ ? #24033

Should our custom DTypes have a __len__ ? #24033

Comments

jorisvandenbossche commented Dec 1, 2018 • edited Loading

TomAugspurger commented Dec 1, 2018

jorisvandenbossche commented Dec 1, 2018

mroeschke commented Jun 23, 2021

Should our custom DTypes have a len ? #24033

Should our custom DTypes have a len ? #24033

jorisvandenbossche commented Dec 1, 2018 •

edited

Loading