Skip to content

BUG/ENH: Improve output for structured non-void types #10381

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Feb 15, 2018

Conversation

eric-wieser
Copy link
Member

Fixes gh-9821

Based on the 1.14 branch point, in case we want to backport.

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Mostly nitpicks about comments, to help possible future developers.

@@ -394,7 +391,12 @@ def _get_format_function(data, **options):
elif issubclass(dtypeobj, _nt.object_):
return formatdict['object']()
elif issubclass(dtypeobj, _nt.void):
return formatdict['void']()
# StructureFormat relies on np.void.__getitem__, so we can't use it
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment would seem helpful only if you thought StructureFormat should be used elsewhere (as it was originally, but which is not all that obvious). Maybe just state here something along the lines of "'void' only deals with unstructured bytes; we use StructureFormat if they are assigned to fields. Note that we cannot use StructureFormat where the void is in a union with a non-void dtype; those will be caught above and the typeset with the regular dtype (though the full dtype will be printed, to indicate to the user it is a union dtype)"

Copy link
Member Author

@eric-wieser eric-wieser Jan 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you thought StructureFormat should be used elsewhere

That depends on what you think counts as a structured type:

  • issubdtype(dt, np.void) and dt.names is not None
  • dt.names is not None
  • issubdtype(dt, np.flexible) (a lot of old code refers to flexible types when it means structured types)

Renaming it to StructuredVoidFormat would make that clear

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed it is confusing, but this is one place where one can explain what it is in the code!

@@ -1236,7 +1238,7 @@ def dtype_is_implied(dtype):
dtype = np.dtype(dtype)
if _format_options['legacy'] == '1.13' and dtype.type == bool_:
return False
return dtype.type in _typelessdata
return dtype.type in _typelessdata and dtype.names is None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we're adding comments, maybe here note that the latter part guards against union dtypes?

CustomScalar.__module__ = None

dt_custom = np.dtype((CustomScalar, fields))
str(dt_custom) # segfault?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well test the output?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Output is meaningless with __module__ assigned to garbage, I think. Can __module__ ever be None in real code?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, true. I tried it myself and indeed got __main__.CustomScalar - I just wondered whether this is tested anywhere at all...


ret = PyUString_FromString("(");
if (modulestr != NULL) {
/* Note: if modulestr == NULL, the type is unpicklable */
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More background behind this comment would be useful. Can this ever happen?

Copy link
Member

@ahaldane ahaldane Jan 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm my memory of this is vague now. I think this code had to do with "old style" vs "new style" classes in python2. I think "old style" classes fill in the __module__ entry incompletely when defined in C, or something along those lines. I will have to look it up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, happily I left some kind of explanation of that in a github comment. See here (link).

Now as to whether this can be NULL, I think it can, for types defined in C in python 2. If you go to https://docs.python.org/2/extending/newtypes.html, and copy the first "noddy" example, you will find:

>>> import noddy
>>> mynoddy = noddy.Noddy()
>>> mynoddy.__module__
AttributeError: 'noddy.Noddy' object has no attribute '__module__'
>>> import cPickle
>>>  cPickle.dumps(mynoddy)
TypeError: can't pickle Noddy objects

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, should the code continue to be able to deal with __module__ == NULL? Or is the new behaviour of raising an exception fine? It would probably be good to add a comment with the conclusion regardless...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably fine to raise an exception if it is NULL, let's go with these changes.I would change the comment, though, it is technically wrong: Instead of this should never happen since types always have these attributes it should say something like we do not expect this to happen happen since types always have these attributes, except in some very special cases (module may be null for "statically allocated" types in python2).

Probably these "special cases" never come up in practice, since they will cause people other problems (eg, unpicklable types).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last comment: I looked more closely at this and I have not been able to construct an example where __module__ is NULL. The type from numpy.core.test_rational import rational is what I expected to fail, since it does not properly define a module, but if you do rational.__module__ you get __builtin__, which is not NULL. I tested that np.dtype((rational, 'i4,i4')) works properly with this PR.

So in summary, this code is fine.

@eric-wieser
Copy link
Member Author

@mhvk: Comments updated

@charris charris added this to the 1.14.1 release milestone Feb 5, 2018
@charris
Copy link
Member

charris commented Feb 7, 2018

I'll leave this to @mhvk and @ahaldane .

@@ -1192,13 +1199,16 @@ def __call__(self, x):
else:
return "({})".format(", ".join(str_fields))

# for backwards compatibility
StructureFormat = StructuredVoidFormat
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a deprecation warning for use of StructureFormat, the same way we did for LongFloatFormat?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, wouldn't hurt

namestr = PyObject_GetAttr((PyObject*)(dtype->typeobj), str_name);
Py_DECREF(str_name);
namestr = PyObject_GetAttrString((PyObject *)dtype->typeobj, "__name__");
modulestr = PyObject_GetAttrString((PyObject *)dtype->typeobj, "__module__");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of these lines are too long, too

@ahaldane
Copy link
Member

ahaldane commented Feb 7, 2018

Everything else LGTM!

@charris
Copy link
Member

charris commented Feb 12, 2018

Be good to finish this up.

@eric-wieser
Copy link
Member Author

Will pick this up in the next few days

@eric-wieser
Copy link
Member Author

I'm gonna roll back the segfault fix - we need a much wider audit of PyUString_ConcatAndDel, even within that file.

@eric-wieser eric-wieser changed the title BUG/ENH: Fix segfault and improve output for structured non-void types BUG/ENH: Improve output for structured non-void types Feb 15, 2018
@eric-wieser
Copy link
Member Author

Alright, that's the fixes for regular structured scalars at least

@ahaldane
Copy link
Member

LGTM. Circleci fails, but I'll merge anyway.

Thanks Eric!

@ahaldane ahaldane merged commit 69fa37f into numpy:master Feb 15, 2018
@charris charris removed the 09 - Backport-Candidate PRs tagged should be backported label Feb 16, 2018
@charris charris removed this from the 1.14.1 release milestone Feb 16, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants