Skip to content

With NumPy 1.20, SymPy generated code cannot be serialized with dill #18547

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
cuihantao opened this issue Mar 4, 2021 · 16 comments · Fixed by #23020
Closed

With NumPy 1.20, SymPy generated code cannot be serialized with dill #18547

cuihantao opened this issue Mar 4, 2021 · 16 comments · Fixed by #23020

Comments

@cuihantao
Copy link

cuihantao commented Mar 4, 2021

I have been using SymPy to generate NumPy code through lambdify and using dill to serialize the code. Since upgraded to NumPy 1.20.1, some generated code cannot be serialized correctly due to a RecursionError. The example code works with NumPy 1.19.

I have posted this issue in the SymPy and NumPy Gitter rooms, and I'm posting my bisecting results here.

Reproducing code example:

Prerequisites: SymPy 1.7.1, dill 0.3.3, NumPy 1.20.1

import dill
import sympy
from sympy.abc import x

dill.settings['recurse'] = True

expr = sympy.sympify('re(x)')   # fails in 1.20.1, works in 1.19

# expr = sympy.sympify('log(x)')   # works in both 1.20.1 and 1.19

lfunc = sympy.lambdify(x, expr, 'numpy')

with open("out.pkl", 'wb') as f:
    dill.dump(lfunc, f)

Based on my nonexhausive testings, the error occurs when the function includes re or im. Other functions can be successfully serialized..

Error message:

Traceback (most recent call last):
  File "test_dill.py", line 12, in <module>
    dill.dump(lfunc, f)
  File "/Users/hcui7/miniconda3/envs/np120/lib/python3.8/site-packages/dill/_dill.py", line 267, in dump
    Pickler(file, protocol, **_kwds).dump(obj)
  File "/Users/hcui7/miniconda3/envs/np120/lib/python3.8/site-packages/dill/_dill.py", line 454, in dump
    StockPickler.dump(self, obj)
  File "/Users/hcui7/miniconda3/envs/np120/lib/python3.8/pickle.py", line 487, in dump
    self.save(obj)
  File "/Users/hcui7/miniconda3/envs/np120/lib/python3.8/pickle.py", line 560, in save
    f(self, obj)  # Call unbound method with explicit self
  File "/Users/hcui7/miniconda3/envs/np120/lib/python3.8/site-packages/dill/_dill.py", line 1422, in save_function
    globs = globalvars(obj, recurse=True, builtin=True)
  File "/Users/hcui7/miniconda3/envs/np120/lib/python3.8/site-packages/dill/detect.py", line 220, in globalvars
    func.update(globalvars(nested_func, True, builtin))
  File "/Users/hcui7/miniconda3/envs/np120/lib/python3.8/site-packages/dill/detect.py", line 220, in globalvars
    func.update(globalvars(nested_func, True, builtin))
  File "/Users/hcui7/miniconda3/envs/np120/lib/python3.8/site-packages/dill/detect.py", line 220, in globalvars
    func.update(globalvars(nested_func, True, builtin))
  [Previous line repeated 981 more times]
  File "/Users/hcui7/miniconda3/envs/np120/lib/python3.8/site-packages/dill/detect.py", line 213, in globalvars
    func.update(nestedglobals(getattr(orig_func, func_code)))
  File "/Users/hcui7/miniconda3/envs/np120/lib/python3.8/site-packages/dill/detect.py", line 170, in nestedglobals
    dis.dis(func) #XXX: dis.dis(None) disassembles last traceback
  File "/Users/hcui7/miniconda3/envs/np120/lib/python3.8/dis.py", line 79, in dis
    _disassemble_recursive(x, file=file, depth=depth)
  File "/Users/hcui7/miniconda3/envs/np120/lib/python3.8/dis.py", line 373, in _disassemble_recursive
    disassemble(co, file=file)
  File "/Users/hcui7/miniconda3/envs/np120/lib/python3.8/dis.py", line 369, in disassemble
    _disassemble_bytes(co.co_code, lasti, co.co_varnames, co.co_names,
  File "/Users/hcui7/miniconda3/envs/np120/lib/python3.8/dis.py", line 401, in _disassemble_bytes
    for instr in _get_instructions_bytes(code, varnames, names,
  File "/Users/hcui7/miniconda3/envs/np120/lib/python3.8/dis.py", line 321, in _get_instructions_bytes
    labels = findlabels(code)
  File "/Users/hcui7/miniconda3/envs/np120/lib/python3.8/dis.py", line 437, in findlabels
    for offset, op, arg in _unpack_opargs(code):
  File "/Users/hcui7/miniconda3/envs/np120/lib/python3.8/dis.py", line 421, in _unpack_opargs
    for i in range(0, len(code), 2):
RecursionError: maximum recursion depth exceeded in comparison

Bisecting Results

4cd6e4b is the first bad commit.

Could anyone help?

NumPy/Python version information:

1.20.1 3.8.8 | packaged by conda-forge | (default, Feb 20 2021, 16:12:38)
[Clang 11.0.1 ]

@seberg
Copy link
Member

seberg commented Mar 9, 2021

Well, I dug a bit. And it gets stuck asanyarray. I guess asanyarray effectively calls itself recursively. So thats probably the reason, but I don't really understand why that is so bad. Maybe dill thinks that its not possible that the module doesn't match the __globals__ or so...
Not sure how to best approach it though, likely just breaking the recursion, unless you can fix dill to cope with it ;)

@cuihantao
Copy link
Author

Thanks, @seberg. This is probably beyond my current knowledge of NumPy and dill. Let me ask if dill's maintainer can offer some help.

Hi @mmckerns, do you see any chance of fixing this issue? Thanks.

@seberg seberg added the triage review Issue/PR to be discussed at the next triage meeting label Mar 10, 2021
@seberg
Copy link
Member

seberg commented Mar 10, 2021

Hmm, I had marked this for triage review, but we don't know this well. Is this severe enough that it is something that we should try to fix in 1.21.2? Do we have any lead on how to fix it easily?

@seberg seberg added this to the 1.20.2 release milestone Mar 10, 2021
@seberg
Copy link
Member

seberg commented Mar 10, 2021

@pentschev, I am a bit stumped right now. It might just be a dill bug or harmless. But if you got an idea, that would be good as well :).

@pentschev
Copy link
Contributor

I don't have any immediate ideas, but I could see that on my side as well. The globalvars in dill is recursive and not immediately clear to me what codepath is being followed. If there's a minimal reproducer, I'm happy to take a look.

@asmeurer
Copy link
Member

asmeurer commented Mar 11, 2021

The use of lambdify here seems to be irrelevant. I get the same issue with

import dill
from numpy import real

dill.settings['recurse'] = True

def lfunc(x):
    return real(x)

dill.dumps(lfunc)

Setting recurse=True is relevant, though, and without it, I get

Traceback (most recent call last):
  File "test.py", line 13, in <module>
    dill.dumps(lfunc)
  File "/Users/aaronmeurer/anaconda3/lib/python3.7/site-packages/dill/_dill.py", line 273, in dumps
    dump(obj, file, protocol, byref, fmode, recurse, **kwds)#, strictio)
  File "/Users/aaronmeurer/anaconda3/lib/python3.7/site-packages/dill/_dill.py", line 267, in dump
    Pickler(file, protocol, **_kwds).dump(obj)
  File "/Users/aaronmeurer/anaconda3/lib/python3.7/site-packages/dill/_dill.py", line 454, in dump
    StockPickler.dump(self, obj)
  File "/Users/aaronmeurer/anaconda3/lib/python3.7/pickle.py", line 437, in dump
    self.save(obj)
  File "/Users/aaronmeurer/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/Users/aaronmeurer/anaconda3/lib/python3.7/site-packages/dill/_dill.py", line 1447, in save_function
    obj.__dict__, fkwdefaults), obj=obj)
  File "/Users/aaronmeurer/anaconda3/lib/python3.7/pickle.py", line 638, in save_reduce
    save(args)
  File "/Users/aaronmeurer/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/Users/aaronmeurer/anaconda3/lib/python3.7/pickle.py", line 789, in save_tuple
    save(element)
  File "/Users/aaronmeurer/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/Users/aaronmeurer/anaconda3/lib/python3.7/site-packages/dill/_dill.py", line 941, in save_module_dict
    StockPickler.save_dict(pickler, obj)
  File "/Users/aaronmeurer/anaconda3/lib/python3.7/pickle.py", line 859, in save_dict
    self._batch_setitems(obj.items())
  File "/Users/aaronmeurer/anaconda3/lib/python3.7/pickle.py", line 885, in _batch_setitems
    save(v)
  File "/Users/aaronmeurer/anaconda3/lib/python3.7/pickle.py", line 524, in save
    rv = reduce(self.proto)
TypeError: can't pickle PyCapsule objects

which doesn't happen for the non-lambdified version (without recurse=True, the non-lambdified version pickles without any errors).

@cuihantao
Copy link
Author

which doesn't happen for the non-lambdified version (without recurse=True, the non-lambdified version pickles without any errors).

Thanks for chiming in. I always have recurse=True set. Since the non-lambdified version works, then I thought lambdify was involved.

@seberg seberg removed the triage review Issue/PR to be discussed at the next triage meeting label Mar 12, 2021
@mmckerns
Copy link

mmckerns commented Mar 14, 2021

The dill inspection tool sheds some light on potentially whats going on, see here: uqfoundation/dill#405 (comment)
also see my reply: uqfoundation/dill#405 (comment). Translation is that something changed with regard to lambdify (in numpy or sympy) and an argument of the function used for pickling now gets punted to pickle when previously it was handled by dill. Needs further investigation.

@charris
Copy link
Member

charris commented Mar 27, 2021

@mmckerns Any progress?

@charris
Copy link
Member

charris commented May 6, 2021

@mmckerns What is the current status with dill? I'm going to kick this off to Numpy 1.22.0 for tracking. If there is a fix we can backport to the upcoming 1.21.x branch.

@charris charris modified the milestones: 1.20.3 release, 1.22.0 release May 6, 2021
@seberg
Copy link
Member

seberg commented Nov 12, 2021

Bumping the milestone, but if were have to look into breaking the cycle in NumPy, please bump this, because IIRC the cycle seems pretty natural and not that bad.

@seberg seberg modified the milestones: 1.22.0 release, 1.23.0 Nov 12, 2021
@seberg
Copy link
Member

seberg commented May 4, 2022

I am going to remove the milestone, since it was pushed off before already. There were no more complaints so we are assuming it is not very pressing.

Please ping if this is important and needs to be prioritized after all.

@seberg seberg removed this from the 1.23.0 release milestone May 4, 2022
@cuihantao
Copy link
Author

I had a workaround by avoiding serializing generated code, so this issue is no longer pressing. I still hope that @mmckerns can fix this issue as I ran into "maximum recursion depth reached" elsewhere. But that might not have to do directly with NumPy.

@mmckerns
Copy link

mmckerns commented Jun 5, 2022

I haven't checked in on this thread in a while...

How recursion is handled has recently changed in dill, and there's also a PyCapsule PR that should get pulled imminently. Checking the above code with the current dill...

>>> import dill
>>> from numpy import real
>>> 
>>> def lfunc(x):
...     return real(x)
... 
>>> dill.dumps(lfunc)
b'\x80\x04\x95\xba\x00\x00\x00\x00\x00\x00\x00\x8c\ndill._dill\x94\x8c\x10_create_function\x94\x93\x94(h\x00\x8c\x0c_create_code\x94\x93\x94(K\x01K\x00K\x00K\x01K\x02KCC\x08t\x00|\x00\x83\x01S\x00\x94N\x85\x94\x8c\x04real\x94\x85\x94\x8c\x01x\x94\x85\x94\x8c\x07<stdin>\x94\x8c\x05lfunc\x94K\x01C\x02\x00\x01\x94))t\x94R\x94c__builtin__\n__main__\nh\x0cNNt\x94R\x94}\x94}\x94\x8c\x0f__annotations__\x94}\x94s\x86\x94b.'

So, it looks like this works. However, fails for recurse = True: with:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/mmckerns/lib/python3.9/site-packages/dill-0.3.6.dev0-py3.9.egg/dill/_dill.py", line 364, in dumps
    dump(obj, file, protocol, byref, fmode, recurse, **kwds)#, strictio)
  File "/Users/mmckerns/lib/python3.9/site-packages/dill-0.3.6.dev0-py3.9.egg/dill/_dill.py", line 336, in dump
    Pickler(file, protocol, **_kwds).dump(obj)
  File "/Users/mmckerns/lib/python3.9/site-packages/dill-0.3.6.dev0-py3.9.egg/dill/_dill.py", line 620, in dump
    StockPickler.dump(self, obj)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/pickle.py", line 487, in dump
    self.save(obj)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/pickle.py", line 560, in save
    f(self, obj)  # Call unbound method with explicit self
  File "/Users/mmckerns/lib/python3.9/site-packages/dill-0.3.6.dev0-py3.9.egg/dill/_dill.py", line 2058, in save_function
    globs_copy = globalvars(obj, recurse=True, builtin=True)
  File "/Users/mmckerns/lib/python3.9/site-packages/dill-0.3.6.dev0-py3.9.egg/dill/detect.py", line 239, in globalvars
    func.update(globalvars(nested_func, True, builtin))
  File "/Users/mmckerns/lib/python3.9/site-packages/dill-0.3.6.dev0-py3.9.egg/dill/detect.py", line 223, in globalvars
    _vars = globalvars(cell_contents, recurse, builtin) or {}
  File "/Users/mmckerns/lib/python3.9/site-packages/dill-0.3.6.dev0-py3.9.egg/dill/detect.py", line 223, in globalvars
    _vars = globalvars(cell_contents, recurse, builtin) or {}
  File "/Users/mmckerns/lib/python3.9/site-packages/dill-0.3.6.dev0-py3.9.egg/dill/detect.py", line 223, in globalvars
    _vars = globalvars(cell_contents, recurse, builtin) or {}
  [Previous line repeated 980 more times]
  File "/Users/mmckerns/lib/python3.9/site-packages/dill-0.3.6.dev0-py3.9.egg/dill/detect.py", line 232, in globalvars
    func.update(nestedglobals(getattr(orig_func, func_code)))
  File "/Users/mmckerns/lib/python3.9/site-packages/dill-0.3.6.dev0-py3.9.egg/dill/detect.py", line 181, in nestedglobals
    dis.dis(func) #XXX: dis.dis(None) disassembles last traceback
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/dis.py", line 79, in dis
    _disassemble_recursive(x, file=file, depth=depth)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/dis.py", line 373, in _disassemble_recursive
    disassemble(co, file=file)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/dis.py", line 369, in disassemble
    _disassemble_bytes(co.co_code, lasti, co.co_varnames, co.co_names,
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/dis.py", line 401, in _disassemble_bytes
    for instr in _get_instructions_bytes(code, varnames, names,
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/dis.py", line 321, in _get_instructions_bytes
    labels = findlabels(code)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/dis.py", line 437, in findlabels
    for offset, op, arg in _unpack_opargs(code):
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/dis.py", line 421, in _unpack_opargs
    for i in range(0, len(code), 2):
RecursionError: maximum recursion depth exceeded while calling a Python object
>>> 

With regard to the originating code, recurse = False yields an error trying to pickle a PyCapsule, while recurse = True yields a RecursionError.

>>> numpy.__version__
'1.22.4'
>>> dill.__version__
'0.3.6.dev0'
>>> sympy.__version__
'1.10.1'

I'll check back again after the PyCapsule PR goes in.

seberg added a commit to seberg/numpy that referenced this issue Jan 16, 2023
This moves dispatching for `__array_function__` into a C-wrapper.  This
helps speed for multiple reasons:
* Avoids one additional dispatching function call to C
* Avoids the use of `*args, **kwargs` which is slower.
* For simple NumPy calls we can stay in the faster "vectorcall" world

This speeds up things generally a little, but can speed things up a lot
when keyword arguments are used on lightweight functions, for example::

    np.can_cast(arr, dtype, casting="same_kind")

is more than twice as fast with this.

There is one alternative in principle to get best speed:  We could inline
the "relevant argument"/dispatcher extraction.  That changes behavior in
an acceptable but larger way (passes default arguments).
Unless the C-entry point seems unwanted, this should be a decent step
in the right direction even if we want to do that eventually, though.

Closes numpygh-20790
Closes numpygh-18547  (although not quite sure why)
seberg added a commit to seberg/numpy that referenced this issue Jan 16, 2023
This moves dispatching for `__array_function__` into a C-wrapper.  This
helps speed for multiple reasons:
* Avoids one additional dispatching function call to C
* Avoids the use of `*args, **kwargs` which is slower.
* For simple NumPy calls we can stay in the faster "vectorcall" world

This speeds up things generally a little, but can speed things up a lot
when keyword arguments are used on lightweight functions, for example::

    np.can_cast(arr, dtype, casting="same_kind")

is more than twice as fast with this.

There is one alternative in principle to get best speed:  We could inline
the "relevant argument"/dispatcher extraction.  That changes behavior in
an acceptable but larger way (passes default arguments).
Unless the C-entry point seems unwanted, this should be a decent step
in the right direction even if we want to do that eventually, though.

Closes numpygh-20790
Closes numpygh-18547  (although not quite sure why)
seberg added a commit to seberg/numpy that referenced this issue Jan 17, 2023
This moves dispatching for `__array_function__` into a C-wrapper.  This
helps speed for multiple reasons:
* Avoids one additional dispatching function call to C
* Avoids the use of `*args, **kwargs` which is slower.
* For simple NumPy calls we can stay in the faster "vectorcall" world

This speeds up things generally a little, but can speed things up a lot
when keyword arguments are used on lightweight functions, for example::

    np.can_cast(arr, dtype, casting="same_kind")

is more than twice as fast with this.

There is one alternative in principle to get best speed:  We could inline
the "relevant argument"/dispatcher extraction.  That changes behavior in
an acceptable but larger way (passes default arguments).
Unless the C-entry point seems unwanted, this should be a decent step
in the right direction even if we want to do that eventually, though.

Closes numpygh-20790
Closes numpygh-18547  (although not quite sure why)
@seberg
Copy link
Member

seberg commented Jan 17, 2023

I had tested this with gh-23020 so it should be fixed. I am not sure why, please do open a new issue if this or similar shows up still. I think it shouldn't be hard to resolve this.

@mmckerns
Copy link

FYI: A quick test shows after the PyCapsule patch went into dill, and some adjustments to improve pickling ufunc, all of the above work as expected for dill.settings['recurse'] = False. However, a RecursionError still occurs for dill.settings['recurse'] = True. numpy 1.24.1; sympy 1.11.1, dill 0.3.6.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants