Skip to content

dump/load sessions with non-arrays (hdf, pickle) #153

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
gdementen opened this issue Mar 16, 2017 · 7 comments
Open

dump/load sessions with non-arrays (hdf, pickle) #153

gdementen opened this issue Mar 16, 2017 · 7 comments

Comments

@gdementen
Copy link
Contributor

I am thinking mostly of groups and scalars.

This would be especially important for users of the standalone interface (see #88).

When using the standalone interface, they will probably want to also dump functions defined within the console in the same file, so that they can start working exactly where they left of, but we could use a specific format for that (e.g pickle the whole session), but in that case, using from larray import * will become problematic.

@gdementen
Copy link
Contributor Author

gdementen commented Mar 21, 2017

This might be a crazy idea, but, if we want to save python functions inside an .xlsx, we could place them inside macros, potentially wrapped inside xlwings RunPython so that they are even executable from within Excel. For maintainability and collaboration it is probably better to save the python file externally (so that you can put it on a VCS, but there might be cases where this could be useful.

In that case, this could be useful:
http://stackoverflow.com/questions/17197259/use-python-to-inject-macros-into-spreadsheets

@gdementen
Copy link
Contributor Author

We might want to use something like https://github.com/h5io/h5io or https://github.com/telegraphic/hickle

@gdementen
Copy link
Contributor Author

Or, we could roll our own stuff. If we go that route and want to continue using Pandas HDFStore, this might be useful:

http://stackoverflow.com/questions/29129095/save-additional-attributes-in-pandas-dataframe/29130146#29130146

In [172]: df = pd.DataFrame(np.random.randn(8,3))
In [173]: store = pd.HDFStore('test.h5')
In [174]: store.put('df',df)
# you can store an arbitrary python object via pickle
In [175]: store.get_storer('df').attrs.my_attribute = dict(A = 10)
In [176]: store.get_storer('df').attrs.my_attribute
Out[176]: {'A': 10}

@gdementen
Copy link
Contributor Author

gdementen commented Sep 20, 2017

Here are links to stuff that might help (or not). This is a dump of all the browser windows I have been keeping open for months. ;-)

  • standard pickle (if using a low enough protocol) are guaranteed to work across Python versions. Marshal (used to create .pyc) on the other hand is not.

  • to pickle tracebacks: https://github.com/ionelmc/python-tblib

  • better pickles (pickle more kinds of objects than standard pickle):

  • jsonpickle (serialize most python objects as json -- but with similar security implications than pickle!). The format is nicer than pickle, but I am unsure it brings any value to the table, if we do not plan to exchange sessions across languages.
    https://jsonpickle.github.io/

  • teleport: JSON with "types"/predefined structure. An nice alternative if we only want to serialiaze data.
    http://www.teleport-json.org/python/

  • signed pickles (python2 only): http://trustedpickle.sourceforge.net/
    this works by generating a set of private/public keys and signing outgoing pickles with the private key and checking the sender is "trusted" by using the public key. This obviously does not prevent an attacker to sign his code, so it wouldn't help for making the usual case (here is my session, could you try it?) safer. Given that it would probably be "tedious" to setup for users, it will probably not be worth the trouble for a looong time. But the idea is interesting nevertheless, if we need to regularly exchange data with the same external users and want to provide some security.

@alixdamman
Copy link
Collaborator

@gdementen What is the difference between this issue and #578?

If the difference is to handle objects that are not LArray, Axis and Group (like session's metadata), I suggest to rename this issue as "dump/load sessions with metadata"?

@gdementen
Copy link
Contributor Author

#578 is about the same thing as this issue except that it is more limited. This issue is about saving an interactive session to disk, close the editor, then load it back and continue working. Issue #578 will indeed bring us closer to that goal but not all the way to it. We need at least scalars and "simple python structures" of supported types to get there. Possibly functions/arbitrary objects too, but that can come later.

Note that neither issue speak about session metadata. I think I never thought about that but users will indeed want this at some point, so please create an issue for this (to be done after or at the same time than metadata for LArray #78 and #79).

@alixdamman
Copy link
Collaborator

alixdamman commented Apr 6, 2018

To tell the truth, I'm thinking to try to implement a Jupyterlab extension for LArray and then abandon the editor (not soon).

OK to create a new issue --> see #615

@alixdamman alixdamman modified the milestones: 0.29, nice_to_have Apr 6, 2018
@gdementen gdementen removed this from the nice_to_have milestone Aug 1, 2019
@alixdamman alixdamman added this to the nice_to_have milestone Oct 10, 2019
@alixdamman alixdamman changed the title dump/load sessions with non-arrays dump/load sessions with non-arrays (hdf, pickle) Nov 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants