-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Introducing pyscript.fs namespace/module #2289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
8a58df8
to
2aabb6e
Compare
for more information, see https://pre-commit.ci
2aabb6e
to
bf53aba
Compare
for more information, see https://pre-commit.ci
99f7beb
to
2a36f3e
Compare
OK, while things might (and likely will) change or be discussed, I've published the current state on npm as @ntoll I will demo this during today PyScript Fun Community call but feel free to chime in with thoughts or comments, if you have any. I do like the current state of the MR:
|
2a36f3e
to
fd57c56
Compare
FYI I've pinged @dpgeorge in Discord around this issue as I think some work needs to be addressed on the MicroPython side of affairs and we all would love to have this API work in there too without further thinking from our users. |
fd57c56
to
da4d451
Compare
Update this now works in MicrpoPython too through exact same logic Pyodide implemented ... it's up and running as |
da4d451
to
fd9ae86
Compare
This is outstanding work. Bravo. Some thoughts:
from pyscript import fs
await with fs.MountPoint("foo"):
... do stuff ..
# At this point, now out of scope of the `MountPoint` context handler, the fs is synced
# and unmounted. An example implementation (off the top of my head) might be: # Somewhere in the fs module...
class MountPoint:
"""
An async context handler for working with a mounted local filesystem. Automatically
syncs and unmounts the referenced mount-point once out of scope.
"""
def __init__(self, path, mode="readwrite", root="", id="pyscript"):
"""
Expects exactly the same args as the fs.mount function.
"""
self.path = path
self.mode = mode
self.root = root
self.id= id
async def __aenter__(self):
await mount(self.path, self.mode, self.root, self.id)
async def __aexit__(self, *args):
await sync(self.path)
# If unmount is also implemented
await unmount(self.path) I actually prefer your (@WebReflection's) more explicit way of working with the fs... but Pythonistas (like @fpliger) would really appreciate the expected "Pythonic" idiom of context handlers. This feels like a classic PyScript case of trying to integrate aspects of both the browser and Python in a coherent and aesthetic manner. Once again, really great work. Bravo. |
@ntoll what does unmount mean?
I would like to understand use cases for unmounting because Pyodide didn't consider it and I could not think of any use case for desiring that operation plus it's problematic and it brings nothing to the plate (unless use cases are explicit indeed). If we remove the handler from IndexedDB all my effort to make it work the wonderful way it does now through Workers and with or without SharedArrayBuffer would be vane and that's unfortunate because we have an opportunity to provide something nobody has done before even on JS side of affairs. Last, but not least, having that All demo around this feature in Pyodide are also based on my approach for a reason: it cannot be seamlessly integrated like a context manager would so your class sketched idea breaks in unexpected way if operations are not performed after the modal granted access or whenever a So, as soon as all these questions are answered by you or @fpliger I can say if it's worth it or not ... so far I think what's offered is exactly what our users explicitly stated (in Discord) would be awesome to have and it's extremely simple (imho) to reason about. edit on top of all this, a transient user action is not something you want to spread around your logic ... it's something by Web design and specs you should ask once and never again ... having that randomly asked every single time that mounted file is needed, in case we want to unmount and erase the access right each time, feels counter productive and some added friction for no reason ... pythonista or not, here we're still dealing with Web related security flows, not just ideal APIs we could have ignoring these Web related constraints so we should really be careful here in changing the current approach, imho, but happy to work on the best outcome we can and still I would like to understand use cases or concrete examples that are better, thanks. |
To expand further, please read pyodide mount ability and realize there is not a single unmount operation in there. On top of that, with the current logic provided by this PR it means that no Python code needs to know, learn, or understand how the underlying path has been mounted or provided, it's fully transparent, meaning that if Requiring a different code/logic/class no other Python env knows or understand out there means breaking compatibility and portability across already working code. Here one could just add a single Providing a new API nobody (so far) asked for, would require changes all over previously working code written in pure Python. That being said, calling sync per each mounted operation is also deadly slow so what we can do there is await for the time to be right when Emscripten will implement its own auto sync FS feature which is still not there yet, so that we won't diverge in logic and capabilities from what Emscripten, hence MicroPython and Pyodide, will provide too. |
Hey hey @WebReflection - thanks for the feedback. I was, as I know you realise, thinking out loud given the Pythonic However, I'd failed to take into account the transient activation - which means using Regarding unmount. 😉 OK... I had a "thought experiment" use case in mind. As always feedback most welcome! Say you have some sort of data science tool delivered via PyScript. Because of size of download the datasets to be processed by the tool are to be loaded from the user's local filesystem. I imagine that different datasets for such analysis may be in different folders on the user's filesystem, and so being able to unmount and remount different local folders to the same location in the browser's virtual FS might be useful should you wish to change datasets being analysed by the tool. The same probably also applies for LLM models too - different models mounted to the same virtual location for use with a PyScript delivered LLM library would be rather useful. Also, this may apply to spreadsheet data for PySheets? @laffra, is this just a silly thought experiment or would this be useful for you? However, I TOTALLY get this is a thought experiment - and I'm mentioning it here merely to promote a discussion. I wonder what @fpliger thinks? In any case, as a first step, the totality of this PR is magnificent. Perhaps we can keep this open for feedback from Fabio, but then I say merge and release asap..? This is a big deal, as the reaction on discord shows. |
@ntoll With access to the local filesystem, PySheets can load local Excel or CSV files, process them in PyOdide in the browser, and then save the resulting result in JSON, CSV, PDF, HTML, or PNG. This would make PySheets a lot more practical as a research tool. |
that's not possible though so it will be rather confusing. edit amend, there is a possible workaround to mount different folders over the same path, see #2289 (comment) ... we still need to dig further to see if that's beneficial or practical, but there is hope with a great default and an even greater extra parameter to have one Python reachable folder able to reach different paths (one per time, of course) ! The stored handler with previously granted rights works only for the selected folder ... the name you give or use in Python code to that folder is irrelevant for the Web FileSystem Directory Picker API so your idea of swapping folders at the Python level out of a granted access fails in practice due the way the thing work ... on the other hand, you have a perfect use case to keep the current API exactly as it is, assuming you grant accesso to your Downloads/data folder and in there you have multiple sub-folders you can crawl whenever you need ... mount that as These are the reason I wanted this landed because there are technical limitations we need to consider:
Last, but not least, it's true that data could be huge and the current Pyodide implementation (hence its MicroPython port) uses MEMFS so if you mount 5 folders with 10Gb of models each, stuff won't work ... however, there's literally nothing we can do (right now) about it, we need to wait for WasmFS to land (it did already) and be complete enough to be adopted by Pyodide ... we'll follow up after with MicroPython and that should give us "infinite access without RAM constraints" which is currently the case.
It is a big deal but most importantly there's really no other way around ... we're delivering the best approach Pyodide and Emscripten offer to date to deal with that API while other parts are still moving (WasmFS) so that primitives are all there and it's through those primitive we can eventually refine, improve, or offer a contextual with class in the near future but what works in here will remain for a while out there as "the only way" due constraints around the stack and feature which is also experimental on the Web, on Pyodide, and so on. To clarify, I never meant to say "this is our new fs namespace and it's frozen", this MR was meant to rather state "we're working on edge experimental capabilities as soon as these are available to provide most basic bricks to build even more awesome flows on top of that right after" which I hope we share as sentiment in the PysRcript team and I hope @fpliger agrees on that too. |
@laffra that's already possible with this MR and the currently published npm module: you select the folder with CSV files, you work with those without needing to list all of them in the |
@ntoll just because I've looked at the code (then I'll be likely off 'til Monday) the only possibility to mount multiple handlers into the same path is to use an fs.mount("/same-path", name="project2") that would ask for a new transient activation flow in case This is doable right now where the default name can be just an empty string and it feels like something extra we can provide. However the fact we don't get to unmount it remains, but I can try to figure out out of the Emscripten filesystem if that's a possibility so we'll have that extra utility with actually a meaning, without revoking access to that previously mounted folder (the directory handler is not so heavy in terms of used bytes but it's gold in terms of potentials). Could this be a compromise that let us all move forward happier than now? I have no objection around this compromise but happy to hear back from you if that feels reasonable and it is a decent DX too (which I think it is, it just requires extra documentation around this part of the API). If you are OK, on Monday:
Last thing is that eventually I would like to also investigate if an edit on a late second thought, this would allow Pyscript to read from a folder and store into another through the same path ... which I think is an extra awesome capability ... I hope my findings on Monday will make me scream Hooooraaay! |
this is a quick one for @fpliger: if my latest idea works I think it's superior to the contextual with because it will work with any That's the summary, if things go smoothly, and I hope to hear you OK around this summary 👋 |
Excellent work @WebReflection ! This is really awesome. Honestly, I think there's a bit of disconnect between possible use cases and flows on how this can be used.... IIRC, the context manager case was to have a pythonic way to make sure I do think that having a context manager that calls I'm +1 on merging and discussing an additional follow up :) |
@fpliger awesome, then we all agree ... my idea is that Like I've said, ideally any Updates on Monday, or after, you all enjoy your weekend 👋 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* verified that RAM gets freed * allowed to mount different handlers within the same path through different `id` as that's the Web best way to do so
4a89e78
to
6388736
Compare
for more information, see https://pre-commit.ci
Thanks @ntoll for approval + I have added other changes and here are both changes and findings:
Quick examples of the current basic API: from pyscript import fs
# ask once for permission to mount any local folder
# into the Virtual FileSystem handled by Pyodide/MicroPython
await fs.mount("/local")
# if changes were made, ensure these are persistent in the local system folder
await fs.sync("/local")
# if needed to free RAM or that specific path, sync and unmount
await fs.unmount("/local") variants # mount a local folder specifying a different handler
# this requires user explicit transient action (once)
await fs.mount("/local", id="v1")
# ... operate on that folder ...
await fs.unmount("/local")
# mount a local folder specifying a different handler
# this requires user explicit transient action (once)
await fs.mount("/local", id="v2")
# ... operate on that folder ...
await fs.unmount("/local")
# go back to the original handler or previous one
# no transient action required now
await fs.mount("/local", id="v1")
# ... operate again on that folder ... The The This is the entirety of the current Last, but not least, we should declare in our docs that Happy to expand or clarify further but right now I am super happy about the current state and, like we stated already, we can also improve later on with ease. |
* introducing pyscript.fs namespace/module * Added proper rejection when showDirectoryPicker is not supported * Improved exports to make explicit import in 3rd party modules easier * implemented `fs.unmount(path)`: * verified that RAM gets freed * allowed to mount different handlers within the same path through different `id` as that's the Web best way to do so
* introducing pyscript.fs namespace/module * Added proper rejection when showDirectoryPicker is not supported * Improved exports to make explicit import in 3rd party modules easier * implemented `fs.unmount(path)`: * verified that RAM gets freed * allowed to mount different handlers within the same path through different `id` as that's the Web best way to do so
Description
We have recently discovered the pyodide ability to mount users' local folders via mountNativeFS API and we would like to provide that experiment via our own
pyscript.fs
module.This MR provides all the basic "bricks" to have the ability to
await fs.mount("/path")
and later on, if changes are made to such path, the ability toawait fs.sync("/path")
so that changes become persistent and available for any further visit of the same page/app/tab/domain.Works only in Chrome/ium⚠️
I wouldn't know how to best reflect the fact this feature currently works only in chrome as it requires showDirectoryPicker which is supported even on Android but nowhere around Firefox or Safari.
No Polyfill⚠️
Due the inability to persist the directory handler via the only polyfill I could find, it's unclear how users could decide different strategies when it comes to storing files ... although we do provide a
pyscript.store
API so they should be covered.The check on their side would be as simple as:
Changes
IDBMap
module and its Sync counterpart get exported, as it's the best/easiest way to actually persist data between either main or workers threads. The latest version also monkey-patch MicroPython interpreter providing the exact same functionality Pyodide provides aroundmountNativeFS
Checklist
make build
works locally.