Skip to content

tools/mpremote: Recursive directory sync, regression tests, transport filesystem API, hash command. #11777

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Oct 9, 2024

Conversation

jimmo
Copy link
Member

@jimmo jimmo commented Jun 14, 2023

This PR:

  • Implements the start of a transport-agnostic filesystem API.
  • Re-implements the cat, cp, mkdir, etc commands in terms of this.
  • Adds a new hash command.
  • Adds a new implementation of cp -r that works with arbitrary source/dest paths (i.e. host->device, device->host, device->device, host->host). Ensures that os.path is used for local paths and split("/") is used for remote paths (hopefully this will avoid Windows path issues).
  • Uses file hash to avoid copying over an already-identical file (This is a 10x speedup for recursive copying a directory of files with only a small number of changes). cc @peterhinch
  • Adds the beginning of a set of regression tests for the command line interface. Written in bash for now because it was very simple and I just wanted to ensure that these changes didn't break anything and got the recursive copy behavior correct, but we may choose to re-implement this in Python using system/subprocess later.

TODO:

  • Docs update.
  • Testing on Windows.

This work was funded through GitHub Sponsors.

@github-actions
Copy link

github-actions bot commented Jun 14, 2023

Code size report:

   bare-arm:    +0 +0.000% 
minimal x86:    +0 +0.000% 
   unix x64:    +0 +0.000% standard
      stm32:    +0 +0.000% PYBV10
     mimxrt:    +0 +0.000% TEENSY40
        rp2:    +0 +0.000% RPI_PICO_W
       samd:    +0 +0.000% ADAFRUIT_ITSYBITSY_M4_EXPRESS
  qemu rv32:    +0 +0.000% VIRT_RV32

@codecov
Copy link

codecov bot commented Jun 14, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.57%. Comparing base (6835743) to head (e5eeaa7).
Report is 7 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #11777   +/-   ##
=======================================
  Coverage   98.57%   98.57%           
=======================================
  Files         164      164           
  Lines       21336    21336           
=======================================
  Hits        21031    21031           
  Misses        305      305           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jimmo
Copy link
Member Author

jimmo commented Jun 15, 2023

Note on the method naming. We now have fs_readfile, fs_writefile, fs_printfile, fs_hashfile, but fs_touch, fs_rm, etc.

I quite like the file suffix, mainly because fs_read and fs_write are ambiguous. It also fits with fs_rmdir and fs_mkdir. So I suggest we rename fs_touch and fs_rm to add file.

@dpgeorge
Copy link
Member

So I suggest we rename fs_touch and fs_rm to add file

Yes that sounds fine.

@dpgeorge dpgeorge added the tools Relates to tools/ directory in source, or other tooling label Jun 15, 2023
@jimmo jimmo force-pushed the mpremote-fs-api branch from 18eb600 to 6a40fc1 Compare June 15, 2023 07:25
@jimmo
Copy link
Member Author

jimmo commented Jun 15, 2023

Yes that sounds fine.

Done

@jimmo jimmo force-pushed the mpremote-fs-api branch 2 times, most recently from 563095b to 6f55e00 Compare June 19, 2023 01:53
@jimmo
Copy link
Member Author

jimmo commented Jun 19, 2023

Updated the filesystem test to use a remote ramdisk instead of internal flash (significantly speeds up the test, and allows it to run on all ports instead of being stm32-specific).

@peterhinch
Copy link
Contributor

If I can lob in another suggestion. My fork of rshell has an option to enable rsync to ignore certain files. This is documented here. Obvious uses are to avoid copying to the device files such as documents or backups or private subdirectories.

@jimmo
Copy link
Member Author

jimmo commented Jun 19, 2023

@peterhinch Great suggestion, I'll look into it. Thanks! Will probably do it in a future PR though.

@Josverl
Copy link
Contributor

Josverl commented Jun 27, 2023

@jimmo,
I took a swing at testing this on Windows and try to draw up a quick how-to on that.
Not quite sure how to best merge that with this PR , so both results and testme.md in a Gist for now

In short:

  • The testing works with Git Bash on Windows, It needed a bit of a hack due to the differences between *nix and Windows, and my quite limited understanding how MINGW64 tries to overcome these.
  • 2 of the current tests fail : ./test_filesystem.sh: FAIL, ./test_mount.sh: FAIL
    • most of it appear to be changes in path notation, but there are also some other differences such as the extra * in the hash results.

image

https://gist.github.com/Josverl/8b1804c9ea23582bdd675da1338a8d4d

@Josverl
Copy link
Contributor

Josverl commented Jul 9, 2023

@jimmo , I did a quick PoC of using the command APIs

  • It is significantly faster than working via subprocess run
  • the function/method signatures are a bit hard to use , I was able to figure it out - but its no fun needing to go through argparse namespaces for a lot of calls.
    I think this can be solved by some refactoring
  • there are still a lot of functions that print or even exit rather than return a (list of) values . again I think this is more refactoring to do.
  • I think the exposed apis should be properly typed and some docstrings added - happy to pitch in here , ( although Im not sure how to to that efficiently on a PR that i can only read.

Detailed findings in a Jupyter notebook:

@Wind-stormger
Copy link
Contributor

Currently I have only tested this command in windows: mpremote cp -r path/. :./

Both relative and absolute paths are tested.

path/. allows all files or folders in this path to be copied to the root directory of the device without creating another folder with the same name in the root directory of the device.

In contrast, path/ will first create a folder with the same name and then copy the file.

mpremote cp -r path/. :./ will probably be the most commonly used command.

@andrewleech
Copy link
Contributor

I'm not seeing the file hash speedup of copying single files... should I?

cd micropython
gh pr checkout 11777
cd tools/mpremote
python -m pip install .
anl@STEP: ~ $ mpremote --version
mpremote 1.20.0.post226+g6f55e0075d.d20231030
anl@STEP: ~ $ time mpremote cp hello.py :
cp hello.py :
                                        
real    0m1.373s
user    0m0.096s
sys     0m0.000s
anl@STEP: ~ $ time mpremote cp hello.py :
cp hello.py :
                                        
real    0m1.366s
user    0m0.093s
sys     0m0.002s
anl@STEP: ~ $ time mpremote cp hello.py :
cp hello.py :
                                        
real    0m1.416s
user    0m0.071s
sys     0m0.035s

The same command works with mpremote installed in windows

py.exe -m pip install .
PS D:\Downloads> mpremote.exe --version
mpremote 1.20.0.post226+g6f55e0075d.d20231030

PS D:\Downloads> Measure-Command -Expression { mpremote.exe cp hello.py :  | Out-Default } | select TotalSeconds
cp hello.py :

TotalSeconds
------------
    1.5850042

PS D:\Downloads> Measure-Command -Expression { mpremote.exe cp hello.py :  | Out-Default } | select TotalSeconds
cp hello.py :

TotalSeconds
------------
   1.5931215

@zcattacz
Copy link

I am not sure when people would use mpremote cp for host->host operation. I used it a few times, all by mistake.
If this's also true for others, maybe it's reasonable to let mpremote cp make some noise before doing host->host.

@dpgeorge
Copy link
Member

I updated this PR:

  • rebased on latest master, fixed conflicts
  • make tests more deterministic by sorting ls output
  • factor stdout_write_bytes and make it work without sys.stdout.buffer
  • fix copying a file to the host, remove the leading :
  • rename fs hash to fs sha256sum
  • make fs cp always check hash and skip if file is the same, by default (even when copying a single file)
  • add --force/-f option to fs cp to force a copy and not check the hash

@dpgeorge
Copy link
Member

I'm not seeing the file hash speedup of copying single files

That's now fixed, all copying will check the hash first and skip if it's the same file.

@dpgeorge
Copy link
Member

I added some documentation for the new features.

@dpgeorge
Copy link
Member

I did some basic tests on Windows 10, recursive copy to and from the device. It seems to work correctly (including the hash checking if files have changed or not).

@peterhinch
Copy link
Contributor

Is it possible to direct cp, ls etc to a mounted device such as a Pyboard's SD card? Or hardware mounted by main.py?

@Josverl
Copy link
Contributor

Josverl commented Sep 27, 2024

I can do some more tests on w11 with different devices and SD cards over the weekend

@dpgeorge
Copy link
Member

Is it possible to direct cp, ls etc to a mounted device such as a Pyboard's SD card?

Yes, if that's mounted by boot.py (or by the system, eg stm32 will mount SD card automatically), you should be able to just reference it as normal, eg mpremote cp foo :/sd.

@peterhinch
Copy link
Contributor

peterhinch commented Sep 28, 2024

Thanks for that. The required syntax might be more evident if mpremote ls : listed the mounted filesystems

/sd
/flash

On a device with multiple mounted filesystems it's not clear to where mpremote cp foo : would direct the file. I guess one way to resolve the ambiguity would be for mpremote to have a means of listing active mountpoints.

@dpgeorge
Copy link
Member

On a device with multiple mounted filesystems it's not clear to where mpremote cp foo : would direct the file

It defaults to the current directory on the target, which is usually either / or /flash.

I guess one way to resolve the ambiguity would be for mpremote to have a means of listing active mountpoints.

You should be able to do this with mpremote ls /.

Also mpremote df is intended to show all mount points and how much space they have used/remaining.

@mattytrentini
Copy link
Contributor

mattytrentini commented Sep 30, 2024

If anyone wants to test this version of mpremote, one way is to install this PR with uv.

Switch your MicroPython clone to this PR and install (from the root of your clone):

gh pr checkout 11777
uv tool install tools/mpremote

Or if you don't have a handy MicroPython clone, check out Jim's source branch and install:

git clone -b mpremote-fs-api --single-branch https://github.com/jimmo/micropython/ jimmo-micropython
uv tool install jimmo-micropython/tools/mpremote

I'm sure there are other ways! (I couldn't figure out how to install a tool with uv from a subdirectory of a branch of a github repository.)

@peterhinch
Copy link
Contributor

It defaults to the current directory on the target, which is usually either / or /flash.

To the naive user this results in nondeterministic results. On a Pyboard ls : sometimes shows the Flash, sometimes the SD card. In particular the destination of cp foo.py : was opaque. I concluded that support for multiple filesystems was incomplete and used rshell on these targets.

While mpremote works well when you know how, It would be more intuitive and discoverable (IMO) if ls : worked as follows:

  • On a device with one filesystem, as at present.
  • Otherwise show the mountpoints.

@dpgeorge
Copy link
Member

dpgeorge commented Oct 1, 2024

It would be more intuitive and discoverable (IMO) if ls : worked as follows:

* On a device with one filesystem, as at present.

* Otherwise show the mountpoints.

The problem with making ls : show the mount points, is that means it's listing /, ie ls :/. Then when you do a copy, eg cp foo : one would expect that to go in the same location as ls :, but it won't (and it can't because the root is not writable on these systems).

Well, changing ls : to do ls :/ would be a pretty big breaking change, and mean that users now need to do ls :/flash to get the existing behaviour.

@Josverl
Copy link
Contributor

Josverl commented Oct 1, 2024

for mpremote to have a means of listing active mountpoints.
@peterhinch , I agree that this is a hassle , but the cause is not mpremote, in my experience the different ports ( and families) have different behaviors with the 'default path' after a restart.
Another common case is where a previous mpremote command, or an application running on the device could have os.chdir('/elsewhere') and mpremote resume is used to avoid to restart the MCU.

I have not found a way to enumerate the mount points ,
In order to be able to run the same scripts reliably across systems; I have needed to implement a bit of a kludge/ trial and error approach. While it mostly works for my purposes, but I know it will return unreliable results on an esp32 with an SD card mounted, or with folders matching the names of expected mount points.
https://github.com/Josverl/micropython-stubber/blob/main/src/stubber/board/createstubs.py#L652-L666

As I understand it there are two things that would be good to expose:

functionality MicroPython mpremote
cwd os.getcwd() mpremote resume eval 'os.getcwd()'
default cwd at boot custom boot.py mpremote exec "import os;print(os.getcwd())" (Note: No resume - resets MCU)
list mountpoints not possible AFAIK <-

list mountpoints:

  • os.stat() does not return a complete os.stat_result, specifically st_dev / 3rd item in the tuple, is missing / 0. if there is a way can return an mountpoint id if the path is a mountpoint , then that would allow differntiating between a mountpoint and a folder, and thus of enumerating the mountpoints.
  • os.statvfs() just shows info on the vfs that the given path resolves to

then based on that mpremote could have a additional standard or custom command to list the mount mountpoints.

@peterhinch
Copy link
Contributor

The problem with making ls : show the mount points, is that means it's listing /, ie ls :/. Then when you do a copy, eg cp foo : one would expect that to go in the same location as ls :, but it won't (and it can't because the root is not writable on these systems).

I don't follow. I'm suggesting that on a device with a single filesystem, mpremote works exactly as at present.

On a system with multiple mounted filesystems, ls : returns a list of mountpoints. cp foo.py : would throw an error as / is not writeable. The user would issue cp foo.py :/sd/ - as at present. If ls : returns a list of mountpoints it would be evident to the user that cp foo.py : would be expected to fail.

Currently the destination of cp foo.py : cannot be determined without knowledge of the state of the target. It seems reasonable to disallow this apparently nondeterministic behaviour.

If the behaviour is to remain as at present, it should be documented. I have used mpremote daily since its inception (superb tool, btw) and read the docs many times, without realising the capability of handling multiple filesystems existed. I doubt I'm alone.

@dpgeorge
Copy link
Member

dpgeorge commented Oct 3, 2024

@peterhinch @Josverl I think we should open a separate issue about listing mount points and the default target directory when : is used, eg ls :. That discussion is pretty much independent to this PR (which does not change anything in that regard), and it would be good to get this one merged.

@Josverl did you get a chance to test on Windows 11?

@Josverl
Copy link
Contributor

Josverl commented Oct 4, 2024

@dpgeorge,
testing thusfar has been good.
Windows 11 24H2
to ports:

  • stm32 ( to :/sd, :/flash, and ':')
  • samd51
  • rp2040
  • esp32

I have created some pytest tests to repeat across different hardware and MCUs, both with static files and folders that are copied back and forth and compared on essentials. As well as some random generated file content.
I want to add some more folder name variations using Hypothesis, but no issues other that that I needed to special case the stm32 mountpoints in the tests.

I agree that the mountpoint conversions is best handled in its own issue and PR

@dpgeorge
Copy link
Member

dpgeorge commented Oct 4, 2024

OK, thanks for the update. Your testing sounds quite thorough.

I think this is probably ready to be merged now, but if you want to do more tests then let me know when you're done and happy with this.

@dpgeorge
Copy link
Member

dpgeorge commented Oct 9, 2024

I will merge this now. Any further fixes/improvements can happen separately.

jimmo and others added 7 commits October 9, 2024 15:56
This is a step towards making the transport expose a Python API rather than
functions that mostly print to stdout.

Most use cases of `transport.eval()` are to get some state back from the
device, so have it return as a value directly by default.

Updates uses of `transport.eval()` to remove the parse argument where it
now isn't needed, make the `rtc` command use eval/exec, and update the
`mip` command to use eval's parsing.

This work was funded through GitHub Sponsors.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
This introduces a Python filesystem API on `Transport` that is implemented
entirely with eval/exec provided by the underlying transport subclass.

Updates existing mpremote filesystem commands (and `edit) to use this API.

Also re-implements recursive `cp` to allow arbitrary source / destination.

This work was funded through GitHub Sponsors.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
Signed-off-by: Damien George <damien@micropython.org>
Changes in this commit:
- Adds transport API `fs_hashfile` to compute the hash of a file with given
  algorithm.
- Adds commands `mpremote <...>sum file` to compute and print hashes of
  various algorithms.
- Adds shortcut `mpremote sha256sum file`.
- Uses the hash computation to improve speed of recursive file copy to
  avoid copying a file where the target is identical.

For recursive copy, if possible it will use the board's support (e.g.
built-in hashlib or hashlib from micropython-lib), but will fall back to
downloading the file and using the local implementation.

This work was funded through GitHub Sponsors.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
Signed-off-by: Damien George <damien@micropython.org>
Makes the filesystem command give standard error messages rather than
just printing the exception from the device.

Makes the distinction between CommandError and TransportError clearer.

This work was funded through GitHub Sponsors.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
These tests are specifically for the command-line interface and cover:
 - resume/soft-reset/connect/disconnect
 - mount
 - fs cp,touch,mkdir,cat,sha256sum,rm,rmdir
 - eval/exec/run

This work was funded through GitHub Sponsors.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
Signed-off-by: Damien George <damien@micropython.org>
This adds a -f/--force option to the "cp" command, which forces
unconditional copies, in particular does not check the hash.

Signed-off-by: Damien George <damien@micropython.org>
Signed-off-by: Damien George <damien@micropython.org>
@dpgeorge dpgeorge merged commit e5eeaa7 into micropython:master Oct 9, 2024
66 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tools Relates to tools/ directory in source, or other tooling
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants