-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for annex add --batch and use it if more than a single file (and not directories) is provided #6977
base: master
Are you sure you want to change the base?
Conversation
Note: benchmarks are ok to fail ATM since we are adding all the --batch overhead for **every** `add` ATM due to TEMP `if True`. But they fail not because of slow down but because some calls/ fail (TODO)before after ratio
[dc77a088] [428f3ccc]
<bm/merge-target> <bm/pr>
! 1.87±0.06s failed n/a api.supers.time_createadd
! 1.97±0.09s failed n/a api.supers.time_createadd_to_dataset
failed failed n/a api.supers.time_diff
failed failed n/a api.supers.time_diff_recursive
! 6.98±0.01s failed n/a api.supers.time_installr
! 487±10ms failed n/a api.supers.time_ls
! 2.65±0.04s failed n/a api.supers.time_ls_recursive
! 2.85±0s failed n/a api.supers.time_ls_recursive_long_all
failed failed n/a api.supers.time_remove
! 1.49±0.01s failed n/a api.supers.time_status
! 2.76±0.06s failed n/a api.supers.time_status_recursive
! 120±5ms failed n/a api.supers.time_subdatasets
! 4[11](https://github.com/datalad/datalad/runs/7998607905?check_suite_focus=true#step:8:12)±4ms failed n/a api.supers.time_subdatasets_recursive
! [12](https://github.com/datalad/datalad/runs/7998607905?check_suite_focus=true#step:8:13)3±3ms failed n/a api.supers.time_subdatasets_recursive_first
! 421±10ms failed n/a api.supers.time_uninstall
806±10ms 817±10ms 1.01 api.testds.time_create_test_dataset1
! 5.44±0.01s failed n/a api.testds.time_create_test_dataset2x2
290±3ms 300±8ms 1.04 cli.startup.time_command_execution
286±9ms 295±9ms 1.03 cli.startup.time_command_help_np
292±10ms 296±4ms 1.01 cli.startup.time_command_short_help
625±6ms 640±10ms 1.02 cli.startup.time_help_np
293±7ms 289±2ms 0.99 cli.startup.time_short_help
274±3ms 272±5ms 1.00 cli.startup.time_usage_advice
184±8ms 188±6ms 1.02 core.startup.time_import
6[13](https://github.com/datalad/datalad/runs/7998607905?check_suite_focus=true#step:8:14)±5ms 619±3ms 1.01 core.startup.time_import_api
3.19±0.03ms 3.20±0.08ms 1.01 core.witlessrunner.time_echo
3.76±0.07ms 3.64±0.04ms 0.97 core.witlessrunner.time_echo_gitrunner
3.98±0.04ms 4.40±0.3ms ~1.10 core.witlessrunner.time_echo_gitrunner_fullcapture
738±10ms 757±20ms 1.03 plugins.addurls.addurls1.time_addurls('*')
2.38±0.05s 2.46±0.05s 1.03 plugins.addurls.addurls1.time_addurls(None)
! 20.4±2ms failed n/a repo.gitrepo.time_get_content_info
8.89±0.4ms 8.68±0.02ms 0.98 support.path.get_parent_paths.time_allsubmods_toplevel
8.11±0.2ms 8.01±0ms 0.99 support.path.get_parent_paths.time_allsubmods_toplevel_only
392±0.3ns 4[15](https://github.com/datalad/datalad/runs/7998607905?check_suite_focus=true#step:8:16)±5ns 1.06 support.path.get_parent_paths.time_no_submods
6.96±0.03ms 6.97±0.03ms 1.00 support.path.get_parent_paths.time_one_submod_subdir
7.06±0.09ms 7.[21](https://github.com/datalad/datalad/runs/7998607905?check_suite_focus=true#step:8:22)±0.6ms 1.02 support.path.get_parent_paths.time_one_submod_toplevel
57.4±0.2s 57.0±0.3s 0.99 usecases.study_forrest.time_make_studyforrest_mockup |
# python -m nose -s -v datalad/support/tests/test_annexrepo.py:test_files_split | ||
# | ||
# It seems that BatchedCommand lost `yield_` in | ||
# 9f066efa6a0b316a133b6d6d0d4e4c2e1b856ce0 and ability to be used as a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yield_
was removed because it was not used anymore. It is no problem to bring it back, the proc1
-method still exists and exhibits the original semantics.
If you have specific requirements that go beyond the original yield_
, I am happy to incorporate them :-) (the current runner-based BatchedCommand-implementation does faithfully recreate the behavior of the previous implementation with no further extensions).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no immediate other requirements -- just want to have a generator to avoid the overhead of pre/post logic which we otherwise invoke per each file. Do you think you could provide needed changes within this PR @christian-monch ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping @christian-monch on this elderly request -- I am back to this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking into the required changes
for f in files: | ||
yield batched_get.proc1(f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. This is the replacement for yield_
, if files is a list of strings.
59f2a5a
to
05f9560
Compare
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #6977 +/- ##
===========================================
- Coverage 91.59% 73.00% -18.59%
===========================================
Files 325 354 +29
Lines 43427 46347 +2920
Branches 5821 6613 +792
===========================================
- Hits 39777 33837 -5940
- Misses 3635 12486 +8851
- Partials 15 24 +9
☔ View full report in Codecov by Sentry. |
FTR: rebased, pushed. |
Yarik said: If he doesn't get to it in the next week, feel free to close |
05f9560
to
a0c365f
Compare
I pushed into |
a0c365f
to
fdd1df9
Compare
Closes datalad#7614 === Do not change lines below === { "chain": [], "cmd": "git-sedi JSOB JSON", "exit": 0, "extra_inputs": [], "inputs": [], "outputs": [], "pwd": "." } ^^^ Do not change lines above ^^^
fdd1df9
to
02ba927
Compare
Code Climate has analyzed commit 02ba927 and detected 3 issues on this pull request. Here's the issue category breakdown:
View more on Code Climate. |
Resurrects #5722 which could not be reopened since I force-pushed
If ever finished, Closes #5721, #6977 (hopefully)
If to proceed
if True
to always try batch mode0.92 plugins.addurls.addurls1.time_addurls('*')
is happierbatch=None
logic handling is sound (taken fromget_file_key
), so I would just keep it but could be improved to just reuse a batchedadd
if already one running.total_nbytes
(I did not even look into what it is about)