-
Notifications
You must be signed in to change notification settings - Fork 24.9k
Remove usage of fsspec in HF consolidation script #159392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Moving towards just supporting local storage to take advantage of HF apis such as safe_open. This PR removes fsspec usages and relies on local storage only Differential Revision: [D78997975](https://our.internmc.facebook.com/intern/diff/D78997975/) [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159392
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit d3f0cda with merge base eed9dbf ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This pull request was exported from Phabricator. Differential Revision: D78997975 |
return f.read(end_offset - start_offset) | ||
# Use mmap for efficient access | ||
with open(file_path, "rb") as f: | ||
with mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ) as mm: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put both context managers on the same with statement to reduce unnecessary indent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't work because the mmap call needs to be nested in the open
output_files_data[output_path] = _OutputFileData() | ||
|
||
# Find all safetensors files in the input directory | ||
safetensors_files = [] | ||
for file in input_fs.ls(input_dir, detail=False): | ||
for file in os.listdir(input_dir): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is now local, why not just glob or iglob with the suffix? Would remove the need for os.path joins below? Also would turn it into one line glob.glob(os.path.join(input_dir,f"*{SUFFIX}"))
Moving towards just supporting local storage to take advantage of HF apis such as safe_open. This PR removes fsspec usages and relies on local storage only Differential Revision: [D78997975](https://our.internmc.facebook.com/intern/diff/D78997975/) cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta [ghstack-poisoned]
Pull Request resolved: #159392 Moving towards just supporting local storage to take advantage of HF apis such as safe_open. This PR removes fsspec usages and relies on local storage only ghstack-source-id: 300119133 @exported-using-ghexport Differential Revision: [D78997975](https://our.internmc.facebook.com/intern/diff/D78997975/)
This pull request was exported from Phabricator. Differential Revision: D78997975 |
Moving towards just supporting local storage to take advantage of HF apis such as safe_open. This PR removes fsspec usages and relies on local storage only Differential Revision: [D78997975](https://our.internmc.facebook.com/intern/diff/D78997975/) cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta [ghstack-poisoned]
This pull request was exported from Phabricator. Differential Revision: D78997975 |
Moving towards just supporting local storage to take advantage of HF apis such as safe_open. This was already done in Storage component in #159405. This PR removes fsspec usages in consolidation script and relies on local storage only Differential Revision: [D78997975](https://our.internmc.facebook.com/intern/diff/D78997975/) cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta [ghstack-poisoned]
Pull Request resolved: #159392 Moving towards just supporting local storage to take advantage of HF apis such as safe_open. This PR removes fsspec usages and relies on local storage only ghstack-source-id: 302105247 @exported-using-ghexport Differential Revision: [D78997975](https://our.internmc.facebook.com/intern/diff/D78997975/)
This pull request was exported from Phabricator. Differential Revision: D78997975 |
Moving towards just supporting local storage to take advantage of HF apis such as safe_open. This was already done in Storage component in #159405. This PR removes fsspec usages in consolidation script and relies on local storage only Differential Revision: [D78997975](https://our.internmc.facebook.com/intern/diff/D78997975/) cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta [ghstack-poisoned]
Pull Request resolved: #159392 Moving towards just supporting local storage to take advantage of HF apis such as safe_open. This PR removes fsspec usages and relies on local storage only ghstack-source-id: 302468409 @exported-using-ghexport Differential Revision: [D78997975](https://our.internmc.facebook.com/intern/diff/D78997975/)
This pull request was exported from Phabricator. Differential Revision: D78997975 |
@pytorchmergebot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Stack from ghstack (oldest at bottom):
Moving towards just supporting local storage to take advantage of HF apis such as safe_open. This was already done in Storage component in #159405. This PR removes fsspec usages in consolidation script and relies on local storage only
Differential Revision: D78997975
cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta