-
Notifications
You must be signed in to change notification settings - Fork 24.9k
Remove usage of fsspec in HF consolidation script #159392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: gh/ankitageorge/14/base
Are you sure you want to change the base?
Conversation
Moving towards just supporting local storage to take advantage of HF apis such as safe_open. This PR removes fsspec usages and relies on local storage only Differential Revision: [D78997975](https://our.internmc.facebook.com/intern/diff/D78997975/) [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159392
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ✅ You can merge normally! (1 Unrelated Failure)As of commit fac7914 with merge base f02b783 ( UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This pull request was exported from Phabricator. Differential Revision: D78997975 |
return f.read(end_offset - start_offset) | ||
# Use mmap for efficient access | ||
with open(file_path, "rb") as f: | ||
with mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ) as mm: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put both context managers on the same with statement to reduce unnecessary indent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't work because the mmap call needs to be nested in the open
output_files_data[output_path] = _OutputFileData() | ||
|
||
# Find all safetensors files in the input directory | ||
safetensors_files = [] | ||
for file in input_fs.ls(input_dir, detail=False): | ||
for file in os.listdir(input_dir): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is now local, why not just glob or iglob with the suffix? Would remove the need for os.path joins below? Also would turn it into one line glob.glob(os.path.join(input_dir,f"*{SUFFIX}"))
Moving towards just supporting local storage to take advantage of HF apis such as safe_open. This PR removes fsspec usages and relies on local storage only Differential Revision: [D78997975](https://our.internmc.facebook.com/intern/diff/D78997975/) cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta [ghstack-poisoned]
Pull Request resolved: #159392 Moving towards just supporting local storage to take advantage of HF apis such as safe_open. This PR removes fsspec usages and relies on local storage only ghstack-source-id: 300119133 @exported-using-ghexport Differential Revision: [D78997975](https://our.internmc.facebook.com/intern/diff/D78997975/)
This pull request was exported from Phabricator. Differential Revision: D78997975 |
Moving towards just supporting local storage to take advantage of HF apis such as safe_open. This PR removes fsspec usages and relies on local storage only Differential Revision: [D78997975](https://our.internmc.facebook.com/intern/diff/D78997975/) cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta [ghstack-poisoned]
This pull request was exported from Phabricator. Differential Revision: D78997975 |
Stack from ghstack (oldest at bottom):
Moving towards just supporting local storage to take advantage of HF apis such as safe_open. This was already done in Storage component in #159405. This PR removes fsspec usages in consolidation script and relies on local storage only
Differential Revision: D78997975
cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta