Skip to content

HF component update to not use fsspec components #159405

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

ankitageorge
Copy link
Contributor

@ankitageorge ankitageorge commented Jul 29, 2025

Stack from ghstack (oldest at bottom):

Update HF components to not inherit from fsspec components and instead use filesystem writer/reader. The reason is because there doesn't seem to be much of a need for fsspec, since users are using mounted storage. Using local storage will allow for performance improvements because we can take advantage of the safe_open API provided by HF safetensors (30s vs 4s for load of 8b model), which is signifcant performance wins over reading bytes and converting to tensors which is what we are doing now. Also, we can use the official methods provided by HF instead of relying on reading the metadata by bytes and loading it

Differential Revision: D78993550

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta

Update HF components to not inherit from fsspec components and instead use filesystem writer/reader. The reason is because there doesn't seem to be much of a need for fsspec, since users are using mounted storage. Using local storage will allow for performance improvements because we can take advantage of the safe_open API provided by HF safetensors, which is signifcant performance wins over reading bytes and converting to tensors which is what we are doing now.

Differential Revision: [D78993550](https://our.internmc.facebook.com/intern/diff/D78993550/)

[ghstack-poisoned]
Copy link

pytorch-bot bot commented Jul 29, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159405

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit d3a9c53 with merge base 93da995 (image):

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

  • pull / linux-jammy-py3_9-clang9-xla / test (xla, 1, 1, linux.12xlarge, unstable) (gh) (#158876)
    /var/lib/jenkins/workspace/xla/torch_xla/csrc/runtime/BUILD:476:14: Compiling torch_xla/csrc/runtime/xla_util_test.cpp failed: (Exit 1): gcc failed: error executing CppCompile command (from target //torch_xla/csrc/runtime:xla_util_test) /usr/bin/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections ... (remaining 229 arguments skipped)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D78993550

)
super().__init__(
path=path,
serialization_format=SerializationFormat.SAFETENSORS,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Safe open cannot use file like API of fsspec? That is surprising to me?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya it requires a path unfortunately and doesn't take the file-like API.

Update HF components to not inherit from fsspec components and instead use filesystem writer/reader. The reason is because there doesn't seem to be much of a need for fsspec, since users are using mounted storage. Using local storage will allow for performance improvements because we can take advantage of the safe_open API provided by HF safetensors, which is signifcant performance wins over reading bytes and converting to tensors which is what we are doing now.

Differential Revision: [D78993550](https://our.internmc.facebook.com/intern/diff/D78993550/)

cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta

[ghstack-poisoned]
ankitageorge added a commit that referenced this pull request Aug 1, 2025
Pull Request resolved: #159405

Update HF components to not inherit from fsspec components and instead use filesystem writer/reader. The reason is because there doesn't seem to be much of a need for fsspec, since users are using mounted storage. Using local storage will allow for performance improvements because we can take advantage of the safe_open API provided by HF safetensors, which is signifcant performance wins over reading bytes and converting to tensors which is what we are doing now.
ghstack-source-id: 300108423
@exported-using-ghexport

Differential Revision: [D78993550](https://our.internmc.facebook.com/intern/diff/D78993550/)
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D78993550

Update HF components to not inherit from fsspec components and instead use filesystem writer/reader. The reason is because there doesn't seem to be much of a need for fsspec, since users are using mounted storage. Using local storage will allow for performance improvements because we can take advantage of the safe_open API provided by HF safetensors, which is signifcant performance wins over reading bytes and converting to tensors which is what we are doing now.

Differential Revision: [D78993550](https://our.internmc.facebook.com/intern/diff/D78993550/)

cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta

[ghstack-poisoned]
ankitageorge added a commit that referenced this pull request Aug 1, 2025
Pull Request resolved: #159405

Update HF components to not inherit from fsspec components and instead use filesystem writer/reader. The reason is because there doesn't seem to be much of a need for fsspec, since users are using mounted storage. Using local storage will allow for performance improvements because we can take advantage of the safe_open API provided by HF safetensors, which is signifcant performance wins over reading bytes and converting to tensors which is what we are doing now.
ghstack-source-id: 300168979
@exported-using-ghexport

Differential Revision: [D78993550](https://our.internmc.facebook.com/intern/diff/D78993550/)
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D78993550

@@ -52,7 +52,7 @@
__all__ = ["HuggingFaceStorageWriter", "HuggingFaceStorageReader"]


class HuggingFaceStorageWriter(FsspecWriter):
class HuggingFaceStorageWriter(FileSystemWriter):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ankitageorge If any user needs the FSSpec based component, they should be able to write a wrapper extending the HuggingFaceStorageWriter. Could you please confirm if thats the case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ya if they write a class to inherit from both HuggingFaceStorageWriter and FsspecWriter, then it should work.

@@ -52,7 +52,7 @@
__all__ = ["HuggingFaceStorageWriter", "HuggingFaceStorageReader"]


class HuggingFaceStorageWriter(FsspecWriter):
class HuggingFaceStorageWriter(FileSystemWriter):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this be backward compatible? Its a very new feature so probably no major adoption yet but we enabled a few partners to start using the FSSpec component.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only usage I could find is torchtune, but considering it's being deprecated I think it is okay. The flow will just go back to users having to download files from HF hub before running, which was the old experience.

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 7, 2025
Update HF components to not inherit from fsspec components and instead use filesystem writer/reader. The reason is because there doesn't seem to be much of a need for fsspec, since users are using mounted storage. Using local storage will allow for performance improvements because we can take advantage of the safe_open API provided by HF safetensors (30s vs 4s for load of 8b model), which is signifcant performance wins over reading bytes and converting to tensors which is what we are doing now. Also, we can use the official methods provided by HF instead of relying on reading the metadata by bytes and loading it

Differential Revision: [D78993550](https://our.internmc.facebook.com/intern/diff/D78993550/)

cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D78993550

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

@pytorchmergebot
Copy link
Collaborator

Starting merge as part of PR stack under #159681

@pytorchmergebot
Copy link
Collaborator

Starting merge as part of PR stack under #159406

pytorchmergebot pushed a commit that referenced this pull request Aug 7, 2025
Reading the bytes and converting to tensors is much slower than using safe_open. For a 8B model across 8 ranks, took ~30s to load before this change and ~4s after.

Differential Revision: [D78994259](https://our.internmc.facebook.com/intern/diff/D78994259/)

Pull Request resolved: #159406
Approved by: https://github.com/saumishr
ghstack dependencies: #159405
pytorchmergebot pushed a commit that referenced this pull request Aug 7, 2025
Get rid of the logic to read the metadata from the header of the safetensors file manually and use the functions as part of safe_open() to get the metadata. This is much cleaner and allows us to not rely on our own custom methods to get metadata, but use safetensors provided APIs

Differential Revision: [D79460272](https://our.internmc.facebook.com/intern/diff/D79460272/)

Pull Request resolved: #159681
Approved by: https://github.com/saumishr
ghstack dependencies: #159405, #159406
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request fb-exported Merged oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (checkpoint)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants