Skip to content

Update RandomSampler docstring. data_source must be Sized not Dataset #158857

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

dsashidh
Copy link
Contributor

Fixes #158631

The docstring said data_source was a Dataset, but RandomSampler only needs something that implements len. This updates the docstring to use Sized instead, which matches the actual type used in the constructor.

Copy link

pytorch-bot bot commented Jul 22, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/158857

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 New Failures, 4 Unrelated Failures

As of commit 911c8b2 with merge base 7a08755 (image):

NEW FAILURES - The following jobs have failed:

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the release notes: dataloader release notes category label Jul 22, 2025
@dsashidh
Copy link
Contributor Author

@pytorchbot label "topic: not user facing"

Copy link
Contributor

@divyanshk divyanshk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also updated the docstring for Sampler class, and SequentialSampler class ?

@dsashidh
Copy link
Contributor Author

Can we also updated the docstring for Sampler class, and SequentialSampler class ?

Hi @divyanshk I updated the docstring in SequentialSampler to use Sized instead of Dataset since it calls len(data_sourcee) like RandomSampler.

But I left Sampler as is because its data_source argument is unused and marked for removal in 2.2.0. Changing the type there didn't seem necessary to me but please let me know if you'd prefer I update that one too.

@dsashidh dsashidh requested a review from divyanshk July 23, 2025 17:50
@soulitzer soulitzer added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jul 23, 2025
Copy link
Contributor

@divyanshk divyanshk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Thank you!

@divyanshk
Copy link
Contributor

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 6, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 2 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

@divyanshk
Copy link
Contributor

Hi @dsashidh can you resolve the CI errors ? Thanks.

@dsashidh
Copy link
Contributor Author

dsashidh commented Aug 7, 2025

Hi @dsashidh can you resolve the CI errors ? Thanks.

Hi @divyanshk The CI failures appear to be permission-related issues rather than code issues:
(ERROR: Could not install packages due to an OSError: [Errno 13] Permission denied: '/var/lib/jenkins/ci_env/bin/uv'
Check the permissions.)

Could you please re-trigger the CI run when you get a chance? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request open source release notes: dataloader release notes category topic: not user facing topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RandomSampler docs wrongly states type for data_source is Dataset, when it is Sized
5 participants