Skip to content

Fix DataLoader to Pass List to getitems When Using BatchSampler. Fixes Issue_#154810 #154844

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

aritra3520
Copy link

@aritra3520 aritra3520 commented Jun 2, 2025

This PR fixes an issue where PyTorch's DataLoader did not always pass a list of indices to custom Datasets implementing getitems when using a BatchSampler. Instead, a single integer could be passed, causing 2D tensor outputs to be incorrectly flattened in the collate function.

Root cause:
In MapDatasetFetcher.fetch, the type of indices (possibly_batched_index) was not checked before being passed to getitems. If the indices were not already a list, this led to unexpected results for batch-aware Datasets.

What’s fixed:

Ensured that DataLoader always passes a list to __getitems__ if a list is expected, by wrapping possibly_batched_index as a list if needed.

Test coverage:

Added TestDataLoaderBatchSamplerList.test_passes_list_to_getitems in test/test_dataloader.py to verify that, when using a BatchSampler, __getitems__ receives a list and returns correctly shaped batched tensors.

Fixes #154810

Copy link

pytorch-bot bot commented Jun 2, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/154844

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 2 Cancelled Jobs, 1 Unrelated Failure

As of commit 3a2b25e with merge base ff4515f (image):

NEW FAILURE - The following job has failed:

CANCELLED JOBS - The following jobs were cancelled. Please retry:

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the release notes: dataloader release notes category label Jun 2, 2025
@aritra3520
Copy link
Author

@atalman @malfet , can you people take a look?

@malfet
Copy link
Contributor

malfet commented Jun 2, 2025

@atalman @malfet , can you people take a look?

@aritra3520 why are you pinging people the minute you create an issue?

Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description lacks any explanation of what is fixed and how (I.e. title should not simply reference the issue number )

Also please add test

@soulitzer soulitzer added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jun 2, 2025
@aritra3520
Copy link
Author

@atalman @malfet , can you people take a look?

@aritra3520 why are you pinging people the minute you create an issue?

Hi I am sorry for any inconvenience caused, actually i have started new in this upstream contribuiton. Actually I was unaware of the protocols that should be maintained. I will adhere to them in future. But again am really sorry.

Also thank you for the review I will surely make the changes that you stated.

@aritra3520 aritra3520 changed the title Fixes Issue_##154810 Fix DataLoader to Pass List to getitems When Using BatchSampler. Fixes Issue_#154810 Jun 3, 2025
@aritra3520
Copy link
Author

PR description lacks any explanation of what is fixed and how (I.e. title should not simply reference the issue number )

Also please add test

I have updated with the necessary changes

@Skylion007 Skylion007 requested a review from malfet June 3, 2025 18:09
Copy link
Contributor

github-actions bot commented Aug 6, 2025

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

@github-actions github-actions bot added the Stale label Aug 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
open source release notes: dataloader release notes category Stale triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Multi-dimensional tensors in datasets might get incorrectly flattened when fetching data from dataloader which is specified 'batch_sampler' when created
4 participants