Skip to content

Fix UB, panic and compiler error in bevy_tasks #20352

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Aug 10, 2025

Conversation

NthTensor
Copy link
Contributor

@NthTensor NthTensor commented Jul 31, 2025

Objective

Fixes several issues with bevy_tasks uncovered by @IceSentry and @Ratysz:

  • For web builds, Task::is_finished is missing, causing compile failures for cross-platform apps.
  • Scopes running on the single-threaded executor have possible UB due to mistakenly using a 'env bound instead of a 'scope lifetime bound on the scope ref.
  • Scopes running on the single-threaded executor fail to poll tasks spawned on them to completion when all tasks go to sleep at the same time (causing panics and potential UB).

Solution

  • Replace futures_channel::oneshot with async_channel::bounded(1), and use Receiver::is_empty to implement Task::finished. The async_channel channel crate was chosen because it allows Task to be Send + Sync + Unpin and because it has a special-case oneshot implementation for bounded size-one channels (inherited from concurrent-queue).
  • Switch around the incorrect lifetime bounds.
  • Use referencing counting to spin while ticking the executor until all tasks spawned on a scope are complete.

Since these are all fairly small changes I rolled them together into a single PR. I can split any of the three out if they prove controversial.

Testing

  • Changes are currently untested. I need to check that this fixes the bugs we've been seeing at work, and I'd like to run this through miri.

It's concerning to me that miri failed to catch the bad lifetime bound on the scope impl.

@NthTensor NthTensor added this to the 0.17 milestone Jul 31, 2025
@NthTensor NthTensor added D-Trivial Nice and easy! A great choice to get started with Bevy P-Crash A sudden unexpected crash A-Tasks Tools for parallel and async work P-Compile-Failure A failure to compile Bevy apps P-Unsound A bug that results in undefined compiler behavior labels Jul 31, 2025
@NthTensor NthTensor added the S-Needs-Review Needs reviewer attention (from anyone!) to move forward label Jul 31, 2025
@NthTensor NthTensor requested review from Ratysz and IceSentry July 31, 2025 15:53
Copy link
Contributor

@MalekiRe MalekiRe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm

@NthTensor
Copy link
Contributor Author

Looks like EdgeExecutor is going to make this annoying.

@james7132
Copy link
Member

For web builds, Task::is_finished is missing, causing compile failures for cross-platform apps.

I'm surprised this was not caught by CI. Are we somehow skipping web builds?

@NthTensor
Copy link
Contributor Author

NthTensor commented Jul 31, 2025

I'm surprised this was not caught by CI. Are we somehow skipping web builds?

There's nothing that would enforce this ATM. I don't think we have tests to enforce that the wasm Task impl has the same API as async_tasks::Task. If it's not used in our examples it won't get caught by CI.

@NthTensor
Copy link
Contributor Author

I circulated this at work have have gotten some internal feedback. Working on some fixes, and marking this as waiting-on-author for the moment.

@NthTensor NthTensor added S-Waiting-on-Author The author needs to make changes or address concerns before this can be merged and removed S-Needs-Review Needs reviewer attention (from anyone!) to move forward labels Jul 31, 2025
@NthTensor
Copy link
Contributor Author

Pushed a change that should fix some of the missing trait bounds on wasm tasks (Send + Sync + Unpin). I've checked the docs and wasm tasks now implement exactly the same traits as the ones from async_tasks.

@NthTensor
Copy link
Contributor Author

NthTensor commented Aug 1, 2025

I'd like to get #19091 merged so I can rebase my work on it. Anyone interested in this PR should go review that one too.

Edit: This was adopted as #20369.

@NthTensor NthTensor added S-Waiting-on-Author The author needs to make changes or address concerns before this can be merged and removed S-Needs-Review Needs reviewer attention (from anyone!) to move forward labels Aug 5, 2025
@alice-i-cecile
Copy link
Member

@NthTensor what are your plans here for the 0.17 release?

@NthTensor
Copy link
Contributor Author

I need to make this panic safe, @hymm has pointed out an place where UB can be caused by causing a panic in a scope and then later catching it. It's partly done, should be ready this evening. After that, I think we should be good to merge.

@NthTensor
Copy link
Contributor Author

NthTensor commented Aug 7, 2025

I've addressed the feedback from the latest reviews. Most notably:

  • I've chosen to be less aggressive with refactoring the existing code.
  • I've chosen to use a fresh executor for every scope, which mostly eliminates the UB concern.
  • I've switched back from wrapping LocalExecutor to be send to just Executor.
  • I've added a test for the scope issue. It crashes on main but passes here.

Seems like we are pretty close to being merge ready.

@NthTensor NthTensor requested review from hymm and james7132 August 7, 2025 03:22
@NthTensor NthTensor added S-Needs-Review Needs reviewer attention (from anyone!) to move forward and removed S-Waiting-on-Author The author needs to make changes or address concerns before this can be merged labels Aug 7, 2025
Copy link
Member

@james7132 james7132 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, though there's two things worth trying to see if we could improve the performance/readability here.

@alice-i-cecile alice-i-cecile added S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it and removed S-Needs-Review Needs reviewer attention (from anyone!) to move forward labels Aug 9, 2025
@NthTensor
Copy link
Contributor Author

Alright, last round of feedback addressed. As follow up to this, I want to investigate integrating async_task directly with web_bindgen_futures. But that's a longer term initiative.

I think this should be good for merge now, but I'm going to go ahead and re-request reviews.

@NthTensor NthTensor requested a review from james7132 August 10, 2025 03:21
@james7132 james7132 added this pull request to the merge queue Aug 10, 2025
Merged via the queue into bevyengine:main with commit 7681b25 Aug 10, 2025
36 checks passed
gwafotapa pushed a commit to gwafotapa/bevy that referenced this pull request Aug 12, 2025
# Objective

Fixes several issues with `bevy_tasks` uncovered by @IceSentry and
@Ratysz:
+ For web builds, `Task::is_finished` is missing, causing compile
failures for cross-platform apps.
+ Scopes running on the single-threaded executor have possible UB due to
mistakenly using a `'env` bound instead of a `'scope` lifetime bound on
the scope ref.
+ Scopes running on the single-threaded executor fail to poll tasks
spawned on them to completion when all tasks go to sleep at the same
time (causing panics and potential UB).

## Solution

- Replace `futures_channel::oneshot` with `async_channel::bounded(1)`,
and use `Receiver::is_empty` to implement `Task::finished`. The
`async_channel` channel crate was chosen because it allows `Task` to be
`Send + Sync + Unpin` and because it has a special-case oneshot
implementation for bounded size-one channels (inherited from
`concurrent-queue`).
- Switch around the incorrect lifetime bounds.
- Use referencing counting to spin while ticking the executor until all
tasks spawned on a scope are complete.

Since these are all fairly small changes I rolled them together into a
single PR. I can split any of the three out if they prove controversial.

## Testing

- Changes are currently untested. I need to check that this fixes the
bugs we've been seeing at work, and I'd like to run this through miri.

It's concerning to me that miri failed to catch the bad lifetime bound
on the scope impl.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Tasks Tools for parallel and async work D-Async Deals with asynchronous abstractions D-Complex Quite challenging from either a design or technical perspective. Ask for help! D-Unsafe Touches with unsafe code in some way P-Compile-Failure A failure to compile Bevy apps P-Crash A sudden unexpected crash P-Unsound A bug that results in undefined compiler behavior S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants