Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to a single GPU context in Blade #20853

Merged
merged 4 commits into from
Dec 18, 2024

Conversation

kvark
Copy link
Contributor

@kvark kvark commented Nov 19, 2024

Closes #17005

Release Notes:

  • Improved GPU context management: share a single context with multiple surfaces.

High Level

Blade got a proper support for Surface objects in kvark/blade#203.
That was mainly motivated by Zed needing to draw multiple windows. With the Surface API, Zed is now able to have the GPU context tied to the "Platform" instead of "Window". Practically speaking, this means:

  • architecture more sound
  • faster to open/close windows
  • less surprises, more robust

Concerns

  1. Zed has been using a temporary workaround for the platform bug on some Intel+Nvidia machines that makes us unable to present - Try to actually acquire a frame on Intel/Linux upon adapter inspection kvark/blade#144 . This workaround is no longer available with the new architecture. I'm looking for ideas on how to approach this better.
  2. Metal-rs dependency is switched to https://github.com/kvark/metal-rs/tree/blade, since upstream isn't responsive in merging changes that are required for Blade. Hopefully, temporary.
    • we can also hack around it by just transmuting the texture references, since we know those are unchanged in the branch. That would allow Blade to use it's own version of Metal, temporarily, if switching metal-rs in the workspace is a concern.
    • merged my metal-rs changes and updated Zed to use the upstream github reference

@cla-bot cla-bot bot added the cla-signed The user has signed the Contributor License Agreement label Nov 19, 2024
@kvark kvark mentioned this pull request Nov 19, 2024
5 tasks
@kvark kvark force-pushed the blade-window branch 7 times, most recently from d3ff403 to 677c1a6 Compare November 21, 2024 06:43
@kvark kvark marked this pull request as ready for review November 21, 2024 06:52
@kvark kvark changed the title (WIP) Switch to a single GPU context in Blade Switch to a single GPU context in Blade Nov 21, 2024
@mikayla-maki
Copy link
Contributor

This workaround is no longer available with the new architecture.

Does merging this PR mean dropping support for those Intel and Nvidia machines, at least temporarily?

@kvark
Copy link
Contributor Author

kvark commented Nov 21, 2024

Does merging this PR mean dropping support for those Intel and Nvidia machines, at least temporarily?

Yes, I think so. One of the challenges here is to even gather information to understand who is affected.

We know it's about Linux systems with both Intel and Nvidia. I think, both X11 and Wayland are affected, possibly in different ways.
It's a 3 years old upstream issue with the platform, which is still unresolved - https://gitlab.freedesktop.org/mesa/mesa/-/issues/4688

The current workaround in Blade is checking for the prime-select:

// See https://github.com/canonical/nvidia-prime/blob/587c5012be9dddcc17ab4d958f10a24fa3342b4d/prime-select#L56
fn is_nvidia_prime_forced() -> bool {
    match fs::read_to_string("/etc/prime-discrete") {
        Ok(contents) => contents == "on\n",
        Err(_) => false,
    }
}

However, I recall issue threads in Zed where users were still having issues, and thus we tried other things. But I didn't get sufficient feedback about what actually worked. That's the most recent I see - #17022 (comment)

If I had, say, vulkaninfo and other system outputs from all the machines where the "main" workaround failed, it would certainly help to look for a better alternative. In the meantime, I'll try to find those issue threads, upstream issues, and other projects dealing with the same thing (e.g. wgpu), and come up with ideas.

@kvark
Copy link
Contributor Author

kvark commented Nov 21, 2024

Here is what I gathered - kvark/blade#205
Looks like there are at least 3 issues mixed in under this umbrella:

  1. X11 Nvidia Optimus systems, which are unable to present with Intel.
  2. Wayland Nvidia Optimus systems, which are unable to present with Nvidia.
  3. systems where the preferred GPU is indeed the first, but it's rejected because of the older Vulkan driver.

mgsloan added a commit that referenced this pull request Nov 22, 2024
* Fixes registration of event handler for xinput-2 device changes,
revealed by this improvement.

* Pushes `.unwrap()` panic-ing outwards to callers.

* Includes a description of what the X11 call was doing when a failure
was encountered.

* Fixes a variety of places where the X11 reply wasn't being inspected
for failures.

* Destroys windows on failure during setup. New structure makes it
possible for the caller of `open_window` to carry on despite failures,
and so partially initialized window should be removed (though all
calls I looked at also panic currently).

Considered pushing this through `linux/x11/client.rs` too but figured it'd be nice to minimize merge conflicts with #20853.
mgsloan added a commit that referenced this pull request Nov 22, 2024
…21079)

* Fixes registration of event handler for xinput-2 device changes,
revealed by this improvement.

* Pushes `.unwrap()` panic-ing outwards to callers.

* Includes a description of what the X11 call was doing when a failure
was encountered.

* Fixes a variety of places where the X11 reply wasn't being inspected
for failures.

* Destroys windows on failure during setup. New structure makes it
possible for the caller of `open_window` to carry on despite failures,
and so partially initialized window should be removed (though all calls
I looked at also panic currently).

Considered pushing this through `linux/x11/client.rs` too but figured
it'd be nice to minimize merge conflicts with #20853.

Release Notes:

- N/A
@kvark kvark force-pushed the blade-window branch 2 times, most recently from 0a192a8 to 0822e3d Compare November 23, 2024 04:45
@kvark
Copy link
Contributor Author

kvark commented Nov 24, 2024

I think we should proceed with this and have a bold note around the Linux/Windows releases recommending to update the driver in case Zed fails to start. That is the best solution for both X11/Wayland presentation issues and "KHR_dynamic_rendering".

@kvark
Copy link
Contributor Author

kvark commented Nov 24, 2024

Also thinking that it would help for the users to have a workaround in case they are affected, and the user isn't able to fix it at the platform level (e.g. by updating the drivers). Perhaps, we could have a Zed configuration option to force a specific Device ID? Feedback on kvark/blade#207 would be welcome!

@notpeter
Copy link
Member

If this gets merged and will negatively impact many users, we should consider doing it on a Wednesday afternoon after releases. That way we'll get 7days of feedback in dev and the opportunity to gather data / improve error messaging before it lands in Preview.

@kvark kvark force-pushed the blade-window branch 3 times, most recently from 3da76e1 to c5e48f9 Compare December 7, 2024 04:58
@mikayla-maki
Copy link
Contributor

Apologies for the long wait on a review.

Summary of the problem:

  • We're currently using a branch of Blade that does some extra checks of the available adapters, so that we can automatically select the adapters without these bugs
  • This PR removes the capacity to run those checks, meaning that Blade could select an adapter that we can't actually use for opaque driver bug related reasons
  • Users updating their drivers will probably fix the issue.
  • If users are still having problems, we could use new Blade APIs to let users point Blade to select a specific device ID, which could be another way to solve the problem.
  • This PR should be merged anyway, because we get better window opening performance (no need to re-initialize Blade)

If that's the state of the world, I think I'm ok with merging this, with a big warning label that you might need to update your drivers. I'd like to get a second opinion from @ConradIrwin first though.

@gretchenfrage
Copy link

gretchenfrage commented Dec 17, 2024

FWIW, my current workflow for using Zed on Linux (Wayland) is to check out this branch and run it.

@ConradIrwin ConradIrwin merged commit 298b9df into zed-industries:main Dec 18, 2024
12 checks passed
@ConradIrwin
Copy link
Member

Alright, let's give this a whirl!

We won't release next week, so this will get two-weeks to bake on main, and will hit preview in ~2 weeks.

@kvark kvark deleted the blade-window branch December 19, 2024 05:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed The user has signed the Contributor License Agreement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Closing a window crashes other Zed windows
5 participants