Skip to content

Add Green Context Support for fsdp2 #160272

@shadow150519

Description

@shadow150519

🚀 The feature, motivation and pitch

PyTorch relies on CUDA streams and events to naively overlap computation and communication. However, these overlaps can interfere with one another - causing throughput degradation and unstable performance. We propose introducing CUDA Green Context[1] in fsdp2 to provide contexts that isolate SM resources for computation and communication. For example, we can split 132 sm into two part 104 sm for compute and 24 sm for communication in H100 (will waste 4 sm, since green context need each partiton's sm is a multiple of 8.
FlashInfer[2] has already integrated this experimental feature into their framework using API provided by cuda-python.
I think a naive implentation will be:
we split the SM into two contexts, computation and communcation(only one context for allgather/reduce_scatter/allreduce)
we create streams from context, 1 stream for overlapped compute, for commucation, we just replace normal cuda streams with streams created from commucation green context
Since we want to use green context stream for overlapped computation and default stream for non-overlapped computation, we need a hook-like mechanism that, just before an allgather or reduce-scatter call begins, switches the current CUDA stream to the green-context stream, and then switches back to the default stream when the communication finishes—so that we can fully utilize all GPU resources. However, I’m not sure whether such frequent stream-switching would introduce significant overhead. Perhaps instead of swapping streams back and forth, we could dispatch different kernels onto two separate streams from the start. This way we overlap communication on the green context stream with computation on the default stream without incurring the cost of repeated stream switches.

[1] https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__GREEN__CONTEXTS.html
[2] flashinfer-ai/flashinfer#1163

Alternatives

No response

Additional context

No response

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta

Metadata

Metadata

Assignees

No one assigned

    Labels

    oncall: distributedAdd this issue/PR to distributed oncall triage queue

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions