故きを温ねて新しきを知る Fast Multi-GPU collectives with NCCL NCCL: Collective Operations Collective communication: theory, practice, and experience A Generalization of the Allreduce Operation TACCL: Guiding Collective Algorithm Synthesis using Communication Sketches A Communication Efficient ADMM-based Distributed Algorithm Using Two-Dimensional Torus Grouping AllReduce Recent Improvements of MPI Communicati