Skip to content

optimize reshard #10925

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from
Open

optimize reshard #10925

wants to merge 1 commit into from

Conversation

fjjF77
Copy link
Contributor

@fjjF77 fjjF77 commented Aug 11, 2025

  • 大幅降低reshard时单个batch训练总用时,gbs=32,train TP4, rollout TP2时,总用时平均降低36.13%。
  • micro_data_group组内DP拆分,优化reshard情况下,模型生成用时。
  • 复现verl hybridflow优化策略,改进reshard后并行组分布,降低通信消耗。
  • 优化reshard模型导出逻辑,避免无效导出。
  • 修改reshard时globald_model_dict,支持sp与fused_qkv等策略。

PR types

Function optimization

PR changes

APIs

Description

精度对比
reshard loss diff
用时消耗
time_used

Copy link

paddle-bot bot commented Aug 11, 2025

Thanks for your contribution!

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


fujinji seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants