llm-rlhf

Here are 3 public repositories matching this topic...

ReasonFlux Series - A family of LLM post-training algorithms focusing on data selection, reinforcement learning, and inference scaling

realize the reinforcement learning training for gpt2 llama bloom and so on llm model

lora reward trl llm rlhf trlx llm-rlhf

ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates

Add a description, image, and links to the llm-rlhf topic page so that developers can more easily learn about it.

To associate your repository with the llm-rlhf topic, visit your repo's landing page and select "manage topics."