TRL/Original DPO
Different experiments with TRL implementation and original implementation.
Created on October 10|Last edited on October 24
Comment
Showing first 10 runs
Showing first 10 runs
Showing first 10 runs
Showing first 10 runs
Run set
38
Results:
As we can see from the above graphs and the table, original implementation with fp32 and 2 models works better than TRL + LoRa + bf16. The difference is quite large. I will try some more parameters for LoRa with a hope that it will increase the quality of the model.
Add a comment