[Feature] I have a few questions regarding the InternVL2-8B-MPO model. #717

ajskdlf64 · 2024-11-21T02:56:39Z

Regarding Table 3, does the InternVL2-8B-SFT model utilize MMPR data with the chosen responses for supervised fine-tuning (SFT)? I am curious about the specific impact of the MMPR dataset.
Regarding Table 8, how does MathVista perform under the baseline condition when applying DPO? The results with DPO+ are quite competitive with MPO, so I would like to understand the standalone effectiveness of DPO.
Regarding Table 8, the MathVista score under DPO+ is reported as 66.4. Is this the CoT score? If so, could you also share the Direct score for comparison?
Regarding Figure 2, the paper mentions that the prompts used to create the MMPR dataset are publicly available, but I couldn't locate them. Could you kindly provide clarification or guidance on where to find them?

Thank you very much for your time and for the valuable contributions your work has made to the community. I look forward to hearing from you

No response

No response

czczup assigned Weiyun1025 Nov 23, 2024

Provide feedback