Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] I have a few questions regarding the InternVL2-8B-MPO model. #717

Open
ajskdlf64 opened this issue Nov 21, 2024 · 0 comments
Open
Assignees

Comments

@ajskdlf64
Copy link

Motivation

  1. Regarding Table 3, does the InternVL2-8B-SFT model utilize MMPR data with the chosen responses for supervised fine-tuning (SFT)? I am curious about the specific impact of the MMPR dataset.

  2. Regarding Table 8, how does MathVista perform under the baseline condition when applying DPO? The results with DPO+ are quite competitive with MPO, so I would like to understand the standalone effectiveness of DPO.

  3. Regarding Table 8, the MathVista score under DPO+ is reported as 66.4. Is this the CoT score? If so, could you also share the Direct score for comparison?

  4. Regarding Figure 2, the paper mentions that the prompts used to create the MMPR dataset are publicly available, but I couldn't locate them. Could you kindly provide clarification or guidance on where to find them?

Thank you very much for your time and for the valuable contributions your work has made to the community. I look forward to hearing from you

Related resources

No response

Additional context

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants