Skip to content

DOC Add section on resolving conflicts in lock files to developer guide #29882

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Nov 22, 2024

Conversation

StefanieSenger
Copy link
Contributor

What does this implement/fix? Explain your changes.

This PR adds a section to the developer guide on how to resolve conflicts in lock files.
Please have a look, @lesteve

Any other comments?

I have also deleted a PEP8 standard answer for reviewing PRs, that is probably not needed anymore.

Copy link

github-actions bot commented Sep 18, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 86a4408. Link to the linter CI: here


python build_tools/update_environments_and_lock_files.py

In order to run this script you might need to add `conda-lock` and `pip-tools` to your
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to mention that the version of conda-lock to be used is defined in the _min_dependencies.py file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following the decision to keep this short and sweet, we should not mention conda-lock and pip-tools at all.
So I left it out now.

@lesteve
Copy link
Member

lesteve commented Sep 19, 2024

I was personally hoping for a quick and dirty list of commands that work in most scenarios rather than a detailed explanation of how to deal with lock-file conflicts in general. I now realize that it may be more complicated than I naively thought ...

Maybe we should put something like this in the developer tips explaining some of the caveats?

# merge upstream/main
git pull upstream main --no-rebase
# there are conflicts so say keep the version from upstream/main
git checkout --theirs build_tools
git add build_tools
git merge --continue
# rerun lock-file update if needed
python build_tools/update_environments_and_lock_files.py

I would say that for most contributors using @scikit-learn-bot please update lock-files is the easiest thing to do.

Maybe the list of commands can be made more robust (for example you probably want to keep changes to build_tools/update_environment_and_lock_files.py and solve conflicts there if any) and the bot lock-file enhanced to deal with the simple kind of conflicts ...

@StefanieSenger
Copy link
Contributor Author

I was personally hoping for a quick and dirty list of commands that work in most scenarios rather than a detailed explanation of how to deal with lock-file conflicts in general.

I get your point @lesteve. If you're doing this more often, you want everything in one cell for a quick copy paste. At the same time I think less experienced people can benefit from the detailed explanations to understand the whys.

Do you think we can put everything in here: the quick-and-dirty first followed by the step-by-step explanation? I feel that the latter is quite general and I am not aware of any other caveats that I could put into the developer tips (which are quite hidden). But I can also move it over there and leave a link here?

@lesteve
Copy link
Member

lesteve commented Sep 19, 2024

Honestly this is not easy to get the balance right and opinions probably vary. My personal opinion is that the contributing doc is already quite long. Also in this particular case only a small fraction of contributors will have to deal with lock-files and only a fraction of them will have conflict in their lock-files.

What I had in mind was a snippet that works in most cases that scikit-learn maintainers could point to or find when needed. Comments can certainly be added in the snippet to make things more clear.

Not entirely sure this is a great idea, but you could imagine (if the list of commands I mentioned can be made a bit more robust) that one day there is something like this @scikit-learn-bot please update lock-files --fix-conflicts ...

For completeness, the kind of caveat I can imagine with my snippet:

  • git checkout --theirs build_tools will take the versions from main for all the build_tools/. I think it would be better to apply this only to environment and lock files
  • this may not be enough to fix all the conflicts and you need user intervention for this. It would only work if the only conflicts are in the environment and lock files.
  • python build_tools/update_environment_and_lock_files.py does not work on Windows, this is why @Charlie-XIAO got motivated to work on the lock-file bot idea.

@Charlie-XIAO
Copy link
Contributor

Just to add a point, I think we want to recommend contributors to use the comment to trigger the bot instead of updating on their own, because we are confident that the bot commit does nothing malicious but human commits may do (and we cannot tell because lock files are not reviewable).

@StefanieSenger
Copy link
Contributor Author

I can see your point and agree to have it short only then, @lesteve. I will make up my mind on how to do that after PyData.

@lesteve
Copy link
Member

lesteve commented Sep 20, 2024

Just to add a point, I think we want to recommend contributors to use the comment to trigger the bot instead of updating on their own, because we are confident that the bot commit does nothing malicious but human commits may do (and we cannot tell because lock files are not reviewable).

A bit off-topic but I am not overly worried about security implications (I could well be wrong ...). Personally, I do browse the lock-file diff to double-check nothing weird is happening. I would not say that lock-files are unreviewable but I would agree it is easy to miss something ...

Copy link
Contributor Author

@StefanieSenger StefanieSenger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on our discussion and the conversation we had, we could do it like this:

Resolve conflicts in lock files
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

When lock files cause conflicts with the `upstream/main` branch, `upstream/main` must
take precedence in defining the environment files and the lock files in the
`build_tools` directory. Therefore, merge conflicts should not be resolved on the GitHub
PR interface, but locally, favouring the `upstream/main` branch:

.. prompt:: bash

  # pull latest upstream/main
  git pull upstream main --no-rebase
  # resolve conflicts - keeping the upstream/main version for specific files
  git checkout --theirs build_tools/*/*.lock build_tools/*/*environment.yml \
  build_tools/*/*lock.txt build_tools/*/*requirements.txt
  git add build_tools
  git merge --continue

This preserves all other changes in the PR, including updates to `build_tools`, except
for lock files and environment files. Double check, if in doubt.

Finally, re-generate the dependencies and lock files for the CIs, as described in
:ref:`Build lock files <build_lock_files>`, or by running:

.. prompt:: bash

  python build_tools/update_environments_and_lock_files.py

But now I am doubting why we want to keep exactly those files from main, that are auto-generated.

This part:

  git checkout --theirs build_tools/*/*.lock build_tools/*/*environment.yml \
  build_tools/*/*lock.txt build_tools/*/*requirements.txt

My assumption is that if there are conflicts in the lock files, then both main and our branch must have changed files that define the dependencies for the whole project.

Should we then not keep those files from main, that change the dependencies for the whole project instead? Then apply our own changes on these files on top, then re-generate everything?
Or shorter: apply both changes to the relevant files and then re-generate lock and dependency files?

Is there something I am missing, @lesteve?


python build_tools/update_environments_and_lock_files.py

In order to run this script you might need to add `conda-lock` and `pip-tools` to your
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following the decision to keep this short and sweet, we should not mention conda-lock and pip-tools at all.
So I left it out now.

@lesteve
Copy link
Member

lesteve commented Nov 18, 2024

But now I am doubting why we want to keep exactly those files from main, that are auto-generated.

We want to resolve the conflicts in the simplest possible way, knowing that we will regenerate the lock-files afterwards. In other words it does not really matter which version we pick (from main of from our branch), since we will update the lock-files again by running the script build_tools/update_environments_and_lock_files.py.

Rereading the commands I think we should be consistent in the git checkout and git add i.e. something like:

environment_and_lock_files=$(ls build_tools/*/*.lock build_tools/*/*environment.yml build_tools/*/*lock.txt build_tools/*/*requirements.txt)
git checkout --theirs $environment_and_lock_files
git add $environment_and_lock_files

@StefanieSenger
Copy link
Contributor Author

Thanks for clarifying, @lesteve. After your comment and after talking with @adrinjalali, I have understood that here we're only dealing with resolving the lock file conflicts, not considering the reasons that lead to it (possible changes of the project wide dependency files), and I could make the separation in my mind.

I have updated the docs, please have a look.

However, I didn't manage to make it work with the $environment_and_lock_files when I tested it. One reason it that git checkout --theirs needs to get the paths separated by a space, not by a new line. But also after adjusting this, it couldn't take apart the environmental variable that way it is supposed to. This is why I passed the paths to git checkout --theirs and to git add individually.

@lesteve lesteve enabled auto-merge (squash) November 22, 2024 10:26
@lesteve
Copy link
Member

lesteve commented Nov 22, 2024

Thanks, I set auto-merge. This looks good, and if the need arises this can always be improved in the future.

@adrinjalali
Copy link
Member

We have an issue with our docbuild on circleci which is unrelated to this PR, and reproduced on other PRs as well.

@lesteve
Copy link
Member

lesteve commented Nov 22, 2024

With #30333 merged, the CircleCI issue should be fixed now, I merged main so hopefully this should get merged when CI is green.

@lesteve lesteve merged commit 32a228d into scikit-learn:main Nov 22, 2024
28 checks passed
@StefanieSenger StefanieSenger deleted the doc_lock_conflict branch November 22, 2024 18:46
jeremiedbb pushed a commit to jeremiedbb/scikit-learn that referenced this pull request Dec 4, 2024
virchan pushed a commit to virchan/scikit-learn that referenced this pull request Dec 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants