-
-
Notifications
You must be signed in to change notification settings - Fork 8
switching to git directly for getting an event's diff #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
posted by @shenxianpeng in #1 (comment) |
Yes. I've been playing around with this idea locally using
That's the main intention. It would also allow better usage of the pkg in local dev env, namely the |
Switching to git for getting event diff, I assume we need to
Because adding comments or annotation also needs to call API, probably calling the API to get the event diff is still the best way. |
Only for PRs, but I haven't assessed the best way to address that event. This will be less of a problem in local dev context. If it is a problem, then we can try to use git to accommodate for shallow history.
I have rarely seen this occur. The only time that I had a problem during local checkout was when the repo itself was extremely large (like Google's Material Icons repo). Even then, the Github runner is better able to do a checkout of any repo hosted on Github with ease (likely because they share the same servers).
Using git directly won't require a
|
For example, when I clone llvm-project and linux repos. -sh-4.2$ time git clone git@github.com:llvm/llvm-project.git
Cloning into 'llvm-project'...
remote: Enumerating objects: 5005074, done.
remote: Counting objects: 100% (2813/2813), done.
remote: Compressing objects: 100% (485/485), done.
remote: Total 5005074 (delta 2509), reused 2441 (delta 2326), pack-reused 5002261
Receiving objects: 100% (5005074/5005074), 1.96 GiB | 51.08 MiB/s, done.
Resolving deltas: 100% (4098747/4098747), done.
Checking out files: 100% (119993/119993), done.
real 3m53.050s
user 5m40.690s
sys 0m32.289s
-sh-4.2$ time git clone git@github.com:torvalds/linux.git
Cloning into 'linux'...
remote: Enumerating objects: 8978158, done.
remote: Total 8978158 (delta 0), reused 0 (delta 0), pack-reused 8978158
Receiving objects: 100% (8978158/8978158), 3.71 GiB | 62.09 MiB/s, done.
Resolving deltas: 100% (7452118/7452118), done.
Checking out files: 100% (78011/78011), done.
real 5m23.826s
user 7m36.899s
sys 0m59.142s
-sh-4.2$ du -h --max-depth=1
3.6G ./llvm-project
5.4G ./linux I also see some Git repos mixed code and binaries together (for some repos migrate from SVN to GIT), The main concern is we don't know how users use git, so checkout with all history will have performance issues and takes extreme space. Also, supporting and maintaining other SCM's REST APIs might help us explore commercialization. |
I'm trying to avoid having to do this because I envision a "spaghetti code" solution (riddled with I don't think we should be held accountable for the size of the user's repo. The full history would only be required for PR events when using git. This implementation depends on knowing how to get the diff (including filenames with paths relative to the repo's root) for the push/PR event. We may not need to use git in the CI runners, but it would help to use git locally when not using a CI runner. More importantly, I was hoping to stop needing the GITHUB_TOKEN (as objected to in cpp-linter/cpp-linter-action#55) because it could potentially compromise user security (in many ways). |
I guess so. Different SCM could create different py files to deal with.
Analyzing PR events is the most important scenario for each project's CI workflow. I suspect that using REST API the performance would be better than git directly most of the time, if using git directly also works for PR events you could also try.
To access GitHub API it's common practice to use a GitHub token, like the following actions:
Switching to git directly, it seems getting event diffs can stop using GITHUB_TOKEN, but it is still needed when adding a comment. |
Well, I'm rather inexperienced with using git directly, but I think I got it working on the use-git-directly branch. I've often preferred to use tools like gitkraken to do complex git commands because I always tend to mess things up when I use git manually in CLI. I had some trouble with getting the diff in CI because the default for actions/checkout action is an extremely shallow history. Meaning that the reported diff was showing everything in the repo as a new file. I tried using subprocess.run(["git", "fetch", "--depth=3"], check=True) but that didn't seem to work. So, I had to set the depth using the action - uses: actions/checkout@v3
with:
fetch-depth: 3 I haven't added any unit tests to verify the integrity of the diff reported yet. And, there was some regex parsing error (specifically when getting the file name) when the depth was 1 (still don't know why)... I think it would be safer to use git directly from a local machine (eg. during a pre-commit hook) which should allow using the As for compatibility with non-GitHub SCM, I'm not sure yet how we could abstract the REST API usage for others. I've been looking at GitLab's REST API, and the response schema is wildly different because the runners for GitLab are less "pre-configured". I still don't know how to get the JSON payload describing what event triggered the GitLab CI run. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Please don't close this. I still have a branch for this but it seems to only work for local clones of a repo. In the CI, the depth needs to be >= 3 (not shallow). |
Added label pinned, this issue will not be closed. |
I went with a different approach concerning
This should ensure that we are getting the "patch" from REST API calls. And, this allows the same diff parsing algorithm used for either CI env or local env. Essentially, this approach doesn't fully satisfy my original idea, but working with shallow checkouts in the CI has proved rather painful when using git directly in CI. In the end, we should be able to now use |
It looks like you've made good progress, it's nice to support cpp-linter as a pre-commit hook. |
I just realized that by using the diff format from REST API, we will not be getting the |
I just found out about pygit2. It is a python binding for the C library that git uses under the hood (called libgit2). I'm currently working on changing the current solution for this issue (using git CLI) into using pygit2. PS - I'm also implementing some performance improvements I discovered when experimenting with cpp_linter_dart, so stayed tuned for another PR. |
I've had an idea about switching to git directly for getting a push event's diff (and list of changed files). Maybe such feature would warrant v1.5.0 tag also.
I'm just not sure how use git to list a diff for a PR. Determining the head and base commits in that context seems non-trivial.
Originally posted by @2bndy5 in #1 (comment)
The text was updated successfully, but these errors were encountered: