fix: repository.Init() to Add gitRepoPaths using the proper new cache key format #20962

jpbelangerupgrade · 2024-11-26T14:44:20Z

Regression from #18388

The gitRepoPaths cache key was updated to now include not just the repository url but the repository Project. Since during init we have no way of knowing the repository project. At least add that repository into cache as the "default" / undefined project.

Note: Cherrypicking this in all active release would be great given 2.12+ are all impacted.

Context:
Our manifest repository is ~20Gi or so. It takes about 5mins minimum to get it cloned properly. What happens on repo-server startup is that repo-server become available BEFORE the manifest repository is cloned. Creating a high cpu usage and also a massive pile-up of request to the restarting repo-server which then cause app to become "unknown" and paging our dev team. This makes argo-cd completely freeze up for 5-15mins everytime we do a node update, recycle a node or any update to repo-server itself.
More information and alternative discussed can be read on #7927 (comment)

To alleviate that situation, we run an init-container that pre-clones the repo before starting the repo-server process. Which makes repo-server startup much more stable and doesn't come with a freeze on argo-cd reconciliation

Checklist:

bunnyshell · 2024-11-26T14:44:31Z

🔴 Preview Environment stopped on Bunnyshell

See: Environment Details | Pipeline Logs

Available commands (reply to this comment):

🔵 /bns:start to start the environment
🚀 /bns:deploy to redeploy the environment
❌ /bns:delete to remove the environment

…key format Signed-off-by: Jean-Philippe Bélanger <30598824+jpbelangerupgrade@users.noreply.github.com>

jpbelangerupgrade · 2024-11-26T20:50:28Z

We use pre-cloned repository so if a repository is already cloned, before it would cache that repo location. Now because of the project key, it’s ignoring the existing cloned repository and cloning a new one.
There are also other using persistent volume for their repo-server to have long live repo. Since some apparently have monorepo in the 20+ gig size.

So before this change. we’d end up with twice the repo cloned.
After, the cache is use and our existing cloned repo is getting used again (same as 2.11.x)

andrii-korotkov-verkada · 2024-11-27T02:20:17Z

reposerver/repository/repository.go

+					return err
+				}
+
+				s.gitRepoPaths.Add(string(keyData), fullPath)


Shall we update getters as well? How does it work?

All other "Get" (GetPath and GetPathIfExists) have been updated already to use the Project with the composite key. Add was left behind apparently in the original PR

ddumaisupgrade · 2024-12-19T20:08:14Z

Hey @andrii-korotkov-verkada . Do you have think we can have an ETA on that ? We are waiting for that to be able to update.

andrii-korotkov-verkada · 2024-12-19T20:11:13Z

Unfortunately no. I'm not an approver myself and it's a holiday season. I suggest posting in #argo-cd-contributors in CNCF Slack, but still it can take time.

rumstead · 2025-01-07T16:43:23Z

@blakepettersson just as an FYI

crenshaw-dev · 2025-01-07T21:19:54Z

We should have a unit test that reproduces the bug.

jpbelangerupgrade · 2025-01-14T15:16:50Z

@crenshaw-dev As much as I'd like to, this whole method uses os package and binds itself to the filesystem. This would be extremely hard to mock and actually do a valid test case.
Any chance we can move forward as-is to fix the regression at least. That will allow us to update argo-cd that's lagging being 2 major release right now. As the behavior makes argo-cd unusable for us given the size of our repo.

codecov · 2025-01-14T16:12:14Z

Codecov Report

Attention: Patch coverage is 40.00000% with 3 lines in your changes missing coverage. Please review.

Project coverage is 55.27%. Comparing base (c3600d2) to head (cc2a012).
Report is 2 commits behind head on master.

Files with missing lines	Patch %	Lines
reposerver/repository/repository.go	40.00%	2 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #20962      +/-   ##
==========================================
+ Coverage   55.21%   55.27%   +0.05%     
==========================================
  Files         337      337              
  Lines       56840    56843       +3     
==========================================
+ Hits        31387    31419      +32     
+ Misses      22757    22736      -21     
+ Partials     2696     2688       -8

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

leoluz

Hey JP! Long time no see :)

@jpbelangerupgrade Can you please clarify in the PR description the real impact of this supposed regression?
It seems that you are trying to pre-clone the monorepo and have the repo-server using it. Can you pls confirm? Why is this a hard requirement for monorepos? I'd like to understand how impactful this regression really is.

reposerver/repository/repository.go

leoluz · 2025-01-16T20:08:24Z

reposerver/repository/repository.go

@@ -172,7 +172,12 @@ func (s *Service) Init() error {
 		closer := s.gitRepoInitializer(fullPath)
 		if repo, err := gogit.PlainOpen(fullPath); err == nil {
 			if remotes, err := repo.Remotes(); err == nil && len(remotes) > 0 && len(remotes[0].Config().URLs) > 0 {
-				s.gitRepoPaths.Add(git.NormalizeGitURL(remotes[0].Config().URLs[0]), fullPath)
+				keyData, err := json.Marshal(map[string]string{"url": git.NormalizeGitURL(remotes[0].Config().URLs[0]), "project": ""})


@blakepettersson Do you see an issue about setting the project with an empty string at this stage? User is claiming that the introduction of the project in the key broke this flow. Can you please confirm if this is indeed a regression of #18388?

Also updated PR description with more details and link to another issue we were discussing for alternative

@jpbelangerupgrade Can you pls confirm how you use AppProject at your company? Do you have a single AppProject for all your apps, 1:1 or everything goes in the default project??

To follow up on @leoluz's question, I presume you have a single (non-project scoped) credential for the repo (I presume yes since you have not yet upgraded to 2.12)?

My concern with this approach is that while this may solve your immediate predicament, this would break once/if a project scoped repo credential is added for the same repo url, since that would involve another git clone. Thinking out loud, it might be worthwhile having a way to configure these cache mappings in the repo-server which it would then use to initialize the cache mappings? Something like Repo URL -> n projects + paths? IMO this needs a bit more thought before saying yay nor nay.

Ultimately, since your repo is ginormous (20gb+) I think alternative ways should be considered (e.g #11198 and/or something more immediate, e.g this comment).

Shallow and sparse checkouts aren't a viable short-term option. I'm really not sure they're even a viable long-term option. We really should be able to support 20GB repos without too much pain.

I'm trying to make sure I understand why we don't have the project name available in Service.Init. It looks to me as if we're reconstructing the git paths cache from the repos found on disk. If we need additional info from storage (i.e. the project(s) associated with each repo dir), why don't we just write that info to disk so it's available when we need it?

Personally, I feel reasonably okay w/ multiple projects sharing the same cache dir. Even though different credentials could technically expose different contents, I think that would be super atypical. The permissions granularity in every SCM afaik is "clone the repo" or "not clone the repo."

Personally, I feel reasonably okay w/ multiple projects sharing the same cache dir. Even though different credentials could technically expose different contents, I think that would be super atypical. The permissions granularity in every SCM afaik is "clone the repo" or "not clone the repo."

@crenshaw-dev Do you mean, removing the project from the cache key?

My first comment is proposing keeping the cache key, but storing it on disk so it's available for cache restoration.

My second comment is, yep, might make sense to just remove the project from the cache key.

Which entity is the owner of the cached revision? Would it be the project or the credentials used to access the repo?

Strictly speaking, I think it makes sense to say that the credential owns the cached revision. But practically, I think project serves as a reasonable proxy for credentials which may or may not be different. At worst, we treat two projects as different when the credentials are actually the same.

But I'm not sure it makes sense to differentiate cache ownership by credential or project. I feel like it's safe to assume that each credential has the same read access to the repo (full clone access) and that any write operations performed by applications in different projects are protected by the concurrent access lock (i.e. one project can't lift a secret that was written to disk by a different project).

But I think it would be fair to review any write operations that occur during manifest generation (for example, modifications to files in the .git config repository) to ensure that 1) they're protected by the concurrent access lock and 2) sensitive changes are cleaned up between operations.

Signed-off-by: Jean-Philippe Bélanger <30598824+jpbelangerupgrade@users.noreply.github.com>

jpbelangerupgrade · 2025-01-16T20:32:04Z

Hey JP! Long time no see :)

@jpbelangerupgrade Can you please clarify in the PR description the real impact of this supposed regression? It seems that you are trying to pre-clone the monorepo and have the repo-server using it. Can you pls confirm? Why is this a hard requirement for monorepos? I'd like to understand how impactful this regression really is.

Long time no see indeed. Thank for taking a look. Updated the description with more details and impact on our side.

jpbelangerupgrade · 2025-02-03T20:55:27Z

I did a validation of before and after this change. One thing that @leoluz was mentioning was that my change would regress the current behavior of the project based auth.

Based on my test it doesn't:

If a user leverage project based repo clone, this PR change won't do anything. The repo will be added as default but also, the repo will get recloned for the required project on first use. So this will be of no impact for this use-case
If a user doesn't use projects, which is probably most people (like me), then this will behave as before and allow user to reuse existing clone instead of recloning on each startup of repo-server

@crenshaw-dev and @blakepettersson this might be of interest to figure out the next step here.

Running v2.13.1
Pre-Cloning enabled
drwx------. 30 argocd argocd 16384 Feb  3 15:07 461130fb-e5de-4983-a8b8-7f5484835df8
drwxr-xr-x.  3 argocd argocd    18 Feb  3 15:17 f6e0b016-48c9-4893-8601-771d433a06b7
d---------. 30 argocd root   16384 Feb  3 15:02 k8s-manifests

Default init
time="2025-02-03T15:04:03Z" level=info msg="Initializing https://github.com/credify/k8s-manifests.git to /tmp/_argocd-repo/461130fb-e5de-4983-a8b8-7f5484835df8"
time="2025-02-03T15:07:46Z" level=info msg=Trace args="[git fetch origin --tags --force --prune]" dir=/tmp/_argocd-repo/461130fb-e5de-4983-a8b8-7f5484835df8 operation_name="exec git" time_ms=222532.09260099998

Project based init
time="2025-02-03T15:17:48Z" level=info msg="Initializing https://github.com/credify/k8s-manifests to /tmp/_argocd-repo/f6e0b016-48c9-4893-8601-771d433a06b7"
level=info msg=Trace args="[git fetch origin --tags --force --prune]" dir=/tmp/_argocd-repo/f6e0b016-48c9-4893-8601-771d433a06b7 operation_name="exec git" time_ms=484082.929101


Results:
- Pre-cloned repo is unused. Associated to a cache item that is never able to be looked up
- f6e0b016-48c9-4893-8601-771d433a06b7 is a project based clone
- 461130fb-e5de-4983-a8b8-7f5484835df8 is the default (no project) clone

——
Updated to PR code w/ pre-cloning enabled

total 16
drwxr-xr-x.  3 argocd root    27 Feb  3 18:25 .
drwxrwxrwx.  3 root   root    58 Feb  3 18:32 ..
d---------. 30 argocd root 16384 Feb  3 18:31 k8s-manifests

time="2025-02-03T18:39:52Z" level=info msg="Initializing https://github.com/credify/k8s-manifests to /tmp/_argocd-repo/7131c06f-3577-4da1-95c8-6f72afbbc22f"
time="2025-02-03T18:46:35Z" level=info msg=Trace args="[git fetch origin --tags --force --prune]" dir=/tmp/_argocd-repo/7131c06f-3577-4da1-95c8-6f72afbbc22f operation_name="exec git" time_ms=402383.445823
time="2025-02-03T18:46:35Z" level=error msg="finished unary call with code Unknown" error="failed to initialize repository resources: rpc error: code = Internal desc = Failed to fetch default: `git fetch origin --tags --force --prune` failed timeout after 1m30s" grpc.code=Unknown grpc.method=GenerateManifest grpc.service=repository.RepoServerService grpc.start_time="2025-02-03T18:39:52Z" grpc.time_ms=403061.62 span.kind=server system=grpc

No repo initializing done for “default” reused the pre-clone
total 32
drwxr-xr-x.  4 argocd root      71 Feb  3 18:39 .
drwxrwxrwx.  3 root   root      58 Feb  3 18:32 ..
drwx------. 30 argocd argocd 16384 Feb  3 18:56 7131c06f-3577-4da1-95c8-6f72afbbc22f
d---------. 30 argocd root   16384 Feb  3 18:31 k8s-manifests

Results:
- Pre-cloned repo was reused. And no other clone was done
- 7131c06f-3577-4da1-95c8-6f72afbbc22f is a project based clone

jpbelangerupgrade requested a review from a team as a code owner November 26, 2024 14:44

jpbelangerupgrade force-pushed the fix/repository-path-cache-key branch from fcb7feb to 2ed5acb Compare November 26, 2024 14:58

Fix repository.Init() to Add gitRepoPaths using the proper new cache …

c6534c0

…key format Signed-off-by: Jean-Philippe Bélanger <30598824+jpbelangerupgrade@users.noreply.github.com>

jpbelangerupgrade force-pushed the fix/repository-path-cache-key branch from 2ed5acb to c6534c0 Compare November 26, 2024 14:58

andrii-korotkov-verkada reviewed Nov 27, 2024

View reviewed changes

andrii-korotkov-verkada approved these changes Nov 27, 2024

View reviewed changes

andrii-korotkov-verkada added the ready-for-review label Nov 27, 2024

Merge branch 'master' into fix/repository-path-cache-key

cc2a012

leoluz requested changes Jan 16, 2025

View reviewed changes

reposerver/repository/repository.go Outdated Show resolved Hide resolved

leoluz reviewed Jan 16, 2025

View reviewed changes

Adding more context to err

47c5191

Signed-off-by: Jean-Philippe Bélanger <30598824+jpbelangerupgrade@users.noreply.github.com>

blakepettersson mentioned this pull request Jan 23, 2025

REQUEST: Promotion to Approver for blakepettersson argoproj/argoproj#350

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: repository.Init() to Add gitRepoPaths using the proper new cache key format #20962

fix: repository.Init() to Add gitRepoPaths using the proper new cache key format #20962

jpbelangerupgrade commented Nov 26, 2024 •

edited

Loading

bunnyshell bot commented Nov 26, 2024 •

edited

Loading

jpbelangerupgrade commented Nov 26, 2024 •

edited

Loading

andrii-korotkov-verkada Nov 27, 2024

jpbelangerupgrade Nov 27, 2024

ddumaisupgrade commented Dec 19, 2024

andrii-korotkov-verkada commented Dec 19, 2024

rumstead commented Jan 7, 2025

crenshaw-dev commented Jan 7, 2025

jpbelangerupgrade commented Jan 14, 2025

codecov bot commented Jan 14, 2025

leoluz left a comment •

edited

Loading

leoluz Jan 16, 2025 •

edited

Loading

jpbelangerupgrade Jan 16, 2025

leoluz Jan 16, 2025

blakepettersson Jan 17, 2025

crenshaw-dev Jan 17, 2025

crenshaw-dev Jan 21, 2025

leoluz Jan 23, 2025 •

edited

Loading

crenshaw-dev Jan 30, 2025

devopsjedi Jan 30, 2025

crenshaw-dev Jan 30, 2025 •

edited

Loading

jpbelangerupgrade commented Jan 16, 2025 •

edited

Loading

jpbelangerupgrade commented Feb 3, 2025

fix: repository.Init() to Add gitRepoPaths using the proper new cache key format #20962

Are you sure you want to change the base?

fix: repository.Init() to Add gitRepoPaths using the proper new cache key format #20962

Conversation

jpbelangerupgrade commented Nov 26, 2024 • edited Loading

bunnyshell bot commented Nov 26, 2024 • edited Loading

🔴 Preview Environment stopped on Bunnyshell

jpbelangerupgrade commented Nov 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ddumaisupgrade commented Dec 19, 2024

andrii-korotkov-verkada commented Dec 19, 2024

rumstead commented Jan 7, 2025

crenshaw-dev commented Jan 7, 2025

jpbelangerupgrade commented Jan 14, 2025

codecov bot commented Jan 14, 2025

Codecov Report

leoluz left a comment • edited Loading

Choose a reason for hiding this comment

leoluz Jan 16, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leoluz Jan 23, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

crenshaw-dev Jan 30, 2025 • edited Loading

Choose a reason for hiding this comment

jpbelangerupgrade commented Jan 16, 2025 • edited Loading

jpbelangerupgrade commented Feb 3, 2025

jpbelangerupgrade commented Nov 26, 2024 •

edited

Loading

bunnyshell bot commented Nov 26, 2024 •

edited

Loading

jpbelangerupgrade commented Nov 26, 2024 •

edited

Loading

leoluz left a comment •

edited

Loading

leoluz Jan 16, 2025 •

edited

Loading

leoluz Jan 23, 2025 •

edited

Loading

crenshaw-dev Jan 30, 2025 •

edited

Loading

jpbelangerupgrade commented Jan 16, 2025 •

edited

Loading