Description
As part of #128, we want to implement "repo"-mode for envbuilder as an alternative to the current "filesystem"-mode.
Filesystem-mode work like this:
- Check if repository is cloned
- If no, clone the repository to the filesystem
- If yes, skip clone
- Read devcontainer/Dockerfile from filesystem
In that last step, the devcontainer or Dockerfile may be old or locally modified. It depends on what the user has done.
Repo-mode differs from filesystem-mode in that it will always clone the repo (to a temporary location) and use the devcontainer/Dockerfile from there. This way the resulting container is always in-sync with the repo.
For now, we can keep this simple and always clone the repo. This feature should be implemented as a package that can be used by both envbuilder and a future envbuilder Terraform provider.
In the future, we can improve performance by only reading the relevant files from the repo.
Activity
mtojek commentedon Jun 6, 2024
@mafredri We're having trouble researching the idea, so I will put this on hold until we have a good understanding.
Envbuilder supports the
git clone
repo withgo-git
today, also tags/branches are supported. Could you please elaborate more what is the ideal solution you had in mind?Also, what kind of problem will this issue address? monorepo & sparse checkout?
bpmct commentedon Jun 6, 2024
Just read the RFC again. Will the filesystem mode have the same level of caching, just via a different architecture or is repo mode more performant in some scenarios?
johnstcn commentedon Jun 12, 2024
As I understand, it entirely depends on whether the layers are present in the build cache.
The way it's described here, repo mode will always clone from the remote. Therefore it may take slightly longer depending on the size of the repo to be cloned. The future performance improvement (only reading the relevant files directly from the repo).
bpmct commentedon Jun 13, 2024
Understood 👍🏼
I feel like I'd personally use local mode so I can iterate on a devcontainer without having to commit and set a branch parameter
mafredri commentedon Jun 17, 2024
@mtojek what you describe here is "filesystem"-mode. The repo is cloned onto a persistent volume, and on next start the local files are used to run envbuidler. The repo isn't cloned or updated. This is the current behavior.
What we want with "repo"-mode is (in simple terms, disregarding optimizations):
The issue we're addressing is startup performance (cache utilization) first and foremost. Essentially on use-case is, the repo owner can pre-generate an envbuilder image from the latest commit (of some git repo). By using "repo"-mode, the user can take advantage of the existing image even when their own local files are old and outdated.
The second issue we're addressing is building the functionality for use in a Terraform provider so that startup performance can be further improved without having to run
envbuilder --get-cached-image
first (this logic will be performed by the Terraform provider).For this issue, we simply want full clone of repo. In future, we can improve performance further by only checking out necessary files (like @johnstcn mentioned).
mtojek commentedon Jun 17, 2024
In case of "repo-mode" this would be
/tmp
or something similar?Does it mean we want to precache the repo snapshot in the image? I'm trying to see the benefits because in both cases we have to checkout the Git repo, and for large repos it will take some time unless we "snapshot" it in the image.
mafredri commentedon Jun 17, 2024
Probably configurability is best, but
/.envbuilder
is a solid alternative for/tmp
. OTOH/tmp
is probably fine too.No, the git repo is only for building, and for get cached image. We don't want it as part of the image.
mtojek commentedon Jun 17, 2024
Ok, I think I got what you mean:
You want to pull the Git repo based on branch/commit/sha, get
.devcontainer
out of it, build the image, and push it to the container repository (CR). I understand that CR hash will refer to the repo commit sha? Should we tag the image built by kaniko specially?Right 👍 we don't want to keep redundant data in too many places.
mafredri commentedon Jun 17, 2024
We can just rely on the image sha produced by Kaniko. The get cached image can deduce the correct sha by cloning the repo. If we tag the git commit sha there needs to be a new build/tag for every commit. Whereas with clone + get cached image it works without tagging. And if the files haven't changed, no additional images need to be pushed.
feat: implement repo-mode
7 remaining items