Implement "repo-mode" to use the repo as source of truth for devcontainer/Dockerfile #218

New issue

Closed

Feature

#290

Closed

Implement "repo-mode" to use the repo as source of truth for devcontainer/Dockerfile#218

Feature

#290

Assignees

mafredri

As part of #128, we want to implement "repo"-mode for envbuilder as an alternative to the current "filesystem"-mode.

Filesystem-mode work like this:

Check if repository is cloned
If no, clone the repository to the filesystem
If yes, skip clone
Read devcontainer/Dockerfile from filesystem

In that last step, the devcontainer or Dockerfile may be old or locally modified. It depends on what the user has done.

Repo-mode differs from filesystem-mode in that it will always clone the repo (to a temporary location) and use the devcontainer/Dockerfile from there. This way the resulting container is always in-sync with the repo.

For now, we can keep this simple and always clone the repo. This feature should be implemented as a package that can be used by both envbuilder and a future envbuilder Terraform provider.

In the future, we can improve performance by only reading the relevant files from the repo.

added

mentioned this

Write RFC around performance/caching improvements #128

mtojek

mentioned this

on Jun 3, 2024

Envbuilder v1.0 release #132

mtojek

Member

@mafredri We're having trouble researching the idea, so I will put this on hold until we have a good understanding.

Envbuilder supports the git clone repo with go-git today, also tags/branches are supported. Could you please elaborate more what is the ideal solution you had in mind?

Also, what kind of problem will this issue address? monorepo & sparse checkout?

bpmct

Member

Just read the RFC again. Will the filesystem mode have the same level of caching, just via a different architecture or is repo mode more performant in some scenarios?

johnstcn

Member

Will the filesystem mode have the same level of caching

As I understand, it entirely depends on whether the layers are present in the build cache.

is repo mode more performant in some scenarios?

The way it's described here, repo mode will always clone from the remote. Therefore it may take slightly longer depending on the size of the repo to be cloned. The future performance improvement (only reading the relevant files directly from the repo).

bpmct

Member

Understood 👍🏼

I feel like I'd personally use local mode so I can iterate on a devcontainer without having to commit and set a branch parameter

mafredri

MemberAuthor

Envbuilder supports the git clone repo with go-git today, also tags/branches are supported. Could you please elaborate more what is the ideal solution you had in mind?

@mtojek what you describe here is "filesystem"-mode. The repo is cloned onto a persistent volume, and on next start the local files are used to run envbuidler. The repo isn't cloned or updated. This is the current behavior.

What we want with "repo"-mode is (in simple terms, disregarding optimizations):

Always clone a fresh copy of the repo as defined by env variables
Do not modify anything in the users persistent files (if their checked out repo is old, it should remain old, if it doesn't exist, we can copy over the repo from 1., etc.)
(Break repo-mode logic out into public func/library so that it can be imported by future envbuilder Terraform provider)

Also, what kind of problem will this issue address? monorepo & sparse checkout?

The issue we're addressing is startup performance (cache utilization) first and foremost. Essentially on use-case is, the repo owner can pre-generate an envbuilder image from the latest commit (of some git repo). By using "repo"-mode, the user can take advantage of the existing image even when their own local files are old and outdated.

The second issue we're addressing is building the functionality for use in a Terraform provider so that startup performance can be further improved without having to run envbuilder --get-cached-image first (this logic will be performed by the Terraform provider).

For this issue, we simply want full clone of repo. In future, we can improve performance further by only checking out necessary files (like @johnstcn mentioned).

mtojek

Member

The repo is cloned onto a persistent volume, and on next start the local files are used to run envbuidler.

In case of "repo-mode" this would be /tmp or something similar?

Essentially on use-case is, the repo owner can pre-generate an envbuilder image from the latest commit (of some git repo).

Does it mean we want to precache the repo snapshot in the image? I'm trying to see the benefits because in both cases we have to checkout the Git repo, and for large repos it will take some time unless we "snapshot" it in the image.

mafredri

MemberAuthor

In case of "repo-mode" this would be /tmp or something similar?

Probably configurability is best, but /.envbuilder is a solid alternative for /tmp. OTOH /tmp is probably fine too.

Does it mean we want to precache the repo snapshot in the image? I'm trying to see the benefits because in both cases we have to checkout the Git repo, and for large repos it will take some time unless we "snapshot" it in the image.

No, the git repo is only for building, and for get cached image. We don't want it as part of the image.

mtojek

Member

Ok, I think I got what you mean:

You want to pull the Git repo based on branch/commit/sha, get.devcontainer out of it, build the image, and push it to the container repository (CR). I understand that CR hash will refer to the repo commit sha? Should we tag the image built by kaniko specially?

No, the git repo is only for building, and for get cached image. We don't want it as part of the image.

Right 👍 we don't want to keep redundant data in too many places.

mafredri

MemberAuthor

Ok, I think I got what you mean:

You want to pull the Git repo based on branch/commit/sha, get.devcontainer out of it, build the image, and push it to the container repository (CR). I understand that CR hash will refer to the repo commit sha? Should we tag the image built by kaniko specially?

We can just rely on the image sha produced by Kaniko. The get cached image can deduce the correct sha by cloning the repo. If we tag the git commit sha there needs to be a new build/tag for every commit. Whereas with clone + get cached image it works without tagging. And if the files haven't changed, no additional images need to be pushed.

mafredri

mentioned this

on Jul 15, 2024