Skip to content

Implement "repo-mode" to use the repo as source of truth for devcontainer/Dockerfile #218

@mafredri

Description

@mafredri

As part of #128, we want to implement "repo"-mode for envbuilder as an alternative to the current "filesystem"-mode.

Filesystem-mode work like this:

  • Check if repository is cloned
  • If no, clone the repository to the filesystem
  • If yes, skip clone
  • Read devcontainer/Dockerfile from filesystem

In that last step, the devcontainer or Dockerfile may be old or locally modified. It depends on what the user has done.

Repo-mode differs from filesystem-mode in that it will always clone the repo (to a temporary location) and use the devcontainer/Dockerfile from there. This way the resulting container is always in-sync with the repo.

For now, we can keep this simple and always clone the repo. This feature should be implemented as a package that can be used by both envbuilder and a future envbuilder Terraform provider.

In the future, we can improve performance by only reading the relevant files from the repo.

Activity

mtojek

mtojek commented on Jun 6, 2024

@mtojek
Member

@mafredri We're having trouble researching the idea, so I will put this on hold until we have a good understanding.

Envbuilder supports the git clone repo with go-git today, also tags/branches are supported. Could you please elaborate more what is the ideal solution you had in mind?

Also, what kind of problem will this issue address? monorepo & sparse checkout?

bpmct

bpmct commented on Jun 6, 2024

@bpmct
Member

Just read the RFC again. Will the filesystem mode have the same level of caching, just via a different architecture or is repo mode more performant in some scenarios?

johnstcn

johnstcn commented on Jun 12, 2024

@johnstcn
Member

Will the filesystem mode have the same level of caching

As I understand, it entirely depends on whether the layers are present in the build cache.

is repo mode more performant in some scenarios?

The way it's described here, repo mode will always clone from the remote. Therefore it may take slightly longer depending on the size of the repo to be cloned. The future performance improvement (only reading the relevant files directly from the repo).

bpmct

bpmct commented on Jun 13, 2024

@bpmct
Member

Understood 👍🏼

I feel like I'd personally use local mode so I can iterate on a devcontainer without having to commit and set a branch parameter

mafredri

mafredri commented on Jun 17, 2024

@mafredri
MemberAuthor

Envbuilder supports the git clone repo with go-git today, also tags/branches are supported. Could you please elaborate more what is the ideal solution you had in mind?

@mtojek what you describe here is "filesystem"-mode. The repo is cloned onto a persistent volume, and on next start the local files are used to run envbuidler. The repo isn't cloned or updated. This is the current behavior.

What we want with "repo"-mode is (in simple terms, disregarding optimizations):

  1. Always clone a fresh copy of the repo as defined by env variables
  2. Do not modify anything in the users persistent files (if their checked out repo is old, it should remain old, if it doesn't exist, we can copy over the repo from 1., etc.)
  3. (Break repo-mode logic out into public func/library so that it can be imported by future envbuilder Terraform provider)

Also, what kind of problem will this issue address? monorepo & sparse checkout?

The issue we're addressing is startup performance (cache utilization) first and foremost. Essentially on use-case is, the repo owner can pre-generate an envbuilder image from the latest commit (of some git repo). By using "repo"-mode, the user can take advantage of the existing image even when their own local files are old and outdated.

The second issue we're addressing is building the functionality for use in a Terraform provider so that startup performance can be further improved without having to run envbuilder --get-cached-image first (this logic will be performed by the Terraform provider).

For this issue, we simply want full clone of repo. In future, we can improve performance further by only checking out necessary files (like @johnstcn mentioned).

mtojek

mtojek commented on Jun 17, 2024

@mtojek
Member

The repo is cloned onto a persistent volume, and on next start the local files are used to run envbuidler.

In case of "repo-mode" this would be /tmp or something similar?

Essentially on use-case is, the repo owner can pre-generate an envbuilder image from the latest commit (of some git repo).

Does it mean we want to precache the repo snapshot in the image? I'm trying to see the benefits because in both cases we have to checkout the Git repo, and for large repos it will take some time unless we "snapshot" it in the image.

mafredri

mafredri commented on Jun 17, 2024

@mafredri
MemberAuthor

In case of "repo-mode" this would be /tmp or something similar?

Probably configurability is best, but /.envbuilder is a solid alternative for /tmp. OTOH /tmp is probably fine too.

Does it mean we want to precache the repo snapshot in the image? I'm trying to see the benefits because in both cases we have to checkout the Git repo, and for large repos it will take some time unless we "snapshot" it in the image.

No, the git repo is only for building, and for get cached image. We don't want it as part of the image.

mtojek

mtojek commented on Jun 17, 2024

@mtojek
Member

Ok, I think I got what you mean:

You want to pull the Git repo based on branch/commit/sha, get.devcontainer out of it, build the image, and push it to the container repository (CR). I understand that CR hash will refer to the repo commit sha? Should we tag the image built by kaniko specially?

No, the git repo is only for building, and for get cached image. We don't want it as part of the image.

Right 👍 we don't want to keep redundant data in too many places.

mafredri

mafredri commented on Jun 17, 2024

@mafredri
MemberAuthor

Ok, I think I got what you mean:

You want to pull the Git repo based on branch/commit/sha, get.devcontainer out of it, build the image, and push it to the container repository (CR). I understand that CR hash will refer to the repo commit sha? Should we tag the image built by kaniko specially?

We can just rely on the image sha produced by Kaniko. The get cached image can deduce the correct sha by cloning the repo. If we tag the git commit sha there needs to be a new build/tag for every commit. Whereas with clone + get cached image it works without tagging. And if the files haven't changed, no additional images need to be pushed.

added a commit that references this issue on Jul 30, 2024
022d9d4

7 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    Implement "repo-mode" to use the repo as source of truth for devcontainer/Dockerfile · Issue #218 · coder/envbuilder