-
Notifications
You must be signed in to change notification settings - Fork 13
CLOUDP-333692: Re-design images building #303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
MCK 1.3.0 Release NotesOther Changes
|
d7ae339
to
6649987
Compare
Fix build scenario Remove create and push manifests Continue improvement to main Simplify main and build_context missed Pass Build Configuration object directly Use legacy and new pipeline Fix Remove --include Rename MCO test image Multi platform builds, with buildx TODOs Implement is_release_step_executed() Fix init appdb image Import sort black formatting Some cleaning and version adjustments Adapt main to new build config Add buildscenario to buildconfig Handle build env Renaming, usage of high level config All images build pass on EVG Lint Explicit image type, support custom build_path Replace old by new pipeline in EVG Add documentation Split in multiple files, cleanup WIP, passing builds on staging temp + multi arch manifests Replace usage of sonar Remove namespace Remove pin_at and build_id Copied pipeline, removed daily builds and --exclude
This reverts commit 426e522.
@lucian-tosa I already have integration working -> #317. At least for PRs. |
Maciej answered above, but I also plan to sync with him as soon as this is merged, to properly rebase and wire everything togetger |
def infer_scenario_from_environment(cls) -> "BuildScenario": | ||
"""Infer the build scenario from environment variables.""" | ||
git_tag = os.getenv("triggered_by_git_tag") | ||
is_patch = os.getenv("is_patch", "false").lower() == "true" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is true only when triggering manual patches or PR commits? Will it be false on merges to master?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made it more explicit, and we'll make proper use of staging registry in a follow up:
2ec7587
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was referring to the is_patch
variable from evergreen. This is what I found from the devprod docs
${is_patch} is "true" if the running task is in a patch build and undefined if it is not.
Does this mean that is_patch
will be false in a merge to master?
scripts/release/build_images.py
Outdated
logger.info(f"Created new buildx builder: {builder_name}") | ||
except DockerException as e: | ||
# Check if this is a race condition (another process created the builder) | ||
if hasattr(e, 'stderr') and 'existing instance for' in str(e.stderr): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nammn @lucian-tosa @MaciejKaras
I'd like your opinion on two solutions here
This function is sometimes failing when building multiple agents concurrently.
I tried two alternatives:
- Catching the exception
- Use ensure_buildx_builder at top leven, in the pipeline_main
None of them are ideal. Using exception is not the right way of handling concurrency. Making the main aware of buildx increase coupling.
My knowledge of python multiprocessing is limited. I know that for multi threading it is easy to make a lock with a global variable and threading.Lock()
. From what I've seen online, for multi processing we would need to pass the lock from the agent building method all the way down to here.
I feel like it adds a lot of complexity and optional variables.
What solutions do you prefer ?
Do you see an alternative ?
Here is the alternative solution with ensuring builder in main:
1badff0
|
||
logger.info(f"Building {image_name}, dockerfile args: {build_args}") | ||
logger.debug(f"Build args: {build_args}") | ||
logger.debug(f"Building {image_name} for platforms={build_configuration.platforms}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we have the span already setup here - can we save the span attributes as we already log them anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should really be done as a follow up, it might look like one minute but I have no idea how to properly setup spans.
The scope of this PR is getting quite large
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's listed in the follow up ticket
https://jira.mongodb.org/browse/CLOUDP-335471
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't have to setup anything. All you need to do is already there.
its
span.set_attribute("mck.platforms", build_configuration.platforms)
span.set_attribute("mck.registry", registry)
and the same everywhere...
return builder_name | ||
|
||
|
||
def execute_docker_build( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or here - i think adding tracing costs 1 minute and saves us all the troubles later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
especially since you are logging the exact information we need as metrics anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should really be done as a follow up, it might look like one minute but I have no idea how to properly setup spans.
The scope of this PR is getting quite large
…ros propagation (#328) # Summary Fixes issue with conflicting `Dockerfile` files on master. Sonar is creating Dockerfiles in the directory where new Dockerfile existed already. They were added here -> #289. Additionally another PR #267 made changes to how the agent images were built and especially to agent Dockerfile ARGs: https://github.com/mongodb/mongodb-kubernetes/blob/89866b79233745e27ade51dc2d9fd0e73c6124e5/docker/mongodb-agent/Dockerfile#L3-L6 This conflicted with #303 where new `atomic_pipeline.py` was still depending on the old Dockerfile structure: https://github.com/mongodb/mongodb-kubernetes/blob/774bbaca930a1752060cfe5e5bf60f24cd6c999c/docker/mongodb-agent/Dockerfile#L25-L26 ## Proof of Work Passing CI (especially agent build task) ## Checklist - [ ] Have you linked a jira ticket and/or is the ticket in the title? - [ ] Have you checked whether your jira ticket required DOCSP changes? - [x] Have you added changelog file? - use `skip-changelog` label if not needed - refer to [Changelog files and Release Notes](https://github.com/mongodb/mongodb-kubernetes/blob/master/CONTRIBUTING.md#changelog-files-and-release-notes) section in CONTRIBUTING.md for more details
Re-design images building
Note for review:
Since
atomic_pipeline.py
is largely a refactored version ofpipeline.py
, it’s much clearer to review their side-by-side diff than to wade through GitHub’s “all new lines” view.Here's the diff:
https://gist.github.com/Julien-Ben/3698d532d17bafb380f2e4f05b5db153
You can also take a look at the related TD Section
Changes
The PR refactors our Docker image build system. Most notably by replacing
pipeline.py
along with other components, detailed below.Usage of standalone Dockerfiles
Added in a previous PR, they eliminate the need for templating, and make it possible to retire Sonar once the Atomic Releases Epic is completed.
Building with docker buildx, multi-platform builds
In
build_images.py
we use docker buildx through a python API. It eliminates the need for building images separately for each platform (ARM/AMD), and then manually bundling them in a manifest.Handle build environments explicitly
We’ve introduced a framework that centralizes build configuration by scenario (e.g local development, staging releases etc) so the pipeline automatically picks sensible defaults (registry, target platforms, signing flags, and more) based on where you’re running.
In
pipeline_main.py
(with support frombuild_configuration.py
andbuild_context.py
) we treat each execution context (local dev, merge to master, release etc...) as an explicit, top-level environment.It infers defaults automatically but lets you override any value via CLI flags, ensuring all build parameters live in one single source of truth rather than scattered through pipeline scripts.
CLI usage
Proof of work
CI is building images with the new pipeline, and tests pass.
Note
For the duration of the Atomic Releases epic, both pipelines will be in the repository, until we are done with the staging and promotion process. This new pipeline will only be used for Evergreen patches.
This PR also heavily depends on changes that are introduced by the agent matrix removal, and the multi-platform support epic.
The existing Evergreen function, that uses
pipeline.py
has been renamedlegacy_pipeline
, and is used for release and periodic builds tasks.A new one has been created, calling the new pipeline.
Once the Atomic Release Epic is complete, we'll be able to remove:
pipeline.py
Follow up ticket to this PR: https://jira.mongodb.org/browse/CLOUDP-335471
Checklist
skip-changelog
label if not needed