Page MenuHomePhabricator

Scap fails with `docker: Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock`
Closed, ResolvedPublic

Description

11:03:38 Finished scap prep auto (duration: 00m 10s)
           ___ ____
         ⎛   ⎛ ,----
          \  //==--'
     _//|,.·//==--'    ____________________________
    _OO≣=-  ︶ ᴹw ⎞_§ ______  ___\ ___\ ,\__ \/ __ \
   (∞)_, )  (     |  ______/__  \/ /__ / /_/ / /_/ /
     ¨--¨|| |- (  / ______\____/ \___/ \__^_/  .__/
         ««_/  «_/ jgs/bd808                /_/

11:03:39 Started scap sync-world: Backport for [[gerrit:1076715|Pin wgRevisionSlotsCacheExpiry to default (T183490)]], [[gerrit:1076451|Move closed wikis to group0 except aawiki]]
11:03:39 Started cache_git_info
11:03:40 Finished cache_git_info (duration: 00m 00s)
11:03:40 Started l10n-update
11:03:40 docker: Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Post "http://%2Fvar%2Frun%2Fdocker.sock/v1.24/containers/create": dial unix /var/run/docker.sock: connect: permission denied.
See 'docker run --help'.

11:03:40 Finished l10n-update (duration: 00m 00s)
11:03:40 Unhandled error:
Traceback (most recent call last):
  File "/var/lib/scap/scap/lib/python3.9/site-packages/scap/cli.py", line 660, in run
    exit_status = app.main(app.extra_arguments)
  File "/var/lib/scap/scap/lib/python3.9/site-packages/scap/main.py", line 1066, in main
    return super().main(*extra_args)
  File "/var/lib/scap/scap/lib/python3.9/site-packages/scap/main.py", line 140, in main
    self._update_caches()
  File "/var/lib/scap/scap/lib/python3.9/site-packages/scap/main.py", line 1091, in _update_caches
    tasks.update_localization_cache(version, self)
  File "/var/lib/scap/scap/lib/python3.9/site-packages/scap/utils.py", line 330, in context_wrapper
    return func(*args, **kwargs)
  File "/var/lib/scap/scap/lib/python3.9/site-packages/scap/tasks.py", line 622, in update_localization_cache
    mw_runtime.run_shell("rm -f {}/*.tmp.*", cache_dir)
  File "/var/lib/scap/scap/lib/python3.9/site-packages/scap/mwscript.py", line 102, in run_shell
    return self._run("/bin/bash", ["-c", command.format(*args)], **kwargs)
  File "/var/lib/scap/scap/lib/python3.9/site-packages/scap/utils.py", line 330, in context_wrapper
    return func(*args, **kwargs)
  File "/var/lib/scap/scap/lib/python3.9/site-packages/scap/mwscript.py", line 190, in _run
    raise err
  File "/var/lib/scap/scap/lib/python3.9/site-packages/scap/mwscript.py", line 184, in _run
    completed.check_returncode()
  File "/usr/lib/python3.9/subprocess.py", line 460, in check_returncode
    raise CalledProcessError(self.returncode, self.args, self.stdout,
subprocess.CalledProcessError: Command '['docker', 'run', '--rm', '--attach', 'stdin', '--attach', 'stdout', '--attach', 'stderr', '--user', 'www-data', '--mount', 'type=bind,source=/srv/mediawiki-staging,target=/srv/mediawiki-staging', '--mount', 'type=bind,source=/tmp,target=/tmp', '--workdir', '/srv/mediawiki-staging', '--entrypoint', '/bin/bash', '--network', 'none', 'docker-registry.wikimedia.org/php7.4-fpm-multiversion-base', '-c', 'rm -f /srv/mediawiki-staging/php-1.43.0-wmf.24/cache/l10n/*.tmp.*']' returned non-zero exit status 126.
11:03:40 scap failed: <CalledProcessError> Command '['docker', 'run', '--rm', '--attach', 'stdin', '--attach', 'stdout', '--attach', 'stderr', '--user', 'www-data', '--mount', 'type=bind,source=/srv/mediawiki-staging,target=/srv/mediawiki-staging', '--mount', 'type=bind,source=/tmp,target=/tmp', '--workdir', '/srv/mediawiki-staging', '--entrypoint', '/bin/bash', '--network', 'none', 'docker-registry.wikimedia.org/php7.4-fpm-multiversion-base', '-c', 'rm -f /srv/mediawiki-staging/php-1.43.0-wmf.24/cache/l10n/*.tmp.*']' returned non-zero exit status 126. (scap version: 4.107.0-1) (duration: 00m 00s)
11:03:40 backport failed: <CalledProcessError> Command '['/usr/bin/scap', 'sync-world', '--pause-after-testserver-sync', '--notify-user=zabe', 'Backport for [[gerrit:1076715|Pin wgRevisionSlotsCacheExpiry to default (T183490)]], [[gerrit:1076451|Move closed wikis to group0 except aawiki]]']' returned non-zero exit status 1. (scap version: 4.107.0-1)
zabe@deploy2002:~$

Event Timeline

Zabe triaged this task as Unbreak Now! priority.Sep 30 2024, 11:05 AM
Zabe added projects: Scap, serviceops.
Zabe assigned this task to jnuche.

scap got rolled back to a non-broken version: https://sal.toolforge.org/log/-KWmQpIBFk7ipym_zJBT

jnuche removed jnuche as the assignee of this task.
jnuche added subscribers: dduvall, dancy, jnuche.

As a temporary measure, I rolled scap back to 4.104.0

jnuche lowered the priority of this task from Unbreak Now! to High.Sep 30 2024, 1:48 PM

Previously docker commands were only run by the mwbuilder user (via sudo) during image build. Now docker is executed by whoever is running scap. The users in the docker group are a subset of the users in the deployment group, which is why this error happened.

Related to T303450: Add some users to the docker group on deployment servers.

The current docker group membership for the deployment servers is the SREs group + the RelEng group + the mwbuilder user. Expanding to include the deployers group is a trivial patch to modules/admin/data/data.yaml in ops/puppet. An open question is if that expansion should be done or if sudo -u mwbuilder should wrap the newer Docker usages that have been added to scap as part of T369115: [WE6.2.1] Publish pre-train single version containers.