Skip to content

Use os.getpid() in configdir, to avoid multiprocess concurrency issues #15911

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

dan-stromberg
Copy link

PR Summary

PR Checklist

  • Has Pytest style unit tests
  • Code is Flake 8 compliant
  • New features are documented, with examples if plot related
  • Documentation is sphinx and numpydoc compliant
  • Added an entry to doc/users/next_whats_new/ if major new feature (follow instructions in README.rst there)
  • Documented in doc/api/api_changes.rst if API changed in a backward-incompatible way

@anntzer
Copy link
Contributor

anntzer commented Dec 11, 2019

Why would it a problem that multiple subprocesses share the same config/cachedir? In fact that's normally the case (they all share ~/.config/matplotlib ~/.cache/matplotlib), why would it be different when using a tmpdir?

It's actually not really clear to me what this is trying to solve.

@dan-stromberg
Copy link
Author

dan-stromberg commented Dec 11, 2019 via email

@anntzer
Copy link
Contributor

anntzer commented Dec 11, 2019

fair point... but I think separate processes can't actually end up creating the same directory? mkdtemp is documented as

There are no race conditions in the directory’s creation.

and if you look at the implementation I think at the OS-level if there is a name collision this is handled by the FileExistsError case (https://github.com/python/cpython/blob/3.8/Lib/tempfile.py#L358).

@dan-stromberg
Copy link
Author

dan-stromberg commented Dec 11, 2019 via email

@anntzer
Copy link
Contributor

anntzer commented Dec 11, 2019

Can you check whether the fork occurred before or after the temporary directory was created and the atexit call registered? I'm still puzzled...

@dan-stromberg
Copy link
Author

dan-stromberg commented Dec 11, 2019 via email

Copy link
Contributor

@anntzer anntzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I interpret both the docs of mkdtemp and its implementation as guaranteeing that two calls to mkdtemp, even in separate (e.g., after-fork) processes, can never return the same directory.
Note that if that's not the case, then the proposed patch here can still fail if e.g. two separate OSes access the same tmpdir (e.g. on an NFS), so that's why I would like to understand what is really happening...

@dan-stromberg
Copy link
Author

dan-stromberg commented Dec 12, 2019 via email

@anntzer
Copy link
Contributor

anntzer commented Dec 12, 2019

Just to be sure:

  • what's the version of matplotlib you're using? does it happen with mpl master?
  • you are positive that adding the pid solves the problem?

@anntzer
Copy link
Contributor

anntzer commented Dec 12, 2019

Also, your log suggest the error is only occurring at shutdown (the line orig_st = os.lstat(path) only occurs in the implementation of shutil.rmtree, which is called by the atexit handler)?

@dan-stromberg
Copy link
Author

dan-stromberg commented Dec 12, 2019 via email

@anntzer
Copy link
Contributor

anntzer commented Dec 12, 2019

At least it would be nice to know if this only occurs with mpl2.2.4, which is quite old (and likely will go unsupported relatively soon, though it's not my call), or mpl3+, which has diverged quite a bit (due to the dropping of Py2 support).

@tacaswell tacaswell added this to the v3.3.0 milestone Dec 12, 2019
@anntzer
Copy link
Contributor

anntzer commented Dec 12, 2019

Actually I now have a repro, that fails even if including the pid in the tmpdir (i.e., this PR is insufficient to fix it):

import os
import time

import matplotlib as mpl
print(mpl.get_configdir())
print(mpl.get_cachedir())

if os.fork():  # parent exits immediately, destroying the tmp configdir.
    pass
else:
    time.sleep(1)  # child waits that parent quits, then crashes at exit...
    # or if later doing something that tries to access the cachedir, e.g.
    # import matplotlib.font_manager

(first do chmod a-x ~/.{config,cache}/matplotlib (temporarily) to force the creation of config/cachedirs in /tmp)

Looks like the following patch works, can you confirm?

diff --git i/lib/matplotlib/__init__.py w/lib/matplotlib/__init__.py
index 11968a34b..f62a55932 100644
--- i/lib/matplotlib/__init__.py
+++ w/lib/matplotlib/__init__.py
@@ -523,16 +523,6 @@ def get_home():
         return None
 
 
-def _create_tmp_config_or_cache_dir():
-    """
-    If the config or cache directory cannot be created, create a temporary one.
-    """
-    configdir = os.environ['MPLCONFIGDIR'] = (
-        tempfile.mkdtemp(prefix='matplotlib-'))
-    atexit.register(shutil.rmtree, configdir)
-    return configdir
-
-
 def _get_xdg_config_dir():
     """
     Return the XDG configuration directory, according to the `XDG
@@ -551,7 +541,29 @@ def _get_xdg_cache_dir():
     return os.environ.get('XDG_CACHE_HOME') or str(Path.home() / ".cache")
 
 
-def _get_config_or_cache_dir(xdg_base):
+# The following functions rely on getpid() to make sure 1) that a temporarily
+# created config/cachedir is not shared between parent and child processes (the
+# parent may exit and destroy it first), and 2) that child processes don't
+# inherit the parent's atexit handler.
+
+
+def _create_tmp_config_or_cache_dir():
+    """
+    If the config or cache directory cannot be created, create a temporary one.
+    """
+    configdir = tempfile.mkdtemp(prefix='matplotlib-')
+
+    @atexit.register
+    def cleanup(getpid=os.getpid, pid=os.getpid(),
+                rmtree=shutil.rmtree, configdir=configdir):
+        if getpid() == pid:
+            rmtree(configdir)
+
+    return configdir
+
+
+@functools.lru_cache()
+def _get_config_or_cache_dir(xdg_base, pid):
     configdir = os.environ.get('MPLCONFIGDIR')
     if configdir:
         configdir = Path(configdir).resolve()
@@ -569,7 +581,6 @@ def _get_config_or_cache_dir(xdg_base):
     return _create_tmp_config_or_cache_dir()
 
 
-@_logged_cached('CONFIGDIR=%s')
 def get_configdir():
     """
     Return the string representing the configuration directory.
@@ -586,10 +597,11 @@ def get_configdir():
        configuration directory.
     5. A writable directory could not be found or created; return None.
     """
-    return _get_config_or_cache_dir(_get_xdg_config_dir())
+    configdir = _get_config_or_cache_dir(_get_xdg_config_dir(), os.getpid())
+    _log.debug("CONFIGDIR=%s", configdir)
+    return configdir
 
 
-@_logged_cached('CACHEDIR=%s')
 def get_cachedir():
     """
     Return the location of the cache directory.
@@ -597,7 +609,9 @@ def get_cachedir():
     The procedure used to find the directory is the same as for
     _get_config_dir, except using `$XDG_CACHE_HOME`/`~/.cache` instead.
     """
-    return _get_config_or_cache_dir(_get_xdg_cache_dir())
+    cachedir = _get_config_or_cache_dir(_get_xdg_cache_dir(), os.getpid())
+    _log.debug("CACHEDIR=%s", cachedir)
+    return cachedir
 
 
 def _get_data_path():

@dan-stromberg
Copy link
Author

dan-stromberg commented Dec 13, 2019 via email

@anntzer
Copy link
Contributor

anntzer commented Dec 13, 2019

Actually, looking at this again, there's a much simpler solution: you can set the MPLCONFIGDIR environment variable (which, despite the name, applies to both the config and cache dirs) to a writable directory that you control. Would that work for you?
I think we should actually emit a warning in the case where we need to create tmp config/cache dirs asking the user to do so themselves (so that e.g. we don't have the regen the font cache on every import, which is a bit silly).
Edit: looks like emitting a warning was also considered when the tmpdir path was first added in #832.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants