Skip to content

FIX: Sanitize default filename. #9057

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

anntzer
Copy link
Contributor

@anntzer anntzer commented Aug 18, 2017

Fixes #9056.

I would actually think that we should not do any sanitization ("you get
what you asked for"), but given that there was some opposition last time
I tried to restore the space in the default filename ("Figure 1.png"),
even though the space is totally innocuous for the filesystem, we may as
well fix the filenames that are actually problematic.

As usual feel free to directly (force-)push minor fixes to the PR.

PR Summary

PR Checklist

  • Has Pytest style unit tests
  • Code is PEP 8 compliant
  • New features are documented, with examples if plot related
  • Documentation is sphinx and numpydoc compliant
  • Added an entry to doc/users/next_whats_new/ if major new feature (follow instructions in README.rst there)
  • Documented in doc/api/api_changes.rst if API changed in a backward-incompatible way

["{}.{}".format(default_basename, default_filetype)],
("{}-{}.{}".format(default_basename, i, default_filetype)
for i in itertools.count(1)))
if not os.path.isfile(os.path.join(save_dir, filename)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

neat trick! Combining next() with a generator like that, I'll have to remember that one.

# Characters to be avoided in a NT path:
# https://msdn.microsoft.com/en-us/library/windows/desktop/aa365247(v=vs.85).aspx#naming_conventions
removed_chars = \
{"posix": " /\0", "nt": r'<>:"/\|?*\0'}.get(os.name, "_")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the 'nt' case missing ' '?

Copy link
Contributor Author

@anntzer anntzer Aug 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This is intentional, because there's literally nothing wrong with spaces in filenames on Windows (for many users, most relevant files live in "C:\Documents and Settings..."). (I would argue that it's also the case on Linux but at least you could say well they interact poorly with badly written shell scripts.)

On the other hand I should have mentioned this in the PR doc, sorry about the omission.

I'm going to self-block the PR until we can agree about that change (or until you can convince me to change it back...). EDIT: Apparently I can't self-review a PR, so just going to mark it as WIP for now :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if it is technically ok it is still bad practice (see http://www2.stat.duke.edu/~rcs46/lectures_2015/01-markdown-git/slides/naming-slides/naming-slides.pdf slides from an R lecture, but the point stands). We should not by default be suggesting file names with spaces in them. We should also not suggest different default file names between windows and *nix.

I would not support sanitizing direct input from the user (ex the input to savefig). If users want to shoot them selves in their foot, that is their business, but we should not be shooting their feet for them with the defaults.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The argument from the slides is

new to regular expressions and globbing? be kind to
yourself and avoid

  • spaces in file names
  • punctuation
  • accented characters
  • different files named “foo” and “Foo”

but I don't see how this is even relevant, because globbing occurs after word splitting (and, of course, languages other than shell are not affected at all).

sh-4.4$ mkdir test; echo 'foo' > 'test/with space and àccént & punc;tu:a>t<ion'; cat test/*
foo
sh-4.4$ python -c 'import glob; print(glob.glob("test/*"))'
['test/with space and àccént & punc;tu:a>t<ion']

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tacaswell if you have a look at #9056 you see that the user does not always want to shoot their own foot. Sometimes it is the interaction of several features, like in that case setting a title will affect the chosen name when you click the save button. And when matplotlib proposes a name which does not follow the naming conventions of an operating system, that is bad in my opinion.

@anntzer anntzer changed the title Sanitize default filename. [WIP] Sanitize default filename. Aug 18, 2017
Copy link
Member

@tacaswell tacaswell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filter out ' 'on nt as well.

@tacaswell tacaswell added this to the 2.1 (next point release) milestone Aug 27, 2017
@dstansby dstansby modified the milestones: 2.2 (next next feature release), 2.1 (next point release) Sep 6, 2017
@dstansby
Copy link
Member

dstansby commented Sep 6, 2017

Needs a rebase, but otherwise looks good.

@anntzer
Copy link
Contributor Author

anntzer commented Sep 6, 2017

@tacaswell do you want to do it? I don't really like the turn this has taken (see above review discussion), but won't prevent you from merging this of course.

@1kastner
Copy link

1kastner commented Oct 4, 2017

I'd love to see this merged, what happened to this pull request?

anntzer and others added 2 commits October 5, 2017 00:42
I would actually think that we should not do any sanitization ("you get
what you asked for"), but given that there was some opposition last time
I tried to restore the space in the default filename ("Figure 1.png"),
even though the space is totally innocuous for the filesystem, we may as
well fix the filenames that are *actually* problematic.
This makes the default suggested filename independent of the OS.
@tacaswell tacaswell force-pushed the sanitize-default-filename branch from 39379ea to 715f5da Compare October 5, 2017 07:44
@tacaswell tacaswell changed the title [WIP] Sanitize default filename. FIX: Sanitize default filename. Oct 5, 2017
@dstansby
Copy link
Member

dstansby commented Oct 19, 2017

Does this need an entry in API changes?

@tacaswell
Copy link
Member

👍 to an API changes entry.

@anntzer
Copy link
Contributor Author

anntzer commented Mar 22, 2018

Closing due to disagreement about handling of spaces.

@jat255
Copy link
Contributor

jat255 commented Apr 11, 2018

@tacaswell is there a reason why this did not get merged after the change to include spaces? I have no opinion one way or the other on the inclusion of ' ', but this issue has bitten us over at hyperspy/hyperspy#1884, and it seems more appropriate to apply the proper fix within matplotlib.

@anntzer anntzer deleted the sanitize-default-filename branch May 20, 2018 07:20
@story645 story645 removed this from the future releases milestone Oct 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

default filename does not strip invalid characters on windows
7 participants