Skip to content

Regression on initialization and new non-deterministic behavior issue in mimetypes #93417

@pombredanne

Description

@pombredanne

Bug report

Mimetypes should NOT include defaults when initialized with file(s)

This is a follow up from https://bugs.python.org/issue4963 that led to this merged PR by @davidkhess #3062

I reported the issue in https://bugs.python.org/issue4963#msg384730 :

The changes introduced by this ticket in 9fc720e#r45794801 are problematic.

I discovered this from having tests failing when testing on Python 3.7 and up

The bug is that calling mimetypes.init(files) will NOT use my files, but instead use both my files and knownfiles.
This was not the case before as knownfiles would be ignored as expected when I provide my own files list.

This is a breaking API change IMHO and introduces a buggy instability : even if I want to ignore knownfiles by providing my list of of files, knownfiles will always be added and this results in erratic and buggy behaviour as the content of "knownfiles" is completely random based on the OS version and else.

The code I am using is here https://github.com/nexB/typecode/blob/ba07c04d23441d3469dc5de911376d408514ebd8/src/typecode/contenttype.py#L308

I think we should reopen to fix (or create a new ticket)

Actually this is problematic on multiples counts:

  1. the behaviour changes and this is a regression
  2. even if that new buggy behaviour was the one to use, it should not give preference to knownfiles over init-provided files, but at least take the provided files first and knownfiles second.

See also this thread 9fc720e#r45794801

@pombredanne
pombredanne on Jan 9, 2021 Contributor

This introduces a buggy unstability starting with Python 3.7: even if I want to ignore knownfiles by providing my list of of files, knownfiles will always be added and this results in erratic and buggy behaviour as the content of "knownfiles" is completely random based on the OS version and else.

@adeadman
adeadman 2 hours ago

@davidkhess this has resulted in an undocumented change to the behaviour of calling mimetypes.init(files=["mimetype_list"]) as now the defaults are included where they weren't before Python 3.7.

I'm not sure how this can be fixed without breaking compatibility or principle of least surprise somewhere - perhaps by adding a new kwarg to init() that ignores knownfiles?

@pombredanne
pombredanne 1 hour ago Contributor

@adeadman re:

I'm not sure how this can be fixed without breaking compatibility or principle of least surprise somewhere - perhaps by adding a new kwarg to init() that ignores knownfiles?

IMHO the current un-documented behaviour is a bug so fixing this would not be an API breakage.
Also the changelog is lying:

Fixed non-deterministic behavior related to mimetypes extension mapping and module reinitialization.

The behaviour used to be mostly deterministic and is now non-deterministic.

And @adeadman also posted in #3062 (comment)

This change has resulted in a difference in behaviour from previous versions of Python in that even when specifying files as a parameter to init(), the known files are loaded, which is not the case in previous versions. This causes problems when an application wishes to use the mimetypes database with an internal-only mapping of mimetypes, to work around inconsistencies with mimetypes in the known files list (which can vary between systems).

Unfortunately this change wasn't documented in the changelog either, nor in the docstring comments, so it can only be found by examining the commit history for the code in cpython.

Would it be appropriate to add another kwarg to init() indicating that knownfiles are to be ignored/excluded from the DB initialization?

Metadata

Metadata

Assignees

No one assigned

    Labels

    3.10only security fixes3.11only security fixes3.12only security fixestopic-emailtype-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions