Skip to content

Multiple test failures with OSError: [Errno 84] Invalid or incomplete multibyte or wide character on ZFS with utf8only=on #81765

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
dimitern mannequin opened this issue Jul 13, 2019 · 9 comments
Labels
tests Tests in the Lib/test dir topic-unicode type-bug An unexpected behavior, bug, or error

Comments

@dimitern
Copy link
Mannequin

dimitern mannequin commented Jul 13, 2019

BPO 37584
Nosy @gpshead, @vstinner, @benjaminp, @ezio-melotti, @serhiy-storchaka, @dimitern
Files
  • cpython_test_output.log: Tests output
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2019-07-13.10:13:56.997>
    labels = ['type-bug', '3.9', '3.10', '3.11', 'tests', 'expert-unicode']
    title = 'Multiple test failures with OSError: [Errno 84] Invalid or incomplete multibyte or wide character on ZFS with utf8only=on'
    updated_at = <Date 2021-12-13.02:23:38.746>
    user = 'https://github.com/dimitern'

    bugs.python.org fields:

    activity = <Date 2021-12-13.02:23:38.746>
    actor = 'gregory.p.smith'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Tests', 'Unicode']
    creation = <Date 2019-07-13.10:13:56.997>
    creator = 'dimitern'
    dependencies = []
    files = ['48475']
    hgrepos = []
    issue_num = 37584
    keywords = []
    message_count = 5.0
    messages = ['347794', '347801', '347998', '348006', '408420']
    nosy_count = 6.0
    nosy_names = ['gregory.p.smith', 'vstinner', 'benjamin.peterson', 'ezio.melotti', 'serhiy.storchaka', 'dimitern']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = 'test needed'
    status = 'open'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue37584'
    versions = ['Python 3.9', 'Python 3.10', 'Python 3.11']

    @dimitern
    Copy link
    Mannequin Author

    dimitern mannequin commented Jul 13, 2019

    I'm running Ubuntu 19.04 on a ZFS mirrored pool, where my home partition is configured with 'utf8only=on' attribute. I've cloned cpython and after running the tests, as described in devguide.python.org, I have 11 test failures:

    == Tests result: FAILURE ==

    389 tests OK.

    11 tests failed:
    test_cmd_line_script test_httpservers test_imp test_import
    test_ntpath test_os test_posixpath test_socket test_unicode_file
    test_unicode_file_functions test_zipimport

    I've been looking for similar or matching reported issues, but could not find one. I'm on the EuroPython 2019 CPython sprint and we'll be looking into this with the help of some of the core devs.

    @dimitern dimitern mannequin added 3.7 (EOL) end of life 3.8 (EOL) end of life 3.9 only security fixes tests Tests in the Lib/test dir topic-unicode type-bug An unexpected behavior, bug, or error labels Jul 13, 2019
    @dimitern
    Copy link
    Mannequin Author

    dimitern mannequin commented Jul 13, 2019

    Here's some additional information I found for that specific attribute:

    From the documentation at
    http://dlc.sun.com/osol/docs/content/ZFSADMIN/gazss.html
    (link is dead, but here's where I found the section below: https://zfs-discuss.opensolaris.narkive.com/3NqQVG0H/utf8only-and-normalization-properties#post1)

    utf8only
    Boolean
    Off
    This property indicates whether a file system should reject file names
    that include characters that are not present in the UTF-8 character code
    set. If this property is explicitly set to off, the normalization
    property must either not be explicitly set or be set to none. The
    default value for the utf8only property is off. This property cannot be
    changed after the file system is created.

    @ezio-melotti
    Copy link
    Member

    I think Dimiter was able to fix most of the failures, except test_unicode_file_functions.
    Yesterday during the sprints we were looking at it, and we did some tests using the following snippet:

    import os
    import unicodedata
    upsilon_diaeresis_and_hook = "ϔ"
    
    for form in ["NFC", "NFD", "NFKC", "NFKD"]:                       
      unicode_filename = unicodedata.normalize(form, upsilon_diaeresis_and_hook)
      with open(unicode_filename, "w") as f: f.write(form)
      print("N:", ascii(unicode_filename))
      print([ascii(filename) for filename in os.listdir('.')])

    On ext4 this creates 4 different files: ['\u03d4', '\u03d2\u0308', '\u03ab', '\u03a5\u0308']
    On ZFS with utf8only=true (and I believe normalization=formD), only 2 files are created but each of the 4 filenames can be used to access either of the 2 files.
    This is also the default behavior on Mac.

    The test is already skipped on darwin (Lib/test/test_unicode_file_functions.py:120), and should be skipped for ZFS too (might depend on the exact flags used), however we weren't able to find a portable way to determine the filesystem and flags.

    An alternative is to try creating the 4 files and skip the test if only 2 gets created and if all the names can be used to open these two files, however this might mask other failures. Unless someone can come up with a better way to do this, I think this is the only option.

    In addition, different filesystems that don't exhibit this behavior can be used on Mac, so the test shouldn't be skipped in those cases.

    @vstinner
    Copy link
    Member

    """
    On ext4 this creates 4 different files: ['\u03d4', '\u03d2\u0308', '\u03ab', '\u03a5\u0308']
    On ZFS with utf8only=true (and I believe normalization=formD), only 2 files are created but each of the 4 filenames can be used to access either of the 2 files.
    This is also the default behavior on Mac.

    The test is already skipped on darwin (Lib/test/test_unicode_file_functions.py:120), and should be skipped for ZFS too (might depend on the exact flags used), however we weren't able to find a portable way to determine the filesystem and flags.
    """

    I suggest to create a temporary directory, create the 4 files and see how many files you can using os.listdir(). If you get 4, the FS doesn't normalize anything. If you get less, it's likely that the FS normalizes names.

    @gpshead
    Copy link
    Member

    gpshead commented Dec 13, 2021

    Confirmed.

    Repro: Do an ubuntu 20.04 install and choose "experimental zfs" support during install - https://ubuntu.com/blog/zfs-focus-on-ubuntu-20-04-lts-whats-new). On such a zfs filesystem, the following tests from a ./python -m test.regrtest run fail in 3.10:

    11 tests failed:
    test_cmd_line_script test_httpservers test_imp test_import
    test_ntpath test_os test_posixpath test_socket test_unicode_file
    test_unicode_file_functions test_zipimport

    Move over to a tmpfs and all but test_httpservers now pass. test_httpservers tries to create such a path on /tmp

    ======================================================================
    ERROR: test_undecodable_filename (test.test_httpservers.SimpleHTTPServerTestCase)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/home/greg/test/cpython/Lib/test/test_httpservers.py", line 400, in test_undecodable_filename
        with open(os.path.join(self.tempdir, filename), 'wb') as f:
    OSError: [Errno 84] Invalid or incomplete multibyte or wide character: '/tmp/tmpnt9ch98x/@test_124227_tmp\udce7w\udcf0.txt'

    I expect any filesystem mounted to reject non-UTF8 pathnames to cause similar failures. Our test suite needs to detect this environment and skip these tests there.

    @gpshead gpshead added 3.10 only security fixes 3.11 only security fixes and removed 3.7 (EOL) end of life 3.8 (EOL) end of life labels Dec 13, 2021
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    gentoo-bot pushed a commit to gentoo/cpython that referenced this issue May 21, 2024
    These tests fail on filesystems which disallow
    invalid UTF8, like ZFS with 'utf8only' on.
    
    Bug: https://bugs.python.org/issue37584
    Bug: python#81765
    Signed-off-by: Sam James <sam@gentoo.org>
    gentoo-bot pushed a commit to gentoo/cpython that referenced this issue May 21, 2024
    These tests fail on filesystems which disallow
    invalid UTF8, like ZFS with 'utf8only' on.
    
    Bug: https://bugs.python.org/issue37584
    Bug: python#81765
    Signed-off-by: Sam James <sam@gentoo.org>
    gentoo-bot pushed a commit to gentoo/cpython that referenced this issue Sep 19, 2024
    These tests fail on filesystems which disallow
    invalid UTF8, like ZFS with 'utf8only' on.
    
    Bug: https://bugs.python.org/issue37584
    Bug: python#81765
    Signed-off-by: Sam James <sam@gentoo.org>
    gentoo-bot pushed a commit to gentoo/cpython that referenced this issue Sep 19, 2024
    These tests fail on filesystems which disallow
    invalid UTF8, like ZFS with 'utf8only' on.
    
    Bug: https://bugs.python.org/issue37584
    Bug: python#81765
    Signed-off-by: Sam James <sam@gentoo.org>
    @erlend-aasland
    Copy link
    Contributor

    test_sqlite3 has multiple "undecodable path" tests that fail on ZFS. Marking the following issues as duplicates of this issue:

    @erlend-aasland erlend-aasland added 3.12 only security fixes 3.13 bugs and security fixes and removed 3.11 only security fixes 3.10 only security fixes 3.9 only security fixes labels Oct 10, 2024
    @erlend-aasland
    Copy link
    Contributor

    erlend-aasland commented Oct 10, 2024

    ISTM we need a test support helper that can identify OS conditions such as this particular ZFS configuration. When that is in place, we can gate all "undecodable path" tests using this condition.

    @igoose1
    Copy link

    igoose1 commented Oct 30, 2024

    Temporary workarounds if you need to build python on system with ZFS:

    1. Build without optimizations.

    2. OR build with optimizations in tempfs.

    mkdir buildpy
    mount -t tmpfs -o size=512M tmpfs buildpy  # may require root
    # download source code, in my case it's "Python-3.13.0.tgz"
    tar xf /path/to/Python-3.13.0.tgz -C buildpy
    cd buildpy/Python-3.13.0  # your path can differ
    ./configure --enable-optimizations  # your flags can differ
    make -j16  # your flags can differ

    At the end, there's ./python built with optimizations. This works because building is done in a different file system. Don't forget to umount buildpy directory.

    @57194
    Copy link

    57194 commented Jun 4, 2025

    If anyone else ever comes here from pyenv, just want to provide a little help.

    1. Mostly follow Multiple test failures with OSError: [Errno 84] Invalid or incomplete multibyte or wide character on ZFS with utf8only=on #81765 (comment) (the above comment)
    2. Add TMPDIR=/path/to/tmpfs-mount to your run command
      • e.g.:
       env TMPDIR='/path/to/tmpfs-mount' PYTHON_CONFIGURE_OPTS='--enable-optimizations --with-lto' PYTHON_CFLAGS='-march=native -mtune=native' PROFILE_TASK='-m test.regrtest --pgo -j0' pyenv install --verbose 3.13.4
      

    Edit: you're also gonna need way more space than 512MB :)

    @gpshead gpshead removed 3.12 only security fixes 3.13 bugs and security fixes labels Jun 4, 2025
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    tests Tests in the Lib/test dir topic-unicode type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    6 participants