Skip to content

_zoneinfo.c incorrectly checks bounds of day variable in calenderrule_new #86826

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pganssle opened this issue Dec 16, 2020 · 13 comments
Closed
Labels
3.9 only security fixes 3.10 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@pganssle
Copy link
Member

pganssle commented Dec 16, 2020

BPO 42660
Nosy @serhiy-storchaka, @ammaraskar, @pganssle
PRs
  • gh-86826: Fix parsing TZ strings in zoneinfo module #23825
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2020-12-16.15:59:28.012>
    labels = ['type-bug', 'library', '3.9', '3.10']
    title = '_zoneinfo.c incorrectly checks bounds of `day` variable in calenderrule_new'
    updated_at = <Date 2020-12-17.21:16:01.113>
    user = 'https://github.com/pganssle'

    bugs.python.org fields:

    activity = <Date 2020-12-17.21:16:01.113>
    actor = 'serhiy.storchaka'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2020-12-16.15:59:28.012>
    creator = 'p-ganssle'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 42660
    keywords = ['patch']
    message_count = 10.0
    messages = ['383180', '383182', '383186', '383187', '383188', '383192', '383200', '383202', '383203', '383266']
    nosy_count = 3.0
    nosy_names = ['serhiy.storchaka', 'ammar2', 'p-ganssle']
    pr_nums = ['23825']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue42660'
    versions = ['Python 3.9', 'Python 3.10']

    Linked PRs

    @pganssle
    Copy link
    Member Author

    This is a code style issue — in #23614, a regression was deliberately introduced to satisfy an overzealous compiler. The day variable has logical bounds 0 <= day <= 6. In the original code, both sides of this boundary condition were explicitly checked (since this logically documents the bounds of the variable).

    Some compilers complain about checking day < 0, because day is an unsigned type. It is not an immutable fact that day will always be an unsigned type, and implicitly relying on this fact makes the code both less readable and more fragile. This was changed over my objections and despite the fact that I had a less fragile solution available that also satisfied the overzealous compiler.

    In the short term, my preferred solution would be to add in a static assertion that day is an unsigned type — this does not have to work on every platform, it simply needs to serve as a notification to make the code less fragile and to document our assumptions to both readers and the compiler.

    In the long term, I think we need a way to solve the problem that it is apparently not possible to disable any compiler warnings even if they don't apply to the situation!

    @pganssle pganssle added 3.9 only security fixes 3.10 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Dec 16, 2020
    @serhiy-storchaka
    Copy link
    Member

    I do not think there is any problem *now*. The type of day is uint8_t, it cannot be negative.

    BTW, why is it uint8_t? Would not be easier to use type int by default? I doubt that it saves memory or makes the code faster.

    @pganssle
    Copy link
    Member Author

    As I mentioned, it's a style issue. Yes this did not introduce any user-observable bugs, nor should it have changed the machine code emitted by the assembly on any competent compiler.

    The issue is that I had deliberately chosen to do a "redundant" check to improve the readability and to be robust against someone saying, "Why is this a unit8_t instead of an int? I don't think it makes anything faster..." or some other justification for changing all the uint8_t fields to ints.

    Previous to this, we had something where the compiler would fix any bugs caused by that by itself by emitting an additional bounds check. In my proposed fix to the compiler warnings, there was a static assertion where the machine would alert us if the situation changed. The problem is that we now have a non-machine-checked comment for something that would be difficult to issue tests for.

    Frankly, I think that even the worst case scenario here isn't that bad — TZif files malformed in a very specific way would be accepted as valid. As it's coded now, this would probably just lead to a crash later, or possibly some weird results in time zone math. Still, I think we need a solution to this problem because our blind adherence to an invalid compiler warning is leading us to write more fragile code, whic h is especially problematic in C, which is notoriously dangerous and problematic to write.

    @ammaraskar
    Copy link
    Member

    Some compilers complain about checking day < 0, because day is an unsigned type

    Just my two cents, this isn't just "some compilers". Everything from gcc, msvc, C# to the rust compiler complain about this sort of code. As they should, this is effectively dead code.

    I think the more pragmatic way to enforce and document this assumption would be to have a unit test that actually checks that the constructor fails with "negative" days. It'll continue to fail right now as its interpretation as an unsigned int will be large and it will start failing if someone changes this to a signed type.

    @pganssle
    Copy link
    Member Author

    Just my two cents, this isn't just "some compilers". Everything from gcc, msvc, C# to the rust compiler complain about this sort of code. As they should, this is effectively dead code.

    They complain because most of the time it's a sign that people were intending to use a signed integer, and it's an indicator that they may be susceptible to integer overflow.

    In this case, it was a deliberate choice to include the extra check knowing it's dead code, because it is essentially a machine-checked documentation of the bounds of that particular variable.

    I think the more pragmatic way to enforce and document this assumption would be to have a unit test that actually checks that the constructor fails with "negative" days. It'll continue to fail right now as its interpretation as an unsigned int will be large and it will start failing if someone changes this to a signed type.

    This would be helpful, but it's not an either-or situation. This particular thing would be tricky to write a targeted unit test for, because it's a very deep implementation detail. Such a unit test would also be quite physically separated from the code in question, whereas if we left the code as-is, we'd never see this type of bug, and if we added a static assertion we'd be told at compile time exactly what is wrong.

    The biggest problem here is the fact that we cannot disable compiler warnings when complying with them makes the code less readable and/or less robust. This makes compiler warnings less useful, since it changes the calculus around whether or not it's worth it to introduce a new compiler warning.

    @serhiy-storchaka
    Copy link
    Member

    I propose to change the type of day to int. It will remove any objections againsy the range check and make the code less error-prone for integer overflow and sign loss.

    @pganssle
    Copy link
    Member Author

    There are at least two issues with this:

    1. This is a constructor for a struct, and the struct would really unnecessarily balloon in size if we switched everything over to be 32-bit integers (which I think is your suggestion?). This is not a major concern because in typical operation there are not likely to be more than 500 of these structs in existence at any one time (due to the cache), but it's still a minor annoyance.

    I would guess that accepting int in the constructor function and converting it to uint8_t as part of building the struct would be maybe neutral to negative, since it involves casting a large signed type to a small unsigned one. This could cause more problems with overflow, not less.

    1. In the current implementation day is unsigned by its nature — it represents a value indexed at 0 for which negative indices have no meaning. If we leave day as a uint8_t, then that tells the compiler at least something about the valid range of the value. By turning day (and presumably the other elements of this struct) into a less-suitable type, we're depriving ourselves of possibly useful compiler warnings.

    Barring a solution where we have a simple and robust way to suppress warnings, I think the next-best solution is to add a static assertion about the signedness of the type for this particular case. I'd be happy to write it in a relatively general way so that it can be re-used elsewhere in CPython for the same purpose.

    @serhiy-storchaka
    Copy link
    Member

    I propose to change types of function parameters and local variables. In any case the constructor checks the range of parameters, so there is no problem with packing them into compact structure.

    This will help to avoid errors of implicit conversion between different integer types. Also it can help to avoid code duplication in parsing integers of different size and signedness.

    Adding a static assertion about the signedness of uint8_t looks meaningless to me.

    @pganssle
    Copy link
    Member Author

    Adding a static assertion about the signedness of uint8_t looks meaningless to me.

    My proposal is to add a static assertion that day is an unsigned type. In the code it would look something like it does in the backports.zoneinfo code (https://github.com/pganssle/zoneinfo/blob/07ec80ad5dc7e7e4b4f861ddbb61a9b71e9f27c7/lib/zoneinfo_module.c#L1247-L1255):

        // The actual bounds of day are (day >= 0 && day <= 6), but since day is an
        // unsigned variable, day >= 0 is always true. To ensure that a bug is not
        // introduced in the event that someone changes day to a signed type, we
        // will assert that it is an unsigned type.
        Py_ASSERT_VAR_UNSIGNED(day);
        if (day > 6) {
            PyErr_Format(PyExc_ValueError, "Day must be in [0, 6]");
            return -1;
        }
    

    I propose to change types of function parameters and local variables. In any case the constructor checks the range of parameters, so there is no problem with packing them into compact structure.

    This will help to avoid errors of implicit conversion between different integer types. Also it can help to avoid code duplication in parsing integers of different size and signedness.

    This is not entirely unreasonable in my opinion, though in this case everything is determined by the POSIX standard and an RFC, so it is unlikely that we'll ever see anything outside of 8 bit integers for these things. If you'd like to refactor the parsing code to use signed integers unconditionally and have them converted as part of the constructor that seems reasonable.

    @serhiy-storchaka
    Copy link
    Member

    PR 23825 refactors parsing. There is now a single function for parsing numbers. It saves around 70 lines of code and makes easy to implement support of full range of transition hours.

    It could be possible to represent transition time as 32-bit offset in seconds in CalendarRule and DayRule instead of 16-bit hour, and 8-bit minute and second. In any case we need that value instead of separate values of hour, minute and second.

    @StanFromIreland
    Copy link
    Contributor

    This is complete with Serhiy's pr, no?

    @picnixz
    Copy link
    Member

    picnixz commented Mar 13, 2025

    I think so. Thus I'm closing the issue as completed. Feel free to reopen @serhiy-storchaka if there were still things to do.

    @serhiy-storchaka
    Copy link
    Member

    Thanks, @picnixz. I forgot about this issue.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes 3.10 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    Archived in project
    Development

    No branches or pull requests

    5 participants