Skip to content

Expose _PyCoreConfig structure to Python #78100

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
warsaw opened this issue Jun 20, 2018 · 19 comments
Closed

Expose _PyCoreConfig structure to Python #78100

warsaw opened this issue Jun 20, 2018 · 19 comments
Assignees
Labels
3.8 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement

Comments

@warsaw
Copy link
Member

warsaw commented Jun 20, 2018

BPO 33919
Nosy @warsaw, @ncoghlan, @vstinner, @tiran, @ericsnowcurrently, @emilyemorehouse
PRs
  • bpo-33919: Expose hash_seed and use_hash_seed in sys.hash_info #7867
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/warsaw'
    closed_at = <Date 2019-05-27.23:40:30.967>
    created_at = <Date 2018-06-20.20:06:15.391>
    labels = ['interpreter-core', 'type-feature', '3.8']
    title = 'Expose _PyCoreConfig structure to Python'
    updated_at = <Date 2019-05-27.23:40:30.966>
    user = 'https://github.com/warsaw'

    bugs.python.org fields:

    activity = <Date 2019-05-27.23:40:30.966>
    actor = 'vstinner'
    assignee = 'barry'
    closed = True
    closed_date = <Date 2019-05-27.23:40:30.967>
    closer = 'vstinner'
    components = ['Interpreter Core']
    creation = <Date 2018-06-20.20:06:15.391>
    creator = 'barry'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 33919
    keywords = ['patch']
    message_count = 19.0
    messages = ['320107', '320108', '320109', '320111', '320198', '320211', '320250', '320265', '320267', '320269', '320270', '320285', '320289', '320299', '320345', '320347', '320438', '342530', '343706']
    nosy_count = 6.0
    nosy_names = ['barry', 'ncoghlan', 'vstinner', 'christian.heimes', 'eric.snow', 'emilyemorehouse']
    pr_nums = ['7867']
    priority = 'normal'
    resolution = 'rejected'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue33919'
    versions = ['Python 3.8']

    @warsaw
    Copy link
    Member Author

    warsaw commented Jun 20, 2018

    The _PyCoreConfig structure in pystate.h has some interesting fields that I don't think are exposed anywhere else to Python-land. I was particularly interested recently in hash_seed and use_hash_seed. I'm thinking that it may be useful to expose this structure in the sys module.

    @warsaw warsaw added the 3.8 (EOL) end of life label Jun 20, 2018
    @warsaw warsaw self-assigned this Jun 20, 2018
    @warsaw warsaw added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement labels Jun 20, 2018
    @tiran
    Copy link
    Member

    tiran commented Jun 20, 2018

    hash_seed and use_hash_seed could be added to sys.hash_info. This would be the first place I'd look for the information. After all I implemented it. :)

    @warsaw
    Copy link
    Member Author

    warsaw commented Jun 20, 2018

    On Jun 20, 2018, at 13:28, Christian Heimes <report@bugs.python.org> wrote:

    Christian Heimes <lists@cheimes.de> added the comment:

    hash_seed and use_hash_seed could be added to sys.hash_info. This would be the first place I'd look for the information. After all I implemented it. :)

    That was the first place I looked too :)

    @vstinner
    Copy link
    Member

    Is is still a secret seed if it's public? :)

    @warsaw
    Copy link
    Member Author

    warsaw commented Jun 21, 2018

    I think the basic implementation problem is that by the time you get to get_hash_info() in sysmodule.c, you no longer have access to the _PyCoreConfig object, nor the _PyMain object that it's generally attached to.

    @vstinner
    Copy link
    Member

    I think the basic implementation problem is that by the time you get to get_hash_info() in sysmodule.c, you no longer have access to the _PyCoreConfig object, nor the _PyMain object that it's generally attached to.

    An interpreter now keeps a copy of _PyCoreConfig and _PyMainInterpreterConfig.

    See for example make_flags() in sysmodule.c:

    _PyCoreConfig *core_config = &_PyGILState_GetInterpreterStateUnsafe()->core_config;
    ...
    PyStructSequence_SET_ITEM(seq, pos++, PyBool_FromLong(core_config->dev_mode));

    The interpreter really owns the copy of these configs and they are kept until the interpreter object is destroyed.

    Another example:

    static PyObject *
    import_find_and_load(PyObject *abs_name)
    {
        ...
        int import_time = interp->core_config.import_time;
        ...
    }

    @warsaw
    Copy link
    Member Author

    warsaw commented Jun 22, 2018

    Thanks for the hint! I had a feeling there had to be an API to get at it, but I couldn’t find it. Maybe we should start documenting the Python Secret Underscore API? :)

    On Jun 22, 2018, at 00:04, STINNER Victor <report@bugs.python.org> wrote:

    _PyCoreConfig *core_config = &_PyGILState_GetInterpreterStateUnsafe()->core_config;
    ...
    PyStructSequence_SET_ITEM(seq, pos++, PyBool_FromLong(core_config->dev_mode));

    The interpreter really owns the copy of these configs and they are kept until the interpreter object is destroyed.

    @warsaw
    Copy link
    Member Author

    warsaw commented Jun 22, 2018

    I think there's another thing I'd like to change, and it seems like it's "just" an implementation detail. In _Py_HashRandomization_Init(), if use_hash_seed is 0, then we directly inject the random bits into the buffer, and then there's no hash_seed. I'd like to change that so that if use_hash_seed is 0, then we create a random hash seed first, and then call lcg_urandom() for the hash secret. That way, even if Python itself uses a random hash seed, we'll have a record of that in the runtime that can be used to reproduce the hashing. In this case, I'd still leave use_hash_seed == 0, and that would tell you what combinations of env vars were used.

    @vstinner
    Copy link
    Member

    Nick plans to finish his PEP-432 for Python 3.8 and make the API public.
    See with him? The PEP should document these structures but I was ahead and
    made changes which were not scheduled and the PEP is now outdated.

    @warsaw
    Copy link
    Member Author

    warsaw commented Jun 22, 2018

    Nosying Nick. I agree there's some overlap with Python startup restructuring, but it feels kind of orthogonal too. I really am only exposing (some elements) of that structure to Python.

    What might be interesting though would be if we want to expose the entire structure and not just the hash seeds, as I'm leaning toward here (given that we already have sys.hash_info).

    @vstinner
    Copy link
    Member

    Barry: generating a 32 bit seed gives less entropy and so makes Python
    easier to crash. If you need reproducible Python: generate a seed and set
    the env var before starting Python. Tox does that. Regrtest should do that.

    @warsaw
    Copy link
    Member Author

    warsaw commented Jun 22, 2018

    We could make the hash_seed 64 bits.

    @warsaw
    Copy link
    Member Author

    warsaw commented Jun 22, 2018

    Although I guess that would require modifications to lcg_urandom(). I don't feel qualified to change that function.

    @vstinner
    Copy link
    Member

    We could make the hash_seed 64 bits.

    On my 64-bit Linux, _Py_HashSecret_t takes 24 bytes (192 bits).

    @ncoghlan
    Copy link
    Contributor

    I'm thoroughly open to co-author requests for PEP-432 - the "Let's implement it as a private API for Python 3.7 and see what we learn from the experience" plan worked beautifully, but it *also* means the PEP text is now woefully out of date with reality :)

    The pieces that are missing are:

    • bring it up to date with what we actually did for 3.7
    • decide which of those pieces we want to make public as-is, and which we want to tweak before making them generally available (e.g. does the "ConfigureMainInterpreter" naming still make sense? Or should we go back to the earlier "BeginInitialization" and "EndInitialization" pair?)
    • now that we store this state in a more coherent way, what do we want to make public at the Python layer, and where should we make it public to avoid causing too many problems for other implementations?

    However, while I'd definitely be able to make time to review a PR to the PEP, I can't make any promises as to when I'd be able to sit down and actually draft that update myself.

    @ncoghlan
    Copy link
    Contributor

    Back on the original hash seed topic:

    1. The exact size of the seed ranges from 128 bits (SIPHash) to 32-bits depending on exactly which hash algorithm you're talking about (https://www.python.org/dev/peps/pep-0456/#hash-secret)

    2. While PEP-456 doesn't state it explicitly, my recollection is that omitting the exact hash seed value from the Python level API was a deliberate decision, since one of the *purposes* of PEP-456 was to protect against seed recovery attacks like https://131002.net/siphash/poc.py. Being able to read the seed directly from the sys modules would rather simplify the task of seed recovery :)

    Only exposing a forced_hash_seed (and hiding randomly generated ones as forced_hash_seed=None) seems reasonable though, since those can already be read from os.environ anyway.

    @warsaw
    Copy link
    Member Author

    warsaw commented Jun 25, 2018

    On Jun 23, 2018, at 20:21, Nick Coghlan report@bugs.python.org wrote:

    Only exposing a forced_hash_seed (and hiding randomly generated ones as forced_hash_seed=None) seems reasonable though, since those can already be read from os.environ anyway.

    Only mirroring $PYTHONHASHSEED probably makes the whole ask less useful. Maybe I should abandon the PR, although it may still make sense to export the full _PyCoreConfig structure.

    @vstinner
    Copy link
    Member

    Update. I implemented _testinternalcapi.get_configs() which exports *all* Python configuration used to initialize Python. It contains the hash seed for example. The function is only written for tests.

    Moreover, I proposed the PEP-587 to expose the new _PyCoreConfig as a public C API.

    @vstinner
    Copy link
    Member

    So far, there is no clear agreement to expose C PyConfig structure in Python, so I close the issue.

    My PEP-587 has been accepted. I chose to not expose PyConfig in Python in the PEP. But I'm open to revisit this idea later, especially to move towards PEP-432: implement multi-phase initialization (only partially supported in my PEP-587).

    But I would prefer to a different rationale than exposing hash_seed. For hash_seed alone, I don't think that it's worth it. Moreover, Christian wrote:

    hash_seed and use_hash_seed could be added to sys.hash_info. This would be the first place I'd look for the information. After all I implemented it. :)

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants