Skip to content

gh-91248: Add PyFrame_GetVar() function #95712

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 8, 2022
Merged

gh-91248: Add PyFrame_GetVar() function #95712

merged 1 commit into from
Nov 8, 2022

Conversation

vstinner
Copy link
Member

@vstinner vstinner commented Aug 5, 2022

Add PyFrame_GetVar() and PyFrame_GetVarString() functions to get a
frame variable by its name.

Move PyFrameObject C API tests from test_capi to test_frame.

@vstinner
Copy link
Member Author

vstinner commented Aug 5, 2022

Since PyCode_GetVarnames(), PyCode_GetCellvars() and PyCode_GetFreevars() got added to Python 3.11 (PR #95008), I propose a concrete implementation of my proposed PyFrame_GetVar() API. The PR also adds a PyFrame_GetVarString() API (use const char* type rather than PyObject* for the variable name) which should be more convenient to use in C, but a little bit slower (create a temporary Unicode object, decode the short bytes string from UTF-8).

For now, my implementation uses PyFrame_GetLocals() + PyDict_GetItemWithError() which is not optimum. The implementation can be optimized later. I would prefer to get an agreement on the proposed API and land this PR, and then optimize it.

@Fidget-Spinner @gvanrossum: Would you mind to review my PR?

Copy link
Member

@Fidget-Spinner Fidget-Spinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link
Member

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'd like to have a discussion about the API design first.

@vstinner
Copy link
Member Author

vstinner commented Aug 8, 2022

Sorry, I'd like to have a discussion about the API design first.

Would you prefer int PyFrame_GetVar(PyFrameObject *frame, PyObject *name, PyObject **value) API which would write into *value instead? The problem with such API is to decide what to do with the *value old value before overriding it. If we consider that it's undefined and it must not be read, the API can be misused and it can introduce a reference leak. Like:

value = Py_NewRef(Py_None); // default value
(void)PyFrame_GetVar(frame, name, &value);
// leak a reference on None singleton

That sounds like a confusing API that we don't need more of.

I'm not a big fan of having to check for PyErr_Occurred() to know if an exception was raised when NULL was returned. But the advantage of such API is to avoid the question of what doing with the *value value.

About this specific API, I'm fine with raising NameError if the frame has no variable called name. This API should not be used to check if a variable exists or not. And checking for NameError is trivial in C with PyErr_ExceptionMatches().


In short, I propose: PyObject *PyFrame_GetVar(PyFrameObject *frame, PyObject *name)

  • Return non-NULL value on success
  • Raise NameError and return NULL if the variable doesn't exist
  • Raise an exception and return NULL on error

In short, NULL indicates an error, and non-NULL is a success.

@gvanrossum
Copy link
Member

gvanrossum commented Aug 8, 2022

In short, I propose: PyObject *PyFrame_GetVar(PyFrameObject *frame, PyObject *name)

  • Return non-NULL value on success
  • Raise NameError and return NULL if the variable doesn't exist
  • Raise an exception and return NULL on error

In short, NULL indicates an error, and non-NULL is a success.

That sounds like the API I was asking for, so yes!

@markshannon
Copy link
Member

A few questions:

  • Who wants this function?
  • Why this API instead of the one I proposed in the issue: PyFrameStack_GetVar(int depth, PyObject *name)?
  • Why access the variable by name, not by index?

@vstinner
Copy link
Member Author

Who wants this function?

Well, first I want it :-)

As I explained in the issue #91248, this function adds again a feature indirectly removed in Python 3.11. Well, Python 3.11 adds PyFrame_GetLocals() which gives a similar feature, but it's just less efficient. The idea is to provide a similar feature than what we had in Python 3.10, with similar performance. My plan is to avoid the creation of a temporary dictionary for the final implementation, to have best performances.

Why this API instead of the one I proposed in the issue: PyFrameStack_GetVar(int depth, PyObject *name)?

I see projects working directly on frame objects and I don't think that they can easily move away from their design which worked well on Python 3.10 and older. For example, here is a PR for pyinstrument which gets a class name from a frame object: https://github.com/joerick/pyinstrument/pull/203/files

(...)
    SELF_STRING = PyUnicode_InternFromString("self");
    CLS_STRING = PyUnicode_InternFromString("cls");
(...)

#if PY_VERSION_HEX >= 0x030b0000 // Python 3.11.0
static const char *
_get_class_name_of_frame(PyFrameObject *frame, PyCodeObject *code) {
    PyObject *locals = PyFrame_GetLocals(frame);

    if (!PyDict_Check(locals)) { return NULL; }
    PyObject *self = PyDict_GetItem(locals, SELF_STRING);

    if (self) {
        Py_DECREF(locals);
        return _PyType_Name(self->ob_type);
    }

    PyObject *cls = PyDict_GetItem(locals, CLS_STRING);

    if (cls) {
        Py_DECREF(locals);
        if (!PyType_Check(cls)) {
            return NULL;
        }
        PyTypeObject *type = (PyTypeObject *)cls;
        return _PyType_Name(type);
    }

    Py_DECREF(locals);

    return NULL;
}
#else
(...)

Why access the variable by name, not by index?

I have no idea how to get a variable by its name. Also, in Python 3.10 and older, PyFrameObject.f_locals gave access to a dictionary where keys are variable names. For me, it's more natural to get a variable by its name, than by an index.

Maybe a separated API, even more efficient, can be added for best performance?

@markshannon
Copy link
Member

This API forces requires two steps instead of one, and forces the creation of the frame object.
To get the local variable "foo"

PyFrameObject *frame = PyThreadState_GetFrame(tstate) /* This is expensive */
PyObject foo = PyFrame_GetVar(frame, foo);
Py_DECREF(frame);

What it I would prefer:

PyObject *foo = PyFrameStack_GetVar(tstate, 0, foo);

@joerick
Copy link

joerick commented Aug 19, 2022

Hi all, pyinstrument maintainer here, thanks for this PR and discussion! I just thought I'd chime in from a profiler perspective on this API design question-

This API forces requires two steps instead of one, and forces the creation of the frame object.

In my case, in examining the stack from a setprofile callback, I already have a PyFrameObject. When I iterate down the stack to capture it, I use PyFrame_GetBack, which will create the frame objects anyway. I suppose it would be better not to, but I need to get at the PyCodeObject objects anyway, and PyFrame_GetCode is the only to get them with the public API.

So, for me, PyFrame_GetVar would be very useful. PyFrameStack_GetVar would only be useful if I could also get code objects with e.g. PyFrameStack_GetCode. Oh, and PyFrameStack_GetLineNumber... there might be an argument for some kind of coroutine/generator state inspection too, (@sumerc might have thoughts on this).

It would also be a decent refactor for me to move to a PyThreadState-based API, since all the code in the project uses frame objects currently. That isn't to say that it wouldn't be worth it however! I just don't know yet what the perf differences would be.

Another angle on this - I just read @markshannon's PEP 669 - from that standpoint, where PyFrameObject objects don't appear, I can see the logic of PyFrameStack_GetStuff. Still, I think I'd need more accessors than just PyFrameStack_GetVar to get all the info that I'd want to build a profiler without frame objects.

@sumerc
Copy link

sumerc commented Aug 22, 2022

I have written all the requirements I have for two tracing profilers I maintain (yappi and Blackfire) here:
https://discuss.python.org/t/python-3-11-frame-structure-and-various-changes/17895/16?u=sumerc and Mark Shannon commented that they already had plans that cover all these requirements. (See the next answer in the thread)

I don't have any strong opinions on function prototypes that are discussed here. I am just hoping for:

  • consistent prototypes and errors handling,
  • strong backward compat. on these APIs as they will take the burden off maintainers for every new Python release,
  • efficient.

I know from first hand how much performance difference can these functions make, especially in a tracing profiler where there are lots of calls happening.

Add PyFrame_GetVar() and PyFrame_GetVarString() functions to get a
frame variable by its name.

Move PyFrameObject C API tests from test_capi to test_frame.
@vstinner
Copy link
Member Author

vstinner commented Nov 4, 2022

@gvanrossum: I updated my PR to raise a NameError if the variable doesn't exist. Would you mind to review the updated PR?

@vstinner
Copy link
Member Author

vstinner commented Nov 4, 2022

I prepared python/pythoncapi-compat#46 which implements PyFrame_GetVar() and PyFrame_GetVarString() on Python 3.11 and older. I also wrote it to prove that it's possible to implement this function on old Python versions.

@vstinner
Copy link
Member Author

vstinner commented Nov 4, 2022

Reminder:

For now, my implementation uses PyFrame_GetLocals() + PyDict_GetItemWithError() which is not optimum. The implementation can be optimized later. I would prefer to get an agreement on the proposed API and land this PR, and then optimize it.

@vstinner vstinner merged commit 4d5fcca into python:main Nov 8, 2022
@vstinner vstinner deleted the frame_getvar branch November 8, 2022 16:40
@vstinner
Copy link
Member Author

vstinner commented Nov 8, 2022

Ok, I merged my PR.

As I wrote, IMO it doesn't prevent adding PyFrameStack_GetVar(). But PyFrame_GetVar() fills a hole created in Python 3.11 C API when PyFrameObject structure was made internal.

Now we can see how to optimize PyFrame_GetVar() :-)

I prepared python/pythoncapi-compat#46 which implements PyFrame_GetVar() and PyFrame_GetVarString() on Python 3.11 and older

Merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants