Skip to content

Optimize for Runtime.InitializePlatformData #891

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
amos402 opened this issue Jun 21, 2019 · 13 comments
Closed

Optimize for Runtime.InitializePlatformData #891

amos402 opened this issue Jun 21, 2019 · 13 comments

Comments

@amos402
Copy link
Member

amos402 commented Jun 21, 2019

The implementation for Runtime.InitializePlatformData just have some limitations.
The platform.py would import subprocess, but not all release binaries have it.
Need a another genernal way to get the architecture type.

Originally posted by @amos402 in #887 (comment)

@amos402
Copy link
Member Author

amos402 commented Jun 21, 2019

cpython3.7.1
platform
subprocess
@benoithudson _posixsubprocess or select are not the essential components for python core

@benoithudson
Copy link
Contributor

I import platform for two pieces of info.

Py_GetPlatform() gives the OS type. We should probably use that.

I didn't see a C API for the architecture; do you know of one?

@amos402
Copy link
Member Author

amos402 commented Jun 22, 2019

Me too, it may also not provided by any c# runtimes.
If there is no general API to do that, we can only use a native API by per system......

@Cronan
Copy link
Contributor

Cronan commented Jun 26, 2019

Architecture does this - worth replicating?

def architecture(executable=sys.executable, bits='', linkage=''):

    """ Queries the given executable (defaults to the Python interpreter
        binary) for various architecture information.
        Returns a tuple (bits, linkage) which contains information about
        the bit architecture and the linkage format used for the
        executable. Both values are returned as strings.
        Values that cannot be determined are returned as given by the
        parameter presets. If bits is given as '', the sizeof(pointer)
        (or sizeof(long) on Python version < 1.5.2) is used as
        indicator for the supported pointer size.
        The function relies on the system's "file" command to do the
        actual work. This is available on most if not all Unix
        platforms. On some non-Unix platforms where the "file" command
        does not exist and the executable is set to the Python interpreter
        binary defaults from _default_architecture are used.
    """
    # Use the sizeof(pointer) as default number of bits if nothing
    # else is given as default.
    if not bits:
        import struct
        size = struct.calcsize('P')
        bits = str(size * 8) + 'bit'

    # Get data from the 'file' system command
    if executable:
        fileout = _syscmd_file(executable, '')
    else:
        fileout = ''

    if not fileout and \
       executable == sys.executable:
        # "file" command did not return anything; we'll try to provide
        # some sensible defaults then...
        if sys.platform in _default_architecture:
            b, l = _default_architecture[sys.platform]
            if b:
                bits = b
            if l:
                linkage = l
        return bits, linkage

    if 'executable' not in fileout and 'shared object' not in fileout:
        # Format not supported
        return bits, linkage

    # Bits
    if '32-bit' in fileout:
        bits = '32bit'
    elif 'N32' in fileout:
        # On Irix only
        bits = 'n32bit'
    elif '64-bit' in fileout:
        bits = '64bit'

    # Linkage
    if 'ELF' in fileout:
        linkage = 'ELF'
    elif 'PE' in fileout:
        # E.g. Windows uses this format
        if 'Windows' in fileout:
            linkage = 'WindowsPE'
        else:
            linkage = 'PE'
    elif 'COFF' in fileout:
        linkage = 'COFF'
    elif 'MS-DOS' in fileout:
        linkage = 'MSDOS'
    else:
        # XXX the A.OUT format also falls under this class...
        pass

    return bits, linkage

@filmor
Copy link
Member

filmor commented Jun 26, 2019

See #897, if we drop support for .NET<4.7.1 we can use the respective .NET standard functions for this, which means we wouldn't even need to have Python loaded already to do the right thing. This would also be beneficial to @lostmsu's idea to use GetProcAddress instead of P/Invoke to get the API functions.

@Jeff17Robbins
Copy link
Contributor

Jeff17Robbins commented Dec 13, 2019

For Python3.8, this issue seems to be a blocker for embedding in dotnetcore on Linux. The import _posixsubprocess that InitializePlatformData causes fails with this kind of error:

/usr/local/lib/python3.8/lib-dynload/_posixsubprocess.cpython-38-x86_64-linux-gnu.so: undefined symbol: PyTuple_Type

because the code in runtime.cs that loads the Python shared object library (which would prevent this error) requires InitializePlatformData to set OperatingSystem.

            InitializePlatformData();

            IntPtr dllLocal = IntPtr.Zero;
            var loader = LibraryLoader.Get(OperatingSystem);

The change in Python3.8 to start using the subprocess module was a breaking change for this pythonnet code.

@amos402
Copy link
Member Author

amos402 commented Dec 14, 2019

@Jeff17Robbins Runtime.InitializePlatformData can be remove in the future.
But your error is wired, why it tried to get PyTuple_Type from _posixsubprocess.so.

@Jeff17Robbins
Copy link
Contributor

Jeff17Robbins commented Dec 14, 2019

I'm assuming because this code in _posixsubprocess.c refers to that symbol in this code?

static PyObject *
subprocess_fork_exec(PyObject* self, PyObject *args)
{
    PyObject *gc_module = NULL;
    PyObject *executable_list, *py_fds_to_keep;
    PyObject *env_list, *preexec_fn;
    PyObject *process_args, *converted_args = NULL, *fast_args = NULL;
    PyObject *preexec_fn_args_tuple = NULL;
    int p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite;
    int errpipe_read, errpipe_write, close_fds, restore_signals;
    int call_setsid;
    PyObject *cwd_obj, *cwd_obj2;
    const char *cwd;
    pid_t pid;
    int need_to_reenable_gc = 0;
    char *const *exec_array, *const *argv = NULL, *const *envp = NULL;
    Py_ssize_t arg_num;
    int need_after_fork = 0;
    int saved_errno = 0;

    if (!PyArg_ParseTuple(
            args, "OOpO!OOiiiiiiiiiiO:fork_exec",
            &process_args, &executable_list,
            &close_fds, &PyTuple_Type, &py_fds_to_keep,
            &cwd_obj, &env_list,
            &p2cread, &p2cwrite, &c2pread, &c2pwrite,
            &errread, &errwrite, &errpipe_read, &errpipe_write,
            &restore_signals, &call_setsid, &preexec_fn))
        return NULL;

I'm further assuming that since [DllImport] does NOT use the RTLD_GLOBAL flag on its dlopen() call, that the Linux loader can't resolve PyTuple_Type from the already loaded python .so?

I'm also assuming that this lack of RTLD_GLOBAL is why pythonnet goes to all this trouble to load
python explicitly in the first place?

            InitializePlatformData();

            IntPtr dllLocal = IntPtr.Zero;
            var loader = LibraryLoader.Get(OperatingSystem);

            if (_PythonDll != "__Internal")
            {
                dllLocal = loader.Load(_PythonDll);
            }

But I don't know the history of the project, and don't know specifically why we need this loader.Load(_PythonDll) other than my assumption about the lack of RTLD_GLOBAL in the implementation of [DllImport].

I would appreciate any info anyone has as to the rationale for all this startup code, or a pointer to any work going on to simplify it, at least for the dotnet core 3.x case.

@amos402
Copy link
Member Author

amos402 commented Dec 14, 2019

The symbol should in python's ELF or .so, _posixsubprocess.so just import from python runtime, so the symbol should be find.
You can use ldd to check is all the dependencies exist. Or you can try add the python's parent directory into LD_LIBRARY_PATH and run again.

The loader code you specified can be remove, see #880

@Jeff17Robbins
Copy link
Contributor

Jeff17Robbins commented Dec 14, 2019

I apologize in advance for being a somewhat ignorant newbie wrt Linux. But the scenario I am testing, I am embedding C Python into a dotnet core 3 C# app using pythonnet.

pythonnet uses [DllImport] in runtime.cs to bootstrap loading libpython3.8.so so it can call, e.g., Py_InitializeEx.

nm -D libpython3.8.so does indeed show that the python runtime symbols are exported, e.g.

0000000000353b20 D PyTuple_Type

But since [DllImport] does NOT set the RTLD_GLOBAL flag when it loads libpython3.8.so, my understanding is that subsequently loaded shared objects, like the C extension that subprocess.py imports (_posixsubprocess.cpython-38-x86_64-linux-gnu.so), those shared object libraries do not "see" the symbols from libpython3.8.so.

When I run ldd -d on _posixsubprocess...so I don't see it showing its dependence on libpython. But of course a builtin Python C extension (ships with Python) presumably "knows" what it is doing, or at least has an expectation that the "parent" libpython3.8.so is loaded.

And I assume that this expectation includes being loaded with the aforementioned RTLD_GLOBAL flag?

Or why is pythonnet taking all this trouble to explicitly load libpython3.8.so in the first place?

[EDIT] @amos402 , why do you believe

"The loader code you specified can be remove, see #880

I think this will break the embedding scenario on Linux dotnet core. Some code has to load libpython.so with RTLD_GLOBAL, doesn't it?

root@97617afd7064:/usr/local/lib/python3.8/lib-dynload# ldd -d _posixsubprocess.cpython-38-x86_64-linux-gnu.so
        linux-vdso.so.1 (0x00007ffd4f9f4000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f9e02063000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f9e01ea2000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f9e02096000)
undefined symbol: PyTuple_Type  (./_posixsubprocess.cpython-38-x86_64-linux-gnu.so)
undefined symbol: PyExc_RuntimeError    (./_posixsubprocess.cpython-38-x86_64-linux-gnu.so)
undefined symbol: PyExc_ValueError      (./_posixsubprocess.cpython-38-x86_64-linux-gnu.so)
undefined symbol: PyExc_OSError (./_posixsubprocess.cpython-38-x86_64-linux-gnu.so)
undefined symbol: _Py_NoneStruct        (./_posixsubprocess.cpython-38-x86_64-linux-gnu.so)
undefined symbol: Py_hexdigits  (./_posixsubprocess.cpython-38-x86_64-linux-gnu.so)

@Jeff17Robbins
Copy link
Contributor

I've been reading more about this issue and found this bit of folklore

cffi link

On some Linux distributions, notably Debian, the .so files of CPython C extension modules may be compiled without saying that they depend on libpythonX.Y.so. This makes such Python systems unsuitable for embedding if the embedder uses dlopen(..., RTLD_LOCAL). You get an undefined symbol error. See issue #264. A workaround is to first call dlopen("libpythonX.Y.so", RTLD_LAZY|RTLD_GLOBAL), which will force libpythonX.Y.so to be loaded first.

This led me to an official-looking Debian statement

Debian Python does not link extensions to libpython (as is done in some operating systems).

I don't see how to support Pythonnet Embedding on Debian unless someone calls dlopen with the RTLD_GLOBAL flag...

@Jeff17Robbins
Copy link
Contributor

This issue is not restricted to Pythonnet, even regular C embedding of Python on certain Linux distributions face the same RTLD_GLOBAL issue:

https://bugs.python.org/issue21536#msg340817

libpython must always be loaded with RTLD_GLOBAL.

This was referenced Dec 15, 2019
@lostmsu lostmsu mentioned this issue Jul 2, 2020
4 tasks
amos402 added a commit to amos402/pythonnet that referenced this issue Jul 15, 2020
amos402 added a commit to amos402/pythonnet that referenced this issue Jul 15, 2020
@lostmsu
Copy link
Member

lostmsu commented Feb 21, 2021

InitializePlatformData is gone. To solve the problem with embedding DLL one must set Runtime.PythonDLL before making any other calls.

@lostmsu lostmsu closed this as completed Feb 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants