Skip to content

getproxies() in python3.10/urllib/request.py crashes on MacOS #104578

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
marcelogaio-groovinads opened this issue May 17, 2023 · 17 comments
Closed
Labels
docs Documentation in the Doc dir OS-mac stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@marcelogaio-groovinads
Copy link

marcelogaio-groovinads commented May 17, 2023

Bug report

There is a bug regarding the getproxies() function in the python3.10/urllib/request.py module. The issue occurs specifically when running the function on MacOS with an M1 or M2 CPU, where it results in a crash.

To address this problem, I propose a simple solution involving a try-except block to handle the potential errors.

Currently, the code looks as follows:

def getproxies():
    return getproxies_environment() or getproxies_macosx_sysconf()

Here's the proposed solution:

def getproxies():
    try:
        return getproxies_environment()
    except Exception as e:
        try:
            return getproxies_macosx_sysconf()
        except Exception as e:
            return {}

By implementing this change, the getproxies() function will first attempt to retrieve the proxies using getproxies_environment(). If an exception is raised, indicating a failure on MacOS with M1/M2 CPU, it will then fall back to getproxies_macosx_sysconf(). In case of any additional exceptions, an empty dictionary {} will be returned.

Environment

  • Python 3.10
  • MacOS Ventura 13.3.1
@marcelogaio-groovinads marcelogaio-groovinads added the type-bug An unexpected behavior, bug, or error label May 17, 2023
@JelleZijlstra
Copy link
Member

What is the error you are seeing?

We should avoid solutions that catch and ignore any Exception, as they can easily hide bugs. Instead, we should catch only those exceptions that are specifically problematic.

@hugovk
Copy link
Member

hugovk commented May 17, 2023

No exceptions for me on macOS Ventura 13.3.1 with an M2 chip:

Python 3.11.3 (v3.11.3:f3909b8bc8, Apr  4 2023, 20:12:10) [Clang 13.0.0 (clang-1300.0.29.30)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import urllib.request
>>> urllib.request.getproxies()
{}
>>> urllib.request.getproxies_environment()
{}
>>> urllib.request.getproxies_macosx_sysconf()
{}
>>>
Python 3.10.11 (main, Apr  7 2023, 07:24:53) [Clang 14.0.0 (clang-1400.0.29.202)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import urllib.request
>>> urllib.request.getproxies()
{}
>>> urllib.request.getproxies_environment()
{}
>>> urllib.request.getproxies_macosx_sysconf()
{}
>>>

@marcelogaio-groovinads
Copy link
Author

marcelogaio-groovinads commented May 17, 2023

I am using Airflow with oauth.
The exception catching is not implemented in the best of ways

Calling "return getproxies_environment() or getproxies_macosx_sysconf()" causes the python thread to crash (crazy!) some times, other times it just hangs there.

Happens both on Macbook airs with M1 and M2

@arhadthedev arhadthedev added stdlib Python modules in the Lib dir OS-mac labels May 17, 2023
@ned-deily
Copy link
Member

I believe what you are seeing is the issue described in #75999. Basically, the problem is that the Python _scproxy helper module to get proxy configurations on macOS uses macOS system frameworks that are documented as not being safe when called in a forked process without an exec. There are suggestions there for how to work around the problem, like either call getproxies in the main thread so that the result is cached or disable proxy lookup by an env variable.

@ned-deily
Copy link
Member

That said, there doesn't seem to be any warning about this in the urllib documentation. It would probably be a good idea to add something there.

@JelleZijlstra
Copy link
Member

Also, is it possible to have this condition throw a Python exception instead of aborting the process? That would make it a bit easier to understand the problem for users.

@marcelogaio-groovinads
Copy link
Author

@ned-deily Yes!! I am looking again into it and it seems it is related to that! It is quite difficult to follow because calls are all over the place, but it definitely seems like it.
I'm sorry to have jumped so fast into a conclusion before.

@marcelogaio-groovinads
Copy link
Author

@JelleZijlstra I'm guessing that would help A LOT, definitely

@ned-deily
Copy link
Member

Also, is it possible to have this condition throw a Python exception instead of aborting the process?

AFAIK, no. I believe the abort occurs in a lower-level macOS framework and we can't safely recover from it.

@ned-deily
Copy link
Member

AFAIK, no. I believe the abort occurs in a lower-level macOS framework and we can't safely recover from it.

@ronaldoussoren ?

@marcelogaio-groovinads
Copy link
Author

@ned-deily I found there IS an option to avoid the proxy check by setting an environment variable called "NO_PROXY" (also works in lowercase) set to any value...
Thanks to all for the help. You might want to close this issue!

@ned-deily
Copy link
Member

Yes, using the env varuabke should avoid the problem. But I think we should at least add something to the urllib documentation about this macOS issue. It keeps biting people.

@ned-deily ned-deily added the docs Documentation in the Doc dir label May 17, 2023
@ronaldoussoren
Copy link
Contributor

Also, is it possible to have this condition throw a Python exception instead of aborting the process?

AFAIK, no. I believe the abort occurs in a lower-level macOS framework and we can't safely recover from it.

That's correct, this is not something we can recover from.

@JelleZijlstra
Copy link
Member

Could we detect it beforehand, though? I imagine the macOS code does something like if this_process_has_forked(): abort(). So perhaps in our code, before calling into the low-level code we can do if this_process_has_forked(): raise RuntimeError.

@ronaldoussoren
Copy link
Contributor

ronaldoussoren commented May 19, 2023

3.12 will at least warn about using os.fork in a multithreaded program, e.g.:

import os
import urllib.request

urllib.request.getproxies()
if os.fork() == 0: raise SystemExit("child")

This prints DeprecationWarning: This process (pid=7541) is multi-threaded, use of fork() may lead to deadlocks in the child. in 3.12.7a7 (but happens not to crash on my machine).

Part of the problem with these crashes is that they don't happen consistently for all users, and can happen or not depending on code paths. In particular, the crash likely won't happen if the getproxies() is only called after os.fork() and not before.

Something we could do is document the issue:

  • Add a warning to urllib.request.getproxies about the issue, with the documented for users that don't care about using system proxies
  • Document problems with os.fork when using higher level system APIs on macOS, with a link to the warning in urllib.request (similarly to how there's a link to a warning in ssl).

@ronaldoussoren
Copy link
Contributor

I added a number of warnings about this platform limitation in #105912, amongst others in the documentation for urllib.request.

That is all we can do until Apple fixes their platform :-(.

@DenisOgr
Copy link

@ned-deily I found there IS an option to avoid the proxy check by setting an environment variable called "NO_PROXY" (also works in lowercase) set to any value... Thanks to all for the help. You might want to close this issue!

This is works on MacOS 13.2.1, python3.10 (using Apache Airflow and S3 operators).
Thanks a lot, I spent several hours for foxing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation in the Doc dir OS-mac stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

7 participants