Skip to content

Conversation

tnoff
Copy link

@tnoff tnoff commented Jul 22, 2025

Description

If a non latin1 character is passed as a URL, you can see the following error

Traceback (most recent call last):
  File "/home/tnorth/.envs/website/lib/python3.12/site-packages/django/core/handlers/exception.py", line 55, in inner
    response = get_response(request)
               ^^^^^^^^^^^^^^^^^^^^^
  File "/home/tnorth/.envs/website/lib/python3.12/site-packages/opentelemetry/instrumentation/django/middleware/otel_middleware.py", line 91, in __call__
    self.process_request(request)
  File "/home/tnorth/.envs/website/lib/python3.12/site-packages/opentelemetry/instrumentation/django/middleware/otel_middleware.py", line 217, in process_request
    attributes = collect_request_attributes(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tnorth/.envs/website/lib/python3.12/site-packages/opentelemetry/instrumentation/wsgi/__init__.py", line 374, in collect_request_attributes
    wsgiref_util.request_uri(environ)
  File "/usr/lib/python3.12/wsgiref/util.py", line 61, in request_uri
    path_info = quote(environ.get('PATH_INFO',''), safe='/;=,', encoding='latin1')
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/urllib/parse.py", line 917, in quote
    string = string.encode(encoding, errors)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 1-5: ordinal not in range(256)

The latin1 bit seems to be hardcoded in the underlying library

Fixes # (issue)

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?

Added unit test

Does This PR Require a Core Repo Change?

  • Yes. - Link to PR:
  • No.

Checklist:

See contributing.md for styleguide, changelog guidelines, and more.

  • Followed the style guidelines of this project
  • Changelogs have been updated
  • Unit tests have been added
  • Documentation has been updated

Copy link

linux-foundation-easycla bot commented Jul 22, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

@xrmx
Copy link
Contributor

xrmx commented Jul 22, 2025

Puzzled by this, if it is wsgiref raising non-latin1 chars if django allows them maybe we should use something else instead of wsgiref in the first place? Have you looked at what django is doing? PEP-3333 also seems to support what we are doing https://peps.python.org/pep-3333/#unicode-issues

@tnoff
Copy link
Author

tnoff commented Jul 23, 2025

@xrmx If I'm reading this all correctly it looks like django has a lot of string handling under the hood to handle this sort of thing

https://github.com/django/django/blob/77d455ae73b177a32102f0b248828b5d63c0aa24/django/http/request.py#L215
https://github.com/django/django/blob/main/django/utils/encoding.py#L191

@xrmx
Copy link
Contributor

xrmx commented Jul 23, 2025

@tnoff
Copy link
Author

tnoff commented Jul 23, 2025

@xrmx yeah I updated yesterday with some bits pretty close to the repercent_broken_unicode function https://github.com/django/django/blob/main/django/utils/encoding.py#L213

@tnoff
Copy link
Author

tnoff commented Aug 4, 2025

@xrmx sorry for ping but did you get a chance to check out the revisions I made?

@xrmx xrmx moved this to Ready for review in @xrmx's Python PR digest Aug 4, 2025
@xrmx
Copy link
Contributor

xrmx commented Aug 4, 2025

@xrmx sorry for ping but did you get a chance to check out the revisions I made?

No, sorry

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Ready for review
Development

Successfully merging this pull request may close these issues.

2 participants