-
-
Notifications
You must be signed in to change notification settings - Fork 32.1k
Python 3.14+: python: Objects/unicodeobject.c:10387: _PyUnicode_JoinArray: Assertion
res_data == PyUnicode_1BYTE_DATA(res) + kind * PyUnicode_GET_LENGTH(res)' failed.` in sqlglot
#134889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I can try to reduce it further, but I really need to work on $dayjob right now, so probably no earlier than the weekend. |
After quite a lot of tracing of the sqlglot code base, I believe this is a minimal reproducer of the root cause of this issue def broken():
variable = f"{1}"
variable = f"{variable}"
return variable ASAN OutputPython 3.15.0a0 (heads/main-dirty:d96343679fd, May 30 2025, 01:50:46) [Clang 15.0.0 (clang-1500.1.0.2.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> def broken():
... variable = f"{1}"
... variable = f"{variable}"
... return variable
...
>>> broken()
=================================================================
==3741==ERROR: AddressSanitizer: heap-use-after-free on address 0x00010a8489d0 at pc 0x000102b588dc bp 0x00016d6e8910 sp 0x00016d6e8908
READ of size 4 at 0x00010a8489d0 thread T0
#0 0x102b588d8 in _PyEval_EvalFrameDefault generated_cases.c.h:10576
#1 0x102b2d24c in PyEval_EvalCode ceval.c:866
#2 0x102b23968 in builtin_exec bltinmodule.c.h:568
#3 0x102b45ffc in _PyEval_EvalFrameDefault generated_cases.c.h:2383
#4 0x102b2d90c in _PyEval_Vector ceval.c:1975
#5 0x1027f27ec in _PyVectorcall_Call call.c:285
#6 0x102ce0924 in pymain_start_pyrepl main.c:310
#7 0x102cde040 in Py_RunMain main.c:772
#8 0x102cdf010 in pymain_main main.c:802
#9 0x102cdf53c in Py_BytesMain main.c:826
#10 0x1881820dc (<unknown module>)
0x00010a8489d0 is located 0 bytes inside of 42-byte region [0x00010a8489d0,0x00010a8489fa)
freed by thread T0 here:
#0 0x103e3f380 in wrap_free+0x98 (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x53380)
#1 0x1029ee280 in unicode_dealloc unicodeobject.c:1801
#2 0x1028ee0c0 in _Py_Dealloc object.c:3194
#3 0x102b3b098 in _PyEval_EvalFrameDefault generated_cases.c.h:11209
#4 0x102b2d24c in PyEval_EvalCode ceval.c:866
#5 0x102b23968 in builtin_exec bltinmodule.c.h:568
#6 0x102b45ffc in _PyEval_EvalFrameDefault generated_cases.c.h:2383
#7 0x102b2d90c in _PyEval_Vector ceval.c:1975
#8 0x1027f27ec in _PyVectorcall_Call call.c:285
#9 0x102ce0924 in pymain_start_pyrepl main.c:310
#10 0x102cde040 in Py_RunMain main.c:772
#11 0x102cdf010 in pymain_main main.c:802
#12 0x102cdf53c in Py_BytesMain main.c:826
#13 0x1881820dc (<unknown module>)
previously allocated by thread T0 here:
#0 0x103e3f244 in wrap_malloc+0x94 (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x53244)
#1 0x10299423c in PyUnicode_New unicodeobject.c:1417
#2 0x10287c178 in long_to_decimal_string_internal longobject.c:2157
#3 0x102885dfc in long_to_decimal_string longobject.c:2247
#4 0x1028dfe54 in PyObject_Str object.c:822
#5 0x102b2fe0c in _PyEval_EvalFrameDefault generated_cases.c.h:5664
#6 0x102b2d24c in PyEval_EvalCode ceval.c:866
#7 0x102b23968 in builtin_exec bltinmodule.c.h:568
#8 0x102b45ffc in _PyEval_EvalFrameDefault generated_cases.c.h:2383
#9 0x102b2d90c in _PyEval_Vector ceval.c:1975
#10 0x1027f27ec in _PyVectorcall_Call call.c:285
#11 0x102ce0924 in pymain_start_pyrepl main.c:310
#12 0x102cde040 in Py_RunMain main.c:772
#13 0x102cdf010 in pymain_main main.c:802
#14 0x102cdf53c in Py_BytesMain main.c:826
#15 0x1881820dc (<unknown module>)
SUMMARY: AddressSanitizer: heap-use-after-free generated_cases.c.h:10576 in _PyEval_EvalFrameDefault
Shadow bytes around the buggy address:
0x00010a848700: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x00010a848780: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x00010a848800: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x00010a848880: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x00010a848900: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x00010a848980: fa fa fa fa fa fa fa fa fa fa[fd]fd fd fd fd fd
0x00010a848a00: fa fa 00 00 00 00 00 03 fa fa fd fd fd fd fd fa
0x00010a848a80: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x00010a848b00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x00010a848b80: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x00010a848c00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==3741==ABORTING
[1] 3741 abort ./python.exe As far as I can tell, this reproducer does not trigger the assertion as that assert was being triggered as a side effect of the underlying use-after-free |
I suspect something is being borrowed where it shouldn't be. |
Same idea, I'm trying to trace it. |
Looks like this block of code is the root cause: Lines 4854 to 4868 in 053c285
When the The reason sqlglot triggers this use-after-free is this block of code here : def fetch_sql(self, expression: exp.Fetch) -> str:
direction = expression.args.get("direction")
direction = f" {direction}" if direction else ""
count = self.sql(expression, "count")
count = f" {count}" if count else ""
limit_options = self.sql(expression, "limit_options")
limit_options = f"{limit_options}" if limit_options else " ROWS ONLY"
return f"{self.seg('FETCH')}{direction}{count}{limit_options}"
|
The issue is that |
…FAST` (python#134958) We were incorrectly handling a few opcodes that leave their operands on the stack. Treat all of these conservatively; assume that they always leave operands on the stack. (cherry picked from commit 6b77af2)
Uh oh!
There was an error while loading. Please reload this page.
Crash report
What happened?
The pure Python code in
sqlglot
package manages to trigger an assertion in CPython:Unfortunately, due to limited this is as far as I've been able to reduce it:
I can reproduce with 4.14.0b2 and 4109a9c, built with
--with-assertions
(but for some reason, doesn't happen if I build--with-pydebug
), against sqlglot 26.23.0, i.e.:(that's just my guesswork of what to print)
CPython versions tested on:
3.14, CPython main branch
Operating systems tested on:
Linux
Output from running 'python -VV' on the command line:
Python 3.15.0a0 (heads/main:51910dc5620, May 29 2025, 16:12:29) [GCC 14.3.0]
Linked PRs
LOAD_FAST
#134958LOAD_FAST
(#134958) #135187The text was updated successfully, but these errors were encountered: