-
-
Notifications
You must be signed in to change notification settings - Fork 32.4k
Description
Bug report
Bug description:
Hello, I'm writing a thesis on free threading python and thus I'm testing the 3.13.0b1 with --disable-gil.
I installed it with pyenv using this command
env PYTHON_CONFIGURE_OPTS='--disable-gil' pyenv install 3.13.0b1
I didn't specify --enable-optimizations and --with-lto because with those the build would fail.
Now, I'm writing a benchmark to compare the free threading python with past versions of normal python and even with the 3.9.10 nogil python.
Here's the problem. The benchmark is a simple matrix-matrix multiplication script that splits the matrix into rows and distributes the rows to a specified number of threads. This is the complete code:
import threading
import time
import random
def multiply_row(A, B, row_index, result):
# Compute the row result
num_columns_B = len(B[0])
num_columns_A = len(A[0])
for j in range(num_columns_B):
sum = 0
for k in range(num_columns_A):
sum += A[row_index][k] * B[k][j]
result[row_index][j] = sum
def parallel_matrix_multiplication(a, b, result, row_indices):
for row_index in row_indices:
multiply_row(a, b, row_index, result)
def multi_threaded_matrix_multiplication(a, b, num_threads):
num_rows = len(a)
result = [[0] * len(b[0]) for _ in range(num_rows)]
row_chunk = num_rows // num_threads
threads = []
for i in range(num_threads):
start_row = i * row_chunk
end_row = (i + 1) * row_chunk if i != num_threads - 1 else num_rows
thread = threading.Thread(target=parallel_matrix_multiplication, args=(a, b, result, range(start_row, end_row)))
threads.append(thread)
thread.start()
for thread in threads:
thread.join()
return result
# Helper function to create a random matrix
def create_random_matrix(rows, cols):
return [[random.random() for _ in range(cols)] for _ in range(rows)]
def main():
size = 500 # Define matrix size
a = create_random_matrix(size, size)
b = create_random_matrix(size, size)
num_threads = 8 # Define number of threads
start = time.perf_counter()
result = multi_threaded_matrix_multiplication(a, b, num_threads)
print("Matrix multiplication completed.", time.perf_counter() - start, "seconds.")
if __name__ == "__main__":
main()
When I ran this code with these versions of python (3.9.10, nogil-3.9.10, 3.10.13, 3.11.8, 3.12.2) the maximum running time is ~13 seconds with normal 3.9.10, the minimum is ~5 seconds with nogil 3.9.10.
When I run it with 3.13.0b1, the time skyrockets to ~48 seconds.
I tried using cProfile to profile the code but it freezes and never outputs anything (with 3.13, with other versions it works), instead the cpu goes to 100% usage, which makes me think it doesn't use multiple cores, since nogil 3.9 goes to >600% usage, and never stops unless I kill the process.
The basic fibonacci test works like a charm, so I know the --disable-gil build succeded.
All of this is done on a Macbook Air M1 with 16 GB of RAM and 8 cpu cores.
CPython versions tested on:
3.9, 3.10, 3.11, 3.12, 3.13
Operating systems tested on:
macOS