-
Notifications
You must be signed in to change notification settings - Fork 10.9k
Description
What version of gRPC and what language are you using?
grpcio
1.57.0
and 1.57.1
What operating system (Linux, Windows,...) and version?
various
What runtime / compiler are you using (e.g. python version or version of gcc)
python 3.X (various)
What did you do?
This issue represents reports from Dagster users dagster-io/dagster#18997 .
What did you expect to see?
When upgrading the grpcio
python package in the system processes that are grpc clients from 1.57.0
to 1.57.1
, memory utilization should not change drastically.
What did you see instead?
For some users, memory consumption in 1.57.1
grows continuously for the life of the process. Downgrading to 1.57.0
bring memory consumption to normal.
Anything else we should know about your project / environment?
A Dagster deployment involves a GRPC server process that loads users Dagster code artifacts and two system processes (a webserver and a daemon) that communicate with the GRPC server. These system processes regularly communicate with the GRPC server to fetch various piece of information. We use a threaded grpc server and the client system processes make blocking requests, not asyncio
.
A previous report #36117 was closed under the assumption that the issue was related to a resolved cpython asyncio
bug python/cpython#111246. Since we observe this in a non asyncio
set-up I don't believe that is an accurate assessment.
When viewing the changes between 1.57.0
and 1.57.1
v1.57.0...v1.57.1 the only meaningful change is #34557 a backport of #34549 that introduced a new heap allocation https://github.com/grpc/grpc/pull/34549/files#r1341637876 . It seems extremely likely that this change is related to the observed changes in memory utilization.
Well outside my area of expertise, but cross referencing https://github.com/abseil/abseil-cpp/blob/master/absl/strings/cord.h#L214-L252 and trying to make some guesses:
- a reference counting issue is preventing the releaser from running
- the releaser is running, but the heap allocations are causing fragmentation preventing memory from being freed back to the OS