Skip to content

asyncio.start_unix_server maybe shouldn’t default to cleanup_socket=True when sock parameter is passed #133354

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Hawk777 opened this issue May 3, 2025 · 19 comments
Labels
stdlib Python modules in the Lib dir topic-asyncio type-bug An unexpected behavior, bug, or error

Comments

@Hawk777
Copy link
Contributor

Hawk777 commented May 3, 2025

Bug report

Bug description:

When asyncio.start_unix_server is called and the sock parameter is passed (rather than path), a server is wrapped around an existing socket. In this case, the fact that cleanup_socket defaults to True is very weird and potentially broken: it means that asyncio will try to delete a socket that someone else created (either other code in the same process, or, in the case of e.g. systemd socket passing, a socket created by a different program).

This is problematic for a couple of reasons when using systemd socket passing:

  • If the server succeeds at deleting the socket, it breaks socket activation. Socket activation means systemd creates the socket, and when the first connection arrives, it starts the server. The way it’s intended to work is that, if the server terminates, it leaves the socket in place, and systemd will start the server again if another connection comes in, allowing virtually seamless service restarts. If the server deletes the listening socket, this breaks.
  • If systemd passes the listening socket to the server, the server may not even have filesystem permission to see the socket—after all, only clients should need that. Then the os.stat call in create_unix_server blows up.

So this is definitely a nontrivial backwards compatibility break from 3.12 (which didn’t have cleanup logic at all and didn’t have the parameter) to 3.13, despite not being mentioned in the release notes. It also seems like a bit of a footgun in general, not to mention the fact that in implementation start_unix_server takes arbitrary kwargs and forwards them to create_unix_server, but the documentation instead mentions each parameter explicitly and doesn’t mention cleanup_socket at all.

CPython versions tested on:

3.13

Operating systems tested on:

Linux

@Hawk777 Hawk777 added the type-bug An unexpected behavior, bug, or error label May 3, 2025
@github-project-automation github-project-automation bot moved this to Todo in asyncio May 3, 2025
@picnixz picnixz added the stdlib Python modules in the Lib dir label May 4, 2025
@dmitya26
Copy link

dmitya26 commented May 4, 2025

Hey, I can take this issue.

Before creating a PR though, I'd want to confirm whether this was intended or not, because it seems like something that was intentional.

@Hawk777
Copy link
Contributor Author

Hawk777 commented May 4, 2025

Doing cleanup by default when path is passed is a change in behaviour too, but it seems like a good idea IMO (in that case the server is responsible for creating the socket so it’s logical it should also be responsible for deleting it). Only when sock is passed does it seem like it probably shouldn’t be done by default.

@Hawk777
Copy link
Contributor Author

Hawk777 commented May 4, 2025

Agreed that getting an opinion, ideally from whoever made the change initially, would be good.

@dmitya26
Copy link

dmitya26 commented May 4, 2025

Haha yea I was searching the github PRs tab, grepping, listing via gh-cli, and I genuinely couldn't find it. I was able to narrow it down to before python 3.13 and after python 3.12.

Is there a standardized way to find who initially made a change?

@Hawk777
Copy link
Contributor Author

Hawk777 commented May 4, 2025

git log -p, git log -G, and git blame can all help. In this case git log -p -G cleanup_socket quickly revealed that the parameter and the code behind it were added in 74b868f, which mentions gh-111246 and gh-111483.

@Hawk777
Copy link
Contributor Author

Hawk777 commented May 4, 2025

The discussion there mentions the use of the sock parameter, but only in the sense of “it’s harder to get the name if someone uses that, but hey we can use getsockname”; I don’t see any discussion of the idea that maybe it shouldn’t be removed at all in that case.

@Hawk777
Copy link
Contributor Author

Hawk777 commented May 4, 2025

@CendioOssman perhaps you have some thoughts on this, as the author of the original change?

@Hawk777
Copy link
Contributor Author

Hawk777 commented May 4, 2025

(Side note, if you want to get the inode number for the listening socket, why are you doing getsockname plus stat rather than fstat, or, equivalently, stat on the descriptor rather than the name? That would fix a race if some other process replaces the socket after you bind but before you stat.)

@CendioOssman
Copy link
Contributor

The idea was that since this creates a server object, it implies a great deal of ownership. E.g. accepting and creating new sockets.

Secondly, removing all traces of the socket when it is closed is how most sockets behave, except for Unix sockets.

Hence, having the cleanup be enabled by default seemed like the most useful and least surprising option.

Admittedly, systemd socket activation was not considered.

@CendioOssman
Copy link
Contributor

(Side note, if you want to get the inode number for the listening socket, why are you doing getsockname plus stat rather than fstat, or, equivalently, stat on the descriptor rather than the name? That would fix a race if some other process replaces the socket after you bind but before you stat.)

I seem to recall that the socket's inode number was from the magical socket file system, and not from the file on disk. So this is a race, yes, but no alternative was found.

@Hawk777
Copy link
Contributor Author

Hawk777 commented May 5, 2025

Ah, good point. I didn’t actually test it; it’s possible that fstat on a socket doesn’t give you the inode number you want.

@Hawk777
Copy link
Contributor Author

Hawk777 commented May 5, 2025

Philosophically, I think the issue regarding ownership is that any socket (UNIX-domain or otherwise) ought to be cleaned up once the socket no longer exists. The problem is, with any socket (UNIX-domain or otherwise), the socket no longer exists once the last file descriptor referring to it is closed, not once this particular file descriptor referring to it is closed. With non-UNIX-in-filesystem sockets (i.e. non-UNIX sockets or UNIX sockets outside the filesystem), that’s trivial: the kernel cleans up the socket once the last descriptor is closed. With UNIX-in-filesystem sockets, if you get the path and create the descriptor yourself, you can be reasonably certain nobody else has a copy of the descriptor (at least, code would have to work quite hard to get it out intentionally and share it with someone); on the other hand, if you get the descriptor via sock, there’s no such guarantee (even if you take ownership of, and responsibility for closing, that descriptor, you can’t assume that nobody else has another descript0r>

@CendioOssman
Copy link
Contributor

Indeed, the library code cannot know for sure and has to be told by the application. So the question was really what the most useful default should be.

I made the assumption that the vast majority of cases (but not all) would have the asyncio server as the sole owner of the socket. Hence, that cleanup should be on by default. That would reduce the risk of developers overlooking the setting. And it keeps application code cleaner, as they don't need to explicitly specify this.

I think it would be a big mistake if the default was changed when path is used, but that doesn't seem to be in question. For sock, I think the usage might be more mixed and the default could be questioned. I'm not convinced the majority of those users have shared socket, though.

Given that the usage of sock is unclear, I would still vote for keeping the current default for the simple reason that it keeps the API understandable. The default of cleanup_socket would be clear and not conditional on other details.

@dmitya26
Copy link

dmitya26 commented May 7, 2025

What would be the issue with making cleanup_socket a required argument as opposed to having a default?

Requiring the argument would arguably make the function more clear IMO.

@Hawk777
Copy link
Contributor Author

Hawk777 commented May 7, 2025

I made the assumption that the vast majority of cases (but not all) would have the asyncio server as the sole owner of the socket. Hence, that cleanup should be on by default. That would reduce the risk of developers overlooking the setting. And it keeps application code cleaner, as they don't need to explicitly specify this.

If this were new, I’d agree with you more strongly. I’m mostly bothered because it’s a compatibility break from Python 3.12. And not even one that’s nice to work around, given that I have to write this ugliness:

if sys.hexversion >= 0x030D00F0:
    async def start_unix_server_from_socket(sock: socket.socket) -> asyncio.Server:
        return await asyncio.start_unix_server(handle_connection, sock=sock, cleanup_socket=False)
else:
    async def start_unix_server_from_socket(sock: socket.socket) -> asyncio.Server:
        return await asyncio.start_unix_server(handle_connection, sock=sock)

What would be the issue with making cleanup_socket a required argument as opposed to having a default?

That would arguably be nice, but would break backwards compatibility with 3.12 even more.

@dmitya26
Copy link

dmitya26 commented May 7, 2025

I understand where Mr. Ossman is coming from and I think his defaults were sane. I am a little uneasy about the compatibility break and the problem with socket activation, but a tradeoff needed to be made and IMHO it's a fair trade off, given you can always undo it with an additional argument, albiet with additional code.

Python, at least to my knowledge, is very philosophically implicit, and a lot of the features of the language will abstract lower level logic or do things implicitly (the type system, pointers, large amounts of metadata that you sometimes can't control) that can sometimes make the language easier or more difficult to work in, but will usually provide a very fast and seamless experience developing in the language with the tradeoff of taking away control. That's why it's so widely used in prototyping and research. So I feel like the discussion earlier about giving the programmer less flexibility by default is largely irrelivent. The main issue at this point is the compatibility break, but I've already spoken on that.

I definitely do think we need to open a separate issue to deal with the problem with the documentation. I can do that if you guys want.

@Hawk777
Copy link
Contributor Author

Hawk777 commented May 7, 2025

I was planning on making a PR for the docs myself (assuming you mean the missing parameter on start_unix_server), once someone decided what was going to happen code-wise, if that PR didn’t already include the docs change. But, feel free to do it yourself if you want to!

@dmitya26
Copy link

dmitya26 commented May 7, 2025

Yea, I can take it.

@dmitya26
Copy link

dmitya26 commented May 7, 2025

What is your opinion on factoring out cleanup_socket logic into a separate function?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir topic-asyncio type-bug An unexpected behavior, bug, or error
Projects
Status: Todo
Development

No branches or pull requests

5 participants