Skip to content

gh-109765: Detect TLS handshake attempt in HTTP server #130041

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

donBarbos
Copy link
Contributor

@donBarbos donBarbos commented Feb 12, 2025

I set self.requestline = "[TLS handshake bytes]" because BaseHTTPRequestHandler print self.requestline as a log, but it is bytes and to keep requestline readable I use "[TLS handshake bytes]" string value

@9001
Copy link

9001 commented Feb 12, 2025

just to offer an alternative take on this,

in a python webserver where I support both http and https on the same port, i went with the following approach for identifying (presumably) TLS traffic:

HTTP_VERB = re.compile(br"[A-Z]{3}[A-Z ]")

def handle_connection(client_socket):
    # obtain the first 4 bytes of the requestline somehow, then...
    is_https = not HTTP_VERB.match(first_4_bytes)

this has the advantage that it also detects SSLv3 connections, which at the time was still relevant.

in my case I'm using client_socket.recv(4, socket.MSG_PEEK) to check if the connection is TLS or not, but in this case it's indeed sufficient to check the raw_requestline.

@donBarbos
Copy link
Contributor Author

@9001 thanks for your suggestion. It seems it really covers more cases and is more universal but i would not like to increase importtime without serious reasons especially since i would like to get rid of unnecessary imports or somehow reduce this time.

But we can add a check method without the re module, like:

def is_http_verb(s: bytes) -> bool:
    return s[:3].isalpha() and s[:3].isupper() and (s[3].isupper() or s[3] == b' '[0])

And still I would prefer to wait for an answer from one of the core members. what they will say

@hramrach
Copy link

The HTTP protocol is extensible and the list of verbs is not meant to be exhaustive. The http.server documentation even shows an example of custom SPAM verb. However, there is some limit on what the verb name can contain, and the TLS handshake is outside of that. There is some validation step that is performed on the request line, and this check should be probably integrated in that.

The TLS handshake is unlikely to change any time soon due to protocol ossification but sampling one byte from the request line and assuming it's TLS is a bit underwhelming for a heuristic. Even if it matches the first few bytes of a TLS handshake perfectly it can still be random garbage. Write something along the lines 'maybe HTTPS'.

Handling random garbage in the request line somewhat gracefully should work for non-TLS garbage as well.

And some users might want to diagnose what garbage they are actually getting, and will want at least some of it printed, even if it happens to look like TLS.

From the linked issue it looks like the newline (and probably other control characters) is not escaped in the log causing misformatting of the log files. This should be probably addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants