-
Notifications
You must be signed in to change notification settings - Fork 2.5k
HTTP: Support Apache-based servers with Negotiate #5286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I meant to open this as a draft. It’s incomplete, I just wanted to see the CI builds for this approach. |
17aae90
to
c14c44d
Compare
5fb1541
to
da07958
Compare
So no need to do reviews yet? |
07bdb22
to
fe24296
Compare
OK, this is ready for review. I started this PR by adding support for both expect/continue and sending probe packets. And it was bad. The business logic of dealing with the HTTP flow was rather tangled up in things like the callbacks used by the HTTP parser library. Some of the code was shared between the standard request/response handling, proxy handling, and then the new expect/continue and probe packet handling, but some of it was copypasta. It was rather brittle. And so after seeing this report in #5236:
I realized that if I had just broken something, it would be hard to diagnose, and probably harder to fix. So I decided to tease things apart -- I've added a @jonathanturcotte could you test this? I hope that you're not seeing segmentation faults anymore. 🤞 |
Also, apologies for the number of commits. A lot of the changes to |
72589f6
to
08ed83b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whew, that's been quite a read. I like this quite a lot, tearing apart HTTP and Git related protocols.
One thing I noted is that this series seems to build the same set of features twice, first for the legacy transports and then the same stuff for the new HTTP client, removing the first try again. I can relate if this is asking too much, but would it be possible to kind of discard the first set of commits that get discarded later on anyway, starting with simple bits like tracing, then creating the new httpclient and afterwards converting callers? Would make it much easier to review this series in future iterations.
Thanks. I'm really happy with it overall. Due to the sort of weird way the transports work (the "read" / "write" lifecycle), it took a lot of aborted attempts to get it right.
Yeah, I can drop the early |
Move the redirect handling into `git_net_url` for consistency.
Provide a mechanism to add a path and query string to an existing url so that we can easily append `/info/refs?...` type url segments to a url given to us by a user.
Allow users to consume a buffer by the number of bytes, not just to an ending pointer.
Introduce a new http client implementation that can GET and POST to remote URLs. Consumers can use `git_http_client_init` to create a new client, `git_http_client_send_request` to send a request to the remote server and `git_http_client_read_response` to read the response. The http client implementation will perform the I/O with the remote server (http or https) but does not understand the git smart transfer protocol. This allows us to split the concerns of the http subtransport from the actual http implementation.
Teach httpclient how to support chunking when POSTing request bodies.
The CLAR_TRACE_LEVEL environment variable was supported when building with GIT_TRACE. Now we always build with GIT_TRACE, but that variable is not provided to tests. Simply support clar tracing always.
When sending a new request, ensure that we got the entirety of the response body. Our caller may have decided that they were done reading. If we were not at the end of the message, this means that we need to tear down the connection and cannot do keep-alive. However, if the caller read all of the message, but we still have a final end-of-response chunk signifier (ie, "0\r\n\r\n") on the socket, then we should consider that the response was successfully copmleted. If we're asked to send a new request, try to read from the socket, just to clear out that end-of-chunk message, marking ourselves as disconnected on any errors.
Introduce a function to format the path and query string for a URL, suitable for creating an HTTP request.
Store the last-seen credential challenges (eg, all the 'WWW-Authenticate' headers in a response message). Given some credentials, find the best (first) challenge whose mechanism supports these credentials. (eg, 'Basic' supports username/password credentials, 'Negotiate' supports default credentials). Set up an authentication context for this mechanism and these credentials. Continue exchanging challenge/responses until we're authenticated.
Detect responses that are sent with Transfer-Encoding: chunked, and record that information so that we can consume the entire message body.
Fully support HTTP proxies, in particular CONNECT proxies, that allow us to speak TLS through a proxy.
Allow users to opt-in to expect/continue handling when sending a POST and we're authenticated with a "connection-based" authentication mechanism like NTLM or Negotiate. If the response is a 100, return to the caller (to allow them to post their body). If the response is *not* a 100, buffer the response for the caller. HTTP expect/continue is generally safe, but some legacy servers have not implemented it correctly. Require it to be opt-in.
Untangle the notion of the http transport from the actual http implementation. The http transport now uses the httpclient.
When we're authenticating with a connection-based authentication scheme (NTLM, Negotiate), we need to make sure that we're still connected between the initial GET where we did the authentication and the POST that we're about to send. Our keep-alive session may have not kept alive, but more likely, some servers do not authenticate the entire keep-alive connection and may have "forgotten" that we were authenticated, namely Apache and nginx. Send a "probe" packet, that is an HTTP POST request to the upload-pack or receive-pack endpoint, that consists of an empty git pkt ("0000"). If we're authenticated, we'll get a 200 back. If we're not, we'll get a 401 back, and then we'll resend that probe packet with the first step of our authentication (asking to start authentication with the given scheme). We expect _yet another_ 401 back, with the authentication challenge. Finally, we will send our authentication response with the actual POST data. This will allow us to authenticate without draining the POST data in the initial request that gets us a 401.
When tracing is disabled, don't let `git_trace__level` return a void, since that can't be compared against.
Download poxygit, a debugging git server, and clone from it using NTLM, both IIS-style (with connection affinity) and Apache-style ("broken", requiring constant reauthentication).
Disambiguate between general network problems and HTTP problems in error codes.
77316e9
to
e9cef7c
Compare
Thanks @pks-t for the review! |
Most non-Microsoft web servers handle NTLM and Negotiate authentication differently than Microsoft web servers.
In theory, NTLM and Negotiate authenticate a connection. Since it's a challenge/response authentication mechanism, it's inefficient to authenticate. To avoid repeated back-and-forth with authentication, the entirety of a keep-alive socket is deemed authentication once authentication succeeds.
This requires the server understand the authentication for the lifetime of the keep-alive socket, which some servers (eg, Apache) simply do not do. Instead, they require re-authentication for each request. This is costly but sadly unavoidable.
libgit2 was built with a premise that we would do a
GET
to the git endpoint, which we would authenticate. SubsequentPOST
requests would remain authenticated, either because we were using HTTP Basic, and would re-send credentials each time, or we were using NTLM or Negotiate authentication and on a kept-alive socket that would remain authenticated through its lifetime.Sadly, this assumption is poor. Apache does not abide by these rules. So we need to teach our HTTP stack to authenticate POST connections.
Regrettably, this leads to other difficulties - when we POST, we need to deliver a payload in our request. If we get a 401 back, we need to send that payload again. This is - at best - inefficient since we're sending the payload twice, but - at worst - is impossible since we may be streaming data to the server and we cannot rewind the stream.
One attempt is to use expect/continue. Instead of delivering the payload, we send "Expect: 100-continue", and only deliver the headers. We can then do the authentication dance, and when the server responds with a "100 Continue" response, we can deliver the payload. Although expect/continue is incredibly useful, it's not necessarily widely supported on ancient servers, and it's not supported by winhttp. As a result, although I'm adding expect/continue support here, I don't think it's the proper solution. It's behind an option.
I'll add the "probe packet" support from #2529, as this seems like the most correct way of handling this.