Skip to content

HTTP: Support Apache-based servers with Negotiate #5286

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 36 commits into from
Jan 24, 2020
Merged

Conversation

ethomson
Copy link
Member

Most non-Microsoft web servers handle NTLM and Negotiate authentication differently than Microsoft web servers.

In theory, NTLM and Negotiate authenticate a connection. Since it's a challenge/response authentication mechanism, it's inefficient to authenticate. To avoid repeated back-and-forth with authentication, the entirety of a keep-alive socket is deemed authentication once authentication succeeds.

This requires the server understand the authentication for the lifetime of the keep-alive socket, which some servers (eg, Apache) simply do not do. Instead, they require re-authentication for each request. This is costly but sadly unavoidable.

libgit2 was built with a premise that we would do a GET to the git endpoint, which we would authenticate. Subsequent POST requests would remain authenticated, either because we were using HTTP Basic, and would re-send credentials each time, or we were using NTLM or Negotiate authentication and on a kept-alive socket that would remain authenticated through its lifetime.

Sadly, this assumption is poor. Apache does not abide by these rules. So we need to teach our HTTP stack to authenticate POST connections.

Regrettably, this leads to other difficulties - when we POST, we need to deliver a payload in our request. If we get a 401 back, we need to send that payload again. This is - at best - inefficient since we're sending the payload twice, but - at worst - is impossible since we may be streaming data to the server and we cannot rewind the stream.

One attempt is to use expect/continue. Instead of delivering the payload, we send "Expect: 100-continue", and only deliver the headers. We can then do the authentication dance, and when the server responds with a "100 Continue" response, we can deliver the payload. Although expect/continue is incredibly useful, it's not necessarily widely supported on ancient servers, and it's not supported by winhttp. As a result, although I'm adding expect/continue support here, I don't think it's the proper solution. It's behind an option.

I'll add the "probe packet" support from #2529, as this seems like the most correct way of handling this.

@ethomson
Copy link
Member Author

I meant to open this as a draft. It’s incomplete, I just wanted to see the CI builds for this approach.

@ethomson ethomson mentioned this pull request Nov 1, 2019
2 tasks
@ethomson ethomson force-pushed the ethomson/gssapi branch 8 times, most recently from 5fb1541 to da07958 Compare November 27, 2019 12:28
@pks-t
Copy link
Member

pks-t commented Nov 28, 2019

I meant to open this as a draft. It’s incomplete, I just wanted to see the CI builds for this approach.

So no need to do reviews yet?

@ethomson ethomson changed the title HTTP: Support Apache-based servers with Negotiate WIP: HTTP: Support Apache-based servers with Negotiate Dec 5, 2019
@ethomson ethomson force-pushed the ethomson/gssapi branch 8 times, most recently from 07bdb22 to fe24296 Compare January 1, 2020 21:44
@ethomson
Copy link
Member Author

ethomson commented Jan 1, 2020

OK, this is ready for review. I started this PR by adding support for both expect/continue and sending probe packets. And it was bad. The business logic of dealing with the HTTP flow was rather tangled up in things like the callbacks used by the HTTP parser library. Some of the code was shared between the standard request/response handling, proxy handling, and then the new expect/continue and probe packet handling, but some of it was copypasta. It was rather brittle.

And so after seeing this report in #5236:

Unfortunately however I get segfaults in libssl when cloning over https 😩 SSH and Negotiate over HTTP work fine. We had this code working with 28.2, but I have a feeling that maybe something has changed in this branch and we need to change how we're calling the api. Here is the backtrace, if you have any ideas on what to check first that would be really helpful!

I realized that if I had just broken something, it would be hard to diagnose, and probably harder to fix.

So I decided to tease things apart -- I've added a git_http_client, which actually handles sending a request and receiving a response, and made the existing http transport use that. I think that the result is much, much simpler and easier to reason about.

@jonathanturcotte could you test this? I hope that you're not seeing segmentation faults anymore. 🤞

@ethomson
Copy link
Member Author

ethomson commented Jan 1, 2020

Also, apologies for the number of commits. A lot of the changes to http.c were refactorings to better understand how things were working in preparation for creating the httpclient. I hope that this isn't too bearish of a review. 😓

@ethomson ethomson force-pushed the ethomson/gssapi branch 4 times, most recently from 72589f6 to 08ed83b Compare January 2, 2020 05:21
Copy link
Member

@pks-t pks-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whew, that's been quite a read. I like this quite a lot, tearing apart HTTP and Git related protocols.

One thing I noted is that this series seems to build the same set of features twice, first for the legacy transports and then the same stuff for the new HTTP client, removing the first try again. I can relate if this is asking too much, but would it be possible to kind of discard the first set of commits that get discarded later on anyway, starting with simple bits like tracing, then creating the new httpclient and afterwards converting callers? Would make it much easier to review this series in future iterations.

@ethomson
Copy link
Member Author

ethomson commented Jan 8, 2020

Whew, that's been quite a read. I like this quite a lot, tearing apart HTTP and Git related protocols.

Thanks. I'm really happy with it overall. Due to the sort of weird way the transports work (the "read" / "write" lifecycle), it took a lot of aborted attempts to get it right.

One thing I noted is that this series seems to build the same set of features twice, first for the legacy transports and then the same stuff for the new HTTP client, removing the first try again. I can relate if this is asking too much, but would it be possible to kind of discard the first set of commits that get discarded later on anyway, starting with simple bits like tracing, then creating the new httpclient and afterwards converting callers? Would make it much easier to review this series in future iterations.

Yeah, I can drop the early http.c refactoring. I left those in since that's literally the process that I followed - but they're certainly pointless in the end. I'll respond to your review and fix this up over the weekend.

Move the redirect handling into `git_net_url` for consistency.
Provide a mechanism to add a path and query string to an existing url
so that we can easily append `/info/refs?...` type url segments to a url
given to us by a user.
Allow users to consume a buffer by the number of bytes, not just to an
ending pointer.
Introduce a new http client implementation that can GET and POST to
remote URLs.

Consumers can use `git_http_client_init` to create a new client,
`git_http_client_send_request` to send a request to the remote server
and `git_http_client_read_response` to read the response.

The http client implementation will perform the I/O with the remote
server (http or https) but does not understand the git smart transfer
protocol.  This allows us to split the concerns of the http subtransport
from the actual http implementation.
Teach httpclient how to support chunking when POSTing request bodies.
The CLAR_TRACE_LEVEL environment variable was supported when building
with GIT_TRACE.  Now we always build with GIT_TRACE, but that variable
is not provided to tests.  Simply support clar tracing always.
When sending a new request, ensure that we got the entirety of the
response body.  Our caller may have decided that they were done reading.
If we were not at the end of the message, this means that we need to
tear down the connection and cannot do keep-alive.

However, if the caller read all of the message, but we still have a
final end-of-response chunk signifier (ie, "0\r\n\r\n") on the socket,
then we should consider that the response was successfully copmleted.

If we're asked to send a new request, try to read from the socket, just
to clear out that end-of-chunk message, marking ourselves as
disconnected on any errors.
Introduce a function to format the path and query string for a URL,
suitable for creating an HTTP request.
Store the last-seen credential challenges (eg, all the
'WWW-Authenticate' headers in a response message).  Given some
credentials, find the best (first) challenge whose mechanism supports
these credentials.  (eg, 'Basic' supports username/password credentials,
'Negotiate' supports default credentials).

Set up an authentication context for this mechanism and these
credentials.  Continue exchanging challenge/responses until we're
authenticated.
Detect responses that are sent with Transfer-Encoding: chunked, and
record that information so that we can consume the entire message body.
Fully support HTTP proxies, in particular CONNECT proxies, that allow us
to speak TLS through a proxy.
Allow users to opt-in to expect/continue handling when sending a POST
and we're authenticated with a "connection-based" authentication
mechanism like NTLM or Negotiate.

If the response is a 100, return to the caller (to allow them to post
their body).  If the response is *not* a 100, buffer the response for
the caller.

HTTP expect/continue is generally safe, but some legacy servers
have not implemented it correctly.  Require it to be opt-in.
Untangle the notion of the http transport from the actual http
implementation.  The http transport now uses the httpclient.
When we're authenticating with a connection-based authentication scheme
(NTLM, Negotiate), we need to make sure that we're still connected
between the initial GET where we did the authentication and the POST
that we're about to send.  Our keep-alive session may have not kept
alive, but more likely, some servers do not authenticate the entire
keep-alive connection and may have "forgotten" that we were
authenticated, namely Apache and nginx.

Send a "probe" packet, that is an HTTP POST request to the upload-pack
or receive-pack endpoint, that consists of an empty git pkt ("0000").
If we're authenticated, we'll get a 200 back.  If we're not, we'll get a
401 back, and then we'll resend that probe packet with the first step of
our authentication (asking to start authentication with the given
scheme).  We expect _yet another_ 401 back, with the authentication
challenge.

Finally, we will send our authentication response with the actual POST
data.  This will allow us to authenticate without draining the POST data
in the initial request that gets us a 401.
When tracing is disabled, don't let `git_trace__level` return a void,
since that can't be compared against.
Download poxygit, a debugging git server, and clone from it using NTLM,
both IIS-style (with connection affinity) and Apache-style ("broken",
requiring constant reauthentication).
Disambiguate between general network problems and HTTP problems in error
codes.
@ethomson
Copy link
Member Author

Thanks @pks-t for the review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants