Skip to content

Fix issues with Proxy Authentication after httpclient refactor #5852

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 16, 2021

Conversation

implausible
Copy link
Contributor

@implausible implausible commented Apr 30, 2021

Issue 1
When skipping the body, we used to wait for the on_message_complete callback to have been triggered, which set parse_finished. After the refactor with httpclient, we stopped relying on this value, and thus we no longer consume the entire body when skipping it in the client. Here's where we used to consume the body

while (!bytes_read && !t->parse_finished) {

This shows up as an issue primarily when working with the proxy connection workflow, where we need to discard the entire body (completely drain the socket of the response) before we start the retry with credentials.

Issue 2
We used to mutate proxy_opts.type to GIT_PROXY_NONE after successfully connecting the tunnel. After the httpclient refactor, we continue to apply proxy-authorization headers to all requests even after we've already built a tunnel. This is fixed by checking the status of the proxy tunnel before applying proxy credentials. I think this roughly translates to the same behavior as before.

This is where we used to mutate the proxy_opts.type before

t->proxy_opts.type = GIT_PROXY_NONE;
And this is where we used to rely on that mutation to stop attaching proxy credentials
if (t->proxy_opts.type != GIT_PROXY_NONE &&
apply_credentials(buf, &t->proxy, AUTH_HEADER_PROXY) < 0)
return -1;

@implausible implausible force-pushed the httpclient/skip-entire-body branch 2 times, most recently from 68ed80b to 1e0dfbb Compare April 30, 2021 22:29
@implausible
Copy link
Contributor Author

Sorry about the force pushes, there was a slight debate over whether it should be !error && client->state != DONE, but we're certain it should be error >= 0 && client->state != DONE, because before the loop cared about bytes_read from the socket, and now client_read_and_parse is returning the bytes_parsed. These are fundamentally different (we can parse 0 bytes, but not have read 0).

@implausible
Copy link
Contributor Author

ehhh... We found another unrelated proxy issue with the httpclient refactor. I'm going to update this PR with that fix as well.

@implausible implausible changed the title Ensure we've consumed entire body in git_http_client_skip_body Fix issues with Proxy Authentication after httpclient refactor May 8, 2021
@implausible
Copy link
Contributor Author

I've updated the original description with the second issue. This seems to alleviate the issue we're seeing.

@@ -734,7 +734,7 @@ static int generate_request(
git_buf_printf(buf, "Expect: 100-continue\r\n");

if ((error = apply_server_credentials(buf, client, request)) < 0 ||
(error = apply_proxy_credentials(buf, client, request)) < 0)
(!client->proxy_connected && (error = apply_proxy_credentials(buf, client, request)) < 0))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So thinking through this. You mention:

we continue to apply proxy-authorization headers to all requests even after we've already built a tunnel

which describes the CONNECT case. From what you're saying, I agree that we're definitely doing the wrong thing here: we would be sending Proxy-Authentication through the tunnel to the git server, which is clearly wrong.

But this auth step is for the non-CONNECT proxy case - when you're sending a GET http://foo.com/ through the proxy itself. If you're authenticating with a mechanism like HTTP Basic to the proxy, then you would need to supply the credentials on every proxy request.

The CONNECT proxy case would have supplied the credentials when it sent the CONNECT request itself, in generate_connect_request. So this is explicitly just for non-CONNECT proxies.

(I realize that it's remarkably uncommon to not speak over a CONNECT proxy, but here we are.)

I think that proxy_connected is not the right test here for that case. proxy_connected is set by proxy_connect which is only called in http_client_connect when:

use_proxy = client->proxy.url.host &&
            !strcmp(client->server.url.scheme, "https");

So I think that proxy_connected will always be false when we're talking to a non-CONNECT proxy.

So my reading of this is that we're no longer passing proxy authentication to non-CONNECT proxies.

(But I'm surprised that we don't test this in our proxy integration tests. 😢 I know that speaking to non-https git servers is uncommon, but it's not a thing that is never done, and I think that we should support it - and support it through a proxy, so that means that we should test it. So I want to think through this a little bit more before I claim that I'm super confident of this analysis.)

I think that we might want to pull out the use_proxy test in http_client_connect into its own function, eg:

static bool use_connect_proxy(git_http_client *client)
{
    return client->proxy.url.host && !strcmp(client->server.url.scheme, "https");
}

And make this line:

if ((error = apply_server_credentials(buf, client, request)) < 0 ||
    (!use_connect_proxy(client) && (error = apply_proxy_credentials(buf, client, request)) < 0))

But - again - I'm not 100% on my analysis. Does this sound right to you?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think that proxy_connected will always be false when we're talking to a non-CONNECT proxy.

So my reading of this is that we're no longer passing proxy authentication to non-CONNECT proxies.

I believe if proxy_connected is always false, then we always pass the Proxy-Authentication headers through, because we perform !client->proxy_connected aka stop sending the Proxy-Authentication headers if we're already connected.

I am not positive with your analysis, but I will go back to our test bed to confirm. I am not positive, because prior to the httpclient refactor, we always attached Proxy-Authentication headers unless we had successfully connected to the proxy. I believe that is what I've updated httpclient to do as well, since proxy_connected is not set until we have established a connection to the proxy.

My intuition with your suggestion is that we would actually regress again, because use_connect_proxy would always be set for an httpclient that has a connect proxy, therefore we would continue sending the Proxy-Authentication headers through after connect has occurred. However, I don't understand the area fully, so maybe the use_connect_proxy will return false on the direct-line to the git server and that's where I am confused with your analysis.

I'll get back to you today.

Copy link
Contributor Author

@implausible implausible May 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I'll be damned. Yea, that totally works in testing. In fact, it works better than what I had originally wrote in that there is less traffic to the git server. This change you suggested removes one round trip... Basically:

In my PR as written:

"GET http://github.com/libgit2/TestGitRepository/info/refs?service=git-upload-pack HTTP/1.1\r\nUser-Agent: git/2.0 (libgit2 1.1.0)\r\nHost: github.com\r\nAccept: */*\r\n\r\n"
"GET http://github.com/libgit2/TestGitRepository/info/refs?service=git-upload-pack HTTP/1.1\r\nUser-Agent: git/2.0 (libgit2 1.1.0)\r\nHost: github.com\r\nAccept: */*\r\n\r\n"
"CONNECT github.com:443 HTTP/1.1\r\nUser-Agent: git/2.0 (libgit2 1.1.0)\r\nHost: {omitted}\r\n\r\n"
"CONNECT github.com:443 HTTP/1.1\r\nUser-Agent: git/2.0 (libgit2 1.1.0)\r\nHost: {omitted}\r\nProxy-Authorization: Negotiate {omitted}\r\n\r\n"
"GET /libgit2/TestGitRepository/info/refs?service=git-upload-pack HTTP/1.1\r\nUser-Agent: git/2.0 (libgit2 1.1.0)\r\nHost: github.com\r\nAccept: */*\r\n\r\n"
"POST /libgit2/TestGitRepository/git-upload-pack HTTP/1.1\r\nUser-Agent: git/2.0 (libgit2 1.1.0)\r\nHost: github.com\r\nAccept: application/x-git-upload-pack-result\r\nContent-Type: application/x-git-upload-pack-request\r\nContent-Length: 429\r\n\r\n"

After your suggestion

"GET http://github.com/libgit2/TestGitRepository/info/refs?service=git-upload-pack HTTP/1.1\r\nUser-Agent: git/2.0 (libgit2 1.1.0)\r\nHost: github.com\r\nAccept: */*\r\n\r\n"
"CONNECT github.com:443 HTTP/1.1\r\nUser-Agent: git/2.0 (libgit2 1.1.0)\r\nHost: {omitted}\r\n\r\n"
"CONNECT github.com:443 HTTP/1.1\r\nUser-Agent: git/2.0 (libgit2 1.1.0)\r\nHost: {omitted}\r\nProxy-Authorization: Negotiate {omitted}\r\n\r\n"
"GET /libgit2/TestGitRepository/info/refs?service=git-upload-pack HTTP/1.1\r\nUser-Agent: git/2.0 (libgit2 1.1.0)\r\nHost: github.com\r\nAccept: */*\r\n\r\n"
"POST /libgit2/TestGitRepository/git-upload-pack HTTP/1.1\r\nUser-Agent: git/2.0 (libgit2 1.1.0)\r\nHost: github.com\r\nAccept: application/x-git-upload-pack-result\r\nContent-Type: application/x-git-upload-pack-request\r\nContent-Length: 429\r\n\r\n"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to try and get an intuition for that now :).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intuition with your suggestion is that we would actually regress again, because use_connect_proxy would always be set for an httpclient that has a connect proxy

Yes I would expect use_connect_proxy to always be true when set for an httpclient that had a connect proxy. My thinking is that we should always set proxy credentials in this method when not using a connect proxy and never set proxy credentials in this method when we are.

Copy link
Contributor Author

@implausible implausible May 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Thanks for explaining that. I understand and agree with your take on this. We leverage our proxy credentials in a connect proxy in generate_connect_request, and thus don't need to ever add them in this method when we are a connect proxy. This makes sense. (looks like I'm the one who missed a ! 😄).

I still think I standby this approach I took (though I can see how it is inferior), because proxy_connected will never be set for non CONNECT proxies, therefore we will always send the Proxy-Authentication when in generate_request. I believe that it works for the same reason that the original flow worked (there is external mutable state that signals whether we have leveraged CONNECT or not).

Are you certain that the approach I took will have a regression for non-CONNECT proxies? (asking, because we're shipping soon and debating on whether we need to take your suggestions before or if they can be taken later 😄)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the ping, I edited that last one and want to make sure that you get the notification @ethomson.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you certain that the approach I took will have a regression for non-CONNECT proxies?

🤔 No, I'm not certain. I'm armchair reviewing from my phone. 🙃

I appreciate you explaining this, I think that you're right and that your approach should work. I had trouble reasoning about proxy_connected.

I'd like to spin up some tests to play with this more but I wouldn't have too many hesitations about shipping hours as if if I were you. Worst case scenario: http (not https) auth breaks to git servers through a proxy. I'm actually not even sure that we support auth over http at the layer above this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had trouble reasoning about proxy_connected.

Mutable state was easy to reason about? 😂

I am going to take your solution, because it ultimately relies not on whether a connection was ever established, but on the fact that if one was ever established, that check must also hold. I prefer the non mutable approach because it's much easier to reason about.

Thanks for the prompt reply, that helped us make a decision.

@implausible implausible force-pushed the httpclient/skip-entire-body branch from a2cb816 to 3473a08 Compare May 12, 2021 18:48
@ethomson
Copy link
Member

Thanks for this!

@ethomson ethomson merged commit b5dcdad into libgit2:main May 16, 2021
@implausible implausible deleted the httpclient/skip-entire-body branch May 17, 2021 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants