Skip to content

[HttpClient][Contracts] introduce component and related contracts #30413

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 7, 2019

Conversation

nicolas-grekas
Copy link
Member

@nicolas-grekas nicolas-grekas commented Mar 1, 2019

Q A
Branch? master
Bug fix? no
New feature? yes
BC breaks? no
Deprecations? no
Tests pass? yes
Fixed tickets #28628
License MIT
Doc PR -

This PR introduces new HttpClient contracts and
component. It makes no compromises between DX, performance, and design.
Its surface should be very simple to use, while still flexible enough
to cover most advanced use cases thanks to streaming+laziness.

Common existing HTTP clients for PHP rely on PSR-7, which is complex
and orthogonal to the way Symfony is designed. More reasons we need
this in core are the package principles: if we want to be able to keep our
BC+deprecation promises, we have to build on more stable and more
abstract dependencies than Symfony itself. And we need an HTTP client
for e.g. Symfony Mailer or #27738.

The existing state-of-the-art puts a quite high bar in terms of features we must
support if we want any adoption. The code in this PR aims at implementing an
even better HTTP client for PHP than existing ones, with more (useful) features
and a better architecture. What a pitch :)

Two full implementations are provided:

  • NativeHttpClient is based on the native "http" stream wrapper.
    It's the most portable one but relies on a blocking fopen().
  • CurlHttpClient relies on the curl extension. It supports full
    concurrency and HTTP/2, including server push.

Here are some examples that work with both clients.

For simple cases, all the methods on responses are synchronous:

$client = new NativeHttpClient();

$response = $client->get('https://google.com');

$statusCode = $response->getStatusCode();
$headers = $response->getHeaders();
$content = $response->getContent();

By default, clients follow redirects. On 3xx, 4xx or 5xx, the getHeaders() and getContent() methods throw an exception, unless their $throw argument is set to false.
This is part of the "failsafe" design of the component. Another example of this
failsafe property is that broken dechunk or gzip streams always trigger an exception,
unlike most other HTTP clients who can silently ignore the situations.

An array of options allows adjusting the behavior when sending requests.
They are documented in HttpClientInterface.

When several responses are 1) first requested in batch, 2) then accessed
via any of their public methods, requests are done concurrently while
waiting for one.

For more advanced use cases, when streaming is needed:

Streaming the request body is possible via the "body" request option.
Streaming the response content is done via client's stream() method:

$client = new CurlHttpClient();

$response = $client->request('GET', 'http://...');

$output = fopen('output.file', 'w');

foreach ($client->stream($response) as $chunk) {
    fwrite($output, $chunk->getContent());
}

The stream() method also works with multiple responses:

$client = new CurlHttpClient();
$pool = [];

for ($i = 0; $i < 379; ++$i) {
    $uri = "https://http2.akamai.com/demo/tile-$i.png";
    $pool[] = $client->get($uri);
}

$chunks = $client->stream($pool);

foreach ($chunks as $response => $chunk) {
    // $chunk is a ChunkInterface object
    if ($chunk->isLast()) {
        $content = $response->getContent();
    }
}

The stream() method accepts a second $timeout argument: responses that
are inactive for longer than the timeout will emit an empty chunk to signal
it. Providing 0 as timeout allows monitoring responses in a non-blocking way.

Implemented:

  • flexible contracts for HTTP clients
  • fopen() + curl-based clients with close feature parity
  • gzip compression enabled when possible
  • streaming multiple responses concurrently
  • base_uri option for scoped clients
  • progress callback with detailed info and able to cancel the request
  • more flexible options for precise behavior control
  • flexible timeout management allowing e.g. server sent events
  • public key pinning
  • auto proxy configuration via env vars
  • transparent IDN support
  • HttpClient::create() factory
  • extensive error handling, e.g. on broken dechunk/gzip streams
  • time stats, primary_ip and other info inspired from curl_getinfo()
  • transparent HTTP/2-push support with authority validation
  • Psr18Client for integration with libs relying on PSR-18
  • free from memory leaks by avoiding circular references
  • fixed handling of redirects when using the fopen-based client
  • DNS cache pre-population with resolve option

Help wanted (can be done after merge):

  • FrameworkBundle integration: autowireable alias + semantic configuration for default options
  • add TraceableHttpClient and integrate with the profiler
  • logger integration
  • add a mock client

More ideas:

  • record/replay like CsaGuzzleBundle
  • use raw sockets instead of the HTTP stream wrapper
  • cookie_jar option
  • HTTP/HSTS cache
  • using the symfony CLI binary to test ssl-related options, HTTP/2-push, etc.
  • add "auto" mode to the "buffer" option, based on the content-type? or array of content-types to buffer
  • etc.

@gnugat
Copy link
Contributor

gnugat commented Mar 1, 2019

It seems like a shame for HttpClientInterface not to extend PSR-18 ClientInterface, and go for a Psr18Client adapter instead, but that seems consistent with the rest of the codebase so that sounds fine.

A pity HttpClientInterface defines send(string $method, string $url, iterable $options): ResponseInterface rather than send(RequestInterface $request): ResponseInterface. That design choice seems strange when compared to the HttpKernelInterface approach.

Finally it seems odd to define a new ResponseInterface rather than use the HttpFoundation one, but that's arguable (HttpFoundation might be more Server oriented, defining a new Response allows for an optimal client usage).

Sounds like this component was tailored for Symfony's internal use only (to be used by the Mail component), and is being made available for anyone who would want to use it, but given the choice between this component and Guzzle I think Guzzle would still be the go to solution.

Copy link
Contributor

@Taluu Taluu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the options, why not use the OptionsResolver component ?

], static function ($v) { return null !== $v; }),
'socket' => [
'bindto' => $options['bindto'],
'tcp_nodelay' => true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this option should be set to false, true is very useful on small request, but on medium to large stream (like http) we want to avoid congestion by default.

Copy link
Member Author

@nicolas-grekas nicolas-grekas Mar 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both fopen and curl already buffer as much as they can before writing to the network. This means the TCP Nagle algorithm is not needed and only degrades performance.

@nicolas-grekas nicolas-grekas force-pushed the contracts-http-stream branch from 5c49831 to 5b07802 Compare March 1, 2019 11:14
@nicolas-grekas
Copy link
Member Author

@fancyweb thanks, comments addressed.

why not use the OptionsResolver component

@Taluu because that would add a dependency and we prefer keeping the component standalone.

Sounds like this component was tailored for Symfony's internal use only

@gnugat not the case at all. The component is standalone and relies on decoupled interfaces. Your choice to use Guzzle. I'm going to replace it personnaly - in "Symfony apps" or "non-Symfony" ones.

@nicolas-grekas nicolas-grekas force-pushed the contracts-http-stream branch 4 times, most recently from d9622bf to 49fa0b4 Compare March 1, 2019 11:50
@nicolas-grekas
Copy link
Member Author

nicolas-grekas commented Mar 6, 2019

@joelwurtz, unfortunately, there is no way to implement the steps you describe: e.g. curl has no concept like that and it would be extremely hard to recreate it. fopen also is missing a lot of steps but I reimplemented the major missing ones in fact.

Copy link
Member

@weaverryan weaverryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After playing with this for more than a day now, it's a 👍 from me! The experience is wonderful and now that we've solved a few issues (like removing the getStatusCode() mutation), the "lazy" responses also yield an excellent experience when you just want to make one, sync request.

A few notes:

A) Responses are lazy. It will be important to document/teach that. In practice, it simply means that if you want to wrap your code in a try-catch, you should wrap all of your logic that deals with the response, not just the $client->response() call.

B) Validation on the options is awesome and the options are reasonable easy to find because the defaults are stored on a constant that is documented above every request() method

C) If want to do multiple requests in parallel, that works really well. And by leveraging the user_data option, you can basically "label" each request so that you can handle the responses appropriately.

D) Streaming a single large file also works well - I streamed a large file directly to a local file without any isuses or memory bump.

E) About decoration - @joelwurtz did an awesome job challenging this. For basic cases like logging, the on_progress option can be leveraged to log the status code from the $info that's passed to that callback. More complex decorators are possible - they may not be super easy, due to the async of the responses, but they're possible. And if you're building a decorator for your app and you don't care about async, decorators are super easy.

So, I think this is ready!

@arnaud-lb
Copy link
Contributor

Since I came here to say what I disliked (and what I liked as well), I also need to say it when those things have changed or if I changed my mind.

The component actually does a great job at hiding an asynchronous internal and exposing
a synchronous-friendly API.

With the removal of ChunkInterface::__toString() and the getStatusCode() mutation, all the easy traps I could think of have disappeared.

The fact that other requests are running when waiting for a single request (even when not explicitly multiplexing with HttpClientInterface::stream()) is something that I initially overlooked (read too fast, or this was not in the initial PR description, I don't know). With this, having lazy responses completely make sense. I could think of way to avoid them, but not without making the API more complex, or requiring more steps.

@nicolas-grekas nicolas-grekas force-pushed the contracts-http-stream branch from 4c6539e to f64f2ab Compare March 7, 2019 16:15
@nicolas-grekas nicolas-grekas force-pushed the contracts-http-stream branch from f64f2ab to fc83120 Compare March 7, 2019 16:16
@fabpot
Copy link
Member

fabpot commented Mar 7, 2019

Thank you @nicolas-grekas.

@fabpot fabpot merged commit fc83120 into symfony:master Mar 7, 2019
fabpot added a commit that referenced this pull request Mar 7, 2019
…d contracts (nicolas-grekas)

This PR was squashed before being merged into the 4.3-dev branch (closes #30413).

Discussion
----------

[HttpClient][Contracts] introduce component and related contracts

| Q             | A
| ------------- | ---
| Branch?       | master
| Bug fix?      | no
| New feature?  | yes
| BC breaks?    | no
| Deprecations? | no
| Tests pass?   | yes
| Fixed tickets | #28628
| License       | MIT
| Doc PR        | -

This PR introduces new `HttpClient` contracts and
component. It makes no compromises between DX, performance, and design.
Its surface should be very simple to use, while still flexible enough
to cover most advanced use cases thanks to streaming+laziness.

Common existing HTTP clients for PHP rely on PSR-7, which is complex
and orthogonal to the way Symfony is designed. More reasons we need
this in core are the [package principles](https://en.wikipedia.org/wiki/Package_principles): if we want to be able to keep our
BC+deprecation promises, we have to build on more stable and more
abstract dependencies than Symfony itself. And we need an HTTP client
for e.g. Symfony Mailer or #27738.

The existing state-of-the-art puts a quite high bar in terms of features we must
support if we want any adoption. The code in this PR aims at implementing an
even better HTTP client for PHP than existing ones, with more (useful) features
and a better architecture. What a pitch :)

Two full implementations are provided:
 - `NativeHttpClient` is based on the native "http" stream wrapper.
   It's the most portable one but relies on a blocking `fopen()`.
 - `CurlHttpClient` relies on the curl extension. It supports full
   concurrency and HTTP/2, including server push.

Here are some examples that work with both clients.

For simple cases, all the methods on responses are synchronous:

```php
$client = new NativeHttpClient();

$response = $client->get('https://google.com');

$statusCode = $response->getStatusCode();
$headers = $response->getHeaders();
$content = $response->getContent();
```

By default, clients follow redirects. On `3xx`, `4xx` or `5xx`, the `getHeaders()` and `getContent()` methods throw an exception, unless their `$throw` argument is set to `false`.
This is part of the "failsafe" design of the component. Another example of this
failsafe property is that broken dechunk or gzip streams always trigger an exception,
unlike most other HTTP clients who can silently ignore the situations.

An array of options allows adjusting the behavior when sending requests.
They are documented in `HttpClientInterface`.

When several responses are 1) first requested in batch, 2) then accessed
via any of their public methods, requests are done concurrently while
waiting for one.

For more advanced use cases, when streaming is needed:

Streaming the request body is possible via the "body" request option.
Streaming the response content is done via client's `stream()` method:

```php
$client = new CurlHttpClient();

$response = $client->request('GET', 'http://...');

$output = fopen('output.file', 'w');

foreach ($client->stream($response) as $chunk) {
    fwrite($output, $chunk->getContent());
}
```

The `stream()` method also works with multiple responses:

```php
$client = new CurlHttpClient();
$pool = [];

for ($i = 0; $i < 379; ++$i) {
    $uri = "https://http2.akamai.com/demo/tile-$i.png";
    $pool[] = $client->get($uri);
}

$chunks = $client->stream($pool);

foreach ($chunks as $response => $chunk) {
    // $chunk is a ChunkInterface object
    if ($chunk->isLast()) {
        $content = $response->getContent();
    }
}
```

The `stream()` method accepts a second `$timeout` argument: responses that
are *inactive* for longer than the timeout will emit an empty chunk to signal
it. Providing `0` as timeout allows monitoring responses in a non-blocking way.

Implemented:
 - flexible contracts for HTTP clients
 - `fopen()` + `curl`-based clients with close feature parity
 - gzip compression enabled when possible
 - streaming multiple responses concurrently
 - `base_uri` option for scoped clients
 - progress callback with detailed info and able to cancel the request
 - more flexible options for precise behavior control
 - flexible timeout management allowing e.g. server sent events
 - public key pinning
 - auto proxy configuration via env vars
 - transparent IDN support
 - `HttpClient::create()` factory
 - extensive error handling, e.g. on broken dechunk/gzip streams
 - time stats, primary_ip and other info inspired from `curl_getinfo()`
 - transparent HTTP/2-push support with authority validation
 - `Psr18Client` for integration with libs relying on PSR-18
 - free from memory leaks by avoiding circular references
 - fixed handling of redirects when using the `fopen`-based client
 - DNS cache pre-population with `resolve` option

Help wanted (can be done after merge):
 - `FrameworkBundle` integration: autowireable alias + semantic configuration for default options
 - add `TraceableHttpClient` and integrate with the profiler
 - logger integration
 - add a mock client

More ideas:
 - record/replay like CsaGuzzleBundle
 - use raw sockets instead of the HTTP stream wrapper
 - `cookie_jar` option
 - HTTP/HSTS cache
 - using the symfony CLI binary to test ssl-related options, HTTP/2-push, etc.
 - add "auto" mode to the "buffer" option, based on the content-type? or array of content-types to buffer
 - *etc.*

Commits
-------

fc83120 [HttpClient] Add Psr18Client - aka a PSR-18 adapter
8610668 [HttpClient] introduce the component
d2d63a2 [Contracts] introduce HttpClient contracts
@nicolas-grekas nicolas-grekas deleted the contracts-http-stream branch March 7, 2019 19:28
@nicolas-grekas
Copy link
Member Author

Thank you @fabpot and thank you to everyone involved. The discussion has been sometimes passionate, but I'm overall highly impressed by the technical level and quality of the reviews. This component is already the work of the community and not mine anymore. Thank you all!

@nicolas-grekas nicolas-grekas mentioned this pull request Mar 9, 2019
18 tasks
javiereguiluz added a commit to symfony/symfony-docs that referenced this pull request Apr 29, 2019
This PR was squashed before being merged into the master branch (closes #11071).

Discussion
----------

Documented the new HttpClient component

Docs for symfony/symfony#30413.

Commits
-------

0390bad Documented the new HttpClient component
@nicolas-grekas nicolas-grekas modified the milestones: next, 4.3 Apr 30, 2019
@fabpot fabpot mentioned this pull request May 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.