Skip to content

[BrowserKit] Add support for HttpClient #30602

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 23, 2019

Conversation

fabpot
Copy link
Member

@fabpot fabpot commented Mar 19, 2019

Q A
Branch? master
Bug fix? no
New feature? yes
BC breaks? no
Deprecations? no
Tests pass? yes
Fixed tickets part of #30502
License MIT
Doc PR not yet

When combining the power of the new HttpClient component with the BrowserKit and Mime components, we can makes something really powerful... a full/better/awesome replacement for https://github.com/FriendsOfPHP/Goutte.

So, this PR is about integrating the HttpClient component with BrowserKit to give users a high-level interface to ease usages in the most common use cases.

Scraping websites can be done like this:

use Symfony\Component\BrowserKit\HttpBrowser;
use Symfony\Component\HttpClient\HttpClient;

$client = HttpClient::create();
$browser = new HttpBrowser($client);

$browser->request('GET', 'https://example.com/');
$browser->clickLink('Log In');
$browser->submitForm('Sign In', ['username' => 'me', 'password' => 'pass']);
$browser->clickLink('Subscriptions')->filter('table tr:nth-child(2) td:nth-child(2)')->each(function ($node) {
    echo trim($node->text())."\n";
});

And voilà! Nice, isn't?

Want to add HTTP cache? Sure:

use Symfony\Component\HttpKernel\HttpCache\Store;

$client = HttpClient::create();
$store = new Store(sys_get_temp_dir().'/http-cache-store');

$browser = new HttpBrowser($client, $store);

// ...

Want logging and debugging of HTTP Cache? Yep:

use Psr\Log\AbstractLogger;

class EchoLogger extends AbstractLogger
{
    public function log($level, $message, array $context = [])
    {
        echo $message."\n";
    }
}

$browser = new HttpBrowser($client, $store, new EchoLogger());

The first time you run your code, you will get an output similar to:

Request: GET https://twig.symfony.com/
Response: 200 https://twig.symfony.com/
Cache: GET /: miss, store
Request: GET https://twig.symfony.com/doc/2.x/
Response: 200 https://twig.symfony.com/doc/2.x/
Cache: GET /doc/2.x/: miss, store

But then:

Cache: GET /: fresh
Cache: GET /doc/2.x/: fresh

Limit is the sky here as you get the full power of all the Symfony ecosystem.

Under the hood, these examples leverage HttpFoundation, HttpKernel (with HttpCache),
DomCrawler, BrowserKit, CssSelector, HttpClient, Mime, ...

Excited?

P.S. : Tests need to wait for the HttpClient Mock class to land into master.

@fabpot fabpot force-pushed the http-with-browserkit branch 2 times, most recently from 8bde4ee to a15b707 Compare March 19, 2019 12:58
Copy link
Contributor

@ro0NL ro0NL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Limit is the sky

You mean "Sky's the limit" i suppose :D

@nicolas-grekas nicolas-grekas changed the title [HttpClient] Add support for BrowserKit [BrowserKit] Add support for HttpClient Mar 21, 2019
fabpot added a commit that referenced this pull request Mar 21, 2019
…HttpClientInterface (fabpot)

This PR was merged into the 4.3-dev branch.

Discussion
----------

[HttpKernel] add RealHttpKernel: handle requests with HttpClientInterface

| Q             | A
| ------------- | ---
| Branch?       | master
| Bug fix?      | no
| New feature?  | yes
| BC breaks?    | no
| Deprecations? | no
| Tests pass?   | yes
| Fixed tickets | -
| License       | MIT
| Doc PR        | -

This commit is directly extracted from #30602 by @fabpot

Commits
-------

b579b02 [HttpKernel] add RealHttpKernel: handle requests with HttpClientInterface
@fabpot fabpot force-pushed the http-with-browserkit branch from 8443c88 to feebfee Compare March 21, 2019 14:59
@nicolas-grekas nicolas-grekas force-pushed the http-with-browserkit branch 3 times, most recently from 380a517 to a25fe3f Compare March 22, 2019 10:48
Copy link
Member

@nicolas-grekas nicolas-grekas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Green in a minute, with tests provided by @ktherage. The caching part is now separated into #30629, and logging has been removed. It should be reintroduced in HttpClient directly.

class HttpBrowser extends AbstractBrowser
{
private $client;
private $httpKernelBrowser;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed thanks

@fabpot
Copy link
Member Author

fabpot commented Mar 23, 2019

Thank you THERAGE Kévin.

@fabpot fabpot merged commit b5b2a25 into symfony:master Mar 23, 2019
fabpot added a commit that referenced this pull request Mar 23, 2019
…GE Kévin)

This PR was merged into the 4.3-dev branch.

Discussion
----------

[BrowserKit] Add support for HttpClient

| Q             | A
| ------------- | ---
| Branch?       | master
| Bug fix?      | no
| New feature?  | yes
| BC breaks?    | no
| Deprecations? | no <!-- don't forget to update UPGRADE-*.md and src/**/CHANGELOG.md files -->
| Tests pass?   | yes    <!-- please add some, will be required by reviewers -->
| Fixed tickets | part of #30502
| License       | MIT
| Doc PR        | not yet

When combining the power of the new HttpClient component with the BrowserKit and Mime components, we can makes something really powerful... a full/better/awesome replacement for https://github.com/FriendsOfPHP/Goutte.

So, this PR is about integrating the HttpClient component with BrowserKit to give users a high-level interface to ease usages in the most common use cases.

Scraping websites can be done like this:

```php
use Symfony\Component\BrowserKit\HttpBrowser;
use Symfony\Component\HttpClient\HttpClient;

$client = HttpClient::create();
$browser = new HttpBrowser($client);

$browser->request('GET', 'https://example.com/');
$browser->clickLink('Log In');
$browser->submitForm('Sign In', ['username' => 'me', 'password' => 'pass']);
$browser->clickLink('Subscriptions')->filter('table tr:nth-child(2) td:nth-child(2)')->each(function ($node) {
    echo trim($node->text())."\n";
});
```

And voilà! Nice, isn't?

Want to add HTTP cache? Sure:

```php
use Symfony\Component\HttpKernel\HttpCache\Store;

$client = HttpClient::create();
$store = new Store(sys_get_temp_dir().'/http-cache-store');

$browser = new HttpBrowser($client, $store);

// ...
```

Want logging and debugging of HTTP Cache? Yep:

```php
use Psr\Log\AbstractLogger;

class EchoLogger extends AbstractLogger
{
    public function log($level, $message, array $context = [])
    {
        echo $message."\n";
    }
}

$browser = new HttpBrowser($client, $store, new EchoLogger());
```

The first time you run your code, you will get an output similar to:

```
Request: GET https://twig.symfony.com/
Response: 200 https://twig.symfony.com/
Cache: GET /: miss, store
Request: GET https://twig.symfony.com/doc/2.x/
Response: 200 https://twig.symfony.com/doc/2.x/
Cache: GET /doc/2.x/: miss, store
```

But then:

```
Cache: GET /: fresh
Cache: GET /doc/2.x/: fresh
```

Limit is the sky here as you get the full power of all the Symfony ecosystem.

Under the hood, these examples leverage HttpFoundation, HttpKernel (with HttpCache),
DomCrawler, BrowserKit, CssSelector, HttpClient, Mime, ...

Excited?

P.S. : Tests need to wait for the HttpClient Mock class to land into master.

Commits
-------

b5b2a25 Add tests for HttpBrowser
dd55845 [BrowserKit] added support for HttpClient
fabpot added a commit that referenced this pull request Mar 23, 2019
This PR was merged into the 4.3-dev branch.

Discussion
----------

[HttpClient] added CachingHttpClient

| Q             | A
| ------------- | ---
| Branch?       | master
| Bug fix?      | no
| New feature?  | yes
| BC breaks?    | no
| Deprecations? | no
| Tests pass?   | yes
| Fixed tickets | -
| License       | MIT
| Doc PR        | -

The proposed `CachingHttpClient` uses `HttpCache` from the HttpKernel component to provide an HTTP-compliant cache.

If this is accepted, it could replace the corresponding part in #30602

Commits
-------

dae5686 [HttpClient] added CachingHttpClient
@nicolas-grekas nicolas-grekas modified the milestones: next, 4.3 Apr 30, 2019
@fabpot fabpot mentioned this pull request May 9, 2019
@fabpot fabpot deleted the http-with-browserkit branch September 12, 2019 12:56
javiereguiluz added a commit to symfony/symfony-docs that referenced this pull request Jan 2, 2020
This PR was merged into the 4.3 branch.

Discussion
----------

Replace references to Goutte with HttpBrowser

With the introduction of [HttpBrowser](symfony/symfony#30602) there's no need for Goutte any more, but the docs currently mention both, which is a little bit confusing. Also, the current version of Goutte actuality [extends HttpBrowser](https://github.com/FriendsOfPHP/Goutte/blob/v4.0.0/Goutte/Client.php), but doesn't add any functionality to it. It seems to me that Goutte is being completely replaced with HttpBrowser so there's no point in mentioning it in the docs any more.

Commits
-------

50898ae Replace references to Goutte with HttpBrowser
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants