Skip to content

[DomCrawler] Allow pipe (|) character in link tags when using Xpath expressions #20235

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Oct 18, 2016

Conversation

nicolas-grekas
Copy link
Member

@nicolas-grekas nicolas-grekas commented Oct 18, 2016

Q A
Branch? master
Bug fix? yes
New feature? no
BC breaks? no
Deprecations? no
Tests pass? yes
Fixed tickets #20229
License MIT
Doc PR -

@klausi could you please validate this patch? Is it an improvement over yours? (sorry I don't have the proper use case to test.)

@nicolas-grekas nicolas-grekas changed the title Domcrawler pipe [DomCrawler] Allow pipe (|) character in link tags when using Xpath expressions Oct 18, 2016
@nicolas-grekas nicolas-grekas changed the base branch from master to 2.7 October 18, 2016 08:29
@nicolas-grekas nicolas-grekas force-pushed the domcrawler-pipe branch 2 times, most recently from 151af3f to 6af53a6 Compare October 18, 2016 08:38
continue 2;
case '[':
case ']':
$openedBrackets += '[' === $xpath[$i] ? 1 : -1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would find it more readable to use 2 different cases rather than redoing a check inside it:

    case '[':
        ++$openedBrackets;
        continue 2;

    case '[':
        --$openedBrackets;
        continue 2;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

$parenthesis = '';
$xpathLen = strlen($xpath);
$openedBrackets = 0;
$lastUnion = strspn($xpath, " \t\n\r\0\x0B");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain what you are looking for here ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespaces, chars list comes from trim()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$startPosition may be a better name than $lastUnion for this variable, as it actually stores the position of the beginning of the current expression (which is never the position of the last pipe, but a position after it)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@stof
Copy link
Member

stof commented Oct 18, 2016

@nicolas-grekas this is a bugfix IMO, because the current code breaks on valid XPath.

Btw, this is a bug affecting Drupal

switch ($xpath[$i]) {
case '"':
case "'":
if (false === $i = strpos($xpath, $xpath[$i], $i + 1)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't do assignments in the if condition, I think you can move the assignment just one line before.

Copy link
Member Author

@nicolas-grekas nicolas-grekas Oct 18, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the contrary, I like dealing with the return value of strpos on the same line than it is called, to make things more readable as far as var types is concerned... so unless there is a strong push to change it, I'd personally prefer to keep it this way...

@stof
Copy link
Member

stof commented Oct 18, 2016

@nicolas-grekas I updated the gist at https://gist.github.com/stof/5131c79c576600800aac7313b182d44d to include your implementation, as well as an implementation short-circuiting XPath splitting when there is no pipe in the string (the benchmark method fits well the Mink use-case of returning split expressions. The short-circuiting needs to be adapted for BrowserKit here as the loop is also doing the processing directly instead of doing a separate loop over the split XPath for processing).
the short-circuiting makes it win over the regex implementation when there is no union (which can have a huge impact if the XPath is long, as shown by the third and fourth cases)

@klausi
Copy link
Contributor

klausi commented Oct 18, 2016

I can confirm that this PR fixes the issue for Drupal.

Luckily the patch almost applies to Symfony 2.7. It was still a bit hard to actually apply it, for testing locally I did it with this hack (documenting for myself for next time):

$ cd vendor/symfony/dom-crawler/
$ wget https://patch-diff.githubusercontent.com/raw/symfony/symfony/pull/20235.diff
$ patch -p3 < 20235.diff 
can't find file to patch at input line 5
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--------------------------
|diff --git a/src/Symfony/Component/DomCrawler/Crawler.php b/src/Symfony/Component/DomCrawler/Crawler.php
|index 37822e5..df16e21 100644
|--- a/src/Symfony/Component/DomCrawler/Crawler.php
|+++ b/src/Symfony/Component/DomCrawler/Crawler.php
--------------------------
File to patch: Crawler.php
patching file Crawler.php
Hunk #1 succeeded at 853 (offset -3 lines).
Hunk #2 succeeded at 903 (offset -3 lines).
Hunk #3 succeeded at 921 (offset -3 lines).
Hunk #4 succeeded at 929 (offset -3 lines).
can't find file to patch at input line 103
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--------------------------
|diff --git a/src/Symfony/Component/DomCrawler/Tests/CrawlerTest.php b/src/Symfony/Component/DomCrawler/Tests/CrawlerTest.php
|index 45bbb2f..65e2a90 100755
|--- a/src/Symfony/Component/DomCrawler/Tests/CrawlerTest.php
|+++ b/src/Symfony/Component/DomCrawler/Tests/CrawlerTest.php
--------------------------
File to patch: d
d: No such file or directory
Skip this patch? [y] y
Skipping patch.
3 out of 3 hunks ignored

So the test file was not patched because Drupal removes it, but that is fine for testing with Drupal.

@stof
Copy link
Member

stof commented Oct 18, 2016

@klausi when you install individual components, you indeed need to remap paths to apply the patch, as the patch is for the main repo.

@nicolas-grekas performance for your version is indeed much better than #20229

@nicolas-grekas
Copy link
Member Author

@stof thanks for the bench! Then this is ready to merge/vote, isn't it?

@stof
Copy link
Member

stof commented Oct 18, 2016

👍

@stof
Copy link
Member

stof commented Oct 18, 2016

@nicolas-grekas FYI, I plan to reuse your faster implementation in Mink as well, (as it needs splitting union parts too, and this is where @klausi submitted the implementation proposed here in #20229). You can send the PR yourselves if you prefer. the code is at https://github.com/minkphp/Mink/blob/master/src/Selector/Xpath/Manipulator.php

@nicolas-grekas
Copy link
Member Author

Please do

@nicolas-grekas
Copy link
Member Author

Thank you @klausi.

@nicolas-grekas nicolas-grekas merged commit 17757d8 into symfony:2.7 Oct 18, 2016
nicolas-grekas added a commit that referenced this pull request Oct 18, 2016
…ing Xpath expressions (klausi, nicolas-grekas)

This PR was merged into the 2.7 branch.

Discussion
----------

[DomCrawler] Allow pipe (|) character in link tags when using Xpath expressions

| Q             | A
| ------------- | ---
| Branch?       | master
| Bug fix?      | yes
| New feature?  | no
| BC breaks?    | no
| Deprecations? | no
| Tests pass?   | yes
| Fixed tickets | #20229
| License       | MIT
| Doc PR        | -

@klausi could you please validate this patch? Is it an improvement over yours? (sorry I don't have the proper use case to test.)

Commits
-------

17757d8 [DomCrawler] Optimize DomCrawler::relativize()
5b26e33 [DomCrawler] Allow pipe (|) character in link tags when using Xpath expressions
@nicolas-grekas nicolas-grekas deleted the domcrawler-pipe branch October 18, 2016 15:34
This was referenced Oct 27, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants