-
-
Notifications
You must be signed in to change notification settings - Fork 9.6k
DomCrawler not getting text #8105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thx for your answer lazyhammer, that sure clears up a lot. The source which is being crawled is not always under the control of the programmer who is using the crawler. So i'm relieved the crawler doesn't break with a wrong usage of the html spec but instead tries to fix it. To get the "fixed" html:
from: http://stackoverflow.com/a/9567835 I won't close this issue as i find it important enough that this specific feature gets documented better. Even though it's not of the DomCrawler itself but libxml ... Symfony docs sometimes has references to "3rd party" functionality and a little explanation. Which would certainly come in good use right here. |
@flip111 note that the behavior of fixing the HTML is even part of the HTML5 spec |
@flip111 would you mind sending a PR to the https://github.com/symfony/symfony-docs to document this behaviour? |
@jakzal I tried (as you can see) Don't really get why it says: flip111 wants to merge 1,614 commits into symfony:2.0 from flip111:patch-1 1614 commits !!!! I just made a little change ... |
@flip111 Looks like you created a branch from master but you're trying to send a PR against 2.0. |
Ok i tried again with 2.1 |
@flip111 Your PR against 2.1 is still messed as you haven't changed your branch, which is still based on master |
Ok i try again ... hopefully this time everything is okay. |
Closing as there is an issue for the docs now. |
As far as i can understand from the documentation ->text() should get everything between tags with inner tags stripped. So:
input = <p><b>hello</b> world</p>
A)
$crawler->filter('p')->text() = "hello world"
B)
$crawler->filter('p')->html() = "<b>hello</b> world"
Assuming this is infact the way it's suppose to be then there is a bug here:
Returns: "\r\n "
Expected: Hello World! (with some spaces on the left and right
If this is not a bug, then please regard it as:
The text was updated successfully, but these errors were encountered: