Skip to content

[DomCrawler] normalizeWhitespace should be true by default #34151

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 29, 2019

Conversation

dunglas
Copy link
Member

@dunglas dunglas commented Oct 28, 2019

Q A
Branch? 4.4
Bug fix? yes
New feature? no
Deprecations? no
Tickets n/a
License MIT
Doc PR n/a

According to the DOM and WebDriver specs, browsers always return the normalized text. In Panther, because of WebDriver, it's not even possible without dirty hacks to retrieve the "non normalized" text.

For compatibility with Panther it's mandatory to set this new parameter (introduced in 4.4) to true by default.

I propose to change the default value to true in 5.0, it has the benefit of:

  • being spec-compliant (in 5.0, text will be normalized by default)
  • being cleaner when using Panther ($node->text() instead of $node->text(null, true), passing true is mandatory because Panther doesn't support retrieving the non-normalized text)

For backward compatible with 4.x versions, if no argument is passed and the returned text isn't the same than the normalized one, a notice is triggered.

Copy link
Member

@javiereguiluz javiereguiluz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love this proposal. Checking the exact original content (including all white spaces) is an ultra rare edge-case in my own experience.

@@ -614,6 +614,14 @@ public function text(/* string $default = null, bool $normalizeWhitespace = fals

$text = $this->getNode(0)->nodeValue;

if (\func_num_args() <= 1) {
if (trim(preg_replace('/(?:\s{2,}+|[^\S ])/', ' ', $text)) !== $text) {
@trigger_error(sprintf('"%s()" will normalize whitespaces by default in Symfony 5.0, set the 2nd "$normalizeWhitespace" argument to false to retrieve the non-normalized version of the text.', __METHOD__), E_USER_DEPRECATED);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2nd -> second ?

Also, do we mention the argument name explicitly in other error messages? If we don't, we could remove the mention to $normalizeWhitespace here. Thanks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a quick search, I found many examples following the same pattern (2nd and the name of the argument). Example: https://github.com/symfony/symfony/blob/4.4/src/Symfony/Component/PropertyAccess/PropertyAccessor.php#L855

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That being said, let me know what's the preferred pattern and I'll update this one :)

@dunglas dunglas force-pushed the normalize-by-default branch from f6d8560 to 54d46ee Compare October 29, 2019 11:38
@dunglas dunglas closed this in 3b11a76 Oct 29, 2019
@dunglas dunglas merged commit 54d46ee into symfony:4.4 Oct 29, 2019
@dunglas dunglas deleted the normalize-by-default branch October 29, 2019 11:39
This was referenced Nov 12, 2019
@dmaicher
Copy link
Contributor

@dunglas should this be mentioned in https://github.com/symfony/symfony/blob/master/UPGRADE-4.4.md?

I'm just upgrading one of my apps from 4.3 to 4.4 and I only noticed this one via the phpunit bridge as its not mentioned in the upgrade docs

@ssanders
Copy link

ssanders commented Oct 4, 2022

There should be a way to set this (back to) false globally.

@stof
Copy link
Member

stof commented Oct 4, 2022

A global setting would be the worse API either, because no reusable code could omit the argument as they would not be able to rely on the default value (there is a reason why we deprecated relying on the old default instead of just changing it directly: it changes the behavior)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants