Skip to content

[DomCrawler] add a value() method, normalize whitespaces #24412

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

xabbuh
Copy link
Member

@xabbuh xabbuh commented Oct 3, 2017

Q A
Branch? 4.1
Bug fix? no
New feature? yes
BC breaks? no
Deprecations? no
Tests pass? yes
Fixed tickets #21302, #23626
License MIT
Doc PR TODO

{
$this->assertEquals("one two\nthree ", $this->createTestCrawler()->filterXPath('//div[@id="text-whitespaces"]')->value(), '->value() returns the node value of the first element of the node list');

$this->createTestCrawler()->filterXPath('//ol')->value();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be a separate test like testValueThrowsExceptionWhenNodeListEmpty. if the above call aleady fails with InvalidArgumentException, the test would still pass.

@vlakoff
Copy link
Contributor

vlakoff commented Oct 3, 2017

So, on 3.4 people will have to update their codebase to text(true), then in 4.0 they will have to update it again... We're not very kind to them.

Couldn't we drop the 3.4 step and just make the 4.0 change? I don't really see the benefits of the 3.4 step, apart from adding work.

@xabbuh
Copy link
Member Author

xabbuh commented Oct 3, 2017

There is no need to update the code in 4.0 again. You could omit the argument then, but it's not necessary.

UPGRADE-3.4.md Outdated
----------

* added an argument to the `text()` method to opt-in normalizing whitespaces, not passing `true` is deprecated, for the
previous behavior of the `text()` method, use the new `value()` method
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

 * Added an argument to the `text()` method to opt-in normalizing whitespaces, not passing `true` is deprecated. For the
   previous behavior of the `text()` method, use the new `value()` method.

? (see dots)

UPGRADE-4.0.md Outdated
DomCrawler
----------

* the `text()` method now normalizes whitespaces of the node text value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same. Compared to above/below blocks this should be sentences no?

@nicolas-grekas
Copy link
Member

nicolas-grekas commented Oct 4, 2017

I'm sorry I don't get the migration path at all.
I understand we're fighting adding a flag... By adding a flag...
As far as the migration path is concerned, this will require ppl to change their code twice: once when moving to 4.0, by adding a flag (failed goal already)
And twice in 4.1 to remove the deprecation...

If we must add a flag because there is no other migration path, let's settle on having a flag and keep it. This would mean zero deprecation (but for ppl extending the class, but that's edgy.)

@nicolas-grekas
Copy link
Member

Oh, and we could then, in 4.1, deprecate the flag and replace it by this value() method.
That'd mean one deprecation, thus one migration.
Better, isn't it?

public function text($normalizeWhitespaces = false)
{
if (!$normalizeWhitespaces) {
@trigger_error(sprintf('Not normalizing whitespace characters when using the %s() method is deprecated since version 3.4. The %s() method will return the first node\'s value with whitespaces being normalized in Symfony 4.0. Use the value() method instead if you do not want to have whitespaces normalized.', __METHOD__, __METHOD__), E_USER_DEPRECATED);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This deprecation is not required and should be removed.
Also, the added arg is a bc break right now, should be done using func_get_arg()

@xabbuh
Copy link
Member Author

xabbuh commented Oct 4, 2017

Fair enough, I reverted the deprecation.

Status: Needs Review

Copy link
Member

@nicolas-grekas nicolas-grekas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's still strange, sorry :)
What about deprecating test(), and add a $normalizeWhitespace = true flag to value()
The flag is a requirement to me, I can't find any better path that would be migration-path-friendly.
But by defaulting to "true", value could just be good by default, WDYT?

@xabbuh
Copy link
Member Author

xabbuh commented Oct 6, 2017

I understand what you mean, but if we did that, value() would have to get passed false to behave as I would have expected. I mean the name indicates returning the raw value, doesn't it? So if we want to go the way with the flag, I suggest to just add it to text() and then decide somewhere along the 4.x if we deprecate the flag and introduce something else or just live with the flag.

@stof
Copy link
Member

stof commented Oct 6, 2017

I the context of an HTML crawler, I think that value() would be confusing it. If I have a crawler holding an input element or a select option, I would understand ->value() as returning the value of the input (which is in an attribute), not the inner text (which is what ->nodeValue does).
FYI, the JS property corresponding to this property is textContent.

The normalization you are doing here for text corresponds to part of the behavior non-standard innerText property of JS DOM APIs (we cannot reproduce the full behavior, as this involves removing text from non-rendered children and we are not doing rendering, and reproducing other parts cannot be done with a simple regex: https://html.spec.whatwg.org/multipage/dom.html#the-innertext-idl-attribute)

@xabbuh xabbuh modified the milestones: 3.4, 4.1 Oct 6, 2017
@xabbuh
Copy link
Member Author

xabbuh commented Oct 6, 2017

I suggest to postpone this PR for Symfony 4.1 to not rush anything into core with which we may not be completely satisfied.

@xabbuh xabbuh changed the base branch from 3.4 to master November 22, 2017 10:10
@nicolas-grekas nicolas-grekas modified the milestones: 4.1, next Apr 20, 2018
@fabpot
Copy link
Member

fabpot commented Oct 10, 2018

@xabbuh Time to resurrect this PR?

@fabpot
Copy link
Member

fabpot commented Feb 21, 2019

@xabbuh Still interested in finishing this PR?

@vlakoff
Copy link
Contributor

vlakoff commented Feb 22, 2019

To avoid all that upgrade path mess with the method parameters, I'd suggest just adding a new method as I proposed initially.

@Simperfit
Copy link
Contributor

@xabbuh do you want me to finish this one ?

@xabbuh
Copy link
Member Author

xabbuh commented Jul 6, 2019

@Simperfit If you have some time, please feel free to take over. 👍

@fabpot
Copy link
Member

fabpot commented Jul 8, 2019

@Simperfit Is this PR still on your radar?

@Simperfit
Copy link
Contributor

Simperfit commented Jul 8, 2019 via email

@xabbuh
Copy link
Member Author

xabbuh commented Jul 8, 2019

closing in favour of #32440

@xabbuh xabbuh closed this Jul 8, 2019
@xabbuh
Copy link
Member Author

xabbuh commented Jul 8, 2019

Thank you for taking over @Simperfit!

@xabbuh xabbuh deleted the issue-21302 branch July 8, 2019 17:02
nicolas-grekas added a commit that referenced this pull request Sep 27, 2019
…t() method (Simperfit)

This PR was merged into the 4.4 branch.

Discussion
----------

[DomCrawler] add a normalizeWhitespace argument to text() method

| Q             | A
| ------------- | ---
| Branch?       | 4.4
| Bug fix?      | no
| New feature?  | yes <!-- please update src/**/CHANGELOG.md files -->
| BC breaks?    | no     <!-- see https://symfony.com/bc -->
| Deprecations? | no <!-- please update UPGRADE-*.md and src/**/CHANGELOG.md files -->
| Tests pass?   | yes    <!-- please add some, will be required by reviewers -->
| Fixed tickets | #21302, #23626  <!-- #-prefixed issue number(s), if any -->
| License       | MIT
| Doc PR        | todo <!-- required for new features -->

<!--
Replace this notice by a short README for your feature/bugfix. This will help people
understand your PR and can be used as a start for the documentation.

Additionally (see https://symfony.com/roadmap):
 - Bug fixes must be submitted against the lowest maintained branch where they apply
   (lowest branches are regularly merged to upper ones so they get the fixes too).
 - Features and deprecations must be submitted against branch 4.4.
 - Legacy code removals go to the master branch.
-->

I've taken #24412, rebased with 4.4, fixed the comments in the PR.

cc @xabbuh @stof @nicolas-grekas

Commits
-------

1746718 [DomCrawler] add a value() method, normalize whitespaces
@nicolas-grekas nicolas-grekas modified the milestones: next, 4.4 Oct 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants