Skip to content

[DomCrawler] Add method flatText() #23626

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

Conversation

vlakoff
Copy link
Contributor

@vlakoff vlakoff commented Jul 22, 2017

Q A
Branch? 3.4 (to be rebased from master)
Bug fix? no
New feature? yes
BC breaks? no
Deprecations? no
Tests pass? yes
Fixed tickets #21302
License MIT
Doc PR symfony/symfony-docs#...

Similar to text(), with some normalization applied to the value: surrounding spaces/newlines are removed, and consecutive spaces/newlines are reduced to one. Helpful if you don't want your code to choke because of the HTML source formatting.

This is a follow-up to #11470 and #21302. ping @javiereguiluz, @jpetitcolas.


Quoting #21302 (comment):

Personally I'm extending the Crawler class, I couldn't anymore clutter my code all over with this:

$ref = preg_replace('/\s+/', ' ', trim($node
    ->filter('div.numero_ref')
    ->text()));

Now I just have to do:

$ref = $node
    ->filter('div.numero_ref')
    ->cleanText();

@vlakoff
Copy link
Contributor Author

vlakoff commented Jul 22, 2017

What version number should be used for adding an entry to the changelog?

@nicolas-grekas
Copy link
Member

3.4 (and the PR be rebased + retargetted to branch 3.4 if you can.)

@nicolas-grekas nicolas-grekas added this to the 3.4 milestone Jul 25, 2017
@vlakoff vlakoff changed the base branch from master to 3.4 July 25, 2017 13:54
@vlakoff
Copy link
Contributor Author

vlakoff commented Jul 25, 2017

Added changelog entry, and rebased to 3.4 (I had followed the instructions, they tell to submit to master…)

@stof
Copy link
Member

stof commented Jul 25, 2017

@vlakoff these instructions are true for 18 months when we don't develop x.4 and x+1.0 in parallel. during the 6 months where we prepare the next minor and the next major, we have 2 active development branch (and anything which is backward compatible goes into the next minor)

@vlakoff
Copy link
Contributor Author

vlakoff commented Jul 25, 2017

Another thing, should there be a PR for adding this method to the documentation here, or is better to not add clutter and have it as a "hidden feature"?

@javiereguiluz
Copy link
Member

@vlakoff there should be a PR for docs ... but only if this proposal is finally approved. So let's wait a bit.

public function flatText()
{
if (!$this->nodes) {
throw new \InvalidArgumentException('The current node list is empty.');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it's OK to throw an exception in this case. Would it be better to simply return an empty string?

Copy link
Contributor

@ro0NL ro0NL Jul 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer the exception =/ same as text() does.

Note this method can be a simple one-liner return trim(preg_replace('/\s+/', ' ', $this->text()));

Copy link
Contributor Author

@vlakoff vlakoff Jul 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Throwing an exception is consistent with the rest of this class, which indeed I think is quite strict in practice (in my code I have to add a lot of tests for possibly absent nodes, instead of just getting null).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't it be an argument to the existing text() method instead?

*
* @throws \InvalidArgumentException When current node is empty
*/
public function flatText()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the proposed function name (flatText()) because this is usually a verb ("flatten") and here we're proposing to use an adjective ("flat"). I've been looking for in the standard libraries of some programming languages (Go, JavaScript, Java, C#, ...) but nobody defines a function like that (they only implement something like our trim(), ltrim(), rtrim(), etc.) Even Laravel doesn't define anything like this in Illuminate\Str.

Maybe trimAll() would be a better name? Or flatten()? Or trimText(), flattenText()? I'm not sure.

Copy link
Contributor Author

@vlakoff vlakoff Jul 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more a getter than an action, so I think a noun rather than a verb is fine (and actually better).

By the way, I had thought about flattenedText, but it's too long.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're flattening text().. it's an action :) i'd also favor flattenText() here.

@ro0NL
Copy link
Contributor

ro0NL commented Jul 26, 2017

Note JS does this by default (element.innerText || element.textContent) :) maybe we can patch text() instead, and optionally add rawText() as well.

@stof
Copy link
Member

stof commented Jul 26, 2017

We cannot change the behavior of text(). This would be a BC break.

@ro0NL
Copy link
Contributor

ro0NL commented Jul 26, 2017

True. Just saying it might be better to update it's behavior instead, if this is about DX.

We could bridge the BC with a temp. arg text($flatten = false) i guess,, not sure if worth it ;)

@vlakoff
Copy link
Contributor Author

vlakoff commented Aug 2, 2017

Thoughts about new method vs. adding a parameter ? (notice the latter isn't strictly BC, if user were extending the method)

@nicolas-grekas
Copy link
Member

Turning my -1 into a -0, ping @symfony/deciders it's up to us now.

@javiereguiluz
Copy link
Member

I'm 👍 about the feature (truly useful and needed very often) but I'm still 👎 about the method name (flatText()). It doesn't feel OK to me.

@nicolas-grekas
Copy link
Member

@javiereguiluz any suggestion to help move forward?

@javiereguiluz
Copy link
Member

I've asked in Symfony Slack, and the community has suggested these alternative names:

  • stripSpace()
  • trimSpaces()
  • strip()
  • spaceless()

@javiereguiluz
Copy link
Member

Maybe we can follow @ro0NL proposal and wait until 4.0 to make a BC break? We could have:

  • text() would work like this proposal and return you just the text contents
  • rawText() (originalText() ?) which is like the current text() and returns things unchanged

@xabbuh
Copy link
Member

xabbuh commented Sep 28, 2017

@javiereguiluz But even with a BC break in 4.0 we would need an upgrade path. We could do so with a new $stripSpaces argument defaulting to false that would keep the current behaviour while passing true would strip spaces. We could then change the default value to true in 4.0 and entirely deprecate the argument in 4.1. The current behaviour of text() could then be mimicked by a new method like rawText() or so.

@vlakoff
Copy link
Contributor Author

vlakoff commented Sep 28, 2017

It seems like things are getting complicated only because we can't figure out a proper name for the function...

I'm unhappy with some of the proposals above, because the function does two things, it doesn't only trim. About reversing the function behaviors, this still can be done later.

I just thought of a new one, how about tidyText()?

@Pierstoval
Copy link
Contributor

Maybe clean(ed?)Text() ?

@jakzal
Copy link
Contributor

jakzal commented Oct 3, 2017

We could do so with a new $stripSpaces argument defaulting to false that would keep the current behaviour while passing true would strip spaces. We could then change the default value to true in 4.0 and entirely deprecate the argument in 4.1. The current behaviour of text() could then be mimicked by a new method like rawText() or so.

As much as I'm usually against such flags, in this case I agree it's our best choice.

@fabpot
Copy link
Member

fabpot commented Oct 3, 2017

The flag sounds like the best idea, indeed.

@vlakoff
Copy link
Contributor Author

vlakoff commented Oct 3, 2017

Added a commit to change to the "flag" way.

Comments welcome about any rewording, nitpicking, etc.
Then I could squash it, and rebase it if needed.

@Tobion
Copy link
Contributor

Tobion commented Oct 3, 2017

Don't use a flag for this unless for deprecation.

  • text($stripSpaces = false) with deprecating $stripSpaces = false
  • value() for old behavior because it's also nodeValue

@vlakoff
Copy link
Contributor Author

vlakoff commented Oct 3, 2017

I understood your point about using the flag only for deprecation.

But TBH, handling all this transitional code is getting incredibly boring to me. So I'm closing this, if someone wants to take it over, he would be very welcome.

@xabbuh
Copy link
Member

xabbuh commented Oct 3, 2017

see #24412 as a replacement

nicolas-grekas added a commit that referenced this pull request Sep 27, 2019
…t() method (Simperfit)

This PR was merged into the 4.4 branch.

Discussion
----------

[DomCrawler] add a normalizeWhitespace argument to text() method

| Q             | A
| ------------- | ---
| Branch?       | 4.4
| Bug fix?      | no
| New feature?  | yes <!-- please update src/**/CHANGELOG.md files -->
| BC breaks?    | no     <!-- see https://symfony.com/bc -->
| Deprecations? | no <!-- please update UPGRADE-*.md and src/**/CHANGELOG.md files -->
| Tests pass?   | yes    <!-- please add some, will be required by reviewers -->
| Fixed tickets | #21302, #23626  <!-- #-prefixed issue number(s), if any -->
| License       | MIT
| Doc PR        | todo <!-- required for new features -->

<!--
Replace this notice by a short README for your feature/bugfix. This will help people
understand your PR and can be used as a start for the documentation.

Additionally (see https://symfony.com/roadmap):
 - Bug fixes must be submitted against the lowest maintained branch where they apply
   (lowest branches are regularly merged to upper ones so they get the fixes too).
 - Features and deprecations must be submitted against branch 4.4.
 - Legacy code removals go to the master branch.
-->

I've taken #24412, rebased with 4.4, fixed the comments in the PR.

cc @xabbuh @stof @nicolas-grekas

Commits
-------

1746718 [DomCrawler] add a value() method, normalize whitespaces
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.