Skip to content

DOMCrawler: removing extra break lines and spaces? #21302

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jpetitcolas opened this issue Jan 16, 2017 · 4 comments
Closed

DOMCrawler: removing extra break lines and spaces? #21302

jpetitcolas opened this issue Jan 16, 2017 · 4 comments
Labels
DomCrawler DX DX = Developer eXperience (anything that improves the experience of using Symfony) Feature

Comments

@jpetitcolas
Copy link

jpetitcolas commented Jan 16, 2017

Q A
Bug report? no
Feature request? yes
BC Break report? no
RFC? yes
Symfony version 3.1.7

We are currently implementing a functional test, testing content of the following code:

<p class="availability">
    <span class="available-from">
        {{ 'video.available.from'|trans }}
        {{ video.rights.begin|localizeddate('long', 'none') }}
    </span>

    <span class="available-to">
        {{ 'video.available.to'|trans }}
        {{ video.rights.end|localizeddate('long', 'none') }}
    </span>
</p>

We need to test content of the .availability element. Hence, we use something like:

$this->assertEquals($expectedAvailability, $crawler->filter('.availability')->text());

However, as there are some line breaks, this test fails as following:

-'Available from January 6, 2017 to June 14, 2017'
+'
+    
+        Available from
+        January 6, 2017
+    
+        to
+        June 14, 2017
+    
+'

We are used to fix this output using a code like:

$text = $element->text();
$text = preg_replace("/(\n|\t)/", '', $text);
$text = preg_replace("/ {2,}/", ' ', $text);
$this->assertEquals($expectedOutput, trim($text));

Yet, it may be a good idea to add a new $cleanedOutput = false parameter to the text() method?

@javiereguiluz
Copy link
Member

This was proposed, and rejected, a few years ago in #11470. Two methods were proposed: trim() and flatten() (the last one is what you are asking for in this issue).

I still think it'd be a good idea. Let's ping @symfony/deciders to see if they've changed their opinion about this. Thanks!

@javiereguiluz javiereguiluz added DomCrawler Feature DX DX = Developer eXperience (anything that improves the experience of using Symfony) labels Jan 16, 2017
@vlakoff
Copy link
Contributor

vlakoff commented May 4, 2017

Adding these trim() and flatten() methods would require changing the text() return type from a plain string to a stringable object. I wouldn't bend towards this change.

Personally I'm extending the Crawler class, I couldn't anymore clutter my code all over with this:

$ref = preg_replace('/\s+/', ' ', trim($node
    ->filter('div.numero_ref')
    ->text()));

Now I just have to do:

$ref = $node
    ->filter('div.numero_ref')
    ->cleanText();

@HeahDude
Copy link
Contributor

It's very common to trim strings in tests, so 👍 for me but this should be opt-in with a new method or a defaulted argument in text($trim = false).

@Simperfit
Copy link
Contributor

Can we move forward on this ?

nicolas-grekas added a commit that referenced this issue Sep 27, 2019
…t() method (Simperfit)

This PR was merged into the 4.4 branch.

Discussion
----------

[DomCrawler] add a normalizeWhitespace argument to text() method

| Q             | A
| ------------- | ---
| Branch?       | 4.4
| Bug fix?      | no
| New feature?  | yes <!-- please update src/**/CHANGELOG.md files -->
| BC breaks?    | no     <!-- see https://symfony.com/bc -->
| Deprecations? | no <!-- please update UPGRADE-*.md and src/**/CHANGELOG.md files -->
| Tests pass?   | yes    <!-- please add some, will be required by reviewers -->
| Fixed tickets | #21302, #23626  <!-- #-prefixed issue number(s), if any -->
| License       | MIT
| Doc PR        | todo <!-- required for new features -->

<!--
Replace this notice by a short README for your feature/bugfix. This will help people
understand your PR and can be used as a start for the documentation.

Additionally (see https://symfony.com/roadmap):
 - Bug fixes must be submitted against the lowest maintained branch where they apply
   (lowest branches are regularly merged to upper ones so they get the fixes too).
 - Features and deprecations must be submitted against branch 4.4.
 - Legacy code removals go to the master branch.
-->

I've taken #24412, rebased with 4.4, fixed the comments in the PR.

cc @xabbuh @stof @nicolas-grekas

Commits
-------

1746718 [DomCrawler] add a value() method, normalize whitespaces
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DomCrawler DX DX = Developer eXperience (anything that improves the experience of using Symfony) Feature
Projects
None yet
Development

No branches or pull requests

6 participants