Skip to content

[Yaml] Allow tabs as separators between tokens #40514

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 23, 2021
Merged

[Yaml] Allow tabs as separators between tokens #40514

merged 1 commit into from
Apr 23, 2021

Conversation

bertramakers
Copy link
Contributor

@bertramakers bertramakers commented Mar 18, 2021

Q A
Branch? 4.4
Bug fix? yes
New feature? no
Deprecations? no
Tickets Fix #40507
License MIT
Doc PR None

As described in #40507, the Yaml spec allows tabs as whitespace characters between tokens. However, the Yaml parser crashes on this as it only expects spaces after the colon. https://yaml.org/spec/1.2/spec.html#id2778241

While I'm not a huge fan of it personally, it's an issue when a different linter tells us that a given YAML file with content that we have little control over has valid syntax in an unrelated check, and then our app crashes because it cannot be parsed after all.

@carsonbot
Copy link

Hey!

I see that this is your first PR. That is great! Welcome!

Symfony has a contribution guide which I suggest you to read.

In short:

  • Always add tests
  • Keep backward compatibility (see https://symfony.com/bc).
  • Bug fixes must be submitted against the lowest maintained branch where they apply (see https://symfony.com/releases)
  • Features and deprecations must be submitted against the 5.x branch.

Review the GitHub status checks of your pull request and try to solve the reported issues. If some tests are failing, try to see if they are failing because of this change.

When two Symfony core team members approve this change, it will be merged and you will become an official Symfony contributor!
If this PR is merged in a lower version branch, it will be merged up to all maintained branches within a few days.

I am going to sit back now and wait for the reviews.

Cheers!

Carsonbot

@bertramakers bertramakers changed the title Issue 40507 tabs as whitespace in scalar content Fix issue 40507: Tabs as separators between tokens Mar 18, 2021
@bertramakers
Copy link
Contributor Author

bertramakers commented Mar 18, 2021

Not sure how I can fix the coding standard suggestions reported on fabbot.io, since it changes the tab characters that are needed to actually test this.
Edit: I misread and it suggested to use single quotes instead of double quotes, so I did that in 5f35e89.

The exception message formatting suggested on fabbot.io is unrelated to my changes. Not sure if I should fix that then.
Edit: I don't think the suggested exception message formatting changes make sense in this context. For example, the exception message Numeric keys are not supported. [...] would become "Numeric" keys are not supported. [...]

@bertramakers
Copy link
Contributor Author

The tests failing on Appveyor also seem unrelated to me, so again I'm not sure how to make those green.

@bertramakers bertramakers changed the title Fix issue 40507: Tabs as separators between tokens [Yaml] Fix issue 40507: Tabs as separators between tokens Mar 18, 2021
@carsonbot
Copy link

Hey!

Nice work here, I this makes me happy.

I think @NamelessCoder has recently worked with this code. Maybe they can help review this?

Cheers!

Carsonbot

@stof
Copy link
Member

stof commented Mar 19, 2021

The exception message formatting suggested on fabbot.io is unrelated to my changes. Not sure if I should fix that then.
Edit: I don't think the suggested exception message formatting changes make sense in this context. For example, the exception message Numeric keys are not supported. [...] would become "Numeric" keys are not supported. [...]

The check about the formatting of exception message has some false positive (where we explicitly decide not to wrap in quotes as we are using placeholders for a different reason).

'foo: bar', // space and tab after colon
];

foreach ($yamls as $yaml) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would be better implemented as 2 separate tests IMO

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open to splitting it up, but I based it on the existing test for tabs as indentation. Should I also change that one then for consistency?

And would a dataprovider also be okay, or do you prefer separate tests?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a data provider is fine

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests of the YAML component are quite old (that component actually comes from symfony 1 originally), and so they don't really follow PHPUnit best practices. But ideally, new tests should follow them

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, thanks for clarifying! I haven't had much time to change it so far but I still intend to follow up on this somewhere in the coming days.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, why should tests not have a : void return type?

@carsonbot carsonbot changed the title [Yaml] Fix issue 40507: Tabs as separators between tokens Fix issue 40507: Tabs as separators between tokens Apr 9, 2021
@xabbuh xabbuh changed the title Fix issue 40507: Tabs as separators between tokens [Yaml] Allow tabs as separators between tokens Apr 23, 2021
@xabbuh
Copy link
Member

xabbuh commented Apr 23, 2021

Thank you @bertramakers.

@xabbuh xabbuh merged commit df6b1eb into symfony:4.4 Apr 23, 2021
This was referenced May 1, 2021
@bertramakers bertramakers deleted the issue-40507-tabs-as-whitespace-in-scalar-content branch May 12, 2021 07:56
@@ -200,7 +200,7 @@ private function doParse(string $value, int $flags)
array_pop($this->refsBeingParsed);
}
} elseif (
self::preg_match('#^(?P<key>(?:![^\s]++\s++)?(?:'.Inline::REGEX_QUOTED_STRING.'|(?:!?!php/const:)?[^ \'"\[\{!].*?)) *\:( ++(?P<value>.+))?$#u', rtrim($this->currentLine), $values)
self::preg_match('#^(?P<key>(?:![^\s]++\s++)?(?:'.Inline::REGEX_QUOTED_STRING.'|(?:!?!php/const:)?[^ \'"\[\{!].*?)) *\:(( |\t)++(?P<value>.+))?$#u', rtrim($this->currentLine), $values)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why group (( |\t)) insteand of list ([ \t])? Lists are a bit faster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No specific reason except that I didn't know that there was a performance difference. Is it significant though?
I have no strong opinion about one or the other, but this PR is merged so it'd require a new PR.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bertramakers I did a benchmark, and list is 1% faster. I do not know if it is considered a micro-optimization, but since it's a parser, maybe it's something significant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea either, I just discovered this bug while using the Yaml component in my own app and reported and fixed it to the best of my ability 😄 I think you can always open another PR to optimize it if you think it's worth it!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bertramakers yeah, no problem! The fix works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants