Skip to content

[Yaml] Add tags support #21194

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from
Closed

[Yaml] Add tags support #21194

wants to merge 4 commits into from

Conversation

GuilhemN
Copy link
Contributor

@GuilhemN GuilhemN commented Jan 7, 2017

Q A
Branch? master
Bug fix? no
New feature? yes
BC breaks? no
Deprecations? yes
Tests pass? yes
Fixed tickets #21185
License MIT
Doc PR

This PR adds custom tags support to the Yaml component.
Symfony tags (!!binary, !str, etc.) are still managed in the parser to have a lighter diff but we'll be able to convert them later if we want to.

The primary addition of this PR is the TagInterface:

interface TagInterface
{
    public function construct(mixed $value): mixed;
}

It can be used to register custom tags. An example that could be used to convert the syntax =iterator to a tag:

final class IteratorTag implements TagInterface
{
    public function construct(mixed $value): mixed
    {
        return new IteratorArgument($value);
    }
}

$parser = new Parser(['iterator' => new IteratorTag()]);

If you think this is too complex, @nicolas-grekas proposed an alternative to my proposal externalizing this support by introducing a new class TaggedValue.

@nicolas-grekas
Copy link
Member

nicolas-grekas commented Jan 7, 2017

Note that the simpler TaggedValue proposal is still compatible with having the Yaml component hydrate things. It would just require a separate class whose responsibility would be solely to hydrate the TaggedValue objects to something else. This would remove this responsibility from the Yaml parser classes, which looks like a good idea to me.

}

if (null !== self::$tagResolver && $scalar && '!' === $scalar[0]) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not a particular big fan of putting conditions like '&& $scalar' because it takes away some readability. There are a lot of ways of validating '$scalar' (empty(), is_null(), etc...) instead of making me believe (at first sight) that it is a boolean.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact we can just remove it, this case is already managed earlier. Thanks!

{
private $tags = array();

public function addTag(TagInterface $tag, $priority = 0)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be a good idea to add some phpDoc about the parameters here.


private function prefixTag($tag)
{
if ($tag && '!' === $tag[0]) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is $tag a bool? If you validate $tag and it is a string I think it makes more sense to use any of the available methods just for improved readability.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Copy link
Member

@nicolas-grekas nicolas-grekas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be great to leverage this for DI =iterator & =closure-proxy in the same PR so that we can juge on an actual use case.

/**
* @author Guilhem N. <egetick@gmail.com>
*
* @internal
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure this should be internal

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be part of the api in 4.0 but as it never ends in userland for now, making it internal allows us to change its name (or maybe just remove it in favor of a simple ParseException?).

try {
return self::$tagResolver->resolve($scalar, $tag);
} catch (UnsupportedTagException $e) {
// Ignored for bc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should trigger a deprecation here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I removed the exception in the last commit (and its catch), I think we can do it later to focus on the feature.

@@ -44,6 +48,16 @@ public function __construct($offset = 0, $totalNumberOfLines = null, array $skip
$this->offset = $offset;
$this->totalNumberOfLines = $totalNumberOfLines;
$this->skippedLineNumbers = $skippedLineNumbers;
$this->tagResolver = new TagResolver();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this a constructor argument instead? (default null, no need for any instance when not using tags)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* @param TagInterface $tag
* @param int $priority
*/
public function addTag(TagInterface $tag, $priority = 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for this method, requiring a tagResolver as constructor is enough imho

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looked complicated to use to me as there are already 3 internal arguments in the constructor (that should be removed imo but not sure we can do that because of bc).
People would be forced to do new Parser(0, null, array(), new TagResolver());.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see #21230

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood, then w know how to make that the first arg: with a BC layer

try {
return $this->doParse($value, $flags);
} finally {
Inline::$tagResolver = null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be restored to the previous value of Inline::$tagResolver

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

woudln't it be better to pass it as argument instead of forcing to maintain some state externally ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nicolas-grekas this is only done for bc (Inline wasn't internal in 3.0) but it must not be used in userland so I think it's fine to always remove the previous value.

@stof it is following this. I think it would be better to instantiate Inline and pass the TagResolver as a constructor argument, wdyt?

try {
return $this->tagResolver->resolve($data, $tag);
} catch (UnsupportedTagException $e) {
// ignored for bc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deprecation

*
* @internal
*/
final class TagResolver
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking at this closer, we might not need it: we could simply feed Yaml\Parser with an associative array of tags-to-tagresolvers:
new Parser(..., ..., array('iterator' => new IteratorTagResolver()) (prefix excluded, or auto added)

we don't need to handle the "priority" + "supports()" logic, it just adds complixity to me, no flexibility

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nicolas-grekas imo #21194 (comment) is a big locker in using a constructor argument. I'll submit a PR to see if we can do something about it.

we don't need to handle the "priority" + "supports()" logic, it just adds complixity to me, no flexibility

It could be useful to allow implicit tags (foo becoming !bar foo for example, based on regexes and on its path) but I'm not sure this is very useful in userland so will try :)

$tagName = '!'.substr($tagName, 18);
}

$this->tags[$tagName] = $tag;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check that $tag has the required interface?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should indeed be safer


foreach ($tags as $tagName => $tag) {
if (!$tag instanceof TagInterface) {
throw new \InvalidArgumentException(sprintf('Expected tag of type "%s", given "%s".', TagInterface::class, get_class($tag) ?: gettype($tag)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_class triggers a warning when given a non-object

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, thanks !

@GuilhemN
Copy link
Contributor Author

Should I add mappings/sequences support here to be able to implement !iterator or do you prefer me to do it later in another PR?

@fabpot
Copy link
Member

fabpot commented Jan 11, 2017

It would be good to have support in this PR I think.

UPGRADE-4.0.md Outdated
* The default value of the strict option of the `Choice` Constraint has been
changed to `true` as of 4.0. If you need the the previous behaviour ensure to
changed to `true` as of 4.0. If you need the the previous behaviour ensure to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double "the" while changing :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, but not related ;) see #21250

try {
return $this->doParse($value, $flags);
} finally {
Inline::$tags = array();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still, when reading this line, I think we should restore to the previous value. I know you explained it's useless, but this requires contextual knowledge, and thus with no guarantee. Better make the code logic local.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually you were right, it was creating a bug when entering a block (because of nested parsers).

*
* @return mixed
*/
public function construct($value);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should pass $tag and $value, so that one could create a single object to manage several tags if required.
$tag should match the key used when registering if possible (no namespace added/removed).
About "construct' => "createValue"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@GuilhemN GuilhemN mentioned this pull request Jan 12, 2017
nicolas-grekas added a commit that referenced this pull request Jan 12, 2017
This PR was merged into the 3.2 branch.

Discussion
----------

Fix a typo

| Q             | A
| ------------- | ---
| Branch?       | 3.2
| Bug fix?      | no
| New feature?  | no
| BC breaks?    | no
| Deprecations? | no
| Tests pass?   | yes
| Fixed tickets | #21194 (comment)
| License       | MIT
| Doc PR        |

<!--
- Bug fixes must be submitted against the lowest branch where they apply
  (lowest branches are regularly merged to upper ones so they get the fixes too).
- Features and deprecations must be submitted against the master branch.
- Please fill in this template according to the PR you're about to submit.
- Replace this comment by a description of what your PR is solving.
-->

Commits
-------

2dcb22e Fix a typo
@@ -20,7 +22,7 @@
*/
class Parser
{
const TAG_PATTERN = '((?P<tag>![\w!.\/:-]+) +)?';
const TAG_PATTERN = '!(?P<tag>[\w!.\/:-]+)';
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should I create a new constant/inline it instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need imo

Copy link
Member

@nicolas-grekas nicolas-grekas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't wait to see the DI part here also :)

$data = array();
$context = null;
$allowOverwrite = false;
while ($this->moveToNextLine()) {

if(null !== ($tag = $this->getLineTag($this->currentLine))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing space after if, no need for brackets around assignment

@@ -129,6 +174,15 @@ public function parse($value, $flags = 0)
// array
if (!isset($values['value']) || '' == trim($values['value'], ' ') || 0 === strpos(ltrim($values['value'], ' '), '#')) {
$data[] = $this->parseBlock($this->getRealCurrentLineNb() + 1, $this->getNextEmbedBlock(null, true), $flags);
} elseif (null !== ($subTag = $this->getLineTag(ltrim($values['value'], ' ')))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for brackets

if ($mergeNode) {
// Merge keys
} elseif (!isset($values['value']) || '' == trim($values['value'], ' ') || 0 === strpos(ltrim($values['value'], ' '), '#')) {
} elseif (!isset($values['value']) || '' == trim($values['value'], ' ') || 0 === strpos(ltrim($values['value'], ' '), '#') || null !== ($subTag = $this->getLineTag(ltrim($values['value'], ' ')))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

brackets


private function getLineTag($value)
{
if ('' === $value || '!' !== $value[0] || 1 !== preg_match('/^'.self::TAG_PATTERN.' *( +#.*)?$/', $value, $matches)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"m" modifier missing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Values passed to this function do not contain newlines, so no need for "m".

{
/**
* @param mixed $value
* @param string $tag the tag name used to register the tag
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The + missing alignment (The should start one space after $value, isn't it?)

@GuilhemN
Copy link
Contributor Author

@nicolas-grekas thanks, i'll fix your comments this evening.
I'm locking on the inline part for now and also on how to dump objects, should we add supportsDump/normalize methods to dump DI iterators? We could also dump iterators as simple arrays but that's a bit annoying.

@GuilhemN
Copy link
Contributor Author

GuilhemN commented Jan 13, 2017

Last commits added inline support (!foo {quz: bar}), a IteratorTag in the DependencyInjection component and a way to dump tags (DumpableTagInterface) because of DI YamlDumper.

Fabbot failure is normal (the current yaml parser can't parse the updated yaml files).

*
* @internal
*/
final class IteratorTag implements TagInterface, DumpableTagInterface
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is also "=closure_proxy" which could benefit from yaml tags.
which means we could have only one class - eg "ArgumentTag" for the DI component, able to deal with all tags it'll need to support (these 2 today and more in the future, no forward compat issue here)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

$value = array('=iterator' => $value->getValues());
} elseif ($value instanceof ClosureProxyArgument) {
$value = array('=closure_proxy' => $value->getValues());
if ($value instanceof IteratorArgument || $value instanceof ClosureProxyArgument) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instanceof ArgumentInterface?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't know the tag corresponding to other instances so I think it's better this way here.

*
* @internal
*/
final class ArgumentTag implements TagInterface, DumpableTagInterface
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for final, especially if @internal

return new IteratorArgument($value);
}

if (!is_array($value) || array(0, 1) !== array_keys($value) || !is_string($value[0]) || !is_string($value[1]) || 0 !== strpos($value[0], '@') || 0 === strpos($value[0], '@@')) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be safe and explicit, this should be in an if ('closure_proxy' === $tag) { block, and the method should throw otherwise

$value = array('=closure_proxy' => $value->getValues());
if ($value instanceof IteratorArgument || $value instanceof ClosureProxyArgument) {
$argument = clone $value;
$argument->setValues($this->dumpValue($argument->getValues()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks like a code smell to me:
this creates an instance of IteratorArgument that is filled with a non usable state. Let's imagine that "setValues" does some validation, which it could very well do - that would make the current design fail.
I think we need a more robust design here, that preserves the semantic of objects.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure there are ways to fix this with the current design.
I still think the "TaggedValue" proposal could be a good idea to try :)

@@ -29,6 +29,7 @@ class Yaml
const DUMP_OBJECT_AS_MAP = 64;
const DUMP_MULTI_LINE_LITERAL_BLOCK = 128;
const PARSE_CONSTANT = 256;
const PARSE_CUSTOM_TAGS = 512;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PARSE_TAGGED_VALUE?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if people won't think that built-in tags are also converted to a TaggedValue with this name. Wdyt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK for me, keep as is

@trigger_error(sprintf('Using the unquoted scalar value "%s" is deprecated since version 3.3 and will be considered as a tagged value in 4.0. You must quote it.', $scalar), E_USER_DEPRECATED);
}

// Optimise for returning strings.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optimize

$this->totalNumberOfLines = $totalNumberOfLines;
$this->skippedLineNumbers = $skippedLineNumbers;
if (func_num_args() > 0) {
@trigger_error(sprintf('Constructor arguments $offset, $totalNumberOfLines, $skippedLineNumbers of %s are deprecated and will be removed in 4.0', self::class), E_USER_DEPRECATED);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this deprecation makes sense. But this PR is already quite big and this change is not really related to the ability to parse YAML tags. Couldn't we move this to a new PR instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is equal to me but I can reopen #21230 if you prefer.

* @param mixed $value
* @param string $tag
*/
public function __construct($value, $tag)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about swapping arguments. IMO we usally have key/value pairs, but not value/key pairs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally see the tag as a complement of the value, not as the key; a value could be represented without a tag (a simple plain value).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I felt the same also - swapping would be my preference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, done

<<<YAML
- !foo
- yaml
- !!quz [bar]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tags starting with two exclamation marks are tags defined by the YAML spec. I do not think we should support them through a flag that indicates to parse custom tags (and if we want to parse built-in tags, we will have to validate their names IMO).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we throw an exception then?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @xabbuh - we need to keep BC of course

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

$nextOffset += strspn($value, ' ', $nextOffset);

// Is followed by a scalar
if (!isset($value[$nextOffset]) || !in_array($value[$nextOffset], array('[', '{'))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pass true as the third argument?

return;
}

if ($flags & Yaml::PARSE_CUSTOM_TAGS) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

everywhere else it's the other way around: YAML::PARSE_CUSTOM_TAGS & $flags

@@ -107,6 +109,10 @@ public static function parse($value, $flags = 0, $references = array())
$result = self::parseScalar($value, $flags, null, $i, true, $references);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it safe to assume that we can always evaluate the scalar even when we are parsing the value of a tag?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xabbuh tags aren't supported on scalars yet anyway but updated to make sure we don't forget it in 4.0. thanks!

xabbuh added a commit that referenced this pull request Jan 21, 2017
This PR was squashed before being merged into the 3.3-dev branch (closes #21350).

Discussion
----------

[Yaml] Remove internal arguments from the api

| Q             | A
| ------------- | ---
| Branch?       | master
| Bug fix?      | no
| New feature?  | no
| BC breaks?    | no
| Deprecations? | yes
| Tests pass?   | yes
| Fixed tickets | #21230
| License       | MIT
| Doc PR        |

Reopening of #21230 because of [@xabbuh's comment](#21194 (comment)).

> This PR removes internal constructor arguments of the `Parser` class.
> This should break nothing as they are only used to build errors message and are clearly meant for internal uses.
>
> This would allow to have a nicer api for #21194 (comment).

Commits
-------

ebae4ff [Yaml] Remove internal arguments from the api
@nicolas-grekas
Copy link
Member

rebase needed

@xabbuh
Copy link
Member

xabbuh commented Feb 9, 2017

I think we should flag the feature as experimental. @GuilhemN Can you do that?

@GuilhemN
Copy link
Contributor Author

GuilhemN commented Feb 9, 2017

@xabbuh done.

Copy link
Member

@xabbuh xabbuh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Status: Reviewed

@nicolas-grekas
Copy link
Member

Thank you @GuilhemN.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants