Skip to content

[Serializer] Add an option to the XmlEncoder allowing to decode tags as collection #58604

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: 7.3
Choose a base branch
from

Conversation

AlexandreGerault
Copy link

@AlexandreGerault AlexandreGerault commented Oct 19, 2024

Q A
Branch? 7.3
Bug fix? no
New feature? yes
Deprecations? no
Issues no
License MIT

Context

I was working on a project where the Serializer component really makes our live easier: we have to work with an XML file as source of truth. The serializer allows us to work on PHP objects really easily and that's a real pleasure to work with it.

However, we have a specific part that is not doing exactly what we would like, and I didn't find a proper solution for it.

The issue

Given this XML content:

<order_rows>
    <order_row>
        <id>1</id>
        <price>1200</price>
    </order_row>
    <order_row>
        <id>2</id>
        <price>2400</price>
    </order_row>
</order_rows>

When we decode it to an array, we have this:

[
  'order_row' => [
    0 => ['id' => 1, 'price' => 1200],
    1 => ['id' => 2, 'price' => 2400],
  ],
]

And that is what we actually expect. However, if we have only one item (one order_row here) we end up with this:

<order_rows>
    <order_row>
        <id>1</id>
        <price>1200</price>
    </order_row>
</order_rows>

And the PHP array:

[
  'order_row' => ['id' => 1, 'price' => 1200],
]

A possible solution

I thought I could use the AS_COLLECTION option with the context to do what we expected but it seems that it is not an option for this use case after all. So I was thinking about introducing an option FORCE_COLLECTION that would actually fix it for the given fields. At this point I'm not sure what should be the possible values of this option, so for the moment I made it quite simple by just using the tag name (order_rows here).

I might missed something while I was doing my research as I didn't get a lot of results (maybe be wrong keywords or vocabulary).

It would be usable like this:

$source = <<<'XML'
<order_rows>
    <order_row>
        <id>1</id>
        <price>1200</price>
    </order_row>
</order_rows>
XML;

$this->encoder->decode($source, 'xml', [XmlEncoder::FORCE_COLLECTION => ['order_rows']]);

/*
[
  'order_row' => [
    0 => ['id' => 1, 'price' => 1200],
  ],
]
*/

Possible improvements

My proposal is quite simple for the moment as I don't think I understand the context of the serializer well enough and I would like to get some feedbacks from maintainers, contributors or anyone with some knowledge about the serializer component.

It might lack good support, for nested properties etc maybe (might be better to support xpath instead of a tag name and so on).

Also I might just have the wrong approach of the problem, I'm open to any feedback ahah

@carsonbot
Copy link

Hey!

I see that this is your first PR. That is great! Welcome!

Symfony has a contribution guide which I suggest you to read.

In short:

  • Always add tests
  • Keep backward compatibility (see https://symfony.com/bc).
  • Bug fixes must be submitted against the lowest maintained branch where they apply (see https://symfony.com/releases)
  • Features and deprecations must be submitted against the 7.2 branch.

Review the GitHub status checks of your pull request and try to solve the reported issues. If some tests are failing, try to see if they are failing because of this change.

When two Symfony core team members approve this change, it will be merged and you will become an official Symfony contributor!
If this PR is merged in a lower version branch, it will be merged up to all maintained branches within a few days.

I am going to sit back now and wait for the reviews.

Cheers!

Carsonbot

@AlexandreGerault AlexandreGerault force-pushed the feature/serializer/add_force_collection_when_decoding_xml branch from 590ad7f to 46e9197 Compare October 21, 2024 06:56
@AlexandreGerault AlexandreGerault force-pushed the feature/serializer/add_force_collection_when_decoding_xml branch from 93e0e76 to dff07d4 Compare October 21, 2024 10:59
@OskarStark OskarStark changed the title [Serializer] Add an option to the XmlEncoder allowing to decode tags as collection [Serializer] Add an option to the XmlEncoder allowing to decode tags as collection Oct 22, 2024
@AlexandreGerault
Copy link
Author

Thanks @nicolas-grekas for the reviews. I applied the suggested changes. I also realized it uses $this->defaultContext instead of raw values so I refactored it a bit at this point (I hope it doesn't bother 😅 )

Also could you share your thoughts about the goal of this PR please? 🙏 I'm quite unsure this is the right approach for this problem.

I'm also unsure whether this should be handled by the decoder (right now we have a workaround with a custom normalizer looking at specific keys).

@nicolas-grekas
Copy link
Member

we have a workaround with a custom normalizer looking at specific keys

So you figured out a way that doesn't require changing the component? then it might be the best way, because as you spotter, defining which tags in the xml tree would need this arrayification is tricky. A normalizer looks like a better place to me.

@AlexandreGerault
Copy link
Author

AlexandreGerault commented Oct 23, 2024

We found a workaround, but that's not exactly what I would expect from the component.

Here's the workaround we ended with, just as information:

public function denormalize(mixed $data, string $type, ?string $format = null, array $context = []): mixed
{
    // If the first element is not an array, wrap it in an array
    if (! is_array(reset($data['plans']))) {
        $data['plans'] = [$data['plans']];
    }

    // The return thing
}

It works in our specific use case (which is why we didn't make a generic normalizer with context option and all), but I find it odd that there are no ways of telling "this XML tag should always be treated as a collection". I mean, it makes more sense to me to be handled by a decoder option, as it is (at least it's what I understand for now) directly related to the XML format. I might be wrong still.

However if I'm right (that it should be handled by the decoder), I can also understand that it would be too tricky to be handled in the decoder and that the maintainers refuses to maintain such a feature.

On the other hand, if you consider it can be added, I'd gladly work on a better implementation and look for more edge cases etc.

EDIT: What I'm trying to say is that, as I user, I'd like to have an easy option for this when dealing with XML. But I also understand that, for maintaining reason, it would be better handled in userland code as it could lead to some unwanted behavior.

@thomasLecler
Copy link

I was also looking for something like this, great 👍
Hope this will be implemented in the coming Laravel updates

@AlexandreGerault
Copy link
Author

I was also looking for something like this, great 👍 Hope this will be implemented in the coming Laravel updates

Not related to Laravel. However you can still use it as a dependency as the component is just like any other library.

@thomasLecler
Copy link

I was also looking for something like this, great 👍 Hope this will be implemented in the coming Laravel updates

Not related to Laravel. However you can still use it as a dependency as the component is just like any other library

I know,

Indeed this is my case 👍

Comment on lines +239 to +245
if (\is_array($value)
&& ($childNodeName = $node->firstChild?->nodeName)
&& 1 === \count($value)
&& \in_array($nodeName, $context[self::FORCE_COLLECTION] ?? $this->defaultContext[self::FORCE_COLLECTION], true)
) {
return [$childNodeName => [$value[$childNodeName]]];
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition can go after the !\is_array($value) if.
Then, you'll be able to drop the is_array check.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it can't according to the test suite 👀
Looks like it actually passes the block below:

if (!\count($data)) {
    return $value;
}

if (\is_array($value)
&& ($childNodeName = $node->firstChild?->nodeName)
&& 1 === \count($value)
&& \in_array($nodeName, $context[self::FORCE_COLLECTION] ?? $this->defaultContext[self::FORCE_COLLECTION], true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it could be great to accept the value * as an "every tag" value, WDYT?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, however I'm still unsure how this PR would be welcome. I'm okay to continue working on it, but I'd like to get a green light before investing too much time on it 😅

But if it get a positive feedback (which doesn't seem to be the case for the moment) I can consider adding a test case for this, maybe using xpath also 🤔

@Hanmac
Copy link
Contributor

Hanmac commented Nov 4, 2024

You are using only the encode/decode functions, right?

Because for my project when i got something like

<order_rows>
    <order_row></order_row>
</order_rows>

I use #[SerializedPath("[order_rows][order_row]")]

@AlexandreGerault
Copy link
Author

@Hanmac what if your XML ends up looking like this:

<order_rows>
</order_rows>

Does it handle it well? 🤔

@Hanmac
Copy link
Contributor

Hanmac commented Nov 5, 2024

@Hanmac what if your XML ends up looking like this:

<order_rows>
</order_rows>

Does it handle it well? 🤔

That should be skipped

the only problem i have with this approach, (or in general) are empty collections when normalizing/encoding:

  #[SerializedPath("[order_rows][order_row]")]
  protected array $rows = [];

if for some reason, the rows are empty, than current AbstractObjectNormalizer does normalize it to this:

[
'order_rows' => [
  'order_row' => []
]
]

which then the Encoder does turn into this:

<order_rows>
    <order_row />
</order_rows>

for this to work better, i added a Normalizer that turns empty arrays into null and then i have AbstractObjectNormalizer::SKIP_NULL_VALUES enabled to remove them.

i think i mentioned them in another ticket before, might need to create a new one => #58768

@fabpot fabpot modified the milestones: 7.2, 7.3 Nov 20, 2024
@AlexandreGerault
Copy link
Author

Hey
I waited a bit and forgot about this PR. I notice the milestones have been updated since. Any feedback or interest in going further (working on how to define related tags for example, as suggested by @mtarld )?
I'm still up to work on this 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants