Skip to content

[Serializer] Add a NormalizerDumper #22051

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

GuilhemN
Copy link
Contributor

@GuilhemN GuilhemN commented Mar 18, 2017

Q A
Branch? master
Bug fix? no
New feature? yes
BC breaks? no
Deprecations? no
Tests pass? yes, no new tests for now
Fixed tickets #17516
License MIT
Doc PR

This PR adds a functional NormalizerDumper to limit the cost of the ObjectNormalizer.
I decided to only support ->normalize() for now as it is the most used method and I think this is already a big step.

For this class:

class Foo
{
    /**
     * @Groups({"foo", "bar"})
     * @MaxDepth(2)
     */
    public $foo;
    public $bar;

    public function getFoo() {
        return $this->foo;
    }
}

It will generate:

use Symfony\Component\Serializer\Normalizer\NormalizerAwareInterface;
use Symfony\Component\Serializer\Normalizer\NormalizerAwareTrait;
use Symfony\Component\Serializer\Normalizer\NormalizerInterface;
use Symfony\Component\Serializer\Normalizer\ObjectNormalizer;

/**
 * This class is generated.
 * Please do not update it manually.
 */
class FooNormalizer implements NormalizerInterface, NormalizerAwareInterface
{
    use NormalizerAwareTrait;

    public function normalize($object, $format = null, array $context = array())
    {

        $objectHash = spl_object_hash($object);
        if (isset($context[ObjectNormalizer::CIRCULAR_REFERENCE_LIMIT][$objectHash])) {
            throw new CircularReferenceException('A circular reference has been detected (configured limit: 1).');
        } else {
            $context[ObjectNormalizer::CIRCULAR_REFERENCE_LIMIT][$objectHash] = 1;
        }

        $groups = isset($context[ObjectNormalizer::GROUPS]) && is_array($context[ObjectNormalizer::GROUPS]) ? $context[ObjectNormalizer::GROUPS] : null;

        $output = array();
        if (isset($context[ObjectNormalizer::ENABLE_MAX_DEPTH])) {
            if (!isset($context['depth_Foo::foo'])) {
                $context['depth_Foo::foo'] = 1;
            } elseif (2 !== $context['depth_Foo::foo']) {
                ++$context['depth_Foo::foo'];
            }
        }

        if ((null === $groups || count(array_intersect($groups, array('foo', 'bar')))) && (!isset($context['attributes']) || isset($context['attributes']['foo']) || (is_array($context['attributes']) && in_array('foo', $context['attributes'], true))) && (!isset($context['depth_Foo::foo']) || 2 !== $context['depth_Foo::foo'])) {
            if (is_scalar($object->getFoo())) {
                $output['foo'] = $object->getFoo();
            } else {
                $subContext = $context;
                if (isset($context['attributes']['foo'])) {
                    $subContext['attributes'] = $context['attributes']['foo'];
                }

                $output['foo'] = $this->normalizer->normalize($object->getFoo(), $format, $context);
            }
        }
        if ((null === $groups) && (!isset($context['attributes']) || isset($context['attributes']['bar']) || (is_array($context['attributes']) && in_array('bar', $context['attributes'], true)))) {
            if (is_scalar($object->bar)) {
                $output['bar'] = $object->bar;
            } else {
                $subContext = $context;
                if (isset($context['attributes']['bar'])) {
                    $subContext['attributes'] = $context['attributes']['bar'];
                }

                $output['bar'] = $this->normalizer->normalize($object->bar, $format, $context);
            }
        }

        return $output;
    }

    public function supportsNormalization($data, $format = null, array $context = array())
    {
        return $data instanceof \Foo;
    }
}

I didn't write the tests yet, I'm waiting for feedbacks first.
You can try it using this.

@GuilhemN
Copy link
Contributor Author

GuilhemN commented Mar 18, 2017

I ran some benchmarks with blackfire:

The GetSetMethodNormalizer should be faster than the ObjectNormalizer so we have at less a gain of 30%.

The benchmark of the generated normalizer is https://blackfire.io/profiles/c0303264-3e68-444c-8687-3b47a8a2205b/graph, according to it FooNormalizer::normalize() takes 22.2 µs.
The benchmark using the GetSetMethodNormalizer is https://blackfire.io/profiles/eb6c3ce8-b44b-4b09-9093-167b3b350638/graph, according to it GetSetMethodNormalizer::normalize() takes 127 µs.

@GuilhemN GuilhemN changed the title [Serializer] Add a NormalizerGenerator [Serializer] Add a NormalizerDumper Mar 18, 2017
@nicolas-grekas nicolas-grekas added this to the 3.x milestone Mar 18, 2017
@soyuka
Copy link
Contributor

soyuka commented Mar 20, 2017

It's better without php-parser, it's not really useful here!

How would an Array of Foo's be normalized here? I assume it'd be a chain of ObjectNormalizer and FooNormalizer?

Is eval better (in term of performance) than require_once in your example?
On a real world example, how would the Normalizers be generated so that they can be easily reused?

I've done some hacking on my end, I'll do some bench! Nice job there!

@GuilhemN
Copy link
Contributor Author

@soyuka the built-in serializer supports array.

Is eval better (in term of performance) than require_once in your example?

I think require_once is better to benefit from the optimisation and opcache, and eval is not available for some hostings.

On a real world example, how would the Normalizers be generated so that they can be easily reused?

We can decide that later but I would use a compiler pass that would register classes based on a glob pattern (defined in the config).

@soyuka
Copy link
Contributor

soyuka commented Mar 20, 2017

I think require_once is better to benefit from the optimisation and opcache, and eval is not available for some hostings.

Then there's missing a php open tag <?php in the generated classes.

We can decide that later but I would use a compiler pass that would register classes based on a glob pattern (defined in the config).

Might be long to generate every normalizer, from what I understand there your normalizers are groups-independent, so that's a nice thing. Though, I tried to generate data with related entities and I only get the root entity. Take this gist for example, maybe I missed something!

@dunglas
Copy link
Member

dunglas commented Mar 20, 2017

Can't we move max depth / circular reference and other features shared with non-dumped normalizers in traits to ease the maintenance?

@soyuka
Copy link
Contributor

soyuka commented Mar 20, 2017

I made some more benchmarks with big collections where each item has 4 relations to another item.

This is in dev mode, with a forced apcu cache. Times are from the profiler, can't run Blackfire on it because it's taking too long.

  • Without the normalizers the script is spending 35720ms in the serializer
  • With the cached normalizers, this time is reduced to 5897 ms

Another one is getting an 8000 collection of simple objects:

  • Without the normalizers we're spending 3112 ms in the serializer
  • With the cached normalizers 236 ms

We can see that this speed things a lot!

Here is a gist with the cache warmer and the compiler pass (made them in a hurry).

Edit:
One of our resource with many relations was spending ~300 ms in the serializer. Now it has been reduced to ~30ms.


Note that we could improve performances by avoiding the $this->normalizer->normalize calls. I'm not sure that we have the required metadata to do so for complex data, but we could avoid some calls for the simple data types (boolean, string etc.).

Also, it'd be nice to have a way to improve the generated normalizers (for example if we want metadata fields, for jsonld it could be @type and @id).

/Edit: For the metadata that's being used to generate the class, WDYT about using property-info instead of the classMetadataFactory? It'll get you more informations about the data Type which will lead to performances improvements (avoid the $this->normalizer->normalize call for simple data).

@GuilhemN
Copy link
Contributor Author

GuilhemN commented Mar 20, 2017

Note that we could improve performances by avoiding the $this->normalizer->normalize calls. I'm not sure that we have the required metadata to do so for complex data, but we could avoid some calls for the simple data types (boolean, string etc.).

@soyuka that's planed but the PR is already pretty big and I preferred delaying it and having something simpler (certainly a bit slower) but still working.

Can't we move max depth / circular reference and other features shared with non-dumped normalizers in traits to ease the maintenance?

@dunglas what do you think we can share? The code is generated here so except by using eval it seems complicated to me to share code.

@soyuka
Copy link
Contributor

soyuka commented Mar 20, 2017

@soyuka that's planed but the PR is already pretty big and I preferred delaying it and having something simpler (certainly a bit slower) but still working.

Isn't property-info a part of the serializer? I see no harm in already replacing the classMetadataFactory, or maybe that this class should in fact be using property-info would be easier 😄. Nevermind.

The code is generated here so except by using eval it seems complicated to me to share code.

+1 this is hard because the generated code is hard-coding attributes, which most of the code there relies on. One may think it'd be easier to maintain with a php-parser but I'm actually glad you removed it, it's overkill for such thing.

@soyuka
Copy link
Contributor

soyuka commented Mar 21, 2017

I think we can do a Trait like this one to remove most of the logic from the actual normalizers:

Trait example ```php use Symfony\Component\Serializer\Exception\CircularReferenceException;
use Symfony\Component\Serializer\Normalizer\ObjectNormalizer;

trait NormalizerTrait
{
public function checkCircularReference($object, &$context)
{
$objectHash = spl_object_hash($object);
if (isset($context[ObjectNormalizer::CIRCULAR_REFERENCE_LIMIT][$objectHash])) {
throw new CircularReferenceException('A circular reference has been detected (configured limit: 1).');
} else {
$context[ObjectNormalizer::CIRCULAR_REFERENCE_LIMIT][$objectHash] = 1;
}
}

public function isAllowedAttribute($property, &$context) {
    $groups = isset($context[ObjectNormalizer::GROUPS]) && is_array($context[ObjectNormalizer::GROUPS]) ? $context[ObjectNormalizer::GROUPS] : null;

    if ((null === $groups) && (!isset($context['attributes']) || isset($context['attributes'][$property]) || (is_array($context['attributes']) && in_array($property, $context['attributes'], true)))) {
        return true;
    }

    return false;
}

public function getValue($value, $property, $format, &$context) {
    if (is_scalar($value)) {
        return [$property => $value];
    } else {
        return [$property => $this->normalizer->normalize($value, $format, $context));
    }
}

}

@dunglas is this what you had in mind?

https://github.com/GuilhemN/symfony/pull/2
</details>

@dunglas
Copy link
Member

dunglas commented Mar 21, 2017

@soyuka indeed

{
$reflectionClass = new \ReflectionClass($class);
if (!isset($context['class'])) {
$context['class'] = $reflectionClass->getShortName().'Normalizer';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really think we should introduce a ClassNameConverter here, I made one on my fork that's just doing str_replace('\\', '_', $reflectionClass->getName()) to avoid collisions.

Copy link
Contributor Author

@GuilhemN GuilhemN Mar 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the namespace option is enough (maybe we should use it by default though).

Copy link
Contributor

@soyuka soyuka Mar 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, yes in fact I saw this option while working with it but I didn't try it. I'll, and yes I think we should make it default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rethinking about it, I think we should make the class option mandatory: in any case, the code using the NormalizerDumper will also have to generate the normalizer name to be able to call it.
I also don't expect the dumper to automatically put my normalizers in the same vendor than the model.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's correct. If it then works within a CompilerPass, it would need this same class name to create a Definition.

@GuilhemN
Copy link
Contributor Author

I think we can do a Trait like this one to remove most of the logic from the actual normalizers:

I don't really see what we gain except having one more thing to maintain. I don't think it fits everyone's usecase, it is very generic while when you write a custom normalizer, you want something very specific imo.

@javiereguiluz
Copy link
Member

javiereguiluz commented Mar 24, 2017

I don't have experiene with the Serializer component, so maybe there are strong reasons to merge this feature. But, for an outsider like me, this looks wrong for two reasons:

  1. If this is done because the Serializer is slow, then we should focus on making it faster ... and not add a PHP code generator to create some optimized runtime code.
  2. If this is done because the Serializer is verbose, then we should focus on making it less verbose ... and not add a PHP code generator to create the verbose code faster.

@soyuka
Copy link
Contributor

soyuka commented Mar 24, 2017

If this is done because the Serializer is slow, then we should focus on making it faster ... and not add a PHP code generator to create some optimized runtime code.

IMHO this is exactly what's being done here, making the serializer faster (because yes it's slow).
The serializer is really well optimized but the PropertyAccessor will always be slower than calling getters directly. In fact, there is nothing faster than using generated accessors to serialize an object.
What bother's you is that this is generating code?

What are you meaning by verbose?

@GuilhemN GuilhemN force-pushed the SERIALIZERAST branch 2 times, most recently from c3801fd to fd4b628 Compare February 19, 2018 21:12
@xabbuh
Copy link
Member

xabbuh commented Feb 23, 2018

Normalizers are generated at runtime, the first time the user tries to normalize an object.

Does this mean that one cannot use this feature when using a cache that should be read-only? Would it be possible to generate normalizers on warmup (maybe as an opt-in feature)?

@GuilhemN
Copy link
Contributor Author

Does this mean that one cannot use this feature when using a cache that should be read-only?

Indeed

Would it be possible to generate normalizers on warmup (maybe as an opt-in feature)?

Is it worth it ? I don't think that's really common and it would be easier to implement on the user's side (to implement it in the core we need an efficient way to filter the classes accepted, it would require at least a patterns and an exclude fields imo).

At least I think we should first wait to have a few feedbacks and if that's really a recurrent request only then implement it.

@soyuka
Copy link
Contributor

soyuka commented Feb 23, 2018

Is it worth it ?

Exactly. It's feasible to do it in a custom cache warmer and it let you control how to choose the class to be normalized.
I must add that when you change your groups, you need to dump normalizers again.

@GuilhemN
Copy link
Contributor Author

Exactly. It's feasible to do it in a custom cache warmer and it let you control how to choose the class to be normalized.

Lacking time to do this right now, I'd rather leave it to later as the PR is mergeable as is. Else this will unfortunately delay even more this PR.

I must add that when you change your groups, you need to dump normalizers again.

No need, serialization groups are supported by generated normalizers.

@GuilhemN
Copy link
Contributor Author

GuilhemN commented Nov 3, 2018

PR rebased on master. Is it a no go merging this without a cache warmer?

@fbourigault
Copy link
Contributor

Do you have some up-to-date benchmark since you rebased? I'm curious to see how much it may improve the ObjectNormalizer perfs compared to #28926.

@fbourigault
Copy link
Contributor

Although this looks interesting if we only look at the benchmark figures, it looks hard to maintain and does not fit well in what @dunglas exposed in #19374 (comment).

What about following the Router path and leverage the PHP7 static array cache (see #28865 and #25909 by @nicolas-grekas).

We could leverage the PHP7 static array cache by decorating the PropertyAccessorInterface which is already used by ObjectNormalizer and also the PropertyListExtractorInterface when #28775 will be accepted and merged.

Also, I wrote a RFC to improve the performances of the ObjectNormalizer property access part (see #28926).

As the ObjectNormalizer is becoming more SOLID, I think it's better to improve performances by small steps instead of hiding real performance problems behind dumped code which will improve the performance of the Serializer but not the ObjectNormalizer. Those small improvements will also benefit to anyone using the PropertyAccess and PropertyInfo components either as standalone components or with the FrameworkBundle.

@nicolas-grekas
Copy link
Member

Is this still relevant? There has been a lot of progress on the topic and we might prefer another approach now? Honestly I don't know, I'm just asking :)

@GuilhemN
Copy link
Contributor Author

GuilhemN commented Apr 3, 2019

You're right, new proposals emerged lately and things seem to be accelerating on their side.

Maintaining this is indeed complicated compared to other solutions. I don't think we'll reach the same performance but if the difference is neglictible and other aspects are in favor of the other solutions it's ok :)

Let's close this and give other proposals a shot! (this weekend eventually)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.