Skip to content

[AstGenerator] [WIP] New Component, add normalizer generator #17516

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

joelwurtz
Copy link
Contributor

Q A
Bug fix? no
New feature? yes
BC breaks? no
Deprecations? no
Tests pass? yes
Fixed tickets
License MIT
Doc PR

This new component brings the concept of generating code in an AST representation into Symfony.

The original idea of doing it comes from the following discussion : api-platform/schema-generator#42

The first focus is to be able to generate Normalizer from existing class by extracting property and types with the PropertyInfo Component, so the following class :

class Foo
{
     /**
      * @var string
      */
     private $a;

     /**
      * @var array
      */
     private $array;

     /**
      * @var Dummy Dummy link
      */
     private $c;

     // Classic set / get operations
     ...
}

will generate the following normalizer:

class FooNormalizer implements NormalizerInterface, DenormalizerInterface
{
    public function supportsDenormalization($data, $type, $format = null)
    {
        if ($type !== 'Foo') {
            return false;
        }
        return true;
    }

    public function supportsNormalization($data, $format = null)
    {
        if ($data instanceof Foo) {
            return true;
        }
        return false;
    }

    public function denormalize($data, $class, $format = null, array $context = [])
    {
        $object = new Foo();
        $object->setA($data->{'a'});
        $object->setB($data->{'b'});
        $object->setC($this->serializer->denormalize($data->{'c'}, 'Dummy'));
        return $object;
    }

    public function normalize($object, $format = null, array $context = [])
    {
        $data = new \stdClass();
        $data->{'a'} = $object->getA();
        $data->{'b'} = $object->getB();
        $data->{'c'} = $this->serializer->normalize($object->getC());
        return $data;
    }
}

Why ?

There is some PR and discussion about making the serialization faster, i believe, by generating such a normalizer, the speed could not be faster as all the logic will be computed in the generation process and not a the runtime.

What is done

For the moment this library generate 2 things:

  • Normalizer class with its methods
  • Hydration statements which will be used by the normalizer generator for the body of normalize and denormalize methods. Hydration as been separated as i believe it can serve other component / library than Normalization (like doctrine, elasticsearch, everything that is data / model related).

What is left

There is still some works to do, but this is a first preview so we can talk about it more without going too far, here is what's left from my point of view:

  • Add generation for object type in hydration with inline mode (in the same context) but would be subject to circular loop
  • Add generation for object type in hydration with call mode (like using the serializer or other services)
  • Add the discriminant type to PropertyInfo (so when there is an inheritance model, normalizer can know which class to denormalize with a type variable)
  • Link to the Symfony Framework (other PR ?)

Issues to address

There is some issues than need to be adressed to have a robust generation:

Interface

This library declare a new interface AstGeneratorInterface it's enough generic to be used for each kind of generation and also with the usage of AstGeneratorChain it will be easy for user to hook into the generation process.

Hydration

On the hydration part the plan is to be able to transform:

  • from object to array
  • from object to stdClass instance
  • from array to object
  • from stdClass instance to object

Even if array is the standard in the serializer component, stdClass is needed IMO, as it's the only possible solution to distinct between an empty object or an empty array in the JSON wrold ({} vs []}

Others ideas

I have plenty of ideas for this component but i want to keep the scope small at the moment and will do others PR:

  • Ast Optimisation: By using AST it would be possible to add a layer of optimisation when generating the code, like direct resolving of values if the argument is static (should be done in a new component IMO)
  • Class Extractor / Generator: Ability to extract classes from a model (Think of schema.org, json schema, doctrine mapping file, ...) and generate the associated class with its getter / setter
  • Validator Generator: Ability to generate a plain validator php file for a class

@joelwurtz joelwurtz changed the title [WIP] New symfony component : AstGenerator [AstGenerator] [WIP] New Component, add normalizer generator Jan 25, 2016
$nodes = [];

foreach ($this->generators as $generator) {
if ($generator instanceof AstGeneratorInterface && $generator->supportsGeneration($object)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The $generator instanceof AstGeneratorInterface test should be moved in the constructor to avoid doing it at every call to generate()

@dunglas
Copy link
Member

dunglas commented Jan 27, 2016

What about making the AstGenerator and the normalizer generator two separate components? I'm very enthusiast about the generic AstGenerator and I think about a lot of usages for it. However, I'm less convinced by the normalizer generator:

  • changes already made in Serializer and PropertyAccess components (especially the cache) will drastically improve their performance and I'm not sure that generating a normalizer will help so much
  • every time the class related to the normalizer will be edited, the normalizer will require to be edited as well
  • there is some drawbacks with the example of generator you give: How can I update an existing object ($context['object_to_populate']) instead of creating a new one? What if the submitted data have missing keys? How to support circular references, name converters, @Groups, @MaxDepth and so on... You'll probably need to introduce some more complexity in generated normalizers and it will be come to be a maintenance nightmare (especially if generated normalizers have been edited).

@joelwurtz
Copy link
Contributor Author

What about making the AstGenerator and the normalizer generator two separate components?

Don't want to add too many components for generator, but i was also wondering if generator about a specific component should be directly in the component (and then we can propose AstGenerator as a suggested dependency ) or if all generators are inside AstGenerator ?

I'm very enthusiast about the generic AstGenerator and I think about a lot of usages for it.

Same, the way i write AstGeneratorInterface is, IMO, enough generic to be usable in other places.

changes already made in Serializer and PropertyAccess components (especially the cache) will drastically improve their performance and I'm not sure that generating a normalizer will help so much

I agree it may be better but not sure and i would like to add some benchmarks to this, is there already a place with some benchmarks in tests or a prefered way to do this ?

Also don't forget that we cab also add ast optimization (by adding a visitor layer on top of the generation) which can improve performance even more.

every time the class related to the normalizer will be edited, the normalizer will require to be edited as well

My intent is more to generate the normalizer into cache directory for the Framework and then in dev environment it will generate each time (generation is enough fast for dev IMO)

there is some drawbacks with the example of generator you give: How can I update an existing object ($context['object_to_populate']) instead of creating a new one? What if the submitted data have missing keys? How to support circular references, name converters, @Groups, @MaxDepth and so on... You'll probably need to introduce some more complexity in generated normalizers and it will be come to be a maintenance nightmare (especially if generated normalizers have been edited).

Yes there is many use case to handle, and want to keep the scope small for a first preview, but i definitely want to add them.

However i don't want to offer the possibilty to edit generated normalizer. To add custom code to the normalizer the way would be to add a custom generator by using the AstGeneratorChain class.

I also think we can heavily drop the maintenance by mutualising logic of all this use case for generators and for existing class.

Maybe the normalizer generator is a scope too large for a first preview of this component, do you have something in mind for reduced implementation as a first example of this generator ?

For me components that can profit of this (not exhaustive list) :

  • CssSelector (but more with an AstOptimiser than AstGenerator)
  • DependencyInjection
  • ExpressionLanguage
  • PropertyAccess (not sure)
  • Routing
  • Serializer (like here)
  • Validator

@javiereguiluz
Copy link
Member

@joelwurtz thanks for proposing a new component for Symfony. Although @dunglas's comments are interesting, I recommend you to stop working on this pull request until a decision is made about integrating this component in Symfony or not. It would be a shame to make you waste time on something that ultimately won't be merged.

@fabpot
Copy link
Member

fabpot commented Jan 28, 2016

@javiereguiluz We've talked about this new component with @joelwurtz and @dunglas in the original discussion linked in the description of this PR.

@joelwurtz
Copy link
Contributor Author

Also @javiereguiluz most of the code comes from an existing library, so there is no so much work involved. The point is @dunglas and me have use case for generating code from existing, but different, schema. One of the objective was to mutualise the generation of code by using a common lib base on the PropertyInfo component. So i will definitely write something like this, having it part of symfony is a proposition :)

To bounce back on @dunglas comments:

there is some drawbacks with the example of generator you give: How can I update an existing object ($context['object_to_populate']) instead of creating a new one? What if the submitted data have missing keys? How to support circular references, name converters, @Groups, @MaxDepth and so on... You'll probably need to introduce some more complexity in generated normalizers and it will be come to be a maintenance nightmare (especially if generated normalizers have been edited).

By giving more thoughts i think my answer was not correct, actually for most of this features, this generator doesn't care:

  • $context['object_to_populate'] would just be separating the instanciation from hydration, no more works, just better decoupling
  • For missing keys, that is indeed needed (already done it in jane but was not a fan because on how we treat null value)
  • Circular Reference can be handle in a upper normalizer chain class not subject to generator
  • Name converters should be done at PropertyInfo Component level not here
  • @Groups is already done in the PropertyInfo Component, so we don't care about this in the generator
  • @MaxDepth can also be done in a upper normalizer chain class, as long as the context is correctly merged each time in the generated normalizers

I will try to make some sorts of summary of all existing features of normalizer and show what need to be done in a more concret way.

However i do not think there is much work to do here, AST can be scary when you have never do this, but in reality it's relatively easy, it's just very verbose :)

@DavidBadura
Copy link
Contributor

the hydration part looks a bit like https://github.com/Ocramius/GeneratedHydrator

@joelwurtz
Copy link
Contributor Author

It's close but not really the same, https://github.com/Ocramius/GeneratedHydrator hydrate the object by binding a generated closure to the model (so it can write to private / protected properties), here we only write / read if the property is public or a method is accessible (no magic hydration) also the hydration is not binded to the model class.

@joelwurtz
Copy link
Contributor Author

To clarify my view on this component, here is more explanations

Goal for a generic AstGenerator Component

One of the goal, despite generated normalizers, is to mutualise and share code generation in Symfony with a better approach than templating (replace, in the long term, the router dumper by example)

Workflow

astgeneratoroverview

  1. User write like before its code and configuration for the framework or others components
  2. Then classic component extract metadatas from it (like it's already done), all the logic for specific features is computed in these metadatas
  3. This metadatas are written into AST but no logic should be involved here to reduce complexity on the generator side (it's just create Expr object)
  4. Then the AST is written into php files in the cache directory, and only this files will be used at runtime (no more calling to extraction, metadatas or others logic)

AstGenerator Internal workflow

For the generation part the following thing need to be achieved:

astgeneratorinsight

  1. Computing possibilities: In fact a user code or configuration can give several possibilities, (like the @Groups metadatas in the PropertyInfo component), here the goal is to create a matrice will all possiblities for metadatas.

  2. The possibility is transformed into AST (so we can even more reduce the logic and have better decoupling for AstGeneratorInterface implementations), each possiblity is represented by a unique hash used in the class name and / or file name

  3. Once the AST has been written it can have a AST Optimisation Pass: the goal of this is to remove uneeded calculation and method calls in the generated call:

    As an example we might generate the following: strtoupper('A Static String'), since the method is stateless and the argument is static we can replace this code directly with the following string A STATIC STRING.

  4. Then it is print to a file with the corresponding hash from the possibility

Runtime workflow

Then at runtime the following thing will happen:

runtimeastgenerator

  1. For an use case when the generator has been involved (routing, normalizer, validator, ....), the hash of the desired functionnality is created, this hash should be the same as one of the possibility created by the AstGenerator
  2. Then it retrieve the file corresponding to the hash and execute it

I believe this will allow to have a common workflow for generated code in symfony, also as there will be a unique pipeline for generated code, many tools can be added without duplication:

  • AstOptimisation like said before
  • Better debugging for generated code (Symfony Toolbar, in dev env, showing different possibilities)
  • Also using this would allow to use PHP new features from next versions without upgrading requirements in the composer.json, like we can detect than the current platform is on PHP7 so we can generate strict type for methods parameters and return.

I hope this give a better vision of this component.

@pyrech
Copy link
Contributor

pyrech commented Jan 28, 2016

Nice summary! 👍 for centralizing the code generation across Symfony.
Possibilities like optimization pass and tweaking the code according to the platform look really exciting!

@Ocramius
Copy link
Contributor

The idea seems neat, but the AstGeneratorInterface interface for it seems really naive right now, and reading the interface doesn't really tell anything about how it works or why it accepts any object.

Shouldn't you dump a TransformationDefinition or similar (for example) into AST instead? This would clearly split the responsibility of analyzing the code (and the transformations required to hydrate/extract data) from the responsibility of generating the code for it, and would make the component actually reusable.

@joelwurtz
Copy link
Contributor Author

Hey @Ocramius nice to you see there !

Shouldn't you dump a TransformationDefinition or similar (for example) into AST instead?

Your are talking about renaming the Interface to TransformationDefinition ? Because i clearly want AstGeneratorInterface to only have the responsability to generate the code (AST) and nothing else.

The analyze part is not done in this repository as it's the reponsability of the others components which extract metadata.

*
* @return \PhpParser\Node[] An array of statements (AST Node)
*/
public function generate($object, array $context = []);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you think that we should expect a clear contract here (object *Context vs array)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arrays are very flexible and have proven to be very good extension points in Serializer and PropertyInfo components. I'm in favor of keeping this as is.

@dunglas
Copy link
Member

dunglas commented Feb 1, 2016

@joelwurtz What do you think about rewriting generators provided by SensioGeneratorBundle with this new component as a "demo" of its possibilities?

@Ocramius
Copy link
Contributor

Ocramius commented Feb 1, 2016

@joelwurtz What do you think about rewriting generators provided by SensioGeneratorBundle with this new component as a "demo" of it's possibilities?

I strongly endorse it. I tried various generator approaches, and moving to AST-only caused an explosion in complexity for my use-cases (which is why I kept using Zend\Code instead). Having a real-world use-case scenario would show the advantages and pitfalls of this. Doesn't need to be a "final" version, but many concerns aren't exposed in this PR

@jakubkulhan
Copy link
Contributor

Regarding generic AstGenerator component, AST node manipulation, is quite complex and also really verbose. It suits well compilers and transpilers, however, for this use-case I think it is much easier to use something like Zend\Code or nette/php-generator - something that only manages file layout (use statements, classes, constants, properties, methods, ...), but lets you compose method bodies from chunks of code. It would be nice if a code generator allowed to easily handle indentation in method bodies (eg. protoc-gen-go generator does that).

The problem I've always had with Symfony-generated classes is that they're treated only as a "cache". If new code generator component replacing router dumper etc. would be introduced, I would like to see ability to better manipulate output destination, so the generated code can be committed to VCS alongside the hand-written code.

As for generated normalizer, I have implemented skrz/meta which does something similiar, however, its approach is generated-code-only - everything is generated ahead-of-time and generated code in runtime has almost none dependencies on the library. We saw great performance improvements compared to any kind of (however optimized) serialization code based on runtime instrospection (so although Serializer & PropertyAccess components might have seen great performance gains I bet generated code will always trump them), I'm looking forward to that Serializer component is moving to generated code.

But once the normalizer/serializer code is being generated, there is really no use for normalization phase before serialization phase. skrz/meta generates specific serialization/deserialization code for every input/output format - currently JSON (because PHP doesn't have useful fast streaming parser/encoder for JSON, this serialization still needs to be piped through array object normalization), XML, and Protocol Buffers. I think generated serializers should also generate specific code for each format and skip normalization phase whatsoever.

Also regarding recent emphasis on client-facing API contracts and that client-side should drive returned fields from the server (i.e. Facebook's GraphQL, or Netlix's Falcor), for Serializer component to stay relevant to current client-side needs, I think it's imperative to implement some kind of filtering for serialized object properties, @Groups are not enough. (For inspiration, skrz/meta has pending pull request regarding this issue - skrz/meta#13).

@dunglas
Copy link
Member

dunglas commented Feb 8, 2016

Why @Groups isn't enough? Can you give a use case not doable with the current implementation?

class ArrayHydrateGeneratorTest extends \PHPUnit_Framework_TestCase
{
/** @var Standard */
protected $printer;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could be private.

@jakubkulhan
Copy link
Contributor

@Groups do not give field-by-field granularity, also they do not nest - it makes them unusable for filtering as seen in:

Field-by-field filtering results in big response payload size reduction.

@dunglas
Copy link
Member

dunglas commented Feb 8, 2016

They can nest: https://api-platform.com/doc/1.0/api-bundle/serialization-groups-and-relations#embedding-relations

Field by field granularity can be easily implemented using groups: one group per field and a the controller create the context based on the query parameter. It can also be easily implemented (without @Groups using a custom normalizer). It can even be added easily to Symfony core by patching the new AbstractObjectNormalizer class (PR welcome).

@GuilhemN
Copy link
Contributor

What is locking here ?

@joelwurtz
Copy link
Contributor Author

What is locking here ?

Nothing in particular, was waiting for some input and didn't have the time the past week to look into this, will try to respond to all of the questions and advance this in the end of this week

@GuilhemN
Copy link
Contributor

Ok thank you a lot for your work @joelwurtz

@joelwurtz
Copy link
Contributor Author

For @jakubkulhan

Regarding generic AstGenerator component, AST node manipulation, is quite complex and also really verbose. It suits well compilers and transpilers, however, for this use-case I think it is much easier to use something like Zend\Code or nette/php-generator - something that only manages file layout (use statements, classes, constants, properties, methods, ...), but lets you compose method bodies from chunks of code. It would be nice if a code generator allowed to easily handle indentation in method bodies (eg. protoc-gen-go generator does that).

I disagree, this may be more complex for simple use case (however it's very subjective) but it's easier for complexe use case (like generating Normalizer), i have build this library https://github.com/jolicode/jane and my first implementation was using Zend\Code and similar library not dealign with Ast, and it was an absolute crap because i have to know when i have a set method, or only a object (not the same assignation), i needed to do many tiny templates files which where hard to debug.

On the other hand working with AST is much simpler as everything is an Expr, so yes in some case it can bring bugs (as not every Expr can work with other Expr), but PHP-Parser was good enough for me to detect this bugs.

Also working with AST you don't have to deal with formatting your code, only the logic part, and you can deal with formatting latter when printing your AST, so it's easier from this POV as it completly remove this concern when developing.

Futhermore, i want to keep a centralized and unique approach for generating code in Symfony for better reusability across components, and i strongly believe that using PHP-Parser is a better choice than Zend\Code. In fact this library will be more a concurrent of the Zend\Code, but with a different approach : focusing on "functional" POV but not technical (i.e. there will be generators for normalization, hydration or others classic things, but no generators for methods / class / ...)

The problem I've always had with Symfony-generated classes is that they're treated only as a "cache". If new code generator component replacing router dumper etc. would be introduced, I would like to see ability to better manipulate output destination, so the generated code can be committed to VCS alongside the hand-written code.

This is not the subject for this PR, and this decision should be in a futur PR, however yes it's a first step towards this. Also the purpose of having a well designed generic interface is to allow users injecting they own generator, so specific logic for some third party bundles (like in the router component) can also be generated.

As for generated normalizer, I have implemented skrz/meta which does something similiar, however, its approach is generated-code-only - everything is generated ahead-of-time and generated code in runtime has almost none dependencies on the library. We saw great performance improvements compared to any kind of (however optimized) serialization code based on runtime instrospection (so although Serializer & PropertyAccess components might have seen great performance gains I bet generated code will always trump them), I'm looking forward to that Serializer component is moving to generated code.

But once the normalizer/serializer code is being generated, there is really no use for normalization phase before serialization phase. skrz/meta generates specific serialization/deserialization code for every input/output format - currently JSON (because PHP doesn't have useful fast streaming parser/encoder for JSON, this serialization still needs to be piped through array object normalization), XML, and Protocol Buffers. I think generated serializers should also generate specific code for each format and skip normalization phase whatsoever.

Not so sure about this, only dealing with normalizer is best for me, as i don't have to think to the differents formats, also the (de)serialization part from/to simple array to/from json, xml, ... cannot be optimized IMO with the current library available from PHP.

Also regarding recent emphasis on client-facing API contracts and that client-side should drive returned fields from the server (i.e. Facebook's GraphQL, or Netlix's Falcor), for Serializer component to stay relevant to current client-side needs, I think it's imperative to implement some kind of filtering for serialized object properties, @groups are not enough. (For inspiration, skrz/meta has pending pull request regarding this issue - skrz/meta#13).

I think just having conditions around each field would be enough (so if field present we normalize, otherwise we don't treat it), take a look at the generated code on this lib https://github.com/jolicode/jane-openapi it's close to what i intend to generate for normalizers.

For @dunglas

@joelwurtz What do you think about rewriting generators provided by SensioGeneratorBundle with this new component as a "demo" of its possibilities?

Not sure if this is the right pick, as it's not present in this lib (and would induce more time for crossed dependencies work)

My plan right know is to make a benchmark with the current component, the PR about optimisation on Serializer, and the generated code from this library https://github.com/jolicode/jane so we can have a idea if it's worth it, is that ok with you ?

@joelwurtz joelwurtz force-pushed the feature/AstGeneratorComponent branch from 7fed221 to de38c7b Compare August 5, 2016 21:30
@nicolas-grekas
Copy link
Member

nicolas-grekas commented Aug 9, 2016

It's kind of leaking to have the AstGeneratorInterface in the Symfony namespace but have its methods return objects in the PhpParser namespace. This is a major issue to me because it prevents us from enforcing the Symfony policies on it (BC promise, deprecations, etc.).

I think that as @joelwurtz wrote elsewhere, this should be moved into a subpart of the Serializer component, dedicated to generating normalizers. Using PhpParser should be handled as an internal implementation "detail" (thus tagging @internal where it leaks).

@joelwurtz
Copy link
Contributor Author

Yes, i agree, we should do a first internal implementation that works out of the box, then, if they are need, allow user / bundle customization.

If i remember well @dunglas got this problem with the PropertyPath (or PropertyAccess ?) component, where there was too many things with public API and it was very hard to maintain BC on this. We should not repeat the same error.

@dunglas
Copy link
Member

dunglas commented Aug 9, 2016

Not sure of what you mean. IIRC both PropertyAcess and PropertyInfo components don't use external dependencies in their APIs.

@GuilhemN
Copy link
Contributor

GuilhemN commented Aug 9, 2016

Maybe the first version of the generator should be kind of hardcoded (i mean coupled to the serializer component) and the user not able to hook into it which shouldn't be a problem if we make it internal.
WDYT ?

@nicolas-grekas nicolas-grekas added this to the 3.x milestone Dec 6, 2016
@nicolas-grekas
Copy link
Member

Closing for housekeeping - don't hesitate to submit again when ready.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.