Skip to content

[RFC] [Form] Improve handling of domain objects in forms #37597

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
alcaeus opened this issue Jul 16, 2020 · 16 comments · Fixed by #37968
Closed

[RFC] [Form] Improve handling of domain objects in forms #37597

alcaeus opened this issue Jul 16, 2020 · 16 comments · Fixed by #37968
Labels
Feature Form RFC RFC = Request For Comments (proposals about features that you want to be discussed)

Comments

@alcaeus
Copy link
Contributor

alcaeus commented Jul 16, 2020

Description

This issue originated as a discussion thread in the Symfony Slack. To preserve the conversation and get more input from maintainers and the community, I'm creating this RFC.

People have been arguing for working with domain transfer objects (DTOs) in the form documentation, as there are a few problem to directly bind the form to a managed object (e.g. a Doctrine entity):

  • Data is written to the entity without checking for validity first
  • Subsequent validation may flag the form as invalid, but doesn't restore original data
  • Subsequently flushing the managed object in the manager (e.g. EntityManager when using Doctrine) may write invalid data to the database and potentially affect unrelated managed objects
  • The PropertyAccess only supports accessing properties through accessor methods that are rather hardcoded.
  • Using separate methods for write and read operations (e.g. Category::getName() and Category::rename($name) is not possible

To solve this, users can create a DTO to hold the data and apply it to the managed object after it has been validated. However, this potentially adds a bunch of boilerplate code (e.g. a class with public properties and a apply method to apply changed data to the form). In many instances, this can be avoided.

There is a bundle that adds read_property_path and write_property_path to the form component (Example). However, @linaori was quick to point out that using strings can cause issues with IDE integration, e.g. by a refactoring not catching the usage in a form as it's not a code reference. @javiereguiluz then suggested the following:

Example

$builder->add('name', TextType::class, [
    'get' => function (?Category $category): ?string { return $category !== null ? $category->getName() : null; },
    'set' => function (Category $category, $value): void { $category->rename($value); },
]);

Note that the getter in this case could be omitted since it replicates the default behaviour.

With short closures, the example gets even more concise:

$builder->add('name', TextType::class, [
    'get' => fn (?Category $category): ?string =>  $category !== null ? $category->getName() : null,
    'set' => fn (Category $category, $value): void => $category->rename($value),
]);

This does not solve the problem of validation only being run after modifying the object, but it gives people more freedom when designing the public API for their domain objects, and may avoid the need for a DTO in some instances.

Future scope

Exception handling

Exceptions while calling any of the closures will lead to a 500 error. This could be avoided by expanding the closure to catch the exception, but then the question is how this can be communicated to the user. The bundle mentioned above introduces a expected_exception form option. If an exception of the given type is thrown, the exception is caught and it is added as a form error. While this may offer more flexibility, I think it introduces yet another form option that people have to understand. An alternative would be to always catch a ValidationExceptionInterface interface provided by the form component. The ValidationException class would provide a generic exception for people to use. This allows doing the following:

  • Users can implement the interface in their domain exceptions:
// Exception class
class NameNotAllowedException extends \RuntimeException implements ValidationExceptionInterface
{
    // ...
}

// Domain object
class Category
{
    public function rename(string $name): void
    {
        if (...) {
            throw NameNotAllowedException::notAllowed($name);
        }

        ...
    }
}
  • Alternatively, users can handle a specific exception in the closure:
$builder->add('name', TextType::class, [
    'set' => function (Category $category, $value): void {
        try {
            $category->rename($value);
        } catch (DomainException $e) {
            throw ValidationException::fromPrevious($e);
        }
    },
]);
  • The functionality above could also be provided by a closure:
// DomainValidationHandler.php
public function wrap(string $exceptionClass, Closure $closure): Closure
{
    $code = <<<'CODE'
        return function (object $object, $value): void {
            try {
                $closure($object, $value);
            } catch (%s $e) {
                throw ValidationException::fromPrevious($e);
            }
        };
        CODE;

    return eval(sprintf($code, $exceptionClass));
}

// In form
$builder->add('name', TextType::class, [
    'set' => DomainValidationHandler::wrap(
        DomainException::class,
        fn (Category $category, $value): void => $category->rename($value)
    ),
]);

Change form validation rules

When binding a form, form data is sent through the data transformers, then applied to the object, and last but not least validated by the validation component. This reads constraints from the form as well as from the object and validates all constraints. It may be sensible to change the order when binding:

  • Take data from form and run through data transformers. Store transformed data
  • Run constraints defined on form fields on model data. Any validation errors are added to the form. If errors are found, binding the form is aborted without changing the object (or creating one if none was passed)
  • Apply data to the object. This ensures that any early constraints (e.g. invalid types in the form) can be caught beforehand. Exception handling as described above is optional here.
  • Run validation constraints defined on the object. Again, errors are added to the form.

The order above would allow people to keep form-specific validation logic in the form (which would otherwise be put in a DTO) while doing other, potentially more complex validation after data is applied to the object:

$builder->add('name', TextType::class, [
    'set' => DomainValidationHandler::wrap(
        DomainException::class,
        fn (Category $category, $value): void => $category->rename($value)
    ),
    'constraints' => [
        new UniqueEntity([
            'entityClass' => Category::class,
            'fields' => ['name'],
            'id' => // ID of entity being edited will have to be passed from options
    ],
]);

Next steps

For now, I'm looking for feedback from people that have previously created DTOs for their form <-> entity mappings, as well as from users using the bundle mentioned above. I've used the bundle myself, but my last work with Symfony Forms was almost 2 years ago, so I'm a bit outdated when it comes to form. After that, I could offer to build a prototype for the functionality outlined above.

@javiereguiluz
Copy link
Member

Andreas, thanks for creating this detailed issue! I hope it results in some improvements related to this.

As a minor comment, I suggested using get and set as the names because that is what others use:

@xabbuh
Copy link
Member

xabbuh commented Jul 16, 2020

For the records, as far as I can see parts of this proposal could currently be achieved by using RichModelFormsBundle. If there is a common ground of what we would like to change/add, I am sure that there is code that could be reused in core.

@alcaeus
Copy link
Contributor Author

alcaeus commented Jul 16, 2020

Thanks for mentioning the bundle directly, @xabbuh! I referenced the readme above but forgot to add the name.

@javiereguiluz javiereguiluz added Feature Form RFC RFC = Request For Comments (proposals about features that you want to be discussed) labels Jul 16, 2020
@javiereguiluz javiereguiluz changed the title [RFC] [form] Improve handling of domain objects in forms [RFC] [Form] Improve handling of domain objects in forms Jul 18, 2020
@garak
Copy link
Contributor

garak commented Jul 19, 2020

I don't get the point of this issue.
If you're using a DTO (that stands for Data Transfer Object, by the way), you should bind your form to it, not to your entity.
Then, after validation, entity should be created from DTO.

@jkufner
Copy link

jkufner commented Jul 19, 2020

One more thing to consider are immutable objects. A form receives an immutable object, and when the form is submitted and valid, it returns another immutable object that contains the submitted data.

Such an immutable object could be as simple as this:

class Tag {
    private $id;
    private $name;
    function __construct($id, $name) { $this->id = $id; $this->name = $name; }
    function getId() { return $this->id; }
    function getName() { return $this->name; }
}

So far I've had some success with a custom data mapper class (that implements DataMapperInterface), but it somewhat lacks elegance (and includes a code generator).

@alcaeus
Copy link
Contributor Author

alcaeus commented Jul 20, 2020

@garak
If you're using a DTO (that stands for Data Transfer Object, by the way), you should bind your form to it, not to your entity.
Then, after validation, entity should be created from DTO.

Let's consider the following Entity and DTO: https://gist.github.com/alcaeus/0b66968ee505ab24bdb71fe7c88325a7.

As you can see, the only job the DTO has is to map the form to the Category API, as this is not supported by the Form framework. I've left out validation on purpose, which I'll explain later down the line. The form for this DTO is simple, if we ignore the fact that we can't tell the form that it should never try to call a factory if no DTO was provided. A combination of data validation before applying to the entity (kind of a built-in DTO) as well as making the set and get accessors for a field customisable (to correctly map the value to the rename method) makes the entire DTO obsolete. Granted, this is a relatively simple DTO, but in my experience most DTOs are this simple or only slightly more complex. For very complex cases, it is still feasible to create a testable DTO instead of relying on the form component to do a sensible data handling.

As for validation, there is one big reason I've tried to keep validation out of the entity: it makes people think that the entity is validated before it is stored. This is obviously incorrect, as the entity manager will not use the validator component to validate an entity when it is persisted. Validation is triggered by the form framework, so if an entity is created without the input of a form, validation is entirely bypassed and invalid data could be stored. There are a few ways to work around this issue:

  • Trigger validation in an ORM listener: for example, a listener could catch an onFlush event and validate every entity that is marked to be stored to the database. If an entity fails validation, this aborts the entire flush without storing any data to the database. This has a few drawbacks: for one, data that is perfectly fine would be skipped, as it's very difficult for a generic event listener to decide whether the UnitOfWork is in a correct state without the invalid entity being applied to the database. Further to the point, we've previously used explicit change tracking, where you need to call persist on a managed entity to mark it as needing persisting to the database. The UnitOfWork will no longer check each managed object for changes, which significantly speeds up flush operations on a large UnitOfWork (>100 managed entities). This means that we generally flushed the entity manager not in a controller action, but rather in a shutdown listener. Controllers marked entities as needing persisting, as would event listeners, state machine transactions, and so on. When everything was said and done, all data belonging to the API call that was just made was either persisted or thrown away completely while logging lots of data to an error log. If validation was only triggered then, this is too late for the code to try and recover from the invalid data.
  • One could go ahead and validate each entity separately when it is being persisted. The problem here is that persist operations need to cascade (to account for relationships between entities), but validation should occur after those entities have been handled as well. This means that if validation fails, all changes that were made to the UnitOfWork will have to be reverted so as to not jeopardise the flush operation that will invariably follow on shutdown.

@jkufner
One more thing to consider are immutable objects. A form receives an immutable object, and when the form is submitted and valid, it returns another immutable object that contains the submitted data.

Such an immutable object could be as simple as this: [...]

When thinking about entities, immutable objects make little sense: if the identifier of the entity stays the same, then you are not really dealing with an immutable entity. Even worse, an immutable entity like this would cause two entities with the same identifier, where the UnitOfWork on persist needs to replace a managed object with another, potentially modified version. This isn't feasible at all. As for dealing with an immutable DTO, I don't think this makes sense, as the DTO is created for the sole purpose of holding data for the form and aid in mapping this data to the entity. It is valid for a single form bind only, thus negating any need for immutability.

For the sake of the argument, when creating an immutable entity from the Category example above, I'd always make the API behave like for ImmutableDateTime vs. DateTime, where the accessors are the same but the accessor returns a modified object:

final class Category
{
  private string $name;

  public static function fromName(string $name): self
  {
    return new self($name);
  }

  private function __construct(string $name)
  {
    $this->name = $name;
  }

  public function getName(): string
  {
    return $this->name;
  }

  public function rename(string $name): self
  {
    return self::fromName($name);
  }
}

Granted, the form system could handle these better (e.g. by replacing its own data with the returned object instead of applying nested fields individually), but I feel this would expand the scope of this RFC too much. Nonetheless, this is still a valid improvement for the form component that could make working with such objects a lot easier.

@alcaeus
Copy link
Contributor Author

alcaeus commented Jul 20, 2020

For the benefit of preserving the discussion, here are some points that came up in Slack (all brought up by the great @linaori)

"Change form validation rules"
I think this should be made broader:

  • Form takes data object and reads the metadata + structure
  • Initial data is copied to a data structure
  • Any form related stuff such as applying changes from Request, validation etc
  • When form is valid apply data to original object
    Though this would be a big change and event listeners wouldn't be able to properly deal with this. It would solve the whole issue with persistance though!

So the reason I didn't talk about a DTO is because this works both with entities and DTOs. When using a DTO, the form will validate data before applying it to the DTO, taking away the need to do basic validation (e.g. are you using a valid email address). The DTO then gets a valid set of data and can do additional work with that data, such as more complex validation that should be done in a second step. The fact that the DTO then needs to be applied to the original entity is no longer the form's concern - it's only job is to map request data to an object, then let the user do something with the object. Remember, the DTO could also be the data that is given to a special entity manager that is handled by the entity manager:

$entityManager->apply($entity, $dto);

This would take the entity, apply data from the DTO according to some mapping/contract, then persist the entity to mark it for changes. The advantage is that depending on how this is built, changeset computation becomes significantly cheaper, as the dto basically defines what data changes. This does away with the need for specific comparison. However, this is far outside of the scope of this, but a great addition to the whole persistence layer (good luck getting that to play nice with Doctrine though...)

Another thing: set and get are clear, but what about adding/removing to a collection?
The collection form type could add additional properties for add and remove. More importantly, the API may be very different from what the form expects. Currently, changes are applied like this:

$data->collectionField->add(...);

However, the API would very often look like this:

$data->addCollectionItem(...);

Reason for this is that the Doctrine collections that are passed here are always mutable, so users may not want to expose the collection object to avoid users modifying it at will. On the other hand, exposing all items in an array is similarly dangerous, as it negates any performance benefits one would get from using lazy collections (e.g. only fetch and hydrate data when required to do so). To cite an example, MongoDB returns a cursor with a single batch of data (the size of which can be configured when running the query), so a result collection based on a cursor (e.g. an inverse reference) would not need to be initialised completely and wouldn't consume lots of memory. Only when items are needed will they be fetched from the server as necessary and add to the memory footprint. Hence, I would opt for allowing to change the add and remove accessors, and saving the rest for later:

$builder->add('collection', CollectionType::class, [
    'set' => null, // Or anything else that disallows calling the setter for the field (important to not replace managed collections)
    'add' => fn (Category $category, Item $item): void => $category->addItem($item),
    'remove' => fn (Category $category, item $item): void => $category->collection->remove($item), // Silly, I know, but I wanted to should the default and that the accessors could be significantly different.
]);

@garak
Copy link
Contributor

garak commented Jul 20, 2020

Let's consider the following Entity and DTO: https://gist.github.com/alcaeus/0b66968ee505ab24bdb71fe7c88325a7.

Still, I don't see the problem with a form in proposed DTO.
And I don't understand why you keep talking about entity validation, when validation should be applied to DTO. Entity should be always valid.

@alcaeus
Copy link
Contributor Author

alcaeus commented Jul 20, 2020

The idea is to make the DTO part of the form component, so users only have to add small bits of logic (e.g. a custom setter) to integrate it into a boilerplate dto. Call it a DTOBuilder similar to the FormBuilder if you must. When that is the case, the entity will always be valid, but could theoretically still be validated again after applying data from the DTO (which was proven valid earlier).

@ro0NL
Copy link
Contributor

ro0NL commented Jul 20, 2020

honestly, IMHO we should get rid of mapped, data_class, data, empty_data options in favor of e.g. a single handler callback option. Making factorization and hydration much more explicit.

furthermore, a usecase i find myself often implementing is combining N form fields to hydrate a single object field, e.g. setName($form['firstName'], $form['lastName'])

another, IMHO valid point, was raised in slack to make model / view normalization more explicit. Currently there's a "normalized data" concept in betweens, whereas i think it can work directly like model2view + view2model.

last but least, having a "post submit and validation" event would be nice. To encapsulate more logic in the form type, rather then in the controller after isValid().

@jkufner
Copy link

jkufner commented Jul 20, 2020

@alcaeus Don't assume there is only Doctrine and its Unit of Work, please. The difficulty with Symfony Forms is because of this assumption that provides some expectations of how the model layer behaves.

Consider the following (simplified) example of a controller:

    function controller(Request $request, $id) {
        $entityObject = $repository->findById($id);
        $origImmutableDto = ImmutableDto::fromEntity($entityObject);
        $form = $this->createForm(ImmutableDtoType::class, $origImmutableDto);
        $form->handleRequest($request);
        if ($form->isSubmitted() && $form->isValid()) {
            $newImmutableDto = $form->getData();
            // Do something wiith $origImmutableDto and $newImmutableDto
            $newImmutableDto->applyToEntity($entityObject);
        }
    }

In this example we used the immutable objects to communicate with the form and then to update the mutable entity object. However, the repository may not use ORM and the Unit of Work. The following example shows the use of data access object that does not track entities; it only calls select and update queries:

    function controller(Request $request, $id) {
        $origImmutableDto = $daoRepository->findById($id)
        $form = $this->createForm(ImmutableDtoType::class, $origImmutableDto);
        $form->handleRequest($request);
        if ($form->isSubmitted() && $form->isValid()) {
            $newImmutableDto = $form->getData();
            // Do something wiith $origImmutableDto and $newImmutableDto
            $daoRepository->update($origImmutableDto, $newImmutableDto);
        }
    }

The form in the both cases behaves exactly in the same way — it receives an immutable object and returns a new immutable object. The immutability shields the form from the repository internals and entity behavior.

@jkufner
Copy link

jkufner commented Jul 20, 2020

As for the validation: I really really like the concept that the form should return valid data only — it is an extremely important change because of the typed properties in PHP 7.4.

@alcaeus
Copy link
Contributor Author

alcaeus commented Jul 20, 2020

Everyone, thanks for your input. I've taken some of it and incorporated it into the original idea, and I've come up with a Proof-of-concept pull request: #37614. The PR describes basic and advanced usage, so you can see if your scenario would be covered. I encourage you all to leave your feedback, as it is highly appreciated!

To address some other points:

@ro0NL
honestly, IMHO we should get rid of mapped, data_class, data, empty_data options in favor of e.g. a single handler callback option. Making factorization and hydration much more explicit.

Yes, and no. A single handler IMO is too much, but I agree that the options above are questionable to say the least. mapped could be done away with when using accessors (I have an idea but want to try it out in the POC before bringing it up).

furthermore, a usecase i find myself often implementing is combining N form fields to hydrate a single object field, e.g. setName($form['firstName'], $form['lastName'])

This is one of the examples shown in the PR as advanced usage. The RichModelFormsBundle does this by using the same property path for multiple fields, which doesn't work here. I believe the use-case is covered, but I'd like to hear your opinion.


@jkufner
another, IMHO valid point, was raised in slack to make model / view normalization more explicit. Currently there's a "normalized data" concept in betweens, whereas i think it can work directly like model2view + view2model.

I believe this is an important idea, but far outside of the scope here: this proposals changes nothing about how view, norm, and model data are handled. I do believe it is worthy of its own discussion though, so please feel free to create a separate issue.

last but least, having a "post submit and validation" event would be nice. To encapsulate more logic in the form type, rather then in the controller after isValid().

I agree. The current proof of concept allows for simple errors to be handled. A TypeError when using strict types is caught, as are any Symfony\Component\Form\Exception\ExceptionInterface exceptions. This allows you to either throw an exception in the accessor based on your own logic, or you can implement this exception in your domain exception. I'm thinking about how to easily make this list extensible.

Don't assume there is only Doctrine and its Unit of Work, please. The difficulty with Symfony Forms is because of this assumption that provides some expectations of how the model layer behaves.

Apologies, I should know better than that. This PR assumes that the "model layer" of the form is something an accessor can be called on. You can receive a scalar value to access (e.g. a string that has to be run through a decoder to make it accessible), or you can receive an object. You can pass an immutable object, or you can even combine this with a DTO for some extra sugar. Let's take your first example, where the DTO needs to be applied to the entity. In this example, the DTO is mutable:

    function controller(Request $request, $id) {
        $entityObject = $repository->findById($id);
        $dto = MutableDto::fromEntity($entityObject);
        $form = $this->createForm(MutableDtoType::class, $dto);
        $form->handleRequest($request);
        if ($form->isSubmitted() && $form->isValid()) {
            $entityManager->persist($entityObject);
            $entityManager->flush();
        }
    }

In this case, "all" you have to do is to add a listener on the form that checks if the form was valid on POST_SUBMIT and apply the DTO to the entity. Any exceptions while applying can still be added as form errors, triggering the isValid check after handleRequest. This makes your DTO behave like an entity for all intents and purposes, and allows multiple layers of validation:

  • When binding data to the form by specifying accessors that handle type errors and domain exceptions
  • When the dto is validated using the form validator extension
  • When the dto is applied to the original entity

This allows for maximum flexibility and leaves your looking like any regular form and treat it like it handled the entity directly.

Your second example spices this up by passing the original and modified DTO. Here, you would change all 'set' accessors to return the new object, triggering the immutability handling. In a POST_SET_DATA listener you store a copy of this immutable data for later use. In the POST_SUBMIT listener you again take the new DTO (which is a new object), but instead of calling apply on the DTO you pass it to the $daoRepository::update method, along with the original data that you've stored in the previous listener. The form for an entity can know about this, but you can also inject this behaviour in a controller.

@jkufner
Copy link

jkufner commented Jul 20, 2020

furthermore, a usecase i find myself often implementing is combining N form fields to hydrate a single object field, e.g. setName($form['firstName'], $form['lastName'])

I see four cases here:

  • 1:1 mapping: One field is mapped to one property of a DTO. Getter and setter may differ, but it is a simple and very common situation. It is important to properly handle nulls and uninitialized typed properties. Target DTO may not be the same as the source DTO.
  • M:1 mapping: Multiple fields form a single property of a DTO. Difficulty is how to specify which fields shoud be passed to the setter. A generic but uncomfortable way is to use a DataMapper. Another way could be to group these fields together into a nested type/form/something; then we can set a single setter for this group.
  • 1:N mapping: Typically a collection, where a single property of a DTO creates a list of fields. Similarly to N:1 case, a nested type represents each item in the collection.
  • M:N mapping: Not sure what this case is; just use a DataMapper or better think again.

@alcaeus
Copy link
Contributor Author

alcaeus commented Jul 20, 2020

furthermore, a usecase i find myself often implementing is combining N form fields to hydrate a single object field, e.g. setName($form['firstName'], $form['lastName'])

I see four cases here:

For these cases, I would suggest the following:

  • 1:1 mapping: both mutable and immutable DTO are supported. Accessors can be used to add custom logic when reading and writing values (e.g. calling strrev or redacting values).
  • M:1 mapping: can be done by changing the accessors on the form itself instead of doing so on nested form fields.
  • 1:N mapping: this is a collection. CollectionType (or whatever that list type is called) would get support for 'add' and 'remove' accessors in a second step. Logic and behaviour is the same.
  • M:N mapping: this would always require a custom data mapper with more logic. Would not recommend.

@jkufner
Copy link

jkufner commented Jul 20, 2020

M:1 mapping: can be done by changing the accessors on the form itself instead of doing so on nested form fields.

The question is, which object should know how to map form fields to DTO properties. The form itself should only know that there are nested form fields for the DTO, nothing more. DTO should not be bound to the particular form — it is the form that uses the DTO, not the other way around. Therefore, the form fields are the only remaining thing that should know about the mapping. Or am I missing something?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature Form RFC RFC = Request For Comments (proposals about features that you want to be discussed)
Projects
None yet
6 participants