Skip to content

[Serializer][FrameworkBundle] Add a CSV encoder #19197

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

dunglas
Copy link
Member

@dunglas dunglas commented Jun 27, 2016

Q A
Branch? master
Bug fix? no
New feature? yes
BC breaks? no
Deprecations? no
Tests pass? yes
Fixed tickets n/a
License MIT
Doc PR todo

Usage:

use Symfony\Component\Serializer\Serializer;
use Symfony\Component\Serializer\Encoder\CsvEncoder;
use Symfony\Component\Serializer\Normalizer\ObjectNormalizer;

$serializer = new Serializer(array(new ObjectNormalizer()), array(new CsvEncoder()));
// or $serializer = $container->get('serializer'); when using the full stack framework
$serializer->encode($something, 'csv');
$serializer->decode(<<<'CSV'
id,name
1,Kévin
CSV
, 'csv');

CSV files must contain a header line with property names as keys.

ping @clementtalleu @Simperfit @gorghoa

@@ -977,6 +978,13 @@ private function registerSerializerConfiguration(array $config, ContainerBuilder
$definition->addTag('serializer.normalizer', array('priority' => -900));
}

if (class_exists('Symfony\Component\Serializer\Encoder\CsvEncoder')) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use ::class here also

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's weird (accessing a constant on a class that doesn't exist) but I confirm it works.

@Simperfit
Copy link
Contributor

LGTM 👍

$result = array();
$this->flatten($value, $result);

if (!$headersAdded) {
Copy link

@gorghoa gorghoa Jun 28, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dunglas, instead of this condition, why don’t you just set csv headers before the foreach ?

forget it, you need to extract the keys from the value… sorry.

@gorghoa
Copy link

gorghoa commented Jun 28, 2016

@dunglas , thanks for your work ;)

Still, I may be wrong, but I think this case may be problematic with your implementation:

Given an entity (Person) with an 0,n attribute association (Addresses).
Then, Person instances may have an array of Addresses of different length.
Hence, the length of the results once filled by the flatten method may be different from one row to another.
Since you are guessing headers form the first row. Headers and rows may be incoherent.

I know it’s a pretty annoying use case but I think that flattening in this is case may be more a con that a pro. A workaround could be a kind of serialization of the related entities and merge them together in one csv column. Not very elegant but maybe more robust in term of cohesion of the generated file.


$this->assertEquals($expected, $this->encoder->decode(<<<'CSV'
foo,bar.baz.bat
a,b,c
Copy link
Contributor

@Simperfit Simperfit Jun 28, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why there is c here ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To test that it is silently ignored.

@dunglas
Copy link
Member Author

dunglas commented Jun 28, 2016

@gorghoa another solution can be to generate something like that:

id,name,address.0,address.1,address.2,address.3,address.4,gender
1,Kévin,A,B,,,,m
2,Rodrigue,A,B,C,D,E,m

WDYT?

@gorghoa
Copy link

gorghoa commented Jun 28, 2016

@dunglas , that may work but I still see two flaws:

1st, we’ll need to iterate through all the rows first in order to generate the headers (EDIT, we may compute them during the rows iterations and put them as first line of our file at the end of the loop. Loosing the possibility to stream content as it goes, though).

2nd, If another 0,n attribute (say phones) is present, we can’t use the trick of the last facultatives columns (or we have to fill empty columns with null value).

Finally, I think that, with this dynamic set of column, it will be more complex for consumer systems (apart from excel and associates) to handle the csv representation.

return $result;
}

// If there is only one data line in the document, return it is not considered has a collection
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand well the last part of this phrase: [...] return it is not considered has a collection

Copy link
Contributor

@greg0ire greg0ire Jun 28, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should probably be // If there is only one data line in the document, return it (the line), the result is not considered as a collection (has => as). Not sure this is very good design : if you can't be sure you can't get one-line long documents, you are going to have to write a special case just for this in client code. I think there should be a dedicated method for this use case if it is really needed, that would throw an exception if more than one line is found. BTW, this does not look like it is unit tested.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea behind this is to allow encoding single items (e.g. /product/1) as a CSV document containing 1 one line without having to tune the normalizer. And this is unit tested (testEncode/testEncodeCollection / testDecode/testDecodeCollection).

@dunglas
Copy link
Member Author

dunglas commented Jun 28, 2016

It's an implementation detail for 1 and 2 (and I've an idea of how do that, even if it will decrease a bit performances).

IMO this approach is still better - even for naive clients - than generating and output not covered by the RFC (like columns in the column).

@dunglas
Copy link
Member Author

dunglas commented Jun 28, 2016

@gorghoa I've looked at how various projects handle that (Sales Force, Drupal) and they all use a solution similar to yours. So after a thinking again, I'll implement what you suggest.

Probably something like that:

id,name,groups
1,Kévin,"a,b,c"

@javiereguiluz
Copy link
Member

@dunglas maybe we should also ask for the opinion of Sonata guys (mostly @soullivaneuh and @greg0ire) because they have experience creating CSV writers (they even have a separate project for that: https://github.com/sonata-project/exporter)

fwrite($handle, $data);
rewind($handle);

$headers = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this var is intended to store an array later, it can be initialized as an empty array IMO. I don't see any special case null.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To remove

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get your point @roukmoute. Can you please show where is this happening?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to keep is as is to have a code as close as possible in encode and decode (and in encode $headers cannot be initialized as an empty array because the normalized data can contain an empty array).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@phansys: sorry I write a mistake.

@dunglas
Copy link
Member Author

dunglas commented Jun 28, 2016

IIRC, the Sonata Exporter has no special handling for multiple values. You need to handle this case yourself by adding a special method to the entity returning a string.

@gorghoa
Copy link

gorghoa commented Jun 28, 2016

@dunglas: Can’t argue against the RFC argument ;) I’m ok then with the elastic columns set. Sorry, I’ve missed your final comment… 👍

/**
* {@inheritdoc}
*/
public function encode($data, $format, array $context = array())
Copy link
Contributor

@greg0ire greg0ire Jun 28, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if $data is huge ? It would be great to have an interator here instead be able to use an iterator here. Not sure line 50 is compatible with that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be an array because of the current design of the Serializer (Normalizers return arrays).

@greg0ire
Copy link
Contributor

@dunglas maybe we should also ask for the opinion of Sonata guys (mostly @soullivaneuh and @greg0ire) because they have experience creating CSV writers (they even have a separate project for that: https://github.com/sonata-project/exporter)

Not speaking for @soullivaneuh here, as far as I'm concerned, I'm just maintaining these writers, I didn't actually create them. I think a good inspiration source could be https://github.com/ddeboer/data-import/blob/master/src/Writer/CsvWriter.php cc @sagikazarmark @Baachi @ddeboer (input from those guys would be great IMO)

@dunglas
Copy link
Member Author

dunglas commented Jun 28, 2016

There is another issue with the solution proposed by @gorghoa. If it's a collection of complex structures (like a collection of associative arrays), it becomes impossible to display it (or it will require to include a full CSV document inside the cell, and it looks bad to me). However, the flatten column solution works as expected:

name,address.0.line1,address.0.line2,address.1.line1,address.1.line2

Thought?

}

rewind($handle);
$value = stream_get_contents($handle);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same problem here, what if $data is huge? $value will be too. Not sure you can do anything against that though, it depends if Symfony can work with a stream rather than a string…

Copy link
Member Author

@dunglas dunglas Jun 28, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not currently, but it impacts all other decoders as well. However it would be a nice addition to the component to be able to deal with streams.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would indeed! Php has yield now, which should make this kind of thing easier to work with (What if normalizers yielded values instead of returning a huge array they build ?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be done in Symfony 3 because the minimal PHP version is now 5.5. However encoders must stay backward compatible with normalizers returning raw arrays (they must support both cases).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I create a new issue about this ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding that to my todo list!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #19238

@greg0ire
Copy link
Contributor

greg0ire commented Jun 29, 2016

There is another issue with the solution proposed by @gorghoa. If it's a collection of complex structures (like a collection of associative arrays), it becomes impossible to display it (or it will require to include a full CSV document inside the cell, and it looks bad to me). However, the flatten column solution works as expected:

I think CSV is not the best format to represent deep data structures, and that you should simply throw an exception if the input data goes to deep. If people want to flatten things, they can still do it before feeding the result to your writer, which should IMO stay focused on doing one thing and doing it well.

If you still feel a strong need however, then you should probably write a flattener class that would be able to wrap this one (which would be best with a stream friendly serializer :trollface: )

@dunglas
Copy link
Member Author

dunglas commented Jul 1, 2016

@greg0ire's idea implemented (It's by far the best idea. Thanks.) and other comments fixed.

@greg0ire
Copy link
Contributor

greg0ire commented Jul 1, 2016

I don't get it… Is it me or is your last push is 4 days old ? 😕 I don't see anything new, but then again github has been having strange caching issues lately.

@Simperfit
Copy link
Contributor

@greg0ire he rebased, this is why ^^

@chalasr
Copy link
Member

chalasr commented Jul 1, 2016

@greg0ire My guess is that the change is here.

$headers = array_keys($result);
fputcsv($handle, $headers, $this->delimiter, $this->enclosure, $this->escapeChar);
} elseif (array_keys($result) !== $headers) {
throw new InvalidArgumentException('To use the CSV encoder, each line in the data array must have the same structure. You may want to use a custom normalizer class to normalize the data format before passing it to the CSV encoder.');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe linebreak this ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, you convinced me !

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fabpot even if I'm personnaly convinced about no linebreak for method and class signatures, I don't really understand why we can't linebreak this thing? It's a new class, no merge conflict possible here.

Could you tell us why? I'm curious. :-) Maybe a coding standard I missed?

I think we can at least put the string argument onto a new line.

Thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the code to be dense vertically. Nowadays, screens have a lot of horizontal space. And anyway, this is just an error message, I don't want exceptions to take too much screen space when reading the code.

Copy link
Contributor

@jvasseur jvasseur Jul 1, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error messages should be keep on one line to allow searching them more easily in the code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nowadays, screens have a lot of horizontal space

That's why we disagree apparently : I use the fact that my screen is big to split windows and often have 4 files on the same screen. I think this is the very first factor of speed on my workflow, because it makes navigating files way easier than browsing through tabs IMO, but I know many people seem to like to use their screen for one file at a time, while they often need to switch between a controller, a service, a view and a translation file in applications, a class and its test in libraries.

Other things to take into consideration is that Github (for instance) will not display this nicely, and also long lines are harder to edit : people scroll way faster vertically than horizontally.

But anyway, your repo, your rules, so thanks for taking the time to give your point of view even if you're probably very busy.

Copy link
Contributor

@greg0ire greg0ire Jul 1, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error messages should be keep on one line to allow searching them more easily in the code.

That's if you linebreak them in the middle of a sentence / proposition. This is still quite greppable :

throw new InvalidArgumentException(
    'To use the CSV encoder, 
each line in the data array must have the same structure. 
You may want to use a custom normalizer class to normalize the data format 
before passing it to the CSV encoder.'
);

@greg0ire
Copy link
Contributor

greg0ire commented Jul 1, 2016

@greg0ire he rebased, this is why ^^

Rebasing means you have to push again and rewrite commits, but ok…

@jvasseur
Copy link
Contributor

jvasseur commented Jul 1, 2016

@greg0ire but rewriting commits can keep the original commit date, and github use the commit date, not the push date.

@greg0ire
Copy link
Contributor

greg0ire commented Jul 1, 2016

@greg0ire but rewriting commits can keep the original commit date, and github use the commit date, not the push date.

Makes sense 💡 Thanks!

@dunglas
Copy link
Member Author

dunglas commented Jul 1, 2016

@rybakit

@dunglas What about my comment for the case when $data = []? The check in line #50 will do comparison against range(0, -1) which evaluates into [0, -1].

I've added 2 unit tests for empty arrays and empty documents (and I fixed a bug in the decoder thanks to this test).

@dunglas
Copy link
Member Author

dunglas commented Jul 10, 2016

Rebased.

ping @symfony/deciders

@dunglas
Copy link
Member Author

dunglas commented Jul 17, 2016

ping @symfony/deciders

* @param array $result
* @param string $parentKey
*/
private function flatten(array $array, array &$result, $parentKey = '')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be moved after public methods

@mcfedr
Copy link
Contributor

mcfedr commented Jul 27, 2016

Idea that might be useful - a way of customising the column names.
In most cases a Normalizer would allow you to use custom column names, but for nested columns you will always have the "." between the names.

There might be value in being able to pass a mapping of names in the serializer context

An example, you might have had this output:

name,address.street,address.city
Fred,A Street,A City

But with a name mapping:

[
   'address.street' = 'Street',
   'address.city' = 'City
]

Changes to

name,Street,City
Fred,A Street,A City

It would be a simple look up when the headers are set, but allow much more flexible CSV generation.

@dunglas
Copy link
Member Author

dunglas commented Aug 16, 2016

@mcfedr good idea but IMO it deserves its own PR. @symfony/deciders are you ok to merge this one?

@fabpot
Copy link
Member

fabpot commented Sep 14, 2016

Thank you @dunglas.

@fabpot fabpot closed this in 47657e5 Sep 14, 2016
@fabpot fabpot mentioned this pull request Oct 27, 2016
xabbuh added a commit to symfony/symfony-docs that referenced this pull request Dec 12, 2016
This PR was submitted for the master branch but it was merged into the 3.2 branch instead (closes #7040).

Discussion
----------

[Serializer] Docs for CSV and YAML encoders

Docs for symfony/symfony#19197 and symfony/symfony#19326.

Commits
-------

5687972 [Serializer] Docs for CSV and YAML encoders.
}

// If there is only one data line in the document, return it (the line), the result is not considered as a collection
return $result[0];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouterj added a commit to symfony/symfony-docs that referenced this pull request Oct 13, 2017
…r (GuilhemN)

This PR was submitted for the 3.2 branch but it was merged into the 3.3 branch instead (closes #7654).

Discussion
----------

[Serializer] Document the CsvEncoder and the YamlEncoder

Document the CsvEncoder and the YamlEncoder introduced in symfony/symfony#19197 and symfony/symfony#19326.

Commits
-------

6944ea1 [Serializer] Document the CsvEncoder and the YamlEncoder
@dunglas dunglas deleted the csv_encoder branch December 6, 2017 13:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.