Skip to content

[Serializer][CSV] Add context options to handle BOM #33896

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 13, 2019

Conversation

malarzm
Copy link
Contributor

@malarzm malarzm commented Oct 7, 2019

Q A
Branch? 4.4
Bug fix? no
New feature? yes
Deprecations? no
Tickets Fix #33684
License MIT
Doc PR symfony/symfony-docs#12461

This allows BOM handling in en/decoded CSV files. To keep current behaviour intact both skipping BOM at the beginning of the CSV and outputting BOM are an opt-in feature.

Personally I'd propose to make SKIP_INPUT_BOM default to false in 5.0 so the BOM is transparent and people that for some reasons expect BOM characters to be present in the parsed text explicitly opt-out of trimming it.

@malarzm malarzm requested a review from dunglas as a code owner October 7, 2019 22:15
@malarzm malarzm changed the title Add context options to handle BOM [Serializer][CSV] Add context options to handle BOM Oct 7, 2019
@nicolas-grekas nicolas-grekas added this to the next milestone Oct 8, 2019
Copy link
Member

@nicolas-grekas nicolas-grekas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When decoding a value, we should always deal with the BOM to me, per the robustness principle.
Then, does it really make sense to add an option to prefix the BOM in the output, when the alternative is just a concatention away? I'm not sure...

@stof
Copy link
Member

stof commented Oct 8, 2019

I agree with @nicolas-grekas that we should always handle BOM in the input.
And I'm not sure we need the ByteOrderMask class.

@malarzm
Copy link
Contributor Author

malarzm commented Oct 8, 2019

@nicolas-grekas so you're proposing we always add BOM to output and handle it in input (if present)?

@nicolas-grekas
Copy link
Member

I'm proposing to never add the BOM in the output, and always handle it in the input.

@malarzm
Copy link
Contributor Author

malarzm commented Oct 8, 2019

I'm 👎 for never adding BOM in the output as then I'm as a consumer responsible for checking whether data is serialized to CSV and if so, add BOM to it.

@stof
Copy link
Member

stof commented Oct 8, 2019

@malarzm as @nicolas-grekas said, you can still concatenate the BOM before the serialized data.

@malarzm
Copy link
Contributor Author

malarzm commented Oct 8, 2019

@nicolas-grekas @stof I'm not sure when exactly you want to manually concatenate BOM, may I ask for an example how do you see it?

@malarzm malarzm force-pushed the csv-bom branch 2 times, most recently from ebe7fd3 to f6b1405 Compare October 8, 2019 19:01
@malarzm
Copy link
Contributor Author

malarzm commented Oct 8, 2019

@nicolas-grekas I've removed the ByteOrderMask class keeping only UTF-8 BOM in the encoder itself, the BOM is always stripped while decoding and, for now, I've left a bool switch to include BOM in encoded string as I believe this is encoder's responsibility.

Copy link
Member

@fabpot fabpot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small typo

@@ -5,6 +5,7 @@ CHANGELOG
-----

* deprecated the `XmlEncoder::TYPE_CASE_ATTRIBUTES` constant, use `XmlEncoder::TYPE_CAST_ATTRIBUTES` instead
* added option to output an UTF-8 BOM in CSV encoder via `CsvEncoder::OUTPUT_UTF8_BOM_KEY` context option
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a UTF-8

@@ -114,6 +119,14 @@ public function encode($data, $format, array $context = [])
$value = stream_get_contents($handle);
fclose($handle);

if ($outputBom) {
if (!preg_match('//u', $value)) {
throw new UnexpectedValueException('You are trying to add an UTF-8 BOM to a non UTF-8 text.');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a UTF-8

@malarzm
Copy link
Contributor Author

malarzm commented Oct 13, 2019

@fabpot thanks for noticing, fixed :)

@fabpot
Copy link
Member

fabpot commented Oct 13, 2019

Thank you @malarzm.

fabpot added a commit that referenced this pull request Oct 13, 2019
…alarzm)

This PR was merged into the 4.4 branch.

Discussion
----------

[Serializer][CSV] Add context options to handle BOM

| Q             | A
| ------------- | ---
| Branch?       | 4.4
| Bug fix?      | no
| New feature?  | yes
| Deprecations? | no
| Tickets       | Fix #33684
| License       | MIT
| Doc PR        | symfony/symfony-docs#12461

This allows BOM handling in en/decoded CSV files. To keep current behaviour intact both skipping BOM at the beginning of the CSV and outputting BOM are an opt-in feature.

Personally I'd propose to make `SKIP_INPUT_BOM` default to `false` in 5.0 so the BOM is transparent and people that for some reasons expect BOM characters to be present in the parsed text explicitly opt-out of trimming it.

Commits
-------

3eb3668 Add context options to handle BOM
@fabpot fabpot merged commit 3eb3668 into symfony:4.4 Oct 13, 2019
@malarzm malarzm deleted the csv-bom branch October 13, 2019 19:47
@nicolas-grekas nicolas-grekas modified the milestones: next, 4.4 Oct 27, 2019
This was referenced Nov 12, 2019
wouterj added a commit to symfony/symfony-docs that referenced this pull request Oct 3, 2020
This PR was merged into the 4.4 branch.

Discussion
----------

List CSV encoder's context options

I needed to add new option `output_utf8_bom` introduced in symfony/symfony#33896 and since context options weren't documented I figured I'll list them all. Format and text is based on XML's part of the docs.

Commits
-------

b600aa0 List CSV encoder's context options
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Serializer] Add BOM support to CSV Encoder
6 participants