-
-
Notifications
You must be signed in to change notification settings - Fork 9.6k
[Serializer][CSV] Add context options to handle BOM #33896
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
src/Symfony/Component/Serializer/Tests/Encoder/CsvEncoderTest.php
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When decoding a value, we should always deal with the BOM to me, per the robustness principle.
Then, does it really make sense to add an option to prefix the BOM in the output, when the alternative is just a concatention away? I'm not sure...
I agree with @nicolas-grekas that we should always handle BOM in the input. |
@nicolas-grekas so you're proposing we always add BOM to output and handle it in input (if present)? |
I'm proposing to never add the BOM in the output, and always handle it in the input. |
I'm 👎 for never adding BOM in the output as then I'm as a consumer responsible for checking whether data is serialized to CSV and if so, add BOM to it. |
@malarzm as @nicolas-grekas said, you can still concatenate the BOM before the serialized data. |
@nicolas-grekas @stof I'm not sure when exactly you want to manually concatenate BOM, may I ask for an example how do you see it? |
ebe7fd3
to
f6b1405
Compare
@nicolas-grekas I've removed the ByteOrderMask class keeping only UTF-8 BOM in the encoder itself, the BOM is always stripped while decoding and, for now, I've left a bool switch to include BOM in encoded string as I believe this is encoder's responsibility. |
93602d0
to
35d8e61
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small typo
@@ -5,6 +5,7 @@ CHANGELOG | |||
----- | |||
|
|||
* deprecated the `XmlEncoder::TYPE_CASE_ATTRIBUTES` constant, use `XmlEncoder::TYPE_CAST_ATTRIBUTES` instead | |||
* added option to output an UTF-8 BOM in CSV encoder via `CsvEncoder::OUTPUT_UTF8_BOM_KEY` context option |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a UTF-8
@@ -114,6 +119,14 @@ public function encode($data, $format, array $context = []) | |||
$value = stream_get_contents($handle); | |||
fclose($handle); | |||
|
|||
if ($outputBom) { | |||
if (!preg_match('//u', $value)) { | |||
throw new UnexpectedValueException('You are trying to add an UTF-8 BOM to a non UTF-8 text.'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a UTF-8
@fabpot thanks for noticing, fixed :) |
Thank you @malarzm. |
…alarzm) This PR was merged into the 4.4 branch. Discussion ---------- [Serializer][CSV] Add context options to handle BOM | Q | A | ------------- | --- | Branch? | 4.4 | Bug fix? | no | New feature? | yes | Deprecations? | no | Tickets | Fix #33684 | License | MIT | Doc PR | symfony/symfony-docs#12461 This allows BOM handling in en/decoded CSV files. To keep current behaviour intact both skipping BOM at the beginning of the CSV and outputting BOM are an opt-in feature. Personally I'd propose to make `SKIP_INPUT_BOM` default to `false` in 5.0 so the BOM is transparent and people that for some reasons expect BOM characters to be present in the parsed text explicitly opt-out of trimming it. Commits ------- 3eb3668 Add context options to handle BOM
This PR was merged into the 4.4 branch. Discussion ---------- List CSV encoder's context options I needed to add new option `output_utf8_bom` introduced in symfony/symfony#33896 and since context options weren't documented I figured I'll list them all. Format and text is based on XML's part of the docs. Commits ------- b600aa0 List CSV encoder's context options
This allows BOM handling in en/decoded CSV files. To keep current behaviour intact both skipping BOM at the beginning of the CSV and outputting BOM are an opt-in feature.
Personally I'd propose to make
SKIP_INPUT_BOM
default tofalse
in 5.0 so the BOM is transparent and people that for some reasons expect BOM characters to be present in the parsed text explicitly opt-out of trimming it.