-
-
Notifications
You must be signed in to change notification settings - Fork 9.6k
CsvEncoder handling variable structures and custom header order #24256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
0719b76
to
3c2aa91
Compare
fyi: the header merging of variable array structures is specifically optimized for collection like substructures, to maintain good ordering. the testcase included is not such a case |
3c2aa91
to
dd8de08
Compare
@dunglas would be nice if you could review, since the original impl is done by you |
@backbone87 I changed the base branch to 3.4 because that's the correct target. Could you please rebase on 3.4 on your side? |
dd8de08
to
d5060ac
Compare
@nicolas-grekas done |
@@ -177,7 +180,7 @@ public function supportsDecoding($format) | |||
* @param string $keySeparator | |||
* @param string $parentKey | |||
*/ | |||
private function flatten(array $array, array &$result, $keySeparator, $parentKey = '') | |||
protected function flatten(array $array, array &$result, $keySeparator, $parentKey = '') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid adding protected
methods? Decoration is usually a better extension point (apply for all methods), and it's easier to maintain for us...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually the flatten function is pure and can be made public static
about the options: you have no way to access the default settings of the CsvEncoder, which requires duplicating the settings (in code and DI) in either case (inheritance or composition)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would be happy to make it private, because i totally understand your reasoning for it, but compositions or subtypes need to be able to access the settings, imho
edit: i actually tried to extend the CsvEncoder in my current project and add the features from this PR via inheritance, but in the end i needed to copy over the whole encoder. one could actually use decoration and prepare the data acoordingly, so the current encoder doesnt error out, but i still need to copy at least the key_separator setting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, making things protected means that we have to maintain backward compatibility for it forever (well, until the next major version). And inheritance-based extension points are the most painful ones when dealing with BC layers.
So we are very careful about introducing them. Please keep this method private for now. Making it accessible to the outside should be a feature request on its own (with arguments justifying the increased maintenance cost)
$this->flatten($value, $flattened, $keySeparator); | ||
$value = $flattened; | ||
} | ||
unset($value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the benefit of unsetting this var explicitly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its a cleanup of the dangling reference var. if one would do $value = 'foo'
after the loop, he would overwrite the last iterated item in the array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes but $value
is not accessed after.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its more of a safety for futher modifications (i do it always as a best practice), this gets lost fast and could cause confusion (though in this case is covered by testcases)
} elseif (array_keys($result) !== $headers) { | ||
throw new InvalidArgumentException('To use the CSV encoder, each line in the data array must have the same structure. You may want to use a custom normalizer class to normalize the data format before passing it to the CSV encoder.'); | ||
} | ||
$headers = array_merge(array_values($headers), array_diff($this->extractHeaders($data), $headers)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Computing the header list dynamically like you've done is very interesting but it can be very costly on large datasets. Shouldn't it introduced as an option? It would be nice to be able to choose between the previous behavior and your implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should be in the same complexity class as the rest of the encoder. we can reduce the cost a little bit by removing the array_keys
in the loop (that indeed gets executed for each item). but other than that its just n*m isset calls for the previously supported cases, which compared to normalization and actual CSV encoding, is negliable.
public function testEncodeNonFlattenableStructure() | ||
public function testEncodeVariableStructure() | ||
{ | ||
$value = array( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about:
array(
'a' => array('a', 'b'),
'a' => 'foo',
)
This case looks not easy to handle, it's one of the reasons of the current behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually this one gets handled just fine, but the result may be a little bit unexpected:
a.0,a.1,a
a,b,
,,foo
i dont think we need to handle this differently in the encoder, because that one is indeed a custom normalizers job
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The output is a bit weird but why not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw this is more a problem of the flattening algorithm, which is untouched by the PR.
compare it with the previously support case of:
[
['a' => ['foo', 'bar'], 'a.0' => 'baz'],
]
d5060ac
to
16c1988
Compare
16c1988
to
4aad92a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good improvements @backbone87!
Can you add a note in the component CHANGELOG before we merge? Thanks. |
@fabpot done, but i guess, i need to rebase? |
Thank you @backbone87. |
…der order (Oliver Hoff) This PR was squashed before being merged into the 3.4 branch (closes #24256). Discussion ---------- CsvEncoder handling variable structures and custom header order | Q | A | ------------- | --- | Branch? | master | Bug fix? | no | New feature? | yes | BC breaks? | no | Deprecations? | no | Tests pass? | yes | Fixed tickets | #23278 | License | MIT | Doc PR | TBD This PR improves the CsvEncoder to handle variable nesting structures and adds a context option that allows custom csv header order. Commits ------- d173494 CsvEncoder handling variable structures and custom header order
This PR improves the CsvEncoder to handle variable nesting structures and adds a context option that allows custom csv header order.