Skip to content

CsvEncoder handling variable structures and custom header order #24256

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

backbone87
Copy link
Contributor

Q A
Branch? master
Bug fix? no
New feature? yes
BC breaks? no
Deprecations? no
Tests pass? yes
Fixed tickets #23278
License MIT
Doc PR TBD

This PR improves the CsvEncoder to handle variable nesting structures and adds a context option that allows custom csv header order.

@backbone87
Copy link
Contributor Author

fyi: the header merging of variable array structures is specifically optimized for collection like substructures, to maintain good ordering. the testcase included is not such a case

@backbone87
Copy link
Contributor Author

@dunglas would be nice if you could review, since the original impl is done by you

@nicolas-grekas nicolas-grekas changed the base branch from master to 3.4 September 20, 2017 05:25
@nicolas-grekas
Copy link
Member

@backbone87 I changed the base branch to 3.4 because that's the correct target. Could you please rebase on 3.4 on your side?

@nicolas-grekas nicolas-grekas added this to the 3.4 milestone Sep 20, 2017
@backbone87
Copy link
Contributor Author

@nicolas-grekas done

@@ -177,7 +180,7 @@ public function supportsDecoding($format)
* @param string $keySeparator
* @param string $parentKey
*/
private function flatten(array $array, array &$result, $keySeparator, $parentKey = '')
protected function flatten(array $array, array &$result, $keySeparator, $parentKey = '')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid adding protected methods? Decoration is usually a better extension point (apply for all methods), and it's easier to maintain for us...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually the flatten function is pure and can be made public static
about the options: you have no way to access the default settings of the CsvEncoder, which requires duplicating the settings (in code and DI) in either case (inheritance or composition)

Copy link
Contributor Author

@backbone87 backbone87 Sep 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would be happy to make it private, because i totally understand your reasoning for it, but compositions or subtypes need to be able to access the settings, imho

edit: i actually tried to extend the CsvEncoder in my current project and add the features from this PR via inheritance, but in the end i needed to copy over the whole encoder. one could actually use decoration and prepare the data acoordingly, so the current encoder doesnt error out, but i still need to copy at least the key_separator setting

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, making things protected means that we have to maintain backward compatibility for it forever (well, until the next major version). And inheritance-based extension points are the most painful ones when dealing with BC layers.
So we are very careful about introducing them. Please keep this method private for now. Making it accessible to the outside should be a feature request on its own (with arguments justifying the increased maintenance cost)

$this->flatten($value, $flattened, $keySeparator);
$value = $flattened;
}
unset($value);
Copy link
Member

@dunglas dunglas Sep 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the benefit of unsetting this var explicitly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its a cleanup of the dangling reference var. if one would do $value = 'foo' after the loop, he would overwrite the last iterated item in the array.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but $value is not accessed after.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its more of a safety for futher modifications (i do it always as a best practice), this gets lost fast and could cause confusion (though in this case is covered by testcases)

} elseif (array_keys($result) !== $headers) {
throw new InvalidArgumentException('To use the CSV encoder, each line in the data array must have the same structure. You may want to use a custom normalizer class to normalize the data format before passing it to the CSV encoder.');
}
$headers = array_merge(array_values($headers), array_diff($this->extractHeaders($data), $headers));
Copy link
Member

@dunglas dunglas Sep 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Computing the header list dynamically like you've done is very interesting but it can be very costly on large datasets. Shouldn't it introduced as an option? It would be nice to be able to choose between the previous behavior and your implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be in the same complexity class as the rest of the encoder. we can reduce the cost a little bit by removing the array_keys in the loop (that indeed gets executed for each item). but other than that its just n*m isset calls for the previously supported cases, which compared to normalization and actual CSV encoding, is negliable.

public function testEncodeNonFlattenableStructure()
public function testEncodeVariableStructure()
{
$value = array(
Copy link
Member

@dunglas dunglas Sep 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about:

array(
  'a' => array('a', 'b'),
  'a' => 'foo',
)

This case looks not easy to handle, it's one of the reasons of the current behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually this one gets handled just fine, but the result may be a little bit unexpected:

a.0,a.1,a
a,b,
,,foo

i dont think we need to handle this differently in the encoder, because that one is indeed a custom normalizers job

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The output is a bit weird but why not.

Copy link
Contributor Author

@backbone87 backbone87 Sep 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw this is more a problem of the flattening algorithm, which is untouched by the PR.
compare it with the previously support case of:

[
  ['a' => ['foo', 'bar'], 'a.0' => 'baz'],
]

Copy link
Member

@dunglas dunglas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good improvements @backbone87!

@fabpot
Copy link
Member

fabpot commented Sep 27, 2017

Can you add a note in the component CHANGELOG before we merge? Thanks.

@backbone87
Copy link
Contributor Author

@fabpot done, but i guess, i need to rebase?

@fabpot
Copy link
Member

fabpot commented Sep 27, 2017

Thank you @backbone87.

fabpot added a commit that referenced this pull request Sep 27, 2017
…der order (Oliver Hoff)

This PR was squashed before being merged into the 3.4 branch (closes #24256).

Discussion
----------

CsvEncoder handling variable structures and custom header order

| Q             | A
| ------------- | ---
| Branch?       | master
| Bug fix?      | no
| New feature?  | yes
| BC breaks?    | no
| Deprecations? | no
| Tests pass?   | yes
| Fixed tickets | #23278
| License       | MIT
| Doc PR        | TBD

This PR improves the CsvEncoder to handle variable nesting structures and adds a context option that allows custom csv header order.

Commits
-------

d173494 CsvEncoder handling variable structures and custom header order
@fabpot fabpot closed this Sep 27, 2017
This was referenced Oct 18, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants