Skip to content

Fix #42357: fputcsv() has an optional parameter for line endings #6403

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 29, 2021

Conversation

CameronHall
Copy link
Contributor

@CameronHall CameronHall commented Nov 6, 2020

Hi all,

This is my first PR to the PHP source and the "bug" I'm trying to fix is also much more of a "feature". Hence why I'm targeting master instead of the lowest supported branch - please advise whether or not this is the correct thing to do.

fputcsv does not terminate lines correctly as per RFC 4810[1]. After adding a new parameter fputcsv it may now use a user defined line ending. In order to maintain backwards compatibility fputcsv() still terminates lines with "\n" by default.

Also fixes: #46367[2], #62770[3]
Ref: #42357[4]

[1] https://tools.ietf.org/html/rfc4180
[2] https://bugs.php.net/bug.php?id=46367
[3] https://bugs.php.net/bug.php?id=62770
[4] https://bugs.php.net/bug.php?id=42357

Copy link
Member

@Girgias Girgias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if this needs an RFC or not

Just for reference I started working on a new CSV extension because the current functions are pretty broken IMHO: https://gitlab.com/Girgias/csv-php-extension/

@CameronHall
Copy link
Contributor Author

CameronHall commented Nov 6, 2020

Thanks for your keen eye @Girgias.

I don't know if this needs an RFC or not

I was going to ask about that in my original post. I suppose we can wait for some opinions :)

@cmb69
Copy link
Member

cmb69 commented Nov 12, 2020

IMHO it's bad to add more paramters to fputcsv() (especially wrt. $escape_character which was a bad idea in the first place). Anyhow, I suggest to write to the internals mailing list for better visibility.

@nyamsprod
Copy link
Contributor

If you are adding this extra parameter on fputcsv wouldn't it be more appropriate to make sure fgetcsv and it's implementation in SplFileObject gets the same treatment which means adding the functionality also when reading a csv 🤔.
I also share the concern expressed by @cmb69 we are adding maybe to many parameters to fput/fget(csv)

@cmb69
Copy link
Member

cmb69 commented Nov 17, 2020

If you are adding this extra parameter on fputcsv wouldn't it be more appropriate to make sure fgetcsv and it's implementation in SplFileObject gets the same treatment which means adding the functionality also when reading a csv

fgetcsv() already happily accepts either LF or CRLF (can be even mixed for a single file). :)

@nyamsprod
Copy link
Contributor

I also share the concern expressed by @cmb69 we are adding maybe to many parameters to fput/fget(csv)

A counterpoint to this would be that PHP8 will introduced named arguments so developers could well do

<?php

$resource = fopen('file.csv', 'w');
$fields = ['aaa', 'bbb', 'ccc', 'dddd'];
$endOfLine = "\r\n";

fputcsv( handle: $resource, fields: $fields, eol: $endOfLine);

Not my favorite thing but named arguments could reduce the issue 🤔

@CameronHall
Copy link
Contributor Author

Thanks for your responses everyone :)

As @cmb69 suggested, I've emailed internals[1] to see if we can get any further input from the community/team. I'm merely stating this for reference, and it's not an attempt to hurry things along - I'm content with things moving at their own pace :)

[1] https://externals.io/message/112248

@CameronHall CameronHall force-pushed the PHP-8.0 branch 2 times, most recently from 5a97b45 to 084260c Compare February 28, 2021 12:50
@CameronHall CameronHall requested a review from Girgias March 17, 2021 04:20
@Girgias
Copy link
Member

Girgias commented Mar 17, 2021

Might make more sense to use a zend_string * instead of a char* although inconsistent with the rest of the API it does make using a NUL byte for the EOL sequence possible (and also add a test for that)

@CameronHall
Copy link
Contributor Author

Thanks @Girgias. Would you mind teaching me how to do that? I tried working with zend_string initially but had a lot of trouble.

@Girgias
Copy link
Member

Girgias commented Mar 17, 2021

Thanks @Girgias. Would you mind teaching me how to do that? I tried working with zend_string initially but had a lot of trouble.

Did you have a look at https://www.phpinternalsbook.com/php7/internal_types/strings/zend_strings.html ? Otherwise follow-up by email. :-)

Copy link
Member

@Girgias Girgias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks rather good, but still don't know if the list is going to be more responsive now.

@CameronHall
Copy link
Contributor Author

Looks rather good, but still don't know if the list is going to be more responsive now.

I'll bump the list one more time, but perhaps this is a lost cause and I should shift my focus to your ext-csv and then RFC that into the core - assuming that's your plan?

@Girgias
Copy link
Member

Girgias commented Mar 18, 2021

Looks rather good, but still don't know if the list is going to be more responsive now.

I'll bump the list one more time, but perhaps this is a lost cause and I should shift my focus to your ext-csv and then RFC that into the core - assuming that's your plan?

I would expect this to take longer as a process for ext-csv to be accepted as a bundled extension, so if nobody replies (other than me) I'll take it that we can merge this as a convenience.

@cmb69
Copy link
Member

cmb69 commented Mar 18, 2021

I don't really know whether I like this change or not. Following a standard which is informational only (and seriously underspecified anyway) per se is not a good argument. That some software may require CRLF is, though. Still, I don't like the additional parameter, and adding it will make it harder to get rid of $escape_char, but that may not happen ever, so …

@Girgias
Copy link
Member

Girgias commented Mar 18, 2021

I don't really know whether I like this change or not. Following a standard which is informational only (and seriously underspecified anyway) per se is not a good argument. That some software may require CRLF is, though. Still, I don't like the additional parameter, and adding it will make it harder to get rid of $escape_char, but that may not happen ever, so …

I mean that's one reason why I went and started from scratch, if there was a way to specify that this can only be used as a named param this would allow us to drop $escape_char whenever we want, but I don't think we have any thing of that sort?

@cmb69
Copy link
Member

cmb69 commented Mar 18, 2021

The point is rather that some may rely on that $escape_char stuff, so removal may face resistance. It may indeed be better to introduce an alternative, such as ext/csv.

@CameronHall CameronHall requested a review from Girgias March 29, 2021 02:10
Copy link
Member

@Girgias Girgias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you squash-rebase onto master?

I'll make a follow-up commit to clean-up the char/length and zend_string* disparity

fputcsv does not terminate lines correctly as per RFC 41801[1]. After adding a new parameter fputcsv may now use a user defined line ending,. In order to maintain backwards compatibility fputcsv() still terminates lines with "\n" by default.

Also fixes: #46367[2], #62770[3]
Ref: #42357[4]

[1] <https://tools.ietf.org/html/rfc4180>
[2] <https://bugs.php.net/bug.php?id=46367>
[3] <https://bugs.php.net/bug.php?id=62770>
[4] <https://bugs.php.net/bug.php?id=42357>
@CameronHall
Copy link
Contributor Author

Thanks @Girgias all done 😄

@Girgias Girgias merged commit 5b29eba into php:master Mar 29, 2021
jrfnl added a commit to PHPCompatibility/PHPCompatibility that referenced this pull request Mar 9, 2022
…tcsv()` function

> **Standard**
>
> `fputcsv()` now accepts a new `eol` argument which allows to define a custom End of Line sequence, the default remains the same and is `"\n"`.

Includes unit tests.

Refs:
* https://www.php.net/manual/en/migration81.new-features.php#migration81.new-features.standard
* php/php-src#6403
* php/php-src@5b29eba
@theking2
Copy link

This most probably is not the right place, anymore But the RFC 4180 specifies

  1. The last record in the file may or may not have an ending line
    break.

I ran in a problem where a upstream system specifically did not accept a record separator after the last record and I ended up programming the fputcsv in php. I would be swell to have a flag to omit the last record separator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants