Skip to content

DOMAttr unescapes character reference #8388

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
t-tera opened this issue Apr 17, 2022 · 10 comments
Closed

DOMAttr unescapes character reference #8388

t-tera opened this issue Apr 17, 2022 · 10 comments

Comments

@t-tera
Copy link

t-tera commented Apr 17, 2022

Description

The following code:

<?php
$node = new DOMAttr('foo','bar');
$node->nodeValue = 'xx&#x31;yy';
var_dump($node->nodeValue);

Resulted in this output:

xx1yy

But I expected this output instead:

xx&#x31;yy

As shown above, setter of nodeValue unescapes character references.
Because of this, I need to do the following:

$node->nodeValue = htmlspecialchars("xx&#x31;yy");
var_dump($node->nodeValue);

This outputs xx&#x31;yy. The setter should not unescape character references.

PHP Version

PHP 7.4.3

Operating System

Ubuntu 20.04

@t-tera t-tera changed the title DOMAttr unescapes HTML reference DOMAttr unescapes character reference Apr 17, 2022
@iluuu1994
Copy link
Member

iluuu1994 commented Apr 17, 2022

I'm on the phone so I can't verify, but I don't think this has anything to do with DOMAttr but rather just the fact that var_dump doesn't escape its output and thus the browser converts it.

Edit: Also note that PHP 7.4 is unsupported.

@t-tera
Copy link
Author

t-tera commented Apr 17, 2022

No. I'm testing it on CLI.

# php -r '$node = new DOMAttr("foo","bar"); $node->nodeValue = "xx&#x31;yy"; var_dump($node->nodeValue);'
string(5) "xx1yy"

@damianwadley
Copy link
Member

damianwadley commented Apr 17, 2022

Bug confirmed https://3v4l.org/Xf27Y , given that Attrs in a browser don't work the same way and that I can't see either the HTML 4 or Living Standard specifying any transformations should happen, but this is likely an issue with libxml and not PHP.

@cmb69
Copy link
Member

cmb69 commented Apr 17, 2022

Writing the nodeValue is implemented by calling xmlNodeSetContentLen(), and its documentation states:

Replace the content of a node. NOTE: @content is supposed to be a piece of XML CDATA, so it allows entity references, but XML special chars need to be escaped first by using xmlEncodeEntitiesReentrant() resp. xmlEncodeSpecialChars().

Our implementation does not escape, though.

Reading the nodeValue is implemented by calling xmlNodeGetContent(), and its documentation states:

Entity references are substituted.

Changing this would cause a pretty serious BC break.

@cmb69
Copy link
Member

cmb69 commented Apr 20, 2022

I think this qualifies as bug, but due to the BC break when we fix it, I'm reluctant to do this.

@beberlei, thoughts?

@nielsdos
Copy link
Member

nielsdos commented Jun 5, 2023

Fixed via GH-10132.

@nielsdos nielsdos closed this as completed Jun 5, 2023
@tstarling
Copy link
Contributor

Do you think it needs a changelog table entry in the manual?

@nielsdos
Copy link
Member

nielsdos commented Jun 6, 2023

Yes. This should happen for us by the docs maintainers because there's an UPGRADING entry. But I opened PR php/doc-en#2520 already so it will certainly not be forgotten.

@nielsdos
Copy link
Member

Reverted the fix because of quite big BC breaks. See #11469

@nielsdos nielsdos reopened this Jun 19, 2023
@nielsdos
Copy link
Member

nielsdos commented Mar 9, 2024

This is fixed in the new opt-in spec-compliance mode, which was merged in #13031.
For more information about the opt-in mode, please see https://wiki.php.net/rfc/opt_in_dom_spec_compliance.
In short: the behaviour of the old DOM classes are unaffected for backwards-compatibility reasons. There are new DOM classes where your code is correctly handled.

@nielsdos nielsdos closed this as completed Mar 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants