-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Set DOMAttr::$value and DOMAttr::$nodeValue without expanding entities #10132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I ran all the ext/dom tests under valgrind. There were three existing failures against master: ext/dom/tests/dom003.phpt ext/dom/tests/dom_set_attr_node.phpt ext/dom/tests/domelement.phpt . No new failures. The unlink loop is tedious, but at least it modifies the tree using a stable API, and doesn't crash. We can't do exactly what xmlNodeSetContentLen() is doing, because the UPDATE_LAST_CHILD_AND_PARENT() macro is private. Downstream bug: https://phabricator.wikimedia.org/T324408 |
While As such I would prefer if we introduce a new method instead that reaches the spec behavior, DOMAttr::setValue for example. If we consider changing DOMAttr::$value write behavior at the very least we should first emit a warning that the behavior changes in PHP 8 and then make the change in 9. |
Note that the property does not round-trip at present, it reads back a different value to what you put in.
That seems like a bug to me. It may lead to security vulnerabilities in applications, because if you sanitize JS or CSS and then put it in an attribute this way, and then serialize the result, user input may escape from a quoted string inside the JS or CSS attribute. My suspicion is that approximately nobody uses this, that's why nobody has reported this before. But if you're really sure it's not a bug, and you're offering to merge DOMAttr::setValue(), then I will do that instead. |
I'd like to get @cmb69 input on what to do |
I tend to agree that the current behavior is a bug, although we probably should not refer to a working draft of a W3C specification, but rather to the WHATWG living standard (but I have no idea how the latter expects to handle that). Anyhow, even the libxml2 documentation on
Actually, using <?php
$doc = new DOMDocument;
$elt = $doc->createElement('elt');
$doc->appendChild($elt);
$elt->setAttribute('a','&');
$attr = $elt->getAttributeNode('a');
$attr->value = "1\x002";
var_dump($doc->saveXML($elt)); outputs:
So the NUL byte truncated the attribute's value anyway. Note that <?php
$doc = new DOMDocument;
$elt = $doc->createElement('elt', 'foo & bar & &');
var_dump($doc->saveXML($elt)); outputs:
This has been reported ages ago, and finally fixed as doc bug only. Maybe we should handle this issue in the same way, or apply one of @beberlei's suggestions for BC. |
I've read the whatwg HTML and DOM specs several times and entities should:
In the spec for replacing data of CharacterData nodes, you won't find anything about entities. In the spec for setting an attribute's value, you won't find anything either. So the current behaviour is clearly a bug. |
Agreed that the behavior is a bug according to DOM spec. Using an up-to-date browser console as an executable version of "the latest WHATWG spec":
I don't think setting a value has ever decoded entities in any version of the DOM spec. Entities are only decoded in the |
I did read that doc comment. Using xmlEncodeSpecialChars() is a bad idea for the following additional reasons:
In the old DOM specs up to DOM Level 3 Core, an Attr node in an XML document has an arbitrary number of children, whereas an Attr node in an HTML document just has a string value. The WHATWG DOM favours the HTML data model. It specifies that the value of an Attr is a string, and removes the old discussion of XML Attr children. In the old DOM specs, assigning to the value of an Attr node creates a single Text node and replaces all children of the Attr with that Text node. In the WHATWG DOM spec, assigning to the value of an Attr node just updates the string associated with the Attr. Never in any DOM spec is it suggested that assigning to the value of an Attr will create an Entity node child for every ampersand contained in the value. But that is what xmlNodeSetContent(xmlEncodeSpecialChars()) would do. In theory it would be nice if PHP followed the WHATWG data model, but in practice it is tied to libxml2 in which an Attr node has an arbitrary number of children. The nearest available approximation to the WHATWG spec is to remove all children of the Attr and insert a single Text node child with the specified value string. That is what my patch does. |
To assess the need for bugwards compatibility, I did the following searches:
Findings:
In summary, the code review supports my intuition that fixing this bug in PHP will be beneficial for users, implicitly fixing bugs in applications which depend on the broken feature. |
1aa2c16
to
14e4bef
Compare
The situation with DOMAttr::$nodeValue is very much the same, so I added the fix to the PR. |
Ok, then lets consider it a bug and fix it. Thank you for the changes @tstarling |
I wonder why AppVeyorCI has not been triggered. By the way, the missing documentation is lying around for a long time: php/doc-en#923. Still, it makes sense to fix it in the next minor version. |
It seems that |
@ThomasWeinert, oh, that is likely a regression due to b2954c6. Could you please open another ticket? |
Related to GH-8388. @tstarling Are you willing to rebase this? The code looks good. |
The manual refers to the DOM Level 3 Core spec which says: "On setting, this creates a Text node with the unparsed contents of the string. I.e. any characters that an XML processor would recognize as markup are instead treated as literal text." PHP is expanding entities when DOMAttr::value is set, which is non-compliant and is a difference in behaviour compared to browser DOM implementations. So, when value is set, remove all children of the attribute node. Then create a single text node and insert that as the only child of the attribute. Add tests.
A few callers remove all children of a node. The way it was done in node.c was unsafe, because it left nodep->last dangling. It just happens to not crash if xmlNodeSetContent() is called immediately afterwards.
The length is passed to xmlStrndup(), which also adds 1, and adds a null terminator past the end. It worked because the length is not actually stored. Strings in libxml2 are null terminated. Passing the length just avoids a call to strlen().
Done. |
This is an adapted re-introduction of phpGH-10132. Co-authored-by: Tim Starling <tstarling@wikimedia.org>
This is an adapted re-introduction of phpGH-10132. Co-authored-by: Tim Starling <tstarling@wikimedia.org>
This is an adapted re-introduction of phpGH-10132. Co-authored-by: Tim Starling <tstarling@wikimedia.org>
This is an adapted re-introduction of phpGH-10132. Co-authored-by: Tim Starling <tstarling@wikimedia.org>
This is an adapted re-introduction of phpGH-10132. Co-authored-by: Tim Starling <tstarling@wikimedia.org>
This is an adapted re-introduction of phpGH-10132. Co-authored-by: Tim Starling <tstarling@wikimedia.org>
This is an adapted re-introduction of phpGH-10132. Co-authored-by: Tim Starling <tstarling@wikimedia.org>
This is an adapted re-introduction of phpGH-10132. Co-authored-by: Tim Starling <tstarling@wikimedia.org>
This is an adapted re-introduction of phpGH-10132. Co-authored-by: Tim Starling <tstarling@wikimedia.org>
This is an adapted re-introduction of phpGH-10132. Co-authored-by: Tim Starling <tstarling@wikimedia.org>
This is an adapted re-introduction of phpGH-10132. Co-authored-by: Tim Starling <tstarling@wikimedia.org>
This is an adapted re-introduction of phpGH-10132. Co-authored-by: Tim Starling <tstarling@wikimedia.org>
This is an adapted re-introduction of phpGH-10132. Co-authored-by: Tim Starling <tstarling@wikimedia.org>
This is an adapted re-introduction of phpGH-10132. Co-authored-by: Tim Starling <tstarling@wikimedia.org>
This is an adapted re-introduction of phpGH-10132. Co-authored-by: Tim Starling <tstarling@wikimedia.org>
This is an adapted re-introduction of phpGH-10132. Co-authored-by: Tim Starling <tstarling@wikimedia.org>
This is an adapted re-introduction of phpGH-10132. Co-authored-by: Tim Starling <tstarling@wikimedia.org>
This is an adapted re-introduction of phpGH-10132. Co-authored-by: Tim Starling <tstarling@wikimedia.org>
This is an adapted re-introduction of phpGH-10132. Co-authored-by: Tim Starling <tstarling@wikimedia.org>
This is an adapted re-introduction of phpGH-10132. Co-authored-by: Tim Starling <tstarling@wikimedia.org>
This is an adapted re-introduction of phpGH-10132. Co-authored-by: Tim Starling <tstarling@wikimedia.org>
The manual refers to the DOM Level 3 Core spec which says:
"On setting, this creates a Text node with the unparsed contents of the
string. I.e. any characters that an XML processor would recognize as
markup are instead treated as literal text."
PHP is expanding entities when DOMAttr::value is set, which is
non-compliant and is a difference in behaviour compared to browser DOM
implementations.
So, when value is set, remove all children of the attribute node. Then
create a single text node and insert that as the only child of the
attribute.
Add tests.