Skip to content

Handle libxml2 OOM more consistently #11927

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Dec 4, 2023
Merged

Handle libxml2 OOM more consistently #11927

merged 3 commits into from
Dec 4, 2023

Conversation

nielsdos
Copy link
Member

@nielsdos nielsdos commented Aug 9, 2023

See php/doc-en#1741 for some background

This is a continuation of commit c2a58ab, in which several OOM error handling was converted to throwing an INVALID_STATE_ERR DOMException. Some places were missed and they still returned false without an exception, or threw a PHP_ERR DOMException.
Convert all of these to INVALID_STATE_ERR DOMExceptions. This also reduces confusion of users going through documentation [1].

Unfortunately, not all node creations are checked for a NULL pointer. Some places therefore will not do anything if an OOM occurs (well, except crash).
On the one hand it's nice to handle these OOM cases. On the other hand, this adds some complexity and it's very unlikely to happen in the real world. But then again, "unlikely" situations have caused trouble before. Ideally all cases should be checked.

[1] php/doc-en#1741

NOTE: since we don't handle all OOM cases, maybe the wording in UPGRADING should be relaxed.

@nielsdos
Copy link
Member Author

nielsdos commented Aug 9, 2023

Looks like this also got rid of the last use of PHP_ERR (which is non-standard, and vague).

@nielsdos
Copy link
Member Author

nielsdos commented Aug 9, 2023

More weirdness: this got rid of the last way to return false from PHP_METHOD(DOMImplementation, createDocument). However, it looks like it should be able to return false because there's still another error condition, but that ignores the strict error property:

if (errorcode != 0) {
if (localname != NULL) {
xmlFree(localname);
}
php_dom_throw_error(errorcode, 1);
RETURN_THROWS();
}

Notice the hardcoded 1 in php_dom_throw_error. I guess that should be changed, no?

Copy link
Member

@Girgias Girgias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I suppose the PHP_ERR constant can be added to the deprecation chop board.

Comment on lines +872 to 870
xmlFreeNode(nodep);
php_dom_throw_error(errorcode, dom_get_strict_error(intern->document));
RETURN_FALSE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably makes sense to rewrite this code prior to the if(errorcode == 0) and use this as a guard as the only other case that seems to fail as far as I can see is on line 862.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also nsptr = dom_get_ns(nodep, uri, &errorcode, prefix); that has an out-parameter that can set the errorcode. And on line 844 the error code can also be set initially. I think I'm going to keep it as-is because it has a single easy-to-spot error handling block for the errorcode this way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right 👍

if (errorcode != 0) {
if (nodep != NULL) {
xmlFreeProp((xmlAttrPtr) nodep);
}
xmlFreeProp((xmlAttrPtr) nodep);
php_dom_throw_error(errorcode, dom_get_strict_error(intern->document));
RETURN_FALSE;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same reply as above ^^ :)

@Girgias
Copy link
Member

Girgias commented Aug 10, 2023

More weirdness: this got rid of the last way to return false from PHP_METHOD(DOMImplementation, createDocument). However, it looks like it should be able to return false because there's still another error condition, but that ignores the strict error property:

if (errorcode != 0) {
if (localname != NULL) {
xmlFree(localname);
}
php_dom_throw_error(errorcode, 1);
RETURN_THROWS();
}

Notice the hardcoded 1 in php_dom_throw_error. I guess that should be changed, no?

Probably? I need to go I'll have another think

@Girgias
Copy link
Member

Girgias commented Aug 10, 2023

Okay, I think the throw is correct for createDocument().

Looking at the spec, the interesting part is step 3, which says:

If qualifiedName is not the empty string, then set element to the result of running the internal createElementNS steps, given document, namespace, qualifiedName, and an empty dictionary.

And considering the implementation is simply:

The createElementNS(namespace, qualifiedName, options) method steps are to return the result of running the internal createElementNS steps, given this, namespace, qualifiedName, and options.

However, this can only return false if the DOM document sets strictErrorChecking to false, but by default it is true and the empty document is created during the call, so it is impossible to toggle this off.

It seems the strictErrorChecking property is a DOM Core Level 3 feature that has been deprecated and removed from browsers, and I think we should put this on the chopping board ASAP as it would clean up the API quite a lot.

@nielsdos
Copy link
Member Author

However, this can only return false if the DOM document sets strictErrorChecking to false, but by default it is true and the empty document is created during the call, so it is impossible to toggle this off.

Hmm. The first step of createElementNS is https://dom.spec.whatwg.org/#validate-and-extract . This specifies things like the conditions for the NAMESPACE_ERR that we throw. It is executed before the creation of the new document in step 4, seemingly within the context of the current document (because we don't have a new one yet, so which other context could it be?). It's a bit unclear though and I'm not convinced my reasoning is right.
Of course, the strict error thing doesn't exist in the live spec. And looking at the DOM3 spec: https://www.w3.org/TR/2003/WD-DOM-Level-3-Core-20030226/DOM3-Core.html#core-Level-2-Core-DOM-createDocument is even more vague. So this doesn't clear things up either.

It seems the strictErrorChecking property is a DOM Core Level 3 feature that has been deprecated and removed from browsers, and I think we should put this on the chopping board ASAP as it would clean up the API quite a lot.

Yeah it would be a lot easier for sure. I'm not sure this is an easy deprecation though, I have no idea about how widely people use this.

As a side note, we must be careful deprecating things that were removed for the live spec. The current live spec basically only takes into account HTML. So XML properties such as xmlVersion and xmlStandalone for example don't exist in the live spec anymore. However, ext/dom supports both HTML and XML documents. Therefore we must keep these things alive such that XML functionality is not lost.

@nielsdos
Copy link
Member Author

Sorry, I clearly am not awake anymore. True facepalm moment here.
Your reasoning is right, the context is the new document. Therefore, it must throw. So discard the reasoning above please.
The part I wrote about deprecations still holds though :)

@staabm
Copy link
Contributor

staabm commented Aug 11, 2023

Sorry for asking this question in between: reading all these comments I get the feeling DOM could run out-of-memory but PHP would still work (and handle this error) without running out of memory itself?

is PHP Userland code expected to handle a possible DOM out-of-memory error?

@Girgias
Copy link
Member

Girgias commented Aug 11, 2023

Sorry for asking this question in between: reading all these comments I get the feeling DOM could run out-of-memory but PHP would still work (and handle this error) without running out of memory itself?

is PHP Userland code expected to handle a possible DOM out-of-memory error?

In theory yes, as the ext/dom memory usage does not count against the PHP memory limit (due to limitations with libxml not allowing to pass a custom allocator). So an allocation could fail, return false/throw an exception and pass back to PHP if it hasn't reached its own memory limit.

@nielsdos
Copy link
Member Author

In theory they should be handled. In practice, if you're OOM in libxml2, then you're probably screwed anyway because the system allocator ran out of memory. Similarly for other libraries that PHP uses under the hood.

The goal is unifying all of the OOM conditions into exceptions is to make the return value semantics easier, and more correct w.r.t. the DOM spec. Exceptions will also make sure you can more easily gracefully shut down the request, which should probably be what happens when you run out of memory.

@staabm
Copy link
Contributor

staabm commented Aug 11, 2023

I see thx. the same is true for preg_* then, right?

refs phpstan/phpstan-src#2560 (comment)

@nielsdos
Copy link
Member Author

@Girgias do I change the return type of createDocument in this PR? It's with a @return, so IIRC this is a docs-only annotation that does not actually enforce a return type?
The thing is though that I didn't check for other potential spec-compliance issues. Perhaps I should leave the return type as-is and only change it when I (hopefully) do my spec fix work in 8.4?

@nielsdos
Copy link
Member Author

I see thx. the same is true for preg_* then, right?

refs phpstan/phpstan-src#2560 (comment)

That memory is also allocated via the system allocator, so you're probably screwed there too if you run OOM.
The reason we can change it in ext/dom to an exception is because we follow a specification in ext/dom, whereas we don't have a specification of how the methods should look like in ext/pcre.

@Girgias
Copy link
Member

Girgias commented Aug 11, 2023

@Girgias do I change the return type of createDocument in this PR? It's with a @return, so IIRC this is a docs-only annotation that does not actually enforce a return type? The thing is though that I didn't check for other potential spec-compliance issues. Perhaps I should leave the return type as-is and only change it when I (hopefully) do my spec fix work in 8.4?

Would not make this change and leave it for later as currently it's not even a tentative type. But maybe @kocsismate can clarify?

@@ -693,14 +694,13 @@ int dom_node_prefix_write(dom_object *obj, zval *newval)
}
if (ns == NULL) {
ns = xmlNewNs(nsnode, nodep->ns->href, (xmlChar *)prefix);
if (UNEXPECTED(ns == NULL)) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: recheck if OOM is the only way in this path that this can return NULL.

@@ -172,7 +172,8 @@ PHP_METHOD(DOMImplementation, createDocument)
if (localname != NULL) {
xmlFree(localname);
}
RETURN_FALSE;
php_dom_throw_error(INVALID_STATE_ERR, /* strict */ true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes it possible to add the tentative DOMDocument return type :)

This is a continuation of commit c2a58ab, in which several OOM error
handling was converted to throwing an INVALID_STATE_ERR DOMException.
Some places were missed and they still returned false without an
exception, or threw a PHP_ERR DOMException.
Convert all of these to INVALID_STATE_ERR DOMExceptions. This also
reduces confusion of users going through documentation [1].

Unfortunately, not all node creations are checked for a NULL pointer.
Some places therefore will not do anything if an OOM occurs (well,
except crash).
On the one hand it's nice to handle these OOM cases.
On the other hand, this adds some complexity and it's very unlikely to
happen in the real world. But then again, "unlikely" situations have
caused trouble before. Ideally all cases should be checked.

[1] php/doc-en#1741
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants