The following steps form the HTML fragment
serialization algorithm. The algorithm takes as input a DOM
Element
or Document
, referred to as the node, and either returns a string or raises an
exception.
This algorithm serializes the children of the node being serialized, not the node itself.
Let s be a string, and initialise it to the empty string.
For each child node of the node, in tree order, run the following steps:
Let current node be the child node being processed.
Append the appropriate string from the following list to s:
Element
Append a U+003C LESS-THAN SIGN (<
)
character, followed by the element's tag name. (For nodes created by
the HTML parser, Document.createElement()
, or Document.renameNode()
, the tag name will be
lowercase.)
For each attribute that the element has, append a U+0020 SPACE
character, the attribute's name (which, for attributes set by the HTML parser or by Element.setAttributeNode()
or Element.setAttribute()
, will be lowercase), a U+003D
EQUALS SIGN (=
) character, a U+0022 QUOTATION
MARK ("
) character, the attribute's
value, escaped
as described below in attribute mode, and a second U+0022
QUOTATION MARK ("
) character.
While the exact order of attributes is UA-defined, and may depend on factors such as the order that the attributes were given in the original markup, the sort order must be stable, such that consecutive invocations of this algorithm serialize an element's attributes in the same order.
Append a U+003E GREATER-THAN SIGN (>
)
character.
If current node is an area
, base
,
basefont
, bgsound
, br
, col
,
embed
, frame
,
hr
, img
, input
, link
, meta
, param
, spacer
, or
wbr
element, then continue on to the next child node at
this point.
If current node is a pre
textarea
, or
listing
element, append a U+000A LINE FEED (LF)
character.
Append the value of running the HTML
fragment serialization algorithm on the current
node element (thus recursing into this algorithm for that
element), followed by a U+003C LESS-THAN SIGN (<
) character, a U+002F SOLIDUS (/
) character, the element's tag name again, and
finally a U+003E GREATER-THAN SIGN (>
)
character.
Text
or CDATASection
node
If one of the ancestors of current node is a
style
, script
, xmp
, iframe
, noembed
,
noframes
, noscript
, or plaintext
element, then append the value of current node's
data
DOM attribute literally.
Otherwise, append the value of current node's
data
DOM attribute, escaped as described below.
Comment
Append the literal string <!--
(U+003C LESS-THAN
SIGN, U+0021 EXCLAMATION MARK, U+002D HYPHEN-MINUS, U+002D
HYPHEN-MINUS), followed by the value of current
node's data
DOM attribute, followed by
the literal string -->
(U+002D HYPHEN-MINUS, U+002D
HYPHEN-MINUS, U+003E GREATER-THAN SIGN).
ProcessingInstruction
Append the literal string <?
(U+003C LESS-THAN
SIGN, U+003F QUESTION MARK), followed by the value of current node's target
DOM
attribute, followed by a single U+0020 SPACE character, followed by
the value of current node's data
DOM attribute, followed by a single U+003E
GREATER-THAN SIGN character ('>').
DocumentType
Append the literal string <!DOCTYPE
(U+003C
LESS-THAN SIGN, U+0021 EXCLAMATION MARK, U+0044 LATIN CAPITAL LETTER
D, U+004F LATIN CAPITAL LETTER O, U+0043 LATIN CAPITAL LETTER C,
U+0054 LATIN CAPITAL LETTER T, U+0059 LATIN CAPITAL LETTER Y, U+0050
LATIN CAPITAL LETTER P, U+0045 LATIN CAPITAL LETTER E), followed by
a space (U+0020 SPACE), followed by the value of current node's name
DOM
attribute, followed by the literal string >
(U+003E
GREATER-THAN SIGN).
Other node types (e.g. Attr
) cannot occur as
children of elements. If, despite this, they somehow do occur, this
algorithm must raise an INVALID_STATE_ERR
exception.
The result of the algorithm is the string s.
Escaping a string (for the purposes of the
algorithm above) consists of replacing any occurrences of the "&
" character by the string "&
", any occurrences of the "<
" character by the string "<
", any occurrences of the ">
" character by the string ">
", any occurrences of the U+00A0 NO-BREAK SPACE
character by the string "
", and, if the
algorithm was invoked in the attribute mode, any occurrences of the
""
" character by the string ""
".
Entity reference nodes are assumed to be expanded by the user agent, and are therefore not covered in the algorithm above.
It is possible that the output of this algorithm, if parsed
with an HTML parser, will not return the original
tree structure. For instance, if a textarea
element to which
a Comment
node has been appended is serialized and
the output is then reparsed, the comment will end up being displayed in
the text field. Similarly, if, as a result of DOM manipulation, an element
contains a comment that contains the literal string "-->
", then when the result of serializing the element
is parsed, the comment will be truncated at that point and the rest of the
comment will be interpreted as markup. More examples would be making a
script
element contain a text node
with the text string "</script>
", or having a p
element that contains a ul
element (as the ul
element's start tag would imply the
end tag for the p
).
The following steps form the HTML fragment
parsing algorithm. The algorithm takes as input a DOM
Element
, referred to as the context
element, which gives the context for the parser, as well as input, a string to parse, and returns a list of zero or
more nodes.
Parts marked fragment case in algorithms in the parser section are parts that only occur if the parser was created for the purposes of this algorithm. The algorithms have been annotated with such markings for informational purposes only; such markings have no normative weight. If it is possible for a condition described as a fragment case to occur even when the parser wasn't created for the purposes of handling this algorithm, then that is an error in the specification.
Create a new Document
node, and mark it as being an HTML document.
Create a new HTML parser, and associate it with
the just created Document
node.
Set the HTML parser's tokenisation stage's content model flag according to the context element, as follows:
title
or
textarea
element
style
, script
, xmp
, iframe
, noembed
, or
noframes
element
noscript
element
plaintext
element
Let root be a new html
element with no attributes.
Append the element root to the
Document
node created above.
Set up the parser's stack of open elements so that it contains just the single element root.
Reset the parser's insertion mode appropriately.
The parser will reference the context element as part of that algorithm.
Set the parser's form
element
pointer to the nearest node to the context
element that is a form
element (going straight up the
ancestor chain, and including the element itself, if it is a
form
element), or, if there is no such form
element, to null.
Place into the input stream for the HTML parser just created the input.
Start the parser and let it run until it has consumed all the characters just inserted into the input stream.
Return all the child nodes of root, preserving the document order.