HTML 5

W3C

HTML 5

A vocabulary and associated APIs for HTML and XHTML

← 4.7 Offline Web applicationsTable of contents4.11 Structured client-side storage →

4.8 Session history and navigation

4.8.1 The session history of browsing contexts

The sequence of Documents in a browsing context is its session history.

History objects provide a representation of the pages in the session history of browsing contexts. Each browsing context has a distinct session history.

Each Document object in a browsing context's session history is associated with a unique instance of the History object, although they all must model the same underlying session history.

The history attribute of the Window interface must return the object implementing the History interface for that Window object's active document.

History objects represent their browsing context's session history as a flat list of session history entries. Each session history entry consists of either a URI or a state object, or both, and may in addition have a title, a Document object, form data, a scroll position, and other information associated with it.

This does not imply that the user interface need be linear. See the notes below.

URIs without associated state objects are added to the session history as the user (or script) navigates from page to page.

A state object is an object representing a user interface state.

Pages can add state objects between their entry in the session history and the next ("forward") entry. These are then returned to the script when the user (or script) goes back in the history, thus enabling authors to use the "navigation" metaphor even in one-page applications.

Every Document in the session history is defined to have a last activated entry, which is the state object entry associated with that Document which was most recently activated. Initially, the last activated entry of a Document must be the first entry for the Document, representing the fact that no state object entry has yet been activated.

At any point, one of the entries in the session history is the current entry. This is the entry representing the active document of the browsing context. The current entry is usually an entry for the location of the Document. However, it can also be one of the entries for state objects added to the history by that document.

Entries that consist of state objects share the same Document as the entry for the page that was active when they were added.

Contiguous entries that differ just by fragment identifier also share the same Document.

All entries that share the same Document (and that are therefore merely different states of one particular document) are contiguous by definition.

User agents may discard the DOMs of entries other than the current entry that are not referenced from any script, reloading the pages afresh when the user or script navigates back to such pages. This specification does not specify when user agents should discard pages' DOMs and when they should cache them. See the section on the load and unload events for more details.

Entries that have had their DOM discarded must, for the purposes of the algorithms given below, act as if they had not. When the user or script navigates back or forwards to a page which has no in-memory DOM objects, any other entries that shared the same Document object with it must share the new object as well.

When state object entries are added, a URI can be provided. This URI is used to replace the state object entry if the Document is evicted.

When a user agent discards the DOM from an entry in the session history, it must also discard all the entries that share that Document but do not have an associated URI (i.e. entries that only have a state object). Entries that shared that Document object but had a state object and have a different URI must then have their state objects removed. Removed entries are not recreated if the user or script navigates back to the page. If there are no state object entries for that Document object then no entries are removed.

when an entry is discarded, any frozen timers, intervals, XMLHttpRequests, database transactions, etc, must be killed

4.8.2 The History interface

interface History {
  readonly attribute long length;
  void go(in long delta);
  void go();
  void back();
  void forward();
  void pushState(in DOMObject data, in DOMString title);
  void pushState(in DOMObject data, in DOMString title, in DOMString url);
  void clearState();
};

The length attribute of the History interface must return the number of entries in this session history.

The actual entries are not accessible from script.

The go(delta) method causes the UA to move the number of steps specified by delta in the session history.

If the index of the current entry plus delta is less than zero or greater than or equal to the number of items in the session history, then the user agent must do nothing.

If the delta is zero, then the user agent must act as if the location.reload() method was called instead.

Otherwise, the user agent must cause the current browsing context to traverse the history to the specified entry. The specified entry is the one whose index equals the index of the current entry plus delta.

When the user navigates through a browsing context, e.g. using a browser's back and forward buttons, the user agent must translate this action into the equivalent invocations of the history.go(delta) method on the various affected window objects.

Some of the other members of the History interface are defined in terms of the go() method, as follows:
Member Definition
go() Must do the same as go(0)
back() Must do the same as go(-1)
forward() Must do the same as go(1)

The pushState(data, title, url) method adds a state object to the history.

When this method is invoked, the user agent must first check the third argument. If a third argument is specified, then the user agent must verify that the third argument is a valid URI or IRI (as defined by RFC 3986 and 3987), and if so, that, after resolving it to an absolute URI, it is either identical to the document's URI, or that it differs from the document's URI only in the <query>, <abs_path>, and/or <fragment> parts, as applicable (the <query> and <abs_path> parts can only be the same if the document's URI uses a hierarchical <scheme>). If the verification fails (either because the argument is syntactically incorrect, or differs in a way not described as acceptable in the previous sentence) then the user agent must raise a security exception. [RFC3986] [RFC3987]

If the third argument passes its verification step, or if the third argument was omitted, then the user agent must remove from the session history any entries for that Document from the entry after the current entry up to the last entry in the session history that references the same Document object, if any. If the current entry is the last entry in the session history, or if there are no entries after the current entry that reference the same Document object, then no entries are removed.

Then, the user agent must add a state object entry to the session history, after the current entry, with the specified data as the state object, the given title as the title, and, if the third argument is present, the given url as the URI of the entry.

Then, the user agent must set this new entry as being the last activated entry for the Document.

Finally, the user agent must update the current entry to be the this newly added entry.

The title is purely advisory. User agents might use the title in the user interface.

User agents may limit the number of state objects added to the session history per page. If a page hits the UA-defined limit, user agents must remove the entry immediately after the first entry for that Document object in the session history after having added the new entry. (Thus the state history acts as a FIFO buffer for eviction, but as a LIFO buffer for navigation.)

The clearState() method removes all the state objects for the Document object from the session history.

When this method is invoked, the user agent must remove from the session history all the entries from the first state object entry for that Document object up to the last entry that references that same Document object, if any.

Then, if the current entry was removed in the previous step, the current entry must be set to the last entry for that Document object in the session history.

4.8.3 Activating state object entries

When an entry in the session history is activated (which happens during session traversal, as described above), the user agent must run the following steps:

  1. First, the user agent must set this new entry as being the last activated entry for the Document to which the entry belongs.

  2. If the entry is a state object entry, let state be that state object. Otherwise, the entry is the first entry for the Document; let state be null.

  3. The user agent must then fire a popstate event in no namespace on the body element using the PopStateEvent interface, with the state attribute set to the value of state. This event bubbles but is not cancelable and has no default action.

interface PopStateEvent : Event {
  readonly attribute DOMObject state;
  void initPopStateEvent(in DOMString typeArg, in boolean canBubbleArg, in boolean cancelableArg, in DOMObject stateArg);
  void initPopStateEventNS(in DOMString namespaceURIArg, in DOMString typeArg, in boolean canBubbleArg, in boolean cancelableArg, in DOMObject stateArg);
};

The initPopStateEvent() and initPopStateEventNS() methods must initialise the event in a manner analogous to the similarly-named methods in the DOM3 Events interfaces. [DOM3EVENTS]

The state attribute represents the context information for the event, or null, if the state represented is the initial state of the Document.

4.8.4 The Location interface

Each Document object in a browsing context's session history is associated with a unique instance of a Location object.

The location attribute of the HTMLDocument interface must return the Location object for that Document object.

The location attribute of the Window interface must return the Location object for that Window object's active document.

Location objects provide a representation of the URI of their document, and allow the current entry of the browsing context's session history to be changed, by adding or replacing entries in the history object.

interface Location {
  readonly attribute DOMString href;
  void assign(in DOMString url);
  void replace(in DOMString url);
  void reload();

  // URI decomposition attributes 
           attribute DOMString protocol;
           attribute DOMString host;
           attribute DOMString hostname;
           attribute DOMString port;
           attribute DOMString pathname;
           attribute DOMString search;
           attribute DOMString hash;
};

The href attribute returns the address of the page represented by the associated Document object, as an absolute IRI reference.

On setting, the user agent must act as if the assign() method had been called with the new value as its argument.

When the assign(url) method is invoked, the UA must navigate the browsing context to the specified url.

When the replace(url) method is invoked, the UA must navigate the browsing context to the specified url with replacement enabled.

Navigation for the assign() and replace() methods must be done with the browsing context of the Window object that is the script execution context of the script that invoked the method as the source browsing context.

If the script execution context of a script isn't a Window object, then it can't ever get to a Location object to call these methods.

Relative url arguments for assign() and replace() must be resolved relative to the base URI of the script that made the method call.

The Location interface also has the complement of URI decomposition attributes, protocol, host, port, hostname, pathname, search, and hash. These must follow the rules given for URI decomposition attributes, with the input being the address of the page represented by the associated Document object, as an absolute IRI reference (same as the href attribute), and the common setter action being the same as setting the href attribute to the new output value.

4.8.4.1. Security

User agents must raise a security exception whenever any of the members of a Location object are accessed by scripts whose effective script origin is not the same as the Location object's associated Document's effective script origin, with the following exceptions:

User agents must not allow scripts to override the href attribute's setter.

4.8.5 Implementation notes for session history

This section is non-normative.

The History interface is not meant to place restrictions on how implementations represent the session history to the user.

For example, session history could be implemented in a tree-like manner, with each page having multiple "forward" pages. This specification doesn't define how the linear list of pages in the history object are derived from the actual session history as seen from the user's perspective.

Similarly, a page containing two iframes has a history object distinct from the iframes' history objects, despite the fact that typical Web browsers present the user with just one "Back" button, with a session history that interleaves the navigation of the two inner frames and the outer page.

Security: It is suggested that to avoid letting a page "hijack" the history navigation facilities of a UA by abusing pushState(), the UA provide the user with a way to jump back to the previous page (rather than just going back to the previous state). For example, the back button could have a drop down showing just the pages in the session history, and not showing any of the states. Similarly, an aural browser could have two "back" commands, one that goes back to the previous state, and one that jumps straight back to the previous page.

In addition, a user agent could ignore calls to pushState() that are invoked on a timer, or from event handlers that do not represent a clear user action, or that are invoked in rapid succession.

4.9 Browsing the Web

Certain actions cause the browsing context to navigate. Navigation always involves a source browsing context, which is the browsing context which was responsible for starting the navigation.

For example, following a hyperlink, form submission, and the window.open() and location.assign() methods can all cause a browsing context to navigate.

A user agent may also provide various ways for the user to explicitly cause a browsing context to navigate.

When a browsing context is navigated, the user agent must run the following steps:

  1. Cancel any preexisting attempt to navigate the browsing context.

  2. If the new resource is the same as the current resource, but a fragment identifier has been specified, changed, or removed, then navigate to that fragment identifier and abort these steps.

  3. If the new resource is to be handled by displaying some sort of inline content, e.g. an error message because the specified scheme is not one of the supported protocols, or an inline prompt to allow the user to select a registered handler for the given scheme, then display the inline content and abort these steps.

  4. If the new resource is to be handled using a mechanism that does not affect the browsing context, then abort these steps and proceed with that mechanism instead.

  5. If the new resource is to be fetched using HTTP GET or equivalent, and if the browsing context being navigated is a top-level browsing context, then check if there are any application caches that have a manifest with the same origin as the URI in question, and that have this URI as one of their entries (excluding entries marked as foreign), and that already contain their manifest, categorized as a manifest. If so, then the user agent must then fetch the resource from the most appropriate application cache of those that match.

    Otherwise, start fetching the specified resource in the appropriate manner (e.g. performing an HTTP GET or POST operation, or reading the file from disk, or executing script in the case of a javascript: URI). If this results in a redirect, return to step 2 with the new resource.

    For example, imagine an HTML page with an associated application cache displaying an image and a form, where the image is also used by several other application caches. If the user right-clicks on the image and chooses "View Image", then the user agent could decide to show the image from any of those caches, but it is likely that the most useful cache for the user would be the one that was used for the aforementioned HTML page. On the other hand, if the user submits the form, and the form does a POST submission, then the user agent will not use an application cache at all; the submission will be made to the network.

  6. Wait for one or more bytes to be available or for the user agent to establish that the resource in question is empty. During this time, the user agent may allow the user to cancel this navigation attempt or start other navigation attempts.

  7. If the resource was not fetched from an application cache, and was to be fetched using HTTP GET or equivalent, and its URI matches the opportunistic caching namespace of one or more application caches, and the user didn't cancel the navigation attempt during the previous step, then:

    If the browsing context being navigated is a top-level browsing context, and the navigation attempt failed (e.g. the server returned a 4xx or 5xx status code or equivalent, or there was a DNS error)

    Let candidate be the fallback resource specified for the opportunistic caching namespace in question. If multiple application caches match, the user agent must use the fallback of the most appropriate application cache of those that match.

    If candidate is not marked as foreign, then the user agent must discard the failed load and instead continue along these steps using candidate as the resource.

    For the purposes of session history (and features that depend on session history, e.g. bookmarking) the user agent must use the URI of the resource that was requested (the one that matched the opportunistic caching namespace), not the fallback resource. However, the user agent may indicate to the user that the original page load failed, that the page used was a fallback resource, and what the URI of the fallback resource actually is.

    Otherwise

    Once the download is complete, if there were no errors and the user didn't cancel the request, the user agent must cache the resource in all the application caches that have a matching opportunistic caching namespace, categorized as opportunistically cached entries. Meanwhile, the user must continue along these steps.

  8. If the document's out-of-band metadata (e.g. HTTP headers), not counting any type information (such as the Content-Type HTTP header), requires some sort of processing that will not affect the browsing context, then perform that processing and abort these steps.

    Such processing might be triggered by, amongst other things, the following:

    • HTTP status codes (e.g. 204 No Content or 205 Reset Content)
    • HTTP Content-Disposition headers
    • Network errors
  9. Let type be the sniffed type of the resource.

  10. If the user agent has been configured to process resources of the given type using some mechanism other than rendering the content in a browsing context, then skip this step. Otherwise, if the type is one of the following types, jump to the appropriate entry in the following list, and process the resource as described there:

    "text/html"
    Follow the steps given in the HTML document section, and abort these steps.
    Any type ending in "+xml"
    "application/xml"
    "text/xml"
    Follow the steps given in the XML document section. If that section determines that the content is not to be displayed as a generic XML document, then proceed to the next step in this overall set of steps. Otherwise, abort these steps.
    "text/plain"
    Follow the steps given in the plain text file section, and abort these steps.
    A supported image type
    Follow the steps given in the image section, and abort these steps.
    A type that will use an external application to render the content in the browsing context
    Follow the steps given in the plugin section, and abort these steps.
  11. Otherwise, the document's type is such that the resource will not affect the browsing context, e.g. because the resource is to be handed to an external application. Process the resource appropriately.

Some of the sections below, to which the above algorithm defers in certain cases, require the user agent to update the session history with the new page. When a user agent is required to do this, it must follows the set of steps given below that is appropriate for the situation at hand. From the point of view of any script, these steps must occur atomically.

  1. pause for scripts

  2. onbeforeunload, and if present set flag that we will kill document

  3. onunload, and if present set flag that we will kill document

  4. if flag is set: reset timers, empty event queue, kill any pending transactions, kill XMLHttpRequests, etc, and set things up so that the document will be discarded asap

  5. If the navigation was initiated for entry update of an entry
    1. Replace the entry being updated with a new entry representing the new resource and its Document object and related state. The user agent may propagate state from the old entry to the new entry (e.g. scroll position).

    2. Traverse the history to the new entry.

    Otherwise
    1. Remove all the entries after the current entry in the browsing context's Document object's History object.

      This doesn't necessarily have to affect the user agent's user interface.

    2. Append a new entry at the end of the History object representing the new resource and its Document object and related state.

    3. Traverse the history to the new entry.

    4. If the navigation was initiated with replacement enabled, remove the entry immediately before the new current entry in the session history.

4.9.2 Page load processing model for HTML files

When an HTML document is to be loaded in a browsing context, the user agent must create a Document object, mark it as being an HTML document, create an HTML parser, associate it with the document, and begin to use the bytes provided for the document as the input stream for that parser.

The input stream converts bytes into characters for use in the tokeniser. This process relies, in part, on character encoding information found in the real Content-Type metadata of the resource; the "sniffed type" is not used for this purpose.

When no more bytes are available, an EOF character is implied, which eventually causes a load event to be fired.

After creating the Document object, but potentially before the page has finished parsing, the user agent must update the session history with the new page.

Application cache selection happens in the HTML parser.

4.9.3 Page load processing model for XML files

When faced with displaying an XML file inline, user agents must first create a Document object, following the requirements of the XML and Namespaces in XML recommendations, RFC 3023, DOM3 Core, and other relevant specifications. [XML] [XMLNS] [RFC3023] [DOM3CORE]

The actual HTTP headers and other metadata, not the headers as mutated or implied by the algorithms given in this specification, are the ones that must be used when determining the character encoding according to the rules given in the above specifications. Once the character encoding is established, the document's character encoding must be set to that character encoding.

If the root element, as parsed according to the XML specifications cited above, is found to be an html element with an attribute manifest, then, as soon as the element is inserted into the DOM, the user agent must run the application cache selection algorithm with the value of that attribute, resolved relative to the element's base URI, as the manifest URI. Otherwise, as soon as the root element is inserted into the DOM, the user agent must run the application cache selection algorithm with no manifest.

Because the processing of the manifest attribute happens only once the root element is parsed, any URIs referenced by processing instructions before the root element (such as <?xml-styleesheet?> and <?xbl?> PIs) will be fetched from the network and cannot be cached.

User agents may examine the namespace of the root Element node of this Document object to perform namespace-based dispatch to alternative processing tools, e.g. determining that the content is actually a syndication feed and passing it to a feed handler. If such processing is to take place, abort the steps in this section, and jump to step 10 in the navigate steps above.

Otherwise, then, with the newly created Document, the user agents must update the session history with the new page. User agents may do this before the complete document has been parsed (thus achieving incremental rendering).

Error messages from the parse process (e.g. namespace well-formedness errors) may be reported inline by mutating the Document.

4.9.4 Page load processing model for text files

When a plain text document is to be loaded in a browsing context, the user agent should create a Document object, mark it as being an HTML document, create an HTML parser, associate it with the document, act as if the tokeniser had emitted a start tag token with the tag name "pre", set the tokenisation stage's content model flag to PLAINTEXT, and begin to pass the stream of characters in the plain text document to that tokeniser.

The rules for how to convert the bytes of the plain text document into actual characters are defined in RFC 2046, RFC 2646, and subsequent versions thereof. [RFC2046] [RFC2646]

The document's character encoding must be set to the character encoding used to decode the document.

Upon creation of the Document object, the user agent must run the application cache selection algorithm with no manifest.

When no more character are available, an EOF character is implied, which eventually causes a load event to be fired.

After creating the Document object, but potentially before the page has finished parsing, the user agent must update the session history with the new page.

User agents may add content to the head element of the Document, e.g. linking to stylesheet or an XBL binding, providing script, giving the document a title, etc.

4.9.5 Page load processing model for images

When an image resource is to be loaded in a browsing context, the user agent should create a Document object, mark it as being an HTML document, append an html element to the Document, append a head element and a body element to the html element, append an img to the body element, and set the src attribute of the img element to the address of the image.

Then, the user agent must act as if it had stopped parsing.

Upon creation of the Document object, the user agent must run the application cache selection algorithm with no manifest.

After creating the Document object, but potentially before the page has finished fully loading, the user agent must update the session history with the new page.

User agents may add content to the head element of the Document, or attributes to the img element, e.g. to link to stylesheet or an XBL binding, to provide a script, to give the document a title, etc.

4.9.6 Page load processing model for content that uses plugins

When a resource that requires an external resource to be rendered is to be loaded in a browsing context, the user agent should create a Document object, mark it as being an HTML document, append an html element to the Document, append a head element and a body element to the html element, append an embed to the body element, and set the src attribute of the img element to the address of the image.

Then, the user agent must act as if it had stopped parsing.

Upon creation of the Document object, the user agent must run the application cache selection algorithm with no manifest.

After creating the Document object, but potentially before the page has finished fully loading, the user agent must update the session history with the new page.

User agents may add content to the head element of the Document, or attributes to the embed element, e.g. to link to stylesheet or an XBL binding, or to give the document a title.

If the sandboxed plugins browsing context flag is set on the browsing context, the synthesized embed element will fail to render the content.

4.9.7 Page load processing model for inline content that doesn't have a DOM

When the user agent is to display a user agent page inline in a browsing context, the user agent should create a Document object, mark it as being an HTML document, and then either associate that Document with a custom rendering that is not rendered using the normal Document rendering rules, or mutate that Document until it represents the content the user agent wants to render.

Once the page has been set up, the user agent must act as if it had stopped parsing.

Upon creation of the Document object, the user agent must run the application cache selection algorithm with no manifest.

After creating the Document object, but potentially before the page has been completely set up, the user agent must update the session history with the new page.

4.9.8 Navigating to a fragment identifier

When a user agent is supposed to navigate to a fragment identifier, then the user agent must update the session history with the new page, where "the new page" has the same Document as before but with the URI having the newly specified fragment identifier.

Part of that algorithm involves the user agent having to scroll to the fragment identifier, which is the important part for this step.

When the user agent is required to scroll to the fragment identifier, it must change the scrolling position of the document, or perform some other action, such that the indicated part of the document is brought to the user's attention. If there is no indicated part, then the user agent must not scroll anywhere.

The the indicated part of the document is the one that the fragment identifier, if any, identifies. The semantics of the fragment identifier in terms of mapping it to a specific DOM Node is defined by the MIME type specification of the document's MIME Type (for example, the processing of fragment identifiers for XML MIME types is the responsibility of RFC3023).

For HTML documents (and the text/html MIME type), the following processing model must be followed to determine what the indicated part of the document is.

  1. Let fragid be the <fragment> part of the URI. [RFC3987]

  2. If fragid is the empty string, then the indicated part of the document is the top of the document.

  3. If there is an element in the DOM that has an ID exactly equal to fragid, then the first such element in tree order is the indicated part of the document; stop the algorithm here.

  4. If there is an a element in the DOM that has a name attribute whose value is exactly equal to fragid, then the first such element in tree order is the indicated part of the document; stop the algorithm here.

  5. Otherwise, there is no indicated part of the document.

For the purposes of the interaction of HTML with Selectors' :target pseudo-class, the target element is the indicated part of the document, if that is an element; otherwise there is no target element. [SELECTORS]

4.9.9 History traversal

When a user agent is required to traverse the history to a specified entry, the user agent must act as follows:

  1. If there is no longer a Document object for the entry in question, the user agent must navigate the browsing context to the location for that entry to perform an entry update of that entry, and abort these steps. The "navigate" algorithm reinvokes this "traverse" algorithm to complete the traversal, at which point there is a Document object and so this step gets skipped. The navigation must be done using the same source browsing context as was used the first time this entry was created.

  2. If appropriate, update the current entry in the browsing context's Document object's History object to reflect any state that the user agent wishes to persist.

    For example, some user agents might want to persist the scroll position, or the values of form controls.

  3. If the specified entry has a different Document object than the current entry then the user agent must run the following substeps:

    1. freeze any timers, intervals, XMLHttpRequests, database transactions, etc
    2. The user agent must move any properties that have been added to the browsing context's default view's Window object to the active document's Document's list of added properties.
    3. If the browsing context is a top-level browsing context (and not an auxiliary browsing context), and the origin of the Document of the specified entry is not the same as the origin of the Document of the current entry, then the following sub-sub-steps must be run:
      1. The current browsing context name must be stored with all the entries in the history that are associated with Document objects with the same origin as the active document and that are contiguous with the current entry.
      2. The browsing context's browsing context name must be unset.
    4. The user agent must make the specified entry's Document object the active document of the browsing context. (If it is a top-level browsing context, this might change which application cache it is associated with.)
    5. If the specified entry has a browsing context name stored with it, then the following sub-sub-steps must be run:
      1. The browsing context's browsing context name must be set to the name stored with the specified entry.
      2. Any browsing context name stored with the entries in the history that are associated with Document objects with the same origin as the new active document, and that are contiguous with the specified entry, must be cleared.
    6. The user agent must move any properties that have been added to the active document's Document's list of added properties to browsing context's default view's Window object.
    7. unfreeze any timers, intervals, XMLHttpRequests, database transactions, etc
  4. If there are any entries with state objects between the last activated entry for the Document of the specified entry and the specified entry itself (not inclusive), then the user agent must iterate through every entry between that last activated entry and the specified entry, starting with the entry closest to the current entry, and ending with the one closest to the specified entry. For each entry, if the entry is a state object, the user agent must activate the state object.

  5. If the specified entry is a state object or the first entry for a Document, the user agent must activate that entry.

  6. If the specified entry has a URI that differs from the current entry's only by its fragment identifier, and the two share the same Document object, then fire a simple event with the name hashchanged at the body element, and, if the new URI has a fragment identifier, scroll to the fragment identifier.

  7. User agents may also update other aspects of the document view when the location changes in this way, for instance the scroll position, values of form fields, etc.

  8. The current entry is now the specified entry.

how does the changing of the global attributes affect .watch() when seen from other Windows?

4.10 Determining the type of a new resource in a browsing context

It is imperative that the rules in this section be followed exactly. When a user agent uses different heuristics for content type detection than the server expects, security problems can occur. For example, if a server believes that the client will treat a contributed file as an image (and thus treat it as benign), but a Web browser believes the content to be HTML (and thus execute any scripts contained therein), the end user can be exposed to malicious content, making the user vulnerable to cookie theft attacks and other cross-site scripting attacks.

The sniffed type of a resource must be found as follows:

  1. Let official type be the type given by the Content-Type metadata for the resource (in lowercase, ignoring any parameters). If there is no such type, jump to the unknown type step below.

  2. If the user agent is configured to strictly obey Content-Type headers for this resource, then jump to the last step in this set of steps.

  3. If the resource was fetched over an HTTP protocol and there is an HTTP Content-Type header and the value of the first such header has bytes that exactly match one of the following lines:

    Bytes in Hexadecimal Textual representation
    74 65 78 74 2f 70 6c 61 69 6e text/plain
    74 65 78 74 2f 70 6c 61 69 6e 3b 20 63 68 61 72 73 65 74 3d 49 53 4f 2d 38 38 35 39 2d 31 text/plain; charset=ISO-8859-1
    74 65 78 74 2f 70 6c 61 69 6e 3b 20 63 68 61 72 73 65 74 3d 69 73 6f 2d 38 38 35 39 2d 31 text/plain; charset=iso-8859-1
    74 65 78 74 2f 70 6c 61 69 6e 3b 20 63 68 61 72 73 65 74 3d 55 54 46 2d 38 text/plain; charset=UTF-8

    ...then jump to the text or binary section below.

  4. If official type is "unknown/unknown" or "application/unknown", jump to the unknown type step below.

  5. If official type ends in "+xml", or if it is either "text/xml" or "application/xml", then the sniffed type of the resource is official type; return that and abort these steps.

  6. If official type is an image type supported by the user agent (e.g. "image/png", "image/gif", "image/jpeg", etc), then jump to the images section below.

  7. If official type is "text/html", then jump to the feed or HTML section below.

  8. The sniffed type of the resource is official type.

4.10.1 Content-Type sniffing: text or binary

  1. The user agent may wait for 512 or more bytes of the resource to be available.

  2. Let n be the smaller of either 512 or the number of bytes already available.

  3. If n is 4 or more, and the first bytes of the file match one of the following byte sets:

    Bytes in Hexadecimal Description
    FE FF UTF-16BE BOM
    FF FE UTF-16LE BOM
    EF BB BF UTF-8 BOM

    ...then the sniffed type of the resource is "text/plain".

  4. Otherwise, if any of the first n bytes of the resource are in one of the following byte ranges:

    ...then the sniffed type of the resource is "application/octet-stream".

    maybe we should invoke the "Content-Type sniffing: image" section now, falling back on "application/octet-stream".

  5. Otherwise, the sniffed type of the resource is "text/plain".

4.10.2 Content-Type sniffing: unknown type

  1. The user agent may wait for 512 or more bytes of the resource to be available.

  2. Let stream length be the smaller of either 512 or the number of bytes already available.

  3. For each row in the table below:

    If the row has no "WS" bytes:
    1. Let pattern length be the length of the pattern (number of bytes described by the cell in the second column of the row).
    2. If pattern length is smaller than stream length then skip this row.
    3. Apply the "and" operator to the first pattern length bytes of the resource and the given mask (the bytes in the cell of first column of that row), and let the result be the data.
    4. If the bytes of the data matches the given pattern bytes exactly, then the sniffed type of the resource is the type given in the cell of the third column in that row; abort these steps.
    If the row has a "WS" byte:
    1. Let indexpattern be an index into the mask and pattern byte strings of the row.

    2. Let indexstream be an index into the byte stream being examined.

    3. Loop: If indexstream points beyond the end of the byte stream, then this row doesn't match, skip this row.

    4. Examine the indexstreamth byte of the byte stream as follows:

      If the indexstreamth byte of the pattern is a normal hexadecimal byte and not a "WS" byte:

      If the "and" operator, applied to the indexstreamth byte of the stream and the indexpatternth byte of the mask, yield a value different that the indexpatternth byte of the pattern, then skip this row.

      Otherwise, increment indexpattern to the next byte in the mask and pattern and indexstream to the next byte in the byte stream.

      Otherwise, if the indexstreamth byte of the pattern is a "WS" byte:

      "WS" means "whitespace", and allows insignificant whitespace to be skipped when sniffing for a type signature.

      If the indexstreamth byte of the stream is one of 0x09 (ASCII TAB), 0x0A (ASCII LF), 0x0B (ASCII VT), 0x0C (ASCII FF), 0x0D (ASCII CR), or 0x20 (ASCII space), then increment only the indexstream to the next byte in the byte stream.

      Otherwise, increment only the indexpattern to the next byte in the mask and pattern.

    5. If indexpattern does not point beyond the end of the mask and pattern byte strings, then jump back to the loop step in this algorithm.

    6. Otherwise, the sniffed type of the resource is the type given in the cell of the third column in that row; abort these steps.

  4. As a last-ditch effort, jump to the text or binary section.

Bytes in Hexadecimal Sniffed type Comment
Mask Pattern
FF FF DF DF DF DF DF DF DF FF DF DF DF DF 3C 21 44 4F 43 54 59 50 45 20 48 54 4D 4C text/html The string "<!DOCTYPE HTML" in US-ASCII or compatible encodings, case-insensitively.
FF FF DF DF DF DF WS 3C 48 54 4D 4C text/html The string "<HTML" in US-ASCII or compatible encodings, case-insensitively, possibly with leading spaces.
FF FF DF DF DF DF WS 3C 48 45 41 44 text/html The string "<HEAD" in US-ASCII or compatible encodings, case-insensitively, possibly with leading spaces.
FF FF DF DF DF DF DF DF WS 3C 53 43 52 49 50 54 text/html The string "<SCRIPT" in US-ASCII or compatible encodings, case-insensitively, possibly with leading spaces.
FF FF FF FF FF 25 50 44 46 2D application/pdf The string "%PDF-", the PDF signature.
FF FF FF FF FF FF FF FF FF FF FF 25 21 50 53 2D 41 64 6F 62 65 2D application/postscript The string "%!PS-Adobe-", the PostScript signature.
FF FF FF FF FF FF 47 49 46 38 37 61 image/gif The string "GIF87a", a GIF signature.
FF FF FF FF FF FF 47 49 46 38 39 61 image/gif The string "GIF89a", a GIF signature.
FF FF FF FF FF FF FF FF 89 50 4E 47 0D 0A 1A 0A image/png The PNG signature.
FF FF FF FF D8 FF image/jpeg A JPEG SOI marker followed by the first byte of another marker.
FF FF 42 4D image/bmp The string "BM", a BMP signature.
FF FF FF FF 00 00 01 00 image/vnd.microsoft.icon A 0 word following by a 1 word, a Windows Icon file format signature.

User agents may support further types if desired, by implicitly adding to the above table. However, user agents should not use any other patterns for types already mentioned in the table above, as this could then be used for privilege escalation (where, e.g., a server uses the above table to determine that content is not HTML and thus safe from XSS attacks, but then a user agent detects it as HTML anyway and allows script to execute).

4.10.3 Content-Type sniffing: image

If the first bytes of the file match one of the byte sequences in the first columns of the following table, then the sniffed type of the resource is the type given in the corresponding cell in the second column on the same row:
Bytes in Hexadecimal Sniffed type Comment
47 49 46 38 37 61 image/gif The string "GIF87a", a GIF signature.
47 49 46 38 39 61 image/gif The string "GIF89a", a GIF signature.
89 50 4E 47 0D 0A 1A 0A image/png The PNG signature.
FF D8 FF image/jpeg A JPEG SOI marker followed by the first byte of another marker.
42 4D image/bmp The string "BM", a BMP signature.
00 00 01 00 image/vnd.microsoft.icon A 0 word following by a 1 word, a Windows Icon file format signature.

User agents must ignore any rows for image types that they do not support.

Otherwise, the sniffed type of the resource is the same as its official type.

4.10.4 Content-Type sniffing: feed or HTML

  1. The user agent may wait for 512 or more bytes of the resource to be available.

  2. Let s be the stream of bytes, and let s[i] represent the byte in s with position i, treating s as zero-indexed (so the first byte is at i=0).

  3. If at any point this algorithm requires the user agent to determine the value of a byte in s which is not yet available, or which is past the first 512 bytes of the resource, or which is beyond the end of the resource, the user agent must stop this algorithm, and assume that the sniffed type of the resource is "text/html".

    User agents are allowed, by the first step of this algorithm, to wait until the first 512 bytes of the resource are available.

  4. Initialise pos to 0.

  5. If s[0] is 0xEF, s[1] is 0xBB, and s[2] is 0xBF, then set pos to 3. (This skips over a leading UTF-8 BOM, if any.)

  6. Loop start: Examine s[pos].

    If it is 0x09 (ASCII tab), 0x20 (ASCII space), 0x0A (ASCII LF), or 0x0D (ASCII CR)
    Increase pos by 1 and repeat this step.
    If it is 0x3C (ASCII "<")
    Increase pos by 1 and go to the next step.
    If it is anything else
    The sniffed type of the resource is "text/html". Abort these steps.
  7. If the bytes with positions pos to pos+2 in s are exactly equal to 0x21, 0x2D, 0x2D respectively (ASCII for "!--"), then:

    1. Increase pos by 3.
    2. If the bytes with positions pos to pos+2 in s are exactly equal to 0x2D, 0x2D, 0x3E respectively (ASCII for "-->"), then increase pos by 3 and jump back to the previous step (the step labeled loop start) in the overall algorithm in this section.
    3. Otherwise, increase pos by 1.
    4. Return to step 2 in these substeps.
  8. If s[pos] is 0x21 (ASCII "!"):

    1. Increase pos by 1.
    2. If s[pos] equal 0x3E, then increase pos by 1 and jump back to the step labeled loop start in the overall algorithm in this section.
    3. Otherwise, return to step 1 in these substeps.
  9. If s[pos] is 0x3F (ASCII "?"):

    1. Increase pos by 1.
    2. If s[pos] and s[pos+1] equal 0x3F and 0x3E respectively, then increase pos by 1 and jump back to the step labeled loop start in the overall algorithm in this section.
    3. Otherwise, return to step 1 in these substeps.
  10. Otherwise, if the bytes in s starting at pos match any of the sequences of bytes in the first column of the following table, then the user agent must follow the steps given in the corresponding cell in the second column of the same row.

    Bytes in Hexadecimal Requirement Comment
    72 73 73 The sniffed type of the resource is "application/rss+xml"; abort these steps The three ASCII characters "rss"
    66 65 65 64 The sniffed type of the resource is "application/atom+xml"; abort these steps The four ASCII characters "feed"
    72 64 66 3A 52 44 46 Continue to the next step in this algorithm The ASCII characters "rdf:RDF"

    If none of the byte sequences above match the bytes in s starting at pos, then the sniffed type of the resource is "text/html". Abort these steps.

  11. If, before the next ">", you find two xmlns* attributes with http://www.w3.org/1999/02/22-rdf-syntax-ns# and http://purl.org/rss/1.0/ as the namespaces, then the sniffed type of the resource is "application/rss+xml", abort these steps. (maybe we only need to check for http://purl.org/rss/1.0/ actually)

  12. Otherwise, the sniffed type of the resource is "text/html".

For efficiency reasons, implementations may wish to implement this algorithm and the algorithm for detecting the character encoding of HTML documents in parallel.

4.10.5 Content-Type metadata

What explicit Content-Type metadata is associated with the resource (the resource's type information) depends on the protocol that was used to fetch the resource.

For HTTP resources, only the first Content-Type HTTP header, if any, contributes any type information; the explicit type of the resource is then the value of that header, interpreted as described by the HTTP specifications. If the Content-Type HTTP header is present but the value of the first such header cannot be interpreted as described by the HTTP specifications (e.g. because its value doesn't contain a U+002F SOLIDUS ('/') character), then the resource has no type information (even if there are multiple Content-Type HTTP headers and one of the other ones is syntactically correct). [HTTP]

For resources fetched from the file system, user agents should use platform-specific conventions, e.g. operating system extension/type mappings.

Extensions must not be used for determining resource types for resources fetched over HTTP.

For resources fetched over most other protocols, e.g. FTP, there is no type information.

The algorithm for extracting an encoding from a Content-Type, given a string s, is as follows. It either returns an encoding or nothing.

  1. Find the first seven characters in s that are a case-insensitive match for the word 'charset'. If no such match is found, return nothing.

  2. Skip any U+0009, U+000A, U+000B, U+000C, U+000D, or U+0020 characters that immediately follow the word 'charset' (there might not be any).

  3. If the next character is not a U+003D EQUALS SIGN ('='), return nothing.

  4. Skip any U+0009, U+000A, U+000B, U+000C, U+000D, or U+0020 characters that immediately follow the equals sign (there might not be any).

  5. Process the next character as follows:

    If it is a U+0022 QUOTATION MARK ('"') and there is a later U+0022 QUOTATION MARK ('"') in s
    If it is a U+0027 APOSTROPHE ("'") and there is a later U+0027 APOSTROPHE ("'") in s

    Return the string between this character and the next earliest occurrence of this character.

    If it is an unmatched U+0022 QUOTATION MARK ('"')
    If it is an unmatched U+0027 APOSTROPHE ("'")
    If there is no next character

    Return nothing.

    Otherwise

    Return the string from this character to the first U+0009, U+000A, U+000B, U+000C, U+000D, U+0020, or U+003B character or the end of s, whichever comes first.

The above algorithm is a willful violation of the HTTP specification. [RFC2616]