feat: improve content type read performance #6077

Eomm · 2025-04-25T06:16:13Z

This PR is a follow-up of

fix: treat space as a delimiter in content-type parsing #6064

This PR aims to improve the content-type parsing by implementing the RFC9110 (more details in the code).

ATM, this PR is a bit slower (see bench here).. checking how to speed it up a bit.

getContentType [application/json] x 22,476,414 ops/sec ±0.30% (97 runs sampled)
getContentTypeLoop [application/json] x 20,122,993 ops/sec ±0.72% (91 runs sampled)
getContentType [application/json; charset=utf-8] x 20,025,598 ops/sec ±1.05% (97 runs sampled)
getContentTypeLoop [application/json; charset=utf-8] x 17,639,137 ops/sec ±0.29% (98 runs sampled)
Fastest is getContentType [application/json]

Bench run

const Benchmark = require('benchmark')
const suite = new Benchmark.Suite()

const headers = ['application/json', 'application/json; charset=utf-8']

function getContentType (header) {
  if (!header) return ''
  return header.split(/[ ;]/, 1)[0].trim().toLowerCase()
}
function getContentTypeLoop (header) {
  if (!header) return ''

  // The Content-Type header field indicates the media type of the resource
  // The media type format is:
  // media-type = type "/" subtype parameters
  // type       = token
  // subtype    = token
  // The type/subtype MAY be followed by semicolon-delimited parameters
  // https://datatracker.ietf.org/doc/html/rfc9110#section-8.3.1

  // Tokens are case-insensitive and consist of:
  // https://datatracker.ietf.org/doc/html/rfc9110#name-tokens

  //   tchar          = "!" / "#" / "$" / "%" / "&" / "'" / "*"
  //                  / "+" / "-" / "." / "^" / "_" / "`" / "|" / "~"
  //                  / DIGIT / ALPHA
  //                  ; any VCHAR, except delimiters
  // Many HTTP field values are defined using common syntax components, separated by whitespace or specific delimiting characters.
  // Delimiters are chosen from the set of US-ASCII visual characters not allowed in a token (DQUOTE and "(),/:;<=>?@[\]{}").

  header = header.trim().toLowerCase()
  let firstToken = true
  for (let i = 0; i < header.length; i++) {
    const char = header.codePointAt(i)
    const isDelimiterChar = isDelimiter(char)
    if (!isDelimiterChar) {
      continue
    }

    if (firstToken) {
      if (char !== 47) {
        // The first delimiter MUST be a slash
        throw new Error('Invalid Content-Type header')
      } else {
        firstToken = false
      }
    } else { // Look for the second delimiter
      if (char !== 59 && char !== 32) {
        // The second delimiter MUST be a semicolon or a space
        throw new Error('Invalid Content-Type header')
      }
      return header.slice(0, i)
    }
  }

  // If we reach here, it means we didn't find a semicolon
  return header
}

function isDelimiter (charCode) {
  // Delimiters are defined by the RFC 9110: (DQUOTE and "(),/:;<=>?@[\]{}")
  return charCode <= 32 || // NOT any VCHAR (visible char): these must be not included in token
    charCode === 34 || // DQUOTE
    charCode === 40 || // (
    charCode === 41 || // )
    charCode === 44 || // ,
    charCode === 47 || // /
    charCode === 58 || // :
    charCode === 59 || // ;
    charCode === 60 || // <
    charCode === 61 || // =
    charCode === 62 || // >
    charCode === 63 || // ?
    charCode === 64 || // @
    charCode === 91 || // [
    charCode === 92 || // \
    charCode === 93 || // ]
    charCode === 123 || // {
    charCode === 125 // }
}

for (const h of headers) {
  suite.add(`getContentType [${h}]`, getContentType.bind(null, h))
  suite.add(`getContentTypeLoop [${h}]`, getContentTypeLoop.bind(null, h))
}

suite.on('cycle', function (event) {
  console.log(String(event.target))
})
  .on('complete', function () {
    console.log('Fastest is ' + this.filter('fastest').map('name'))
  })
  .run()

jsumners · 2025-04-25T11:07:43Z

I spent some time yesterday adding my suggestion, but ended up deleting the branch. The code in question is only used when a schema is defined that tries to verify content types. The baseline code for every incoming request merely checks that the header is set and has a value of something other than an empty string. So I decided it wasn't worth it.

Eomm · 2025-04-25T15:54:16Z

I think we can propagate the code to the content-type parser as well - the target here was to be compliant with the RFC compliant first, avoid any potential bad actors and (luckily) improve the performance

jsumners · 2025-04-25T16:35:46Z

Indeed. I just couldn't find the benefit in my change.

feat: improve content type read performance

9c422f3

Eomm added performances Everything related to performances internals Change that won't impact the surface API. labels Apr 25, 2025

Eomm added 2 commits May 13, 2025 09:28

Merge branch 'main' into perf-header

958a671

chore: speed up

6d905e3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat: improve content type read performance #6077

feat: improve content type read performance #6077

Uh oh!

Eomm commented Apr 25, 2025

Uh oh!

jsumners commented Apr 25, 2025

Uh oh!

Eomm commented Apr 25, 2025

Uh oh!

jsumners commented Apr 25, 2025

Uh oh!

Uh oh!

Uh oh!

feat: improve content type read performance #6077

Are you sure you want to change the base?

feat: improve content type read performance #6077

Uh oh!

Conversation

Eomm commented Apr 25, 2025

Uh oh!

jsumners commented Apr 25, 2025

Uh oh!

Eomm commented Apr 25, 2025

Uh oh!

jsumners commented Apr 25, 2025

Uh oh!

Uh oh!