Skip to content

feat: improve content type read performance #6077

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft

feat: improve content type read performance #6077

wants to merge 3 commits into from

Conversation

Eomm
Copy link
Member

@Eomm Eomm commented Apr 25, 2025

This PR is a follow-up of

This PR aims to improve the content-type parsing by implementing the RFC9110 (more details in the code).

ATM, this PR is a bit slower (see bench here).. checking how to speed it up a bit.

getContentType [application/json] x 22,476,414 ops/sec ±0.30% (97 runs sampled)
getContentTypeLoop [application/json] x 20,122,993 ops/sec ±0.72% (91 runs sampled)
getContentType [application/json; charset=utf-8] x 20,025,598 ops/sec ±1.05% (97 runs sampled)
getContentTypeLoop [application/json; charset=utf-8] x 17,639,137 ops/sec ±0.29% (98 runs sampled)
Fastest is getContentType [application/json]
Bench run
const Benchmark = require('benchmark')
const suite = new Benchmark.Suite()

const headers = ['application/json', 'application/json; charset=utf-8']

function getContentType (header) {
  if (!header) return ''
  return header.split(/[ ;]/, 1)[0].trim().toLowerCase()
}
function getContentTypeLoop (header) {
  if (!header) return ''

  // The Content-Type header field indicates the media type of the resource
  // The media type format is:
  // media-type = type "/" subtype parameters
  // type       = token
  // subtype    = token
  // The type/subtype MAY be followed by semicolon-delimited parameters
  // https://datatracker.ietf.org/doc/html/rfc9110#section-8.3.1

  // Tokens are case-insensitive and consist of:
  // https://datatracker.ietf.org/doc/html/rfc9110#name-tokens

  //   tchar          = "!" / "#" / "$" / "%" / "&" / "'" / "*"
  //                  / "+" / "-" / "." / "^" / "_" / "`" / "|" / "~"
  //                  / DIGIT / ALPHA
  //                  ; any VCHAR, except delimiters
  // Many HTTP field values are defined using common syntax components, separated by whitespace or specific delimiting characters.
  // Delimiters are chosen from the set of US-ASCII visual characters not allowed in a token (DQUOTE and "(),/:;<=>?@[\]{}").

  header = header.trim().toLowerCase()
  let firstToken = true
  for (let i = 0; i < header.length; i++) {
    const char = header.codePointAt(i)
    const isDelimiterChar = isDelimiter(char)
    if (!isDelimiterChar) {
      continue
    }

    if (firstToken) {
      if (char !== 47) {
        // The first delimiter MUST be a slash
        throw new Error('Invalid Content-Type header')
      } else {
        firstToken = false
      }
    } else { // Look for the second delimiter
      if (char !== 59 && char !== 32) {
        // The second delimiter MUST be a semicolon or a space
        throw new Error('Invalid Content-Type header')
      }
      return header.slice(0, i)
    }
  }

  // If we reach here, it means we didn't find a semicolon
  return header
}

function isDelimiter (charCode) {
  // Delimiters are defined by the RFC 9110: (DQUOTE and "(),/:;<=>?@[\]{}")
  return charCode <= 32 || // NOT any VCHAR (visible char): these must be not included in token
    charCode === 34 || // DQUOTE
    charCode === 40 || // (
    charCode === 41 || // )
    charCode === 44 || // ,
    charCode === 47 || // /
    charCode === 58 || // :
    charCode === 59 || // ;
    charCode === 60 || // <
    charCode === 61 || // =
    charCode === 62 || // >
    charCode === 63 || // ?
    charCode === 64 || // @
    charCode === 91 || // [
    charCode === 92 || // \
    charCode === 93 || // ]
    charCode === 123 || // {
    charCode === 125 // }
}

for (const h of headers) {
  suite.add(`getContentType [${h}]`, getContentType.bind(null, h))
  suite.add(`getContentTypeLoop [${h}]`, getContentTypeLoop.bind(null, h))
}

suite.on('cycle', function (event) {
  console.log(String(event.target))
})
  .on('complete', function () {
    console.log('Fastest is ' + this.filter('fastest').map('name'))
  })
  .run()

@Eomm Eomm added performances Everything related to performances internals Change that won't impact the surface API. labels Apr 25, 2025
@jsumners
Copy link
Member

I spent some time yesterday adding my suggestion, but ended up deleting the branch. The code in question is only used when a schema is defined that tries to verify content types. The baseline code for every incoming request merely checks that the header is set and has a value of something other than an empty string. So I decided it wasn't worth it.

@Eomm
Copy link
Member Author

Eomm commented Apr 25, 2025

I think we can propagate the code to the content-type parser as well - the target here was to be compliant with the RFC compliant first, avoid any potential bad actors and (luckily) improve the performance

@jsumners
Copy link
Member

Indeed. I just couldn't find the benefit in my change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
internals Change that won't impact the surface API. performances Everything related to performances
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants