Skip to content

Conversation

iansan5653
Copy link

@iansan5653 iansan5653 commented Aug 27, 2025

Initial checklist

  • I read the support docs
  • I read the contributing guide
  • I agree to follow the code of conduct
  • I searched issues and discussions and couldn’t find anything or linked relevant results below
  • I made sure the docs are up to date
  • I included tests (or that’s not needed)

Description of changes

TLDR: Adds support for extensions' enter and exit handlers to return false to fall back to earlier extensions or the default logic.

Context

I am working on a project where I'd like to attach additional information to nodes based on the information already in the tokenization output. In this particular case, I'm trying to record data about the specific syntax that formed a node in the tree (essentially building a sort of concrete syntax tree on top of the AST).

While building an extension that provides handlers for the relevant syntax tokens and attaches them to nodes in the stack, I noticed that I was breaking the built-in parsing functionality. Upon digging I realized that this was because my enter handlers were overriding and disabling the default handlers, even though I only wanted to encounter the tokens without doing the parsing and tree building myself.

I realized that a very useful feature of extension handlers would be the ability to 'fall through' to the default behavior if desired. To me the most intuitive approach for this was to support returning false to indicate that a token was not handled by the handler. This is a flexible API that allows for handlers to fall through conditionally, allowing them to choose to only take over the handling of a token in certain circumstances.

Since the code change here is fairly straightforward, I chose to go ahead and open a PR rather than start with an issue. I am very open to feedback here and happy to change the approach or reject the change entirely if you don't think it's useful.

@github-actions github-actions bot added 👋 phase/new Post is being triaged automatically 🤞 phase/open Post is being triaged manually and removed 👋 phase/new Post is being triaged automatically labels Aug 27, 2025
@wooorm
Copy link
Member

wooorm commented Aug 28, 2025

Hey!

Can you expand on your actual use case? Sounds like a lot of work to overwrite all handlers.
And, what if there is no handler before?
And, what for extensions that add more handlers?
How would you actually use this?

Why not reach for a beforeEnter/afterEnter/beforeExit/afterExit, something like that?

@iansan5653
Copy link
Author

iansan5653 commented Aug 28, 2025

Sure!

Can you expand on your actual use case? Sounds like a lot of work to overwrite all handlers.

My specific use case is that I am working on a Markdown editor and so I need to know the locations in code of many token types. I could just use micromark tokens directly, but the output from mdast-util-from-markdown is also useful as I also want to integrate some information about the document structure. Parsing out the boundaries of a listItem from the tokens, for example, is pretty difficult and I'd rather not duplicate that logic.

I could handle the mdast tree and the micromark tokens completely separately, but that would require inefficiently tokenizing the document twice and working with two data structures in parallel that would be hard to link together. I'd rather combine all the information I need into one tree by hooking into the existing functionality.

And, what if there is no handler before?

The combineHandlers logic I've defined would handle this case by simply returning the right handler.

And, what for extensions that add more handlers?

If extensions defined after this one define handlers for the same tokens, they would fall through in the same way:

  • If they return undefined, previously-defined handlers would not be called
  • If they return false, previously-defined handlers would still be called

How would you actually use this?

Here's a very simplified example (in practice I will handle many more token types and have a more generic handler):

function handleQuoteMarker(token) {
  this.stack.findLast(n => n.type === 'quote')?.children.push({
    type: "quoteMarker",
    position: {
      end: token.end,
      start: token.start,
    },
  });

  return false // allow the parser to still handle this token, if necessary, to build quote nodes
};

const tree = fromMarkdown(markdown, "utf8", {
  mdastExtensions: [{
    exit: {
      blockQuoteMarker: handleQuoteMarker
    }
  }],
});

Why not reach for a beforeEnter/afterEnter/beforeExit/afterExit, something like that?

That would also work by introducing new events that are less likely to conflict, although I would still be concerned about the possibility that the package internals or other extensions might start to depend on these in the future. I still wouldn't want my extension to override any default functionality, just augment it.

Perhaps the design for these new events could specify that all registered handlers for a token are always called in a consistent order, regardless of conflicts across extensions? Would it be confusing though if these work in a different way from enter and exit? But maybe that's fine since they would act more like "events" than the enter/exit handlers do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤞 phase/open Post is being triaged manually
Development

Successfully merging this pull request may close these issues.

2 participants