Reference Manual
The lezer system consists of multiple modules, each distributed as a separate package on npm.
@lezer/common
: The data structures for the syntax tree and the types shared between all parser implementations.@lezer/lr
: The LR parser runtime.@lezer/highlight
: A system for attaching highlighting information to syntax trees and using that to highlight code.@lezer/generator
: The parser generator, an offline build tool to create parse tables from a grammar description.
@lezer/common module
This package provides common data structures used by all Lezer-related parsing—those related to syntax trees and the generic interface of parsers. Their main use is the LR parsers generated by the parser generator, but for example the Markdown parser implements a different parsing algorithm using the same interfaces.
Trees
Lezer syntax trees are not abstract, they just tell you which nodes were parsed where, without providing additional information about their role or relation (beyond parent-child relations). This makes them rather unsuited for some purposes, but quick to construct and cheap to store.
-
class
Tree A piece of syntax tree. There are two ways to approach these trees: the way they are actually stored in memory, and the convenient way.
Syntax trees are stored as a tree of
Tree
andTreeBuffer
objects. By packing detail information intoTreeBuffer
leaf nodes, the representation is made a lot more memory-efficient.However, when you want to actually work with tree nodes, this representation is very awkward, so most client code will want to use the
TreeCursor
orSyntaxNode
interface instead, which provides a view on some part of this data structure, and can be used to move around to adjacent nodes.-
new Tree()
Construct a new tree. See also
Tree.build
.-
props
Per-node node props to associate with this node.
-
-
type: NodeType
The type of the top node.
-
children: readonly (Tree | TreeBuffer)[]
This node's child nodes.
-
positions: readonly number[]
The positions (offsets relative to the start of this tree) of the children.
-
length: number
The total length of this tree
-
cursor(mode?: IterMode = 0 as IterMode) → TreeCursor
Get a tree cursor positioned at the top of the tree. Mode can be used to control which nodes the cursor visits.
-
cursorAt() → TreeCursor
Get a tree cursor pointing into this tree at the given position and side (see
moveTo
.-
topNode: SyntaxNode
Get a syntax node object for the top of the tree.
-
resolve(pos: number, side?: -1 | 0 | 1 = 0) → SyntaxNode
Get the syntax node at the given position. If
side
is -1, this will move into nodes that end at the position. If 1, it'll move into nodes that start at the position. With 0, it'll only enter nodes that cover the position from both sides.Note that this will not enter overlays, and you often want
resolveInner
instead.-
resolveInner(pos: number, side?: -1 | 0 | 1 = 0) → SyntaxNode
Like
resolve
, but will enter overlaid nodes, producing a syntax node pointing into the innermost overlaid tree at the given position (with parent links going through all parent structure, including the host trees).-
resolveStack(pos: number, side?: -1 | 0 | 1 = 0) → NodeIterator
In some situations, it can be useful to iterate through all nodes around a position, including those in overlays that don't directly cover the position. This method gives you an iterator that will produce all nodes, from small to big, around the given position.
-
iterate()
Iterate over the tree and its children, calling
enter
for any node that touches thefrom
/to
region (if given) before running over such a node's children, andleave
(if given) when leaving the node. Whenenter
returnsfalse
, that node will not have its children iterated over (orleave
called).-
prop<T>(prop: NodeProp<T>) → T | undefined
Get the value of the given node prop for this node. Works with both per-node and per-type props.
-
propValues: readonly [number | NodeProp<any>, any][]
Returns the node's per-node props in a format that can be passed to the
Tree
constructor.-
balance(config?: Object = {}) → Tree
Balance the direct children of this tree, producing a copy of which may have children grouped into subtrees with type
NodeType.none
.-
config
-
makeTree?: fn() → Tree
Function to create the newly balanced subtrees.
-
-
-
static empty: Tree
The empty tree
-
static build(data: Object) → Tree
Build a tree from a postfix-ordered buffer of node information, or a cursor over such a buffer.
-
data
-
buffer: BufferCursor | readonly number[]
The buffer or buffer cursor to read the node data from.
When this is an array, it should contain four values for every node in the tree.
- The first holds the node's type, as a node ID pointing into
the given
NodeSet
. - The second holds the node's start offset.
- The third the end offset.
- The fourth the amount of space taken up in the array by this node and its children. Since there's four values per node, this is the total number of nodes inside this node (children and transitive children) plus one for the node itself, times four.
Parent nodes should appear after child nodes in the array. As an example, a node of type 10 spanning positions 0 to 4, with two children, of type 11 and 12, might look like this:
[11, 0, 1, 4, 12, 2, 4, 4, 10, 0, 4, 12]
- The first holds the node's type, as a node ID pointing into
the given
-
nodeSet: NodeSet
The node types to use.
-
topID: number
The id of the top node type.
-
start?: number
The position the tree should start at. Defaults to 0.
-
bufferStart?: number
The position in the buffer where the function should stop reading. Defaults to 0.
-
length?: number
The length of the wrapping node. The end offset of the last child is used when not provided.
-
maxBufferLength?: number
The maximum buffer length to use. Defaults to
DefaultBufferLength
.-
reused?: readonly Tree[]
An optional array holding reused nodes that the buffer can refer to.
-
minRepeatType?: number
The first node type that indicates repeat constructs in this grammar.
-
-
-
-
interface
SyntaxNodeRef The set of properties provided by both
SyntaxNode
andTreeCursor
. Note that, if you need an object that is guaranteed to stay stable in the future, you need to use thenode
accessor.-
from: number
The start position of the node.
-
to: number
The end position of the node.
-
type: NodeType
The type of the node.
-
name: string
The name of the node (
.type.name
).-
tree: Tree | null
Get the tree that represents the current node, if any. Will return null when the node is in a tree buffer.
-
node: SyntaxNode
Retrieve a stable syntax node at this position.
-
matchContext(context: readonly string[]) → boolean
Test whether the node matches a given context—a sequence of direct parent nodes. Empty strings in the context array act as wildcards, other strings must match the ancestor node's name.
-
-
interface
SyntaxNodeextends SyntaxNodeRef
A syntax node provides an immutable pointer to a given node in a tree. When iterating over large amounts of nodes, you may want to use a mutable cursor instead, which is more efficient.
-
parent: SyntaxNode | null
The node's parent node, if any.
-
firstChild: SyntaxNode | null
The first child, if the node has children.
-
lastChild: SyntaxNode | null
The node's last child, if available.
-
childAfter(pos: number) → SyntaxNode | null
The first child that ends after
pos
.-
childBefore(pos: number) → SyntaxNode | null
The last child that starts before
pos
.-
enter() → SyntaxNode | null
Enter the child at the given position. If side is -1 the child may end at that position, when 1 it may start there.
This will by default enter overlaid mounted trees. You can set
overlays
to false to disable that.Similarly, when
buffers
is false this will not enter buffers, only nodes (which is mostly useful when looking for props, which cannot exist on buffer-allocated nodes).-
nextSibling: SyntaxNode | null
This node's next sibling, if any.
-
prevSibling: SyntaxNode | null
This node's previous sibling.
-
cursor(mode?: IterMode) → TreeCursor
A tree cursor starting at this node.
-
resolve(pos: number, side?: -1 | 0 | 1) → SyntaxNode
Find the node around, before (if
side
is -1), or after (side
is 1) the given position. Will look in parent nodes if the position is outside this node.-
resolveInner(pos: number, side?: -1 | 0 | 1) → SyntaxNode
Similar to
resolve
, but enter overlaid nodes.-
enterUnfinishedNodesBefore(pos: number) → SyntaxNode
Move the position to the innermost node before
pos
that looks like it is unfinished (meaning it ends in an error node or has a child ending in an error node right at its end).-
toTree() → Tree
Get a tree for this node. Will allocate one if it points into a buffer.
-
getChild() → SyntaxNode | null
Get the first child of the given type (which may be a node name or a group name). If
before
is non-null, only return children that occur somewhere after a node with that name or group. Ifafter
is non-null, only return children that occur somewhere before a node with that name or group.-
getChildren() → SyntaxNode[]
Like
getChild
, but return all matching children, not just the first.
-
-
type
NodeIterator Represents a sequence of nodes.
-
node: SyntaxNode
-
next: NodeIterator | null
-
-
class
TreeCursorimplements SyntaxNodeRef
A tree cursor object focuses on a given node in a syntax tree, and allows you to move to adjacent nodes.
-
type: NodeType
The node's type.
-
name: string
Shorthand for
.type.name
.-
from: number
The start source offset of this node.
-
to: number
The end source offset.
-
firstChild() → boolean
Move the cursor to this node's first child. When this returns false, the node has no child, and the cursor has not been moved.
-
lastChild() → boolean
Move the cursor to this node's last child.
-
childAfter(pos: number) → boolean
Move the cursor to the first child that ends after
pos
.-
childBefore(pos: number) → boolean
Move to the last child that starts before
pos
.-
enter() → boolean
Move the cursor to the child around
pos
. If side is -1 the child may end at that position, when 1 it may start there. This will also enter overlaid mounted trees unlessoverlays
is set to false.-
parent() → boolean
Move to the node's parent node, if this isn't the top node.
-
nextSibling() → boolean
Move to this node's next sibling, if any.
-
prevSibling() → boolean
Move to this node's previous sibling, if any.
-
next(enter?: boolean = true) → boolean
Move to the next node in a pre-order traversal, going from a node to its first child or, if the current node is empty or
enter
is false, its next sibling or the next sibling of the first parent node that has one.-
prev(enter?: boolean = true) → boolean
Move to the next node in a last-to-first pre-order traversal. A node is followed by its last child or, if it has none, its previous sibling or the previous sibling of the first parent node that has one.
-
moveTo(pos: number, side?: -1 | 0 | 1 = 0) → TreeCursor
Move the cursor to the innermost node that covers
pos
. Ifside
is -1, it will enter nodes that end atpos
. If it is 1, it will enter nodes that start atpos
.-
node: SyntaxNode
Get a syntax node at the cursor's current position.
-
tree: Tree | null
Get the tree that represents the current node, if any. Will return null when the node is in a tree buffer.
-
iterate()
Iterate over the current node and all its descendants, calling
enter
when entering a node andleave
, if given, when leaving one. Whenenter
returnsfalse
, any children of that node are skipped, andleave
isn't called for it.-
matchContext(context: readonly string[]) → boolean
Test whether the current node matches a given context—a sequence of direct parent node names. Empty strings in the context array are treated as wildcards.
-
-
enum IterMode
Options that control iteration. Can be combined with the
|
operator to enable multiple ones.ExcludeBuffers
When enabled, iteration will only visit
Tree
objects, not nodes packed intoTreeBuffer
s.IncludeAnonymous
Enable this to make iteration include anonymous nodes (such as the nodes that wrap repeated grammar constructs into a balanced tree).
IgnoreMounts
By default, regular mounted nodes replace their base node in iteration. Enable this to ignore them instead.
IgnoreOverlays
This option only applies in
enter
-style methods. It tells the library to not enter mounted overlays if one covers the given position.
-
class
NodeWeakMap<T>
Provides a way to associate values with pieces of trees. As long as that part of the tree is reused, the associated values can be retrieved from an updated tree.
-
set(node: SyntaxNode, value: T)
Set the value for this syntax node.
-
get(node: SyntaxNode) → T | undefined
Retrieve value for this syntax node, if it exists in the map.
-
cursorSet(cursor: TreeCursor, value: T)
Set the value for the node that a cursor currently points to.
-
cursorGet(cursor: TreeCursor) → T | undefined
Retrieve the value for the node that a cursor currently points to.
-
Node types
-
class
NodeType Each node in a syntax tree has a node type associated with it.
-
name: string
The name of the node type. Not necessarily unique, but if the grammar was written properly, different node types with the same name within a node set should play the same semantic role.
-
id: number
The id of this node in its set. Corresponds to the term ids used in the parser.
-
prop<T>(prop: NodeProp<T>) → T | undefined
Retrieves a node prop for this type. Will return
undefined
if the prop isn't present on this node.-
isTop: boolean
True when this is the top node of a grammar.
-
isSkipped: boolean
True when this node is produced by a skip rule.
-
isError: boolean
Indicates whether this is an error node.
-
isAnonymous: boolean
When true, this node type doesn't correspond to a user-declared named node, for example because it is used to cache repetition.
-
is(name: string | number) → boolean
Returns true when this node's name or one of its groups matches the given string.
-
static define(spec: Object) → NodeType
Define a node type.
-
spec
-
id: number
The ID of the node type. When this type is used in a set, the ID must correspond to its index in the type array.
-
name?: string
The name of the node type. Leave empty to define an anonymous node.
-
props?: readonly (NodePropSource | [NodeProp<any>, any])[]
Node props to assign to the type. The value given for any given prop should correspond to the prop's type.
-
top?: boolean
Whether this is a top node.
-
error?: boolean
Whether this node counts as an error node.
-
skipped?: boolean
Whether this node is a skipped node.
-
-
-
static none: NodeType
An empty dummy node type to use when no actual type is available.
-
static match<T>(map: Object<T>) → fn(node: NodeType) → T | undefined
Create a function from node types to arbitrary values by specifying an object whose property names are node or group names. Often useful with
NodeProp.add
. You can put multiple names, separated by spaces, in a single property name to map multiple node names to a single value.
-
-
class
NodeSet A node set holds a collection of node types. It is used to compactly represent trees by storing their type ids, rather than a full pointer to the type object, in a numeric array. Each parser has a node set, and tree buffers can only store collections of nodes from the same set. A set can have a maximum of 2**16 (65536) node types in it, so that the ids fit into 16-bit typed array slots.
-
new NodeSet(types: readonly NodeType[])
Create a set with the given types. The
id
property of each type should correspond to its position within the array.-
types: readonly NodeType[]
The node types in this set, by id.
-
extend(...props: NodePropSource[]) → NodeSet
Create a copy of this set with some node properties added. The arguments to this method can be created with
NodeProp.add
.
-
-
class
NodeProp<T>
Each node type or individual tree can have metadata associated with it in props. Instances of this class represent prop names.
-
new NodeProp(config?: Object = {})
Create a new node prop type.
-
config
-
deserialize?: fn(str: string) → T
The deserialize function to use for this prop, used for example when directly providing the prop from a grammar file. Defaults to a function that raises an error.
-
perNode?: boolean
By default, node props are stored in the node type. It can sometimes be useful to directly store information (usually related to the parsing algorithm) in nodes themselves. Set this to true to enable that for this prop.
-
-
-
perNode: boolean
Indicates whether this prop is stored per node type or per tree node.
-
deserialize(str: string) → T
A method that deserializes a value of this prop from a string. Can be used to allow a prop to be directly written in a grammar file.
-
add() → NodePropSource
This is meant to be used with
NodeSet.extend
orLRParser.configure
to compute prop values for each node type in the set. Takes a match object or function that returns undefined if the node type doesn't get this prop, and the prop's value if it does.-
static closedBy: NodeProp<readonly string[]>
Prop that is used to describe matching delimiters. For opening delimiters, this holds an array of node names (written as a space-separated string when declaring this prop in a grammar) for the node types of closing delimiters that match it.
-
static openedBy: NodeProp<readonly string[]>
The inverse of
closedBy
. This is attached to closing delimiters, holding an array of node names of types of matching opening delimiters.-
static group: NodeProp<readonly string[]>
Used to assign node types to groups (for example, all node types that represent an expression could be tagged with an
"Expression"
group).-
static isolate: NodeProp<"rtl" | "ltr" | "auto">
Attached to nodes to indicate these should be displayed in a bidirectional text isolate, so that direction-neutral characters on their sides don't incorrectly get associated with surrounding text. You'll generally want to set this for nodes that contain arbitrary text, like strings and comments, and for nodes that appear inside arbitrary text, like HTML tags. When not given a value, in a grammar declaration, defaults to
"auto"
.-
static contextHash: NodeProp<number>
The hash of the context that the node was parsed in, if any. Used to limit reuse of contextual nodes.
-
static lookAhead: NodeProp<number>
The distance beyond the end of the node that the tokenizer looked ahead for any of the tokens inside the node. (The LR parser only stores this when it is larger than 25, for efficiency reasons.)
-
static mounted: NodeProp<MountedTree>
This per-node prop is used to replace a given node, or part of a node, with another tree. This is useful to include trees from different languages in mixed-language parsers.
-
-
type NodePropSource = fn(type: NodeType) → [NodeProp<any>, any] | null
Type returned by
NodeProp.add
. Describes whether a prop should be added to a given node type in a node set, and what value it should have.
Buffers
Buffers are an optimization in the way Lezer trees are stored.
-
class
TreeBuffer Tree buffers contain (type, start, end, endIndex) quads for each node. In such a buffer, nodes are stored in prefix order (parents before children, with the endIndex of the parent indicating which children belong to it).
-
new TreeBuffer()
Create a tree buffer.
-
buffer: Uint16Array
The buffer's content.
-
length: number
The total length of the group of nodes in the buffer.
-
set: NodeSet
The node set used in this buffer.
-
-
DefaultBufferLength: 1024
The default maximum length of a
TreeBuffer
node.-
interface
BufferCursor This is used by
Tree.build
as an abstraction for iterating over a tree buffer. A cursor initially points at the very last element in the buffer. Every timenext()
is called it moves on to the previous one.-
pos: number
The current buffer position (four times the number of nodes remaining).
-
id: number
The node ID of the next node in the buffer.
-
start: number
The start position of the next node in the buffer.
-
end: number
The end position of the next node.
-
size: number
The size of the next node (the number of nodes inside, counting the node itself, times 4).
-
next()
Moves
this.pos
down by 4.-
fork() → BufferCursor
Create a copy of this cursor.
-
Parsing
-
abstract class
Parser A superclass that parsers should extend.
-
abstract createParse() → PartialParse
Start a parse for a single tree. This is the method concrete parser implementations must implement. Called by
startParse
, with the optional arguments resolved.-
startParse() → PartialParse
Start a parse, returning a partial parse object.
fragments
can be passed in to make the parse incremental.By default, the entire input is parsed. You can pass
ranges
, which should be a sorted array of non-empty, non-overlapping ranges, to parse only those ranges. The tree returned in that case will start atranges[0].from
.-
parse() → Tree
Run a full parse, returning the resulting tree.
-
-
interface
Input This is the interface parsers use to access the document. To run Lezer directly on your own document data structure, you have to write an implementation of it.
-
length: number
The length of the document.
-
chunk(from: number) → string
Get the chunk after the given position. The returned string should start at
from
and, if that isn't the end of the document, may be of any length greater than zero.-
lineChunks: boolean
Indicates whether the chunks already end at line breaks, so that client code that wants to work by-line can avoid re-scanning them for line breaks. When this is true, the result of
chunk()
should either be a single line break, or the content betweenfrom
and the next line break.-
read(from: number, to: number) → string
Read the part of the document between the given positions.
-
-
interface
PartialParse Interface used to represent an in-progress parse, which can be moved forward piece-by-piece.
-
advance() → Tree | null
Advance the parse state by some amount. Will return the finished syntax tree when the parse completes.
-
parsedPos: number
The position up to which the document has been parsed. Note that, in multi-pass parsers, this will stay back until the last pass has moved past a given position.
-
stopAt(pos: number)
Tell the parse to not advance beyond the given position.
advance
will return a tree when the parse has reached the position. Note that, depending on the parser algorithm and the state of the parse whenstopAt
was called, that tree may contain nodes beyond the position. It is an error to callstopAt
with a higher position than it's current value.-
stoppedAt: number | null
Reports whether
stopAt
has been called on this parse.
-
-
type ParseWrapper = fn() → PartialParse
Parse wrapper functions are supported by some parsers to inject additional parsing logic.
Incremental Parsing
Efficient reparsing happens by reusing parts of the original parsed structure.
-
class
TreeFragment Tree fragments are used during incremental parsing to track parts of old trees that can be reused in a new parse. An array of fragments is used to track regions of an old tree whose nodes might be reused in new parses. Use the static
applyChanges
method to update fragments for document changes.-
new TreeFragment()
Construct a tree fragment. You'll usually want to use
addTree
andapplyChanges
instead of calling this directly.-
from: number
The start of the unchanged range pointed to by this fragment. This refers to an offset in the updated document (as opposed to the original tree).
-
to: number
The end of the unchanged range.
-
tree: Tree
The tree that this fragment is based on.
-
offset: number
The offset between the fragment's tree and the document that this fragment can be used against. Add this when going from document to tree positions, subtract it to go from tree to document positions.
-
openStart: boolean
Whether the start of the fragment represents the start of a parse, or the end of a change. (In the second case, it may not be safe to reuse some nodes at the start, depending on the parsing algorithm.)
-
openEnd: boolean
Whether the end of the fragment represents the end of a full-document parse, or the start of a change.
-
static addTree() → readonly TreeFragment[]
Create a set of fragments from a freshly parsed tree, or update an existing set of fragments by replacing the ones that overlap with a tree with content from the new tree. When
partial
is true, the parse is treated as incomplete, and the resulting fragment hasopenEnd
set to true.-
static applyChanges() → readonly TreeFragment[]
Apply a set of edits to an array of fragments, removing or splitting fragments as necessary to remove edited ranges, and adjusting offsets for fragments that moved.
-
-
interface
ChangedRange The
TreeFragment.applyChanges
method expects changed ranges in this format.-
fromA: number
The start of the change in the start document
-
toA: number
The end of the change in the start document
-
fromB: number
The start of the replacement in the new document
-
toB: number
The end of the replacement in the new document
-
Mixed Parsing
-
parseMixed() → ParseWrapper
Create a parse wrapper that, after the inner parse completes, scans its tree for mixed language regions with the
nest
function, runs the resulting inner parses, and then mounts their results onto the tree.-
interface
NestedParse Objects returned by the function passed to
parseMixed
should conform to this interface.-
parser: Parser
The parser to use for the inner region.
-
overlay?: readonly {from: number, to: number}[] |
When this property is not given, the entire node is parsed with this parser, and it is mounted as a non-overlay node, replacing its host node in tree iteration.
When an array of ranges is given, only those ranges are parsed, and the tree is mounted as an overlay.
When a function is given, that function will be called for descendant nodes of the target node, not including child nodes that are covered by another nested parse, to determine the overlay ranges. When it returns true, the entire descendant is included, otherwise just the range given. The mixed parser will optimize range-finding in reused nodes, which means it's a good idea to use a function here when the target node is expected to have a large, deep structure.
-
-
class
MountedTree A mounted tree, which can be stored on a tree node to indicate that parts of its content are represented by another tree.
-
new MountedTree()
-
tree: Tree
The inner tree.
-
overlay: readonly {from: number, to: number}[] |
If this is null, this tree replaces the entire node (it will be included in the regular iteration instead of its host node). If not, only the given ranges are considered to be covered by this tree. This is used for trees that are mixed in a way that isn't strictly hierarchical. Such mounted trees are only entered by
resolveInner
andenter
.-
parser: Parser
The parser used to create this subtree.
-
@lezer/lr module
This package provides an implementation of a GLR parser that works with the parse tables generated by the parser generator.
Parsing
-
class
LRParserextends Parser
Holds the parse tables for a given grammar, as generated by
lezer-generator
, and provides methods to parse content with.-
nodeSet: NodeSet
The nodes used in the trees emitted by this parser.
-
configure(config: ParserConfig) → LRParser
Configure the parser. Returns a new parser instance that has the given settings modified. Settings not provided in
config
are kept from the original parser.-
hasWrappers() → boolean
Tells you whether any parse wrappers are registered for this parser.
-
getName(term: number) → string
Returns the name associated with a given term. This will only work for all terms when the parser was generated with the
--names
option. By default, only the names of tagged terms are stored.-
topNode: NodeType
The type of top node produced by the parser.
-
-
interface
ParserConfig Configuration options when reconfiguring a parser.
-
props?: readonly NodePropSource[]
Node prop values to add to the parser's node set.
-
top?: string
The name of the
@top
declaration to parse from. If not specified, the first top rule declaration in the grammar is used.-
dialect?: string
A space-separated string of dialects to enable.
-
tokenizers?: {from: ExternalTokenizer, to: ExternalTokenizer}[]
Replace the given external tokenizers with new ones.
-
specializers?: {}[]
Replace external specializers with new ones.
-
contextTracker?: ContextTracker<any>
Replace the context tracker with a new one.
-
strict?: boolean
When true, the parser will raise an exception, rather than run its error-recovery strategies, when the input doesn't match the grammar.
-
wrap?: ParseWrapper
Add a wrapper, which can extend parses created by this parser with additional logic (usually used to add mixed-language parsing).
-
bufferLength?: number
The maximum length of the TreeBuffers generated in the output tree. Defaults to 1024.
-
-
class
Stack A parse stack. These are used internally by the parser to track parsing progress. They also provide some properties and methods that external code such as a tokenizer can use to get information about the parse state.
-
pos: number
The input position up to which this stack has parsed.
-
context: any
The stack's current context value, if any. Its type will depend on the context tracker's type parameter, or it will be
null
if there is no context tracker.-
canShift(term: number) → boolean
Check if the given term would be able to be shifted (optionally after some reductions) on this stack. This can be useful for external tokenizers that want to make sure they only provide a given token when it applies.
-
parser: LRParser
Get the parser used by this stack.
-
dialectEnabled(dialectID: number) → boolean
Test whether a given dialect (by numeric ID, as exported from the terms file) is enabled.
-
Tokenizers
-
class
InputStream Tokenizers interact with the input through this interface. It presents the input as a stream of characters, tracking lookahead and hiding the complexity of ranges from tokenizer code.
-
next: number
The character code of the next code unit in the input, or -1 when the stream is at the end of the input.
-
pos: number
The current position of the stream. Note that, due to parses being able to cover non-contiguous ranges, advancing the stream does not always mean its position moves a single unit.
-
peek(offset: number) → number
Look at a code unit near the stream position.
.peek(0)
equals.next
,.peek(-1)
gives you the previous character, and so on.Note that looking around during tokenizing creates dependencies on potentially far-away content, which may reduce the effectiveness incremental parsing—when looking forward—or even cause invalid reparses when looking backward more than 25 code units, since the library does not track lookbehind.
-
acceptToken(token: number, endOffset?: number = 0)
Accept a token. By default, the end of the token is set to the current stream position, but you can pass an offset (relative to the stream position) to change that.
-
acceptTokenTo(token: number, endPos: number)
Accept a token ending at a specific given position.
-
advance(n?: number = 1) → number
Move the stream forward N (defaults to 1) code units. Returns the new value of
next
.
-
-
class
ExternalTokenizer @external tokens
declarations in the grammar should resolve to an instance of this class.-
new ExternalTokenizer()
Create a tokenizer. The first argument is the function that, given an input stream, scans for the types of tokens it recognizes at the stream's position, and calls
acceptToken
when it finds one.-
options
-
contextual?: boolean
When set to true, mark this tokenizer as depending on the current parse stack, which prevents its result from being cached between parser actions at the same positions.
-
fallback?: boolean
By defaults, when a tokenizer returns a token, that prevents tokenizers with lower precedence from even running. When
fallback
is true, the tokenizer is allowed to run when a previous tokenizer returned a token that didn't match any of the current state's actions.-
extend?: boolean
When set to true, tokenizing will not stop after this tokenizer has produced a token. (But it will still fail to reach this one if a higher-precedence tokenizer produced a token.)
-
-
-
-
class
ContextTracker<T>
Context trackers are used to track stateful context (such as indentation in the Python grammar, or parent elements in the XML grammar) needed by external tokenizers. You declare them in a grammar file as
@context exportName from "module"
.Context values should be immutable, and can be updated (replaced) on shift or reduce actions.
The export used in a
@context
declaration should be of this type.-
new ContextTracker(spec: Object)
Define a context tracker.
-
spec
-
start: T
The initial value of the context at the start of the parse.
-
shift?: fn() → T
Update the context when the parser executes a shift action.
-
reduce?: fn() → T
Update the context when the parser executes a reduce action.
-
reuse?: fn() → T
Update the context when the parser reuses a node from a tree fragment.
-
hash?: fn(context: T) → number
Reduce a context value to a number (for cheap storage and comparison). Only needed for strict contexts.
-
strict?: boolean
By default, nodes can only be reused during incremental parsing if they were created in the same context as the one in which they are reused. Set this to false to disable that check (and the overhead of storing the hashes).
-
-
-
@lezer/highlight module
This package provides a vocabulary for syntax-highlighting code based on a Lezer syntax tree.
-
class
Tag Highlighting tags are markers that denote a highlighting category. They are associated with parts of a syntax tree by a language mode, and then mapped to an actual CSS style by a highlighter.
Because syntax tree node types and highlight styles have to be able to talk the same language, CodeMirror uses a mostly closed vocabulary of syntax tags (as opposed to traditional open string-based systems, which make it hard for highlighting themes to cover all the tokens produced by the various languages).
It is possible to define your own highlighting tags for system-internal use (where you control both the language package and the highlighter), but such tags will not be picked up by regular highlighters (though you can derive them from standard tags to allow highlighters to fall back to those).
-
set: Tag[]
The set of this tag and all its parent tags, starting with this one itself and sorted in order of decreasing specificity.
-
toString() → string
-
static define(name?: string, parent?: Tag) → Tag
Define a new tag. If
parent
is given, the tag is treated as a sub-tag of that parent, and highlighters that don't mention this tag will try to fall back to the parent tag (or grandparent tag, etc).-
static defineModifier(name?: string) → fn(tag: Tag) → Tag
Define a tag modifier, which is a function that, given a tag, will return a tag that is a subtag of the original. Applying the same modifier to a twice tag will return the same value (
m1(t1) == m1(t1)
) and applying multiple modifiers will, regardless or order, produce the same tag (m1(m2(t1)) == m2(m1(t1))
).When multiple modifiers are applied to a given base tag, each smaller set of modifiers is registered as a parent, so that for example
m1(m2(m3(t1)))
is a subtype ofm1(m2(t1))
,m1(m3(t1)
, and so on.
-
The default set of highlighting tags.
This collection is heavily biased towards programming languages, and necessarily incomplete. A full ontology of syntactic constructs would fill a stack of books, and be impractical to write themes for. So try to make do with this set. If all else fails, open an issue to propose a new tag, or define a local custom tag for your use case.
Note that it is not obligatory to always attach the most specific tag possible to an element—if your grammar can't easily distinguish a certain type of element (such as a local variable), it is okay to style it as its more general variant (a variable).
For tags that extend some parent tag, the documentation links to the parent.
A comment.
A line comment.
A block comment.
A documentation comment.
Any kind of identifier.
The name of a variable.
A type name.
A tag name (subtag of
typeName
).A property or field name.
An attribute name (subtag of
propertyName
).The name of a class.
A label name.
A namespace name.
The name of a macro.
A literal value.
A string literal.
A documentation string.
A character literal (subtag of string).
An attribute value (subtag of string).
A number literal.
An integer number literal.
A floating-point number literal.
A boolean literal.
Regular expression literal.
An escape literal, for example a backslash escape in a string.
A color literal.
A URL literal.
A language keyword.
The keyword for the self or this object.
The keyword for null.
A keyword denoting some atomic value.
A keyword that represents a unit.
A modifier keyword.
A keyword that acts as an operator.
A control-flow related keyword.
A keyword that defines something.
A keyword related to defining or interfacing with modules.
An operator.
An operator that dereferences something.
Arithmetic-related operator.
Logical operator.
Bit operator.
Comparison operator.
Operator that updates its operand.
Operator that defines something.
Type-related operator.
Control-flow operator.
Program or markup punctuation.
Punctuation that separates things.
Bracket-style punctuation.
Angle brackets (usually
<
and>
tokens).Square brackets (usually
[
and]
tokens).Parentheses (usually
(
and)
tokens). Subtag of bracket.Braces (usually
{
and}
tokens). Subtag of bracket.Content, for example plain text in XML or markup documents.
Content that represents a heading.
A level 1 heading.
A level 2 heading.
A level 3 heading.
A level 4 heading.
A level 5 heading.
A level 6 heading.
A prose content separator (such as a horizontal rule).
Content that represents a list.
Content that represents a quote.
Content that is emphasized.
Content that is styled strong.
Content that is part of a link.
Content that is styled as code or monospace.
Content that has a strike-through style.
Inserted text in a change-tracking format.
Deleted text.
Changed text.
An invalid or unsyntactic element.
Metadata or meta-instruction.
Metadata that applies to the entire document.
Metadata that annotates or adds attributes to a given syntactic element.
Processing instruction or preprocessor directive. Subtag of meta.
Modifier that indicates that a given element is being defined. Expected to be used with the various name tags.
Modifier that indicates that something is constant. Mostly expected to be used with variable names.
Modifier used to indicate that a variable or property name is being called or defined as a function.
Modifier that can be applied to names to indicate that they belong to the language's standard environment.
Modifier that indicates a given names is local to some scope.
A generic variant modifier that can be used to tag language-specific alternative variants of some common tag. It is recommended for themes to define special forms of at least the string and variable name tags, since those come up a lot.
-
styleTags(spec: Object<Tag | readonly Tag[]>) → NodePropSource
This function is used to add a set of tags to a language syntax via
NodeSet.extend
orLRParser.configure
.The argument object maps node selectors to highlighting tags or arrays of tags.
Node selectors may hold one or more (space-separated) node paths. Such a path can be a node name, or multiple node names (or
*
wildcards) separated by slash characters, as in"Block/Declaration/VariableName"
. Such a path matches the final node but only if its direct parent nodes are the other nodes mentioned. A*
in such a path matches any parent, but only a single level—wildcards that match multiple parents aren't supported, both for efficiency reasons and because Lezer trees make it rather hard to reason about what they would match.)A path can be ended with
/...
to indicate that the tag assigned to the node should also apply to all child nodes, even if they match their own style (by default, only the innermost style is used).When a path ends in
!
, as inAttribute!
, no further matching happens for the node's child nodes, and the entire node gets the given style.In this notation, node names that contain
/
,!
,*
, or...
must be quoted as JSON strings.For example:
parser.configure({props: [ styleTags({ // Style Number and BigNumber nodes "Number BigNumber": tags.number, // Style Escape nodes whose parent is String "String/Escape": tags.escape, // Style anything inside Attributes nodes "Attributes!": tags.meta, // Add a style to all content inside Italic nodes "Italic/...": tags.emphasis, // Style InvalidString nodes as both `string` and `invalid` "InvalidString": [tags.string, tags.invalid], // Style the node named "/" as punctuation '"/"': tags.punctuation }) ]})
-
getStyleTags(node: SyntaxNodeRef) → {tags: readonly Tag[], opaque: boolean, inherit: boolean} |
Match a syntax node's highlight rules. If there's a match, return its set of tags, and whether it is opaque (uses a
!
) or applies to all child nodes (/...
).-
interface
Highlighter A highlighter defines a mapping from highlighting tags and language scopes to CSS class names. They are usually defined via
tagHighlighter
or some wrapper around that, but it is also possible to implement them from scratch.-
style(tags: readonly Tag[]) → string | null
Get the set of classes that should be applied to the given set of highlighting tags, or null if this highlighter doesn't assign a style to the tags.
-
scope?: fn(node: NodeType) → boolean
When given, the highlighter will only be applied to trees on whose top node this predicate returns true.
-
-
tagHighlighter() → Highlighter
Define a highlighter from an array of tag/class pairs. Classes associated with more specific tags will take precedence.
-
options
-
scope?: fn(node: NodeType) → boolean
By default, highlighters apply to the entire document. You can scope them to a single language by providing the tree's top node type here.
-
all?: string
Add a style to all tokens. Probably only useful in combination with
scope
.
-
-
-
highlightCode(
putBreak: fn(),) Highlight the given tree with the given highlighter, calling
putText
for every piece of text, either with a set of classes or with the empty string when unstyled, andputBreak
for every line break.-
highlightTree()
Highlight the given tree with the given highlighter. Often, the higher-level
highlightCode
function is easier to use.-
putStyle(from: number, to: number, classes: string)
Assign styling to a region of the text. Will be called, in order of position, for any ranges where more than zero classes apply.
classes
is a space separated string of CSS classes.-
from
The start of the range to highlight.
-
to
The end of the range.
-
-
classHighlighter: Highlighter
This is a highlighter that adds stable, predictable classes to tokens, for styling with external CSS.
The following tags are mapped to their name prefixed with
"tok-"
(for example"tok-comment"
):link
heading
emphasis
strong
keyword
atom
bool
url
labelName
inserted
deleted
literal
string
number
variableName
typeName
namespace
className
macroName
propertyName
operator
comment
meta
punctuation
invalid
In addition, these mappings are provided:
regexp
,escape
, andspecial
(string)
are mapped to"tok-string2"
special
(variableName)
to"tok-variableName2"
local
(variableName)
to"tok-variableName tok-local"
definition
(variableName)
to"tok-variableName tok-definition"
definition
(propertyName)
to"tok-propertyName tok-definition"
@lezer/generator module
The parser generator is usually ran through its command-line interface, but can also be invoked as a JavaScript function.
-
type
BuildOptions -
fileName?: string
The name of the grammar file
-
warn?: fn(message: string)
A function that should be called with warnings. The default is to call
console.warn
.-
includeNames?: boolean
Whether to include term names in the output file. Defaults to false.
-
moduleStyle?: string
Determines the module system used by the output file. Can be either
"cjs"
(CommonJS) or"es"
(ES2015 module), defaults to"es"
.-
typeScript?: boolean
Set this to true to output TypeScript code instead of plain JavaScript.
-
exportName?: string
The name of the export that holds the parser in the output file. Defaults to
"parser"
.-
externalTokenizer?: fn(name: string, terms: Object<number>) → ExternalTokenizer
When calling
buildParser
, this can be used to provide placeholders for external tokenizers.-
externalPropSource?: fn(name: string) → NodePropSource
Used by
buildParser
to resolve external prop sources.-
externalSpecializer?: fn(name: string, terms: Object<number>) → fn(value: string, stack: Stack) → number
Provide placeholders for external specializers when using
buildParser
.-
externalProp?: fn(name: string) → NodeProp<any>
If given, will be used to initialize external props in the parser returned by
buildParser
.-
contextTracker?: ContextTracker<any> |
If given, will be used as context tracker in a parser built with
buildParser
.
-
-
buildParserFile(text: string, options?: BuildOptions = {}) → {parser: string, terms: string}
Build the code that represents the parser tables for a given grammar description. The
parser
property in the return value holds the main file that exports theParser
instance. Theterms
property holds a declaration file that defines constants for all of the named terms in grammar, holding their ids as value. This is useful when external code, such as a tokenizer, needs to be able to use these ids. It is recommended to run a tree-shaking bundler when importing this file, since you usually only need a handful of the many terms in your code.-
buildParser(text: string, options?: BuildOptions = {}) → LRParser
Build an in-memory parser instance for a given grammar. This is mostly useful for testing. If your grammar uses external tokenizers, you'll have to provide the
externalTokenizer
option for the returned parser to be able to parse anything.-
class
GenErrorextends Error
The type of error raised when the parser generator finds an issue.