Skip to content

Commit 6518e55

Browse files
javascript regexp cheatsheet
1 parent 5adf78c commit 6518e55

File tree

3 files changed

+393
-0
lines changed

3 files changed

+393
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,393 @@
1+
---
2+
title: "JavaScript regular expressions cheatsheet and examples"
3+
categories:
4+
- cheatsheet
5+
- javascript
6+
tags:
7+
- javascript
8+
- regular-expressions
9+
- examples
10+
date: 2019-12-06T14:37:42
11+
---
12+
13+
This blog post gives an overview of regular expression syntax and features supported by JavaScript. Examples have been tested on Chrome/Chromium console (version 78+) and includes features not available in other browsers and platforms. Assume ASCII character set unless otherwise specified. This post is an excerpt from my [JavaScript RegExp](https://github.com/learnbyexample/learn_js_regexp) book.
14+
15+
## Cheatsheet
16+
17+
| Note | Description |
18+
| ------- | ----------- |
19+
| [MDN: Regular Expressions](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions) | MDN documentation for JavaScript regular expressions |
20+
| `/pat/` | a RegExp object |
21+
| `const pet = /dog/` | save regexp in a variable for reuse, clarity, etc |
22+
| `/pat/.test(s)` | Check if given pattern is present anywhere in input string |
23+
| | returns `true` or `false` |
24+
| `i` | flag to ignore case when matching alphabets |
25+
| `g` | flag to match all occurrences |
26+
| `new RegExp('pat', 'i')` | construct RegExp from a string |
27+
| | second argument specifies flags |
28+
| | use backtick strings with `${}` for interpolation |
29+
| `source` | property to convert RegExp object to string |
30+
| | helps to insert a RegExp inside another RegExp |
31+
| `flags` | property to get flags of a RegExp object |
32+
| `s.replace(/pat/, 'repl')` | method for search and replace |
33+
| `s.search(/pat/)` | gives starting location of the match or `-1` |
34+
| `s.split(/pat/)` | split a string based on regexp |
35+
36+
| Anchors | Description |
37+
| ------------- | ----------- |
38+
| `^` | restricts the match to start of string |
39+
| `$` | restricts the match to end of string |
40+
| `\n` | line separator |
41+
| `m` | flag to match the start/end of line with `^` and `$` anchors |
42+
| `\b` | restricts the match to start/end of words |
43+
| | word characters: alphabets, digits, underscore |
44+
| `\B` | matches wherever `\b` doesn't match |
45+
46+
`^`, `$` and `\` are **metacharacters** in the above table, as these characters have special meaning. Prefix a `\` character to remove the special meaning and match such characters literally. For example, `\^` will match a `^` character instead of acting as an anchor.
47+
48+
| Feature | Description |
49+
| ------------- | ----------- |
50+
| `pat1|pat2|pat3` | multiple regexp combined as OR conditional |
51+
| | each alternative can have independent anchors |
52+
| `(pat)` | group pattern(s), also a capturing group |
53+
| `a(b|c)d` | same as `abd|acd` |
54+
| `(?:pat)` | non-capturing group |
55+
| `(?<name>pat)` | named capture group |
56+
| `.` | match any character except `\r` and `\n` characters |
57+
| `[]` | Character class, matches one character among many |
58+
59+
| Greedy Quantifiers | Description |
60+
| ------------- | ----------- |
61+
| `?` | match `0` or `1` times |
62+
| `*` | match `0` or more times |
63+
| `+` | match `1` or more times |
64+
| `{m,n}` | match `m` to `n` times |
65+
| `{m,}` | match at least `m` times |
66+
| `{n}` | match exactly `n` times |
67+
| `pat1.*pat2` | any number of characters between `pat1` and `pat2` |
68+
| `pat1.*pat2|pat2.*pat1` | match both `pat1` and `pat2` in any order |
69+
70+
**Greedy** here means that the above quantifiers will match as much as possible that'll also honor the overall regexp. Appending a `?` to greedy quantifiers makes them **non-greedy**, i.e. match as *minimally* as possible. Quantifiers can be applied to literal characters, groups, backreferences and character classes.
71+
72+
| Character class | Description |
73+
| ------------- | ----------- |
74+
| `[ae;o]` | match **any** of these characters once |
75+
| `[3-7]` | **range** of characters from `3` to `7` |
76+
| `[^=b2]` | **negated set**, match other than `=` or `b` or `2` |
77+
| `[a-z-]` | `-` should be first/last or escaped using `\` to match literally |
78+
| `[+^]` | `^` shouldn't be first character or escaped using `\` |
79+
| `[\]\\]` | `]` and `\` should be escaped using `\` |
80+
| `\w` | similar to `[A-Za-z0-9_]` for matching word characters |
81+
| `\d` | similar to `[0-9]` for matching digit characters |
82+
| `\s` | similar to `[ \t\n\r\f\v]` for matching whitespace characters |
83+
| | use `\W`, `\D`, and `\S` for their opposites respectively |
84+
| `u` | flag to enable unicode matching |
85+
| `\p{}` | Unicode character sets |
86+
| `\P{}` | negated unicode character sets |
87+
| | see [MDN: Unicode property escapes](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions/Unicode_Property_Escapes) for details |
88+
| `\u{}` | specify unicode characters using codepoints |
89+
90+
| Lookarounds | Description |
91+
| ------- | ----------- |
92+
| lookarounds | allows to create custom positive/negative assertions |
93+
| | zero-width like anchors and not part of matching portions |
94+
| `(?!pat)` | negative lookahead assertion |
95+
| `(?<!pat)` | negative lookbehind assertion |
96+
| `(?=pat)` | positive lookahead assertion |
97+
| `(?<=pat)` | positive lookbehind assertion |
98+
| | variable length lookbehind is allowed |
99+
| `(?!pat1)(?=pat2)` | multiple assertions can be specified next to each other in any order |
100+
| | as they mark a matching location without consuming characters |
101+
| `((?!pat).)*` | Negates a regexp pattern |
102+
103+
| Matched portion | Description |
104+
| ------------- | ----------- |
105+
| `m = s.match(/pat/)` | assuming `g` flag isn't used and regexp succeeds, |
106+
| | returns an array with matched portion and 3 properties |
107+
| | `index` property gives the starting location of the match |
108+
| | `input` property gives the input string `s` |
109+
| | `groups` property gives dictionary of named capture groups |
110+
| `m[0]` | for above case, gives entire matched portion |
111+
| `m[N]` | matched portion of Nth capture group |
112+
| `s.match(/pat/g)` | returns only the matched portions, no properties |
113+
| `s.matchAll(/pat/g)` | returns an iterator containing details for |
114+
| | each matched portion and its properties |
115+
| Backreference | gives matched portion of Nth capture group |
116+
| | use `$1`, `$2`, `$3`, etc in replacement section |
117+
| | `$&` gives entire matched portion |
118+
| | `` $` `` gives string before the matched portion |
119+
| | `$'` gives string after the matched portion |
120+
| | use `\1`, `\2`, `\3`, etc within regexp definition |
121+
| `$$` | insert `$` literally in replacement section |
122+
| `$0N` | same as `$N`, allows to separate backreference and other digits |
123+
| `\N\xhh` | allows to separate backreference and digits in regexp definition |
124+
| `(?<name>pat)` | named capture group |
125+
| | use `\k<name>` for backreferencing in regexp definition |
126+
| | use `$<name>` for backreferencing in replacement section |
127+
128+
## Examples
129+
130+
* `test` method
131+
132+
```js
133+
> let sentence = 'This is a sample string'
134+
135+
> /is/.test(sentence)
136+
< true
137+
> /xyz/.test(sentence)
138+
< false
139+
140+
> if (/ring/.test(sentence)) {
141+
console.log('mission success')
142+
}
143+
< mission success
144+
```
145+
146+
* `new RegExp()` constructor
147+
148+
```js
149+
> new RegExp('dog', 'i')
150+
< /dog/i
151+
152+
> new RegExp('123\\tabc')
153+
< /123\tabc/
154+
155+
> let greeting = 'hi'
156+
> new RegExp(`${greeting.toUpperCase()} there`)
157+
< /HI there/
158+
```
159+
160+
* string and line anchors
161+
162+
```js
163+
// string anchors
164+
> /^cat/.test('cater')
165+
< true
166+
> ['surrender', 'unicorn', 'newer', 'door'].filter(w => /er$/.test(w))
167+
< ["surrender", "newer"]
168+
169+
// use 'm' flag to change string anchors to line anchors
170+
> /^par$/m.test('spare\npar\nera\ndare')
171+
< true
172+
173+
// escape metacharacters to match them literally
174+
> /b\^2/.test('a^2 + b^2 - C*3')
175+
< true
176+
```
177+
178+
* `replace` method and word boundaries
179+
180+
```js
181+
> let items = 'catapults\nconcatenate\ncat'
182+
> console.log(items.replace(/^/gm, '* '))
183+
< * catapults
184+
* concatenate
185+
* cat
186+
187+
> let sample = 'par spar apparent spare part'
188+
// replace 'par' only at the start of word
189+
> sample.replace(/\bpar/g, 'X')
190+
< "X spar apparent spare Xt"
191+
// replace 'par' at the end of word but not whole word 'par'
192+
> sample.replace(/\Bpar\b/g, 'X')
193+
< "par sX apparent spare part"
194+
```
195+
196+
* alternations and grouping
197+
198+
```js
199+
// replace either 'cat' at start of string or 'cat' at end of word
200+
> 'catapults concatenate cat scat'.replace(/^cat|cat\b/g, 'X')
201+
< "Xapults concatenate X sX"
202+
203+
// same as: /\bpark\b|\bpart\b/g
204+
> 'park parked part party'.replace(/\bpar(k|t)\b/g, 'X')
205+
< "X parked X party"
206+
```
207+
208+
* [MDN: Regular Expressions doc](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions) provides `escapeRegExp` function, useful to automatically escape metacharacters.
209+
* See also [XRegExp](https://github.com/slevithan/xregexp) utility which provides [XRegExp.escape](http://xregexp.com/api/#escape) and [XRegExp.union](http://xregexp.com/api/#union) methods. The union method has additional functionality of allowing a mix of string and RegExp literals and also takes care of renumbering backreferences.
210+
211+
```js
212+
> function escapeRegExp(string) {
213+
return string.replace(/[.*+?^${}()|[\]\\]/g, '\\$&')
214+
}
215+
216+
> function unionRegExp(arr) {
217+
return arr.map(w => escapeRegExp(w)).join('|')
218+
}
219+
220+
> new RegExp(unionRegExp(['c^t', 'dog$', 'f|x']), 'g')
221+
< /c\^t|dog\$|f\|x/g
222+
```
223+
224+
* dot metacharacter and quantifiers
225+
226+
```js
227+
// matches character '2', any character and then character '3'
228+
> '42\t33'.replace(/2.3/, '8')
229+
< "483"
230+
// 's' flag will allow newline character to be matched as well
231+
> 'Hi there\nHave a Nice Day'.replace(/the.*ice/s, 'X')
232+
< "Hi X Day"
233+
234+
// same as: /part|parrot|parent/g
235+
> 'par part parrot parent'.replace(/par(en|ro)?t/g, 'X')
236+
< "par X X X"
237+
238+
> ['abc', 'ac', 'abbc', 'xabbbcz', 'abbbbbc'].filter(w => /ab{1,4}c/.test(w))
239+
< ["abc", "abbc", "xabbbcz"]
240+
```
241+
242+
* `match` method
243+
244+
```js
245+
// entire matched portion
246+
> 'abc ac adc abbbc'.match(/a(.*)d(.*a)/)[0]
247+
< "abc ac adc a"
248+
// matched portion of 2nd capture group
249+
> 'abc ac adc abbbc'.match(/a(.*)d(.*a)/)[2]
250+
< "c a"
251+
// get location of matching portion
252+
> 'cat and dog'.match(/dog/).index
253+
< 8
254+
255+
// get all matching portions with 'g' flag, no properties or group portions
256+
> 'par spar apparent spare part'.match(/\bs?par[et]\b/g)
257+
< ["spare", "part"]
258+
259+
// useful for debugging purposes as well before using 'replace'
260+
> 'that is quite a fabricated tale'.match(/t.*?a/g)
261+
< ["tha", "t is quite a", "ted ta"]
262+
```
263+
264+
* `matchAll` method
265+
266+
```js
267+
// same as: match(/ab*c/g)
268+
> Array.from('abc ac adc abbbc'.matchAll(/ab*c/g), m => m[0])
269+
< ["abc", "ac", "abbbc"]
270+
// get index for each match
271+
> Array.from('abc ac adc abbbc'.matchAll(/ab*c/g), m => m.index)
272+
< [0, 4, 11]
273+
274+
// get only capture group portions as an array for each match
275+
> Array.from('xx:yyy x: x:yy :y'.matchAll(/(x*):(y*)/g), m => m.slice(1))
276+
< (4) [Array(2), Array(2), Array(2), Array(2)]
277+
0: (2) ["xx", "yyy"]
278+
1: (2) ["x", ""]
279+
2: (2) ["x", "yy"]
280+
3: (2) ["", "y"]
281+
length: 4
282+
__proto__: Array(0)
283+
```
284+
285+
* function/dictionary in replacement section
286+
287+
```js
288+
> function titleCase(m, g1, g2) {
289+
return g1.toUpperCase() + g2.toLowerCase()
290+
}
291+
> 'aBc ac ADC aBbBC'.replace(/(a)(.*?c)/ig, titleCase)
292+
< "Abc Ac Adc Abbbc"
293+
294+
> '1 42 317'.replace(/\d+/g, m => m*2)
295+
< "2 84 634"
296+
297+
> let swap = { 'cat': 'tiger', 'tiger': 'cat' }
298+
> 'cat tiger dog tiger cat'.replace(/cat|tiger/g, k => swap[k])
299+
< "tiger cat dog cat tiger"
300+
```
301+
302+
* `split` method
303+
304+
```js
305+
// split based on one or more digit characters
306+
> 'Sample123string42with777numbers'.split(/\d+/)
307+
< ["Sample", "string", "with", "numbers"]
308+
// use capture group to include the portion that caused the split as well
309+
> 'Sample123string42with777numbers'.split(/(\d+)/)
310+
< ["Sample", "123", "string", "42", "with", "777", "numbers"]
311+
312+
// split based on digit or whitespace characters
313+
> '**1\f2\n3star\t7 77\r**'.split(/[\d\s]+/)
314+
< ["**", "star", "**"]
315+
316+
// use non-capturing group if capturing is not needed
317+
> '123handed42handy777handful500'.split(/hand(?:y|ful)?/)
318+
< ["123", "ed42", "777", "500"]
319+
```
320+
321+
* backreferencing with normal/non-capturing/named capture groups
322+
323+
```js
324+
// remove any number of consecutive duplicate words separated by space
325+
> 'aa a a a 42 f_1 f_1 f_13.14'.replace(/\b(\w+)( \1)+\b/g, '$1')
326+
< "aa a 42 f_1 f_13.14"
327+
328+
// add something around the entire matched portion
329+
> '52 apples and 31 mangoes'.replace(/\d+/g, '($&)')
330+
< "(52) apples and (31) mangoes"
331+
332+
// duplicate first field and add it as last field
333+
> 'fork,42,nice,3.14'.replace(/,.+/, '$&,$`')
334+
< "fork,42,nice,3.14,fork"
335+
336+
// use non-capturing groups when backreferencing isn't needed
337+
> '1,2,3,4,5,6,7'.replace(/^((?:[^,]+,){3})([^,]+)/, '$1($2)')
338+
< "1,2,3,(4),5,6,7"
339+
340+
// named capture groups, same as: replace(/(\w+),(\w+)/g, '$2,$1')
341+
> 'good,bad 42,24'.replace(/(?<fw>\w+),(?<sw>\w+)/g, '$<sw>,$<fw>')
342+
< "bad,good 24,42"
343+
```
344+
345+
* examples for lookarounds
346+
347+
```js
348+
// change 'foo' only if it is not followed by a digit character
349+
// note that end of string satisfies the given assertion
350+
// 'foofoo' has two matches as the assertion doesn't consume characters
351+
> 'hey food! foo42 foot5 foofoo'.replace(/foo(?!\d)/g, 'baz')
352+
< "hey bazd! foo42 bazt5 bazbaz"
353+
354+
// change whole word only if it is not preceded by : or --
355+
> ':cart apple --rest ;tea'.replace(/(?<!:|--)\b\w+/g, 'X')
356+
< ":cart X --rest ;X"
357+
358+
// extract digits only if it is preceded by - and followed by , or ;
359+
> '42 foo-5, baz3; x83, y-20; f12'.match(/(?<=-)\d+(?=[;,])/g)
360+
< ["5", "20"]
361+
362+
// words containing all vowels in any order
363+
> let words = ['sequoia', 'subtle', 'questionable', 'exhibit', 'equation']
364+
> words.filter(w => /(?=.*a)(?=.*e)(?=.*i)(?=.*o).*u/.test(w))
365+
< ["sequoia", "questionable", "equation"]
366+
367+
// replace only 3rd occurrence of 'cat'
368+
> 'cat scatter cater scat'.replace(/(?<=(cat.*?){2})cat/, 'X')
369+
< "cat scatter Xer scat"
370+
371+
// match if 'do' is not there between 'at' and 'par'
372+
> /at((?!do).)*par/.test('fox,cat,dog,parrot')
373+
< false
374+
```
375+
376+
## Debugging and Visualization tools
377+
378+
As your regexp gets complicated, it can get difficult to debug if you run into issues. Building your regexp step by step from scratch and testing against input strings will go a long way in correcting the problem. To aid in such a process, you could use [various online regexp tools](https://news.ycombinator.com/item?id=20614847).
379+
380+
[regex101](https://regex101.com/r/HSeO0z/1) is a popular site to test your regexp. You'll have first choose the flavor as JavaScript. Then you can add your regexp, input strings, choose flags and an optional replacement string. Matching portions will be highlighted and explanation is offered in separate panes. There's also a quick reference and other features like sharing, code generator, quiz, etc.
381+
382+
![regex101 example]({{ '/images/books/regex101.png' | absolute_url }}){: .align-center}
383+
384+
Another useful tool is [jex: regulex](https://jex.im/regulex/#!flags=&re=%5Cbpar(en%7Cro)%3Ft%5Cb) which converts your regexp to a rail road diagram, thus providing a visual aid to understanding the pattern.
385+
386+
![regulex example]({{ '/images/books/regulex.png' | absolute_url }}){: .align-center}
387+
388+
## JavaScript RegExp book
389+
390+
Visit my repo [learn_js_regexp](https://github.com/learnbyexample/learn_js_regexp) for details about the book I wrote on JavaScript regular expressions. The ebook uses plenty of examples to explain the concepts from the basics and includes [exercises](https://github.com/learnbyexample/learn_js_regexp/blob/master/Exercises.md) to test your understanding. The cheatsheet and examples presented in this post are based on contents of this book.
391+
392+
![JavaScript cover image]({{ '/images/books/js_regexp.png' | absolute_url }}){: .align-center}
393+

images/books/regex101.png

26.5 KB
Loading

images/books/regulex.png

5.11 KB
Loading

0 commit comments

Comments
 (0)