Skip to content

Commit 5528d24

Browse files
Docs: Fixed the backslash escaping on the pattern analyzer page
Closes elastic#11099
1 parent a6ccd68 commit 5528d24

File tree

1 file changed

+72
-73
lines changed

1 file changed

+72
-73
lines changed

docs/reference/analysis/analyzers/pattern-analyzer.asciidoc

Lines changed: 72 additions & 73 deletions
Original file line numberDiff line numberDiff line change
@@ -7,16 +7,13 @@ via a regular expression. Accepts the following settings:
77
The following are settings that can be set for a `pattern` analyzer
88
type:
99

10-
[cols="<,<",options="header",]
11-
|===================================================================
12-
|Setting |Description
13-
|`lowercase` |Should terms be lowercased or not. Defaults to `true`.
14-
|`pattern` |The regular expression pattern, defaults to `\W+`.
15-
|`flags` |The regular expression flags.
16-
|`stopwords` |A list of stopwords to initialize the stop filter with.
17-
Defaults to an 'empty' stopword list Check
18-
<<analysis-stop-analyzer,Stop Analyzer>> for more details.
19-
|===================================================================
10+
[horizontal]
11+
`lowercase`:: Should terms be lowercased or not. Defaults to `true`.
12+
`pattern`:: The regular expression pattern, defaults to `\W+`.
13+
`flags`:: The regular expression flags.
14+
`stopwords`:: A list of stopwords to initialize the stop filter with.
15+
Defaults to an 'empty' stopword list Check
16+
<<analysis-stop-analyzer,Stop Analyzer>> for more details.
2017

2118
*IMPORTANT*: The regular expression should match the *token separators*,
2219
not the tokens themselves.
@@ -29,101 +26,103 @@ Pattern API] for more details about `flags` options.
2926
==== Pattern Analyzer Examples
3027

3128
In order to try out these examples, you should delete the `test` index
32-
before running each example:
33-
34-
[source,js]
35-
--------------------------------------------------
36-
curl -XDELETE localhost:9200/test
37-
--------------------------------------------------
29+
before running each example.
3830

3931
[float]
4032
===== Whitespace tokenizer
4133

4234
[source,js]
4335
--------------------------------------------------
44-
curl -XPUT 'localhost:9200/test' -d '
45-
{
46-
"settings":{
47-
"analysis": {
48-
"analyzer": {
49-
"whitespace":{
50-
"type": "pattern",
51-
"pattern":"\\\\s+"
52-
}
53-
}
54-
}
36+
DELETE test
37+
38+
PUT /test
39+
{
40+
"settings": {
41+
"analysis": {
42+
"analyzer": {
43+
"whitespace": {
44+
"type": "pattern",
45+
"pattern": "\\s+"
5546
}
56-
}'
47+
}
48+
}
49+
}
50+
}
5751
58-
curl 'localhost:9200/test/_analyze?pretty=1&analyzer=whitespace' -d 'foo,bar baz'
59-
# "foo,bar", "baz"
52+
GET /test/_analyze?analyzer=whitespace&text=foo,bar baz
53+
54+
# "foo,bar", "baz"
6055
--------------------------------------------------
56+
// AUTOSENSE
6157

6258
[float]
6359
===== Non-word character tokenizer
6460

6561
[source,js]
6662
--------------------------------------------------
67-
68-
curl -XPUT 'localhost:9200/test' -d '
69-
{
70-
"settings":{
71-
"analysis": {
72-
"analyzer": {
73-
"nonword":{
74-
"type": "pattern",
75-
"pattern":"[^\\\\w]+"
76-
}
77-
}
78-
}
63+
DELETE test
64+
65+
PUT /test
66+
{
67+
"settings": {
68+
"analysis": {
69+
"analyzer": {
70+
"nonword": {
71+
"type": "pattern",
72+
"pattern": "[^\\w]+" <1>
7973
}
80-
}'
74+
}
75+
}
76+
}
77+
}
8178
82-
curl 'localhost:9200/test/_analyze?pretty=1&analyzer=nonword' -d 'foo,bar baz'
83-
# "foo,bar baz" becomes "foo", "bar", "baz"
79+
GET /test/_analyze?analyzer=nonword&text=foo,bar baz
80+
# "foo,bar baz" becomes "foo", "bar", "baz"
8481
85-
curl 'localhost:9200/test/_analyze?pretty=1&analyzer=nonword' -d 'type_1-type_4'
86-
# "type_1","type_4"
82+
GET /test/_analyze?analyzer=nonword&text=type_1-type_4
83+
# "type_1","type_4"
8784
--------------------------------------------------
85+
// AUTOSENSE
86+
8887

8988
[float]
9089
===== CamelCase tokenizer
9190

9291
[source,js]
9392
--------------------------------------------------
94-
95-
curl -XPUT 'localhost:9200/test?pretty=1' -d '
96-
{
97-
"settings":{
98-
"analysis": {
99-
"analyzer": {
100-
"camel":{
101-
"type": "pattern",
102-
"pattern":"([^\\\\p{L}\\\\d]+)|(?<=\\\\D)(?=\\\\d)|(?<=\\\\d)(?=\\\\D)|(?<=[\\\\p{L}&&[^\\\\p{Lu}]])(?=\\\\p{Lu})|(?<=\\\\p{Lu})(?=\\\\p{Lu}[\\\\p{L}&&[^\\\\p{Lu}]])"
103-
}
104-
}
105-
}
93+
DELETE test
94+
95+
PUT /test?pretty=1
96+
{
97+
"settings": {
98+
"analysis": {
99+
"analyzer": {
100+
"camel": {
101+
"type": "pattern",
102+
"pattern": "([^\\p{L}\\d]+)|(?<=\\D)(?=\\d)|(?<=\\d)(?=\\D)|(?<=[\\p{L}&&[^\\p{Lu}]])(?=\\p{Lu})|(?<=\\p{Lu})(?=\\p{Lu}[\\p{L}&&[^\\p{Lu}]])"
106103
}
107-
}'
104+
}
105+
}
106+
}
107+
}
108108
109-
curl 'localhost:9200/test/_analyze?pretty=1&analyzer=camel' -d '
110-
MooseX::FTPClass2_beta
111-
'
112-
# "moose","x","ftp","class","2","beta"
109+
GET /test/_analyze?analyzer=camel&text=MooseX::FTPClass2_beta
110+
# "moose","x","ftp","class","2","beta"
113111
--------------------------------------------------
112+
// AUTOSENSE
114113

115114
The regex above is easier to understand as:
116115

117116
[source,js]
118117
--------------------------------------------------
119118
120-
([^\\p{L}\\d]+) # swallow non letters and numbers,
121-
| (?<=\\D)(?=\\d) # or non-number followed by number,
122-
| (?<=\\d)(?=\\D) # or number followed by non-number,
123-
| (?<=[ \\p{L} && [^\\p{Lu}]]) # or lower case
124-
(?=\\p{Lu}) # followed by upper case,
125-
| (?<=\\p{Lu}) # or upper case
126-
(?=\\p{Lu} # followed by upper case
127-
[\\p{L}&&[^\\p{Lu}]] # then lower case
128-
)
119+
([^\p{L}\d]+) # swallow non letters and numbers,
120+
| (?<=\D)(?=\d) # or non-number followed by number,
121+
| (?<=\d)(?=\D) # or number followed by non-number,
122+
| (?<=[ \p{L} && [^\p{Lu}]]) # or lower case
123+
(?=\p{Lu}) # followed by upper case,
124+
| (?<=\p{Lu}) # or upper case
125+
(?=\p{Lu} # followed by upper case
126+
[\p{L}&&[^\p{Lu}]] # then lower case
127+
)
129128
--------------------------------------------------

0 commit comments

Comments
 (0)