regexp-syntax.asciidoc 4.8 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224
  1. [[regexp-syntax]]
  2. == Regular expression syntax
  3. A https://en.wikipedia.org/wiki/Regular_expression[regular expression] is a way to
  4. match patterns in data using placeholder characters, called operators.
  5. {es} supports regular expressions in the following queries:
  6. * <<query-dsl-regexp-query, `regexp`>>
  7. * <<query-dsl-query-string-query, `query_string`>>
  8. {es} uses https://lucene.apache.org/core/[Apache Lucene]'s regular expression
  9. engine to parse these queries.
  10. [discrete]
  11. [[regexp-reserved-characters]]
  12. === Reserved characters
  13. Lucene's regular expression engine supports all Unicode characters. However, the
  14. following characters are reserved as operators:
  15. ....
  16. . ? + * | { } [ ] ( ) " \
  17. ....
  18. Depending on the <<regexp-optional-operators, optional operators>> enabled, the
  19. following characters may also be reserved:
  20. ....
  21. # @ & < > ~
  22. ....
  23. To use one of these characters literally, escape it with a preceding
  24. backslash or surround it with double quotes. For example:
  25. ....
  26. \@ # renders as a literal '@'
  27. \\ # renders as a literal '\'
  28. "john@smith.com" # renders as 'john@smith.com'
  29. ....
  30. [discrete]
  31. [[regexp-standard-operators]]
  32. === Standard operators
  33. Lucene's regular expression engine does not use the
  34. https://en.wikipedia.org/wiki/Perl_Compatible_Regular_Expressions[Perl
  35. Compatible Regular Expressions (PCRE)] library, but it does support the
  36. following standard operators.
  37. `.`::
  38. +
  39. --
  40. Matches any character. For example:
  41. ....
  42. ab. # matches 'aba', 'abb', 'abz', etc.
  43. ....
  44. --
  45. `?`::
  46. +
  47. --
  48. Repeat the preceding character zero or one times. Often used to make the
  49. preceding character optional. For example:
  50. ....
  51. abc? # matches 'ab' and 'abc'
  52. ....
  53. --
  54. `+`::
  55. +
  56. --
  57. Repeat the preceding character one or more times. For example:
  58. ....
  59. ab+ # matches 'ab', 'abb', 'abbb', etc.
  60. ....
  61. --
  62. `*`::
  63. +
  64. --
  65. Repeat the preceding character zero or more times. For example:
  66. ....
  67. ab* # matches 'a', 'ab', 'abb', 'abbb', etc.
  68. ....
  69. --
  70. `{}`::
  71. +
  72. --
  73. Minimum and maximum number of times the preceding character can repeat. For
  74. example:
  75. ....
  76. a{2} # matches 'aa'
  77. a{2,4} # matches 'aa', 'aaa', and 'aaaa'
  78. a{2,} # matches 'a` repeated two or more times
  79. ....
  80. --
  81. `|`::
  82. +
  83. --
  84. OR operator. The match will succeed if the longest pattern on either the left
  85. side OR the right side matches. For example:
  86. ....
  87. abc|xyz # matches 'abc' and 'xyz'
  88. ....
  89. --
  90. `( … )`::
  91. +
  92. --
  93. Forms a group. You can use a group to treat part of the expression as a single
  94. character. For example:
  95. ....
  96. abc(def)? # matches 'abc' and 'abcdef' but not 'abcd'
  97. ....
  98. --
  99. `[ … ]`::
  100. +
  101. --
  102. Match one of the characters in the brackets. For example:
  103. ....
  104. [abc] # matches 'a', 'b', 'c'
  105. ....
  106. Inside the brackets, `-` indicates a range unless `-` is the first character or
  107. escaped. For example:
  108. ....
  109. [a-c] # matches 'a', 'b', or 'c'
  110. [-abc] # '-' is first character. Matches '-', 'a', 'b', or 'c'
  111. [abc\-] # Escapes '-'. Matches 'a', 'b', 'c', or '-'
  112. ....
  113. A `^` before a character in the brackets negates the character or range. For
  114. example:
  115. ....
  116. [^abc] # matches any character except 'a', 'b', or 'c'
  117. [^a-c] # matches any character except 'a', 'b', or 'c'
  118. [^-abc] # matches any character except '-', 'a', 'b', or 'c'
  119. [^abc\-] # matches any character except 'a', 'b', 'c', or '-'
  120. ....
  121. --
  122. [discrete]
  123. [[regexp-optional-operators]]
  124. === Optional operators
  125. You can use the `flags` parameter to enable more optional operators for
  126. Lucene's regular expression engine.
  127. To enable multiple operators, use a `|` separator. For example, a `flags` value
  128. of `COMPLEMENT|INTERVAL` enables the `COMPLEMENT` and `INTERVAL` operators.
  129. [discrete]
  130. ==== Valid values
  131. `ALL` (Default)::
  132. Enables all optional operators.
  133. `COMPLEMENT`::
  134. +
  135. --
  136. Enables the `~` operator. You can use `~` to negate the shortest following
  137. pattern. For example:
  138. ....
  139. a~bc # matches 'adc' and 'aec' but not 'abc'
  140. ....
  141. --
  142. `INTERVAL`::
  143. +
  144. --
  145. Enables the `<>` operators. You can use `<>` to match a numeric range. For
  146. example:
  147. ....
  148. foo<1-100> # matches 'foo1', 'foo2' ... 'foo99', 'foo100'
  149. foo<01-100> # matches 'foo01', 'foo02' ... 'foo99', 'foo100'
  150. ....
  151. --
  152. `INTERSECTION`::
  153. +
  154. --
  155. Enables the `&` operator, which acts as an AND operator. The match will succeed
  156. if patterns on both the left side AND the right side matches. For example:
  157. ....
  158. aaa.+&.+bbb # matches 'aaabbb'
  159. ....
  160. --
  161. `ANYSTRING`::
  162. +
  163. --
  164. Enables the `@` operator. You can use `@` to match any entire
  165. string.
  166. You can combine the `@` operator with `&` and `~` operators to create an
  167. "everything except" logic. For example:
  168. ....
  169. @&~(abc.+) # matches everything except terms beginning with 'abc'
  170. ....
  171. --
  172. [discrete]
  173. [[regexp-unsupported-operators]]
  174. === Unsupported operators
  175. Lucene's regular expression engine does not support anchor operators, such as
  176. `^` (beginning of line) or `$` (end of line). To match a term, the regular
  177. expression must match the entire string.