123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224 |
- [[regexp-syntax]]
- == Regular expression syntax
- A https://en.wikipedia.org/wiki/Regular_expression[regular expression] is a way to
- match patterns in data using placeholder characters, called operators.
- {es} supports regular expressions in the following queries:
- * <<query-dsl-regexp-query, `regexp`>>
- * <<query-dsl-query-string-query, `query_string`>>
- {es} uses https://lucene.apache.org/core/[Apache Lucene]'s regular expression
- engine to parse these queries.
- [discrete]
- [[regexp-reserved-characters]]
- === Reserved characters
- Lucene's regular expression engine supports all Unicode characters. However, the
- following characters are reserved as operators:
- ....
- . ? + * | { } [ ] ( ) " \
- ....
- Depending on the <<regexp-optional-operators, optional operators>> enabled, the
- following characters may also be reserved:
- ....
- # @ & < > ~
- ....
- To use one of these characters literally, escape it with a preceding
- backslash or surround it with double quotes. For example:
- ....
- \@ # renders as a literal '@'
- \\ # renders as a literal '\'
- "john@smith.com" # renders as 'john@smith.com'
- ....
-
- [discrete]
- [[regexp-standard-operators]]
- === Standard operators
- Lucene's regular expression engine does not use the
- https://en.wikipedia.org/wiki/Perl_Compatible_Regular_Expressions[Perl
- Compatible Regular Expressions (PCRE)] library, but it does support the
- following standard operators.
- `.`::
- +
- --
- Matches any character. For example:
- ....
- ab. # matches 'aba', 'abb', 'abz', etc.
- ....
- --
- `?`::
- +
- --
- Repeat the preceding character zero or one times. Often used to make the
- preceding character optional. For example:
- ....
- abc? # matches 'ab' and 'abc'
- ....
- --
- `+`::
- +
- --
- Repeat the preceding character one or more times. For example:
- ....
- ab+ # matches 'ab', 'abb', 'abbb', etc.
- ....
- --
- `*`::
- +
- --
- Repeat the preceding character zero or more times. For example:
- ....
- ab* # matches 'a', 'ab', 'abb', 'abbb', etc.
- ....
- --
- `{}`::
- +
- --
- Minimum and maximum number of times the preceding character can repeat. For
- example:
- ....
- a{2} # matches 'aa'
- a{2,4} # matches 'aa', 'aaa', and 'aaaa'
- a{2,} # matches 'a` repeated two or more times
- ....
- --
- `|`::
- +
- --
- OR operator. The match will succeed if the longest pattern on either the left
- side OR the right side matches. For example:
- ....
- abc|xyz # matches 'abc' and 'xyz'
- ....
- --
- `( … )`::
- +
- --
- Forms a group. You can use a group to treat part of the expression as a single
- character. For example:
- ....
- abc(def)? # matches 'abc' and 'abcdef' but not 'abcd'
- ....
- --
- `[ … ]`::
- +
- --
- Match one of the characters in the brackets. For example:
- ....
- [abc] # matches 'a', 'b', 'c'
- ....
- Inside the brackets, `-` indicates a range unless `-` is the first character or
- escaped. For example:
- ....
- [a-c] # matches 'a', 'b', or 'c'
- [-abc] # '-' is first character. Matches '-', 'a', 'b', or 'c'
- [abc\-] # Escapes '-'. Matches 'a', 'b', 'c', or '-'
- ....
- A `^` before a character in the brackets negates the character or range. For
- example:
- ....
- [^abc] # matches any character except 'a', 'b', or 'c'
- [^a-c] # matches any character except 'a', 'b', or 'c'
- [^-abc] # matches any character except '-', 'a', 'b', or 'c'
- [^abc\-] # matches any character except 'a', 'b', 'c', or '-'
- ....
- --
- [discrete]
- [[regexp-optional-operators]]
- === Optional operators
- You can use the `flags` parameter to enable more optional operators for
- Lucene's regular expression engine.
- To enable multiple operators, use a `|` separator. For example, a `flags` value
- of `COMPLEMENT|INTERVAL` enables the `COMPLEMENT` and `INTERVAL` operators.
- [discrete]
- ==== Valid values
- `ALL` (Default)::
- Enables all optional operators.
- `COMPLEMENT`::
- +
- --
- Enables the `~` operator. You can use `~` to negate the shortest following
- pattern. For example:
- ....
- a~bc # matches 'adc' and 'aec' but not 'abc'
- ....
- --
- `INTERVAL`::
- +
- --
- Enables the `<>` operators. You can use `<>` to match a numeric range. For
- example:
- ....
- foo<1-100> # matches 'foo1', 'foo2' ... 'foo99', 'foo100'
- foo<01-100> # matches 'foo01', 'foo02' ... 'foo99', 'foo100'
- ....
- --
- `INTERSECTION`::
- +
- --
- Enables the `&` operator, which acts as an AND operator. The match will succeed
- if patterns on both the left side AND the right side matches. For example:
- ....
- aaa.+&.+bbb # matches 'aaabbb'
- ....
- --
- `ANYSTRING`::
- +
- --
- Enables the `@` operator. You can use `@` to match any entire
- string.
- You can combine the `@` operator with `&` and `~` operators to create an
- "everything except" logic. For example:
- ....
- @&~(abc.+) # matches everything except terms beginning with 'abc'
- ....
- --
- [discrete]
- [[regexp-unsupported-operators]]
- === Unsupported operators
- Lucene's regular expression engine does not support anchor operators, such as
- `^` (beginning of line) or `$` (end of line). To match a term, the regular
- expression must match the entire string.
|